About Ghostarchive/Questions

Ghostarchive is a web archiving website and platform. It is designed to be fast, versatile, and easy to use. A open source release is planned soon.

I can be contacted at ghostarchive@ghostarchive.org

The following FAQ is inspired from the archive.today, archive.org, freezepage.com, and webcitation.org FAQs.

Which parts of the web page are saved? How does it work?

The archiving process is simple: Ghostarchive takes a snapshot of a cited webpage and stores a copy of the html including images (or any other files, for example pdf) on the ghostarchive server.

The following parts are saved:
  1. Textual content of the web page.
  2. Images, both directly and ones included on webpages.
  3. Content of the frames.
  4. Javascript heavy sites.
  5. Videos from certain sites.
  6. PDF

Webpages are stored in two different ways, so if one site doesn't work on one rendering method, you can always try the other to see if it works.

How long does it take to make a snapshot ?

The same time as to load a page into your browser. Pages with video or lots of scripts might take longer. There is a 5 minutes timeout, if page is not fully loaded in 5 minutes, the saving is considered to have failed. You can always try it again.

Is there a limit on the page size ?

The stored page with all images must be smaller than 50Mb. For video, the archiver will try to archive the highest quality possible (up to 720p) that is also under 100Mb. For example, if the 720p version is 90Mb, it will pick the 720p version. If the 720p version is 110 Mb, it will pick the 480p version. And if the 480p version is 110 Mb, it will pick the 360p version, and so on...

How long the page will be stored ?

Virtually forever. We have a lot of free space and although the archive grows with time, the storage and bandwidth get cheaper.

Do you delete my stored page(s) ?

Pages which violate our hoster's rules (cracks, porn, etc) may be deleted. Also, completely empty pages (or pages which have nothing but text like “502 Server Timeout”) may be deleted. Empty videos with no content or looped content may also be deleted.

How is the archive funded?

It is privately funded; there are no complex finances behind it. It may look more or less reliable compared to startup-style funding or a university project, depending on which risks are taken into account.

Does it support any API ?

Ghostarchive is in the process of adding support for the MementoWeb API.

Why are some sites harder to archive than others?

If you look at our collection of archived sites, you will find some broken pages, missing graphics, and some sites that aren't archived at all. Some of the things that may cause this are:

Javascript -- Javascript elements are often hard to archive, but especially if they generate links without having the full name in the page. Plus, if javascript needs to authenticate with the originating server in order to work, it will fail when archived.

Server side image maps -- Like any functionality on the web, if it needs to contact the originating server in order to work, it will fail when archived.

Pages only accessible when you click a button on an existing page may also not work.

As a general rule of thumb, simple html is the easiest to archive.

Can I have an account to manage my bookmarks ?

No. But you can keep bookmarks to archived pages in one of the existing bookmark managers, like Delicious, Google Bookmarks, …

Why does ghostarchive.org not obey robots.txt?

It is not a free-walking crawler, it saves only one page or video acting as a direct agent of the human user. Such services don't obey robots.txt (e.g. Google Feedfetcher, screenshot- or pdf-making services, isup.me, …)

The archived video does not play !

Try downloading the video and playing it.

I found incorrect/inaccurate/obsolete information. Can I request it to be altered or deleted?

The archive is not a news agency nor an authoritative source of reference information. It merely certifies that at the given point of time there was a page on the web. The page might well contain a fairy tale and despite “One day Little Red Riding Hood goes to visit her granny” being a false statement it is not the reason to burn the books. Note that weather forecasts on the archived pages are outdated as well.

My question is not here!

More questions and answers: https://ghostarchiving.tumblr.com