About Ghostarchive/Questions

Ghostarchive is a web archiving website and platform. It is designed to be fast, versatile, and easy to use. A open source release is planned soon.

I can be contacted at ghostarchive@ghostarchive.org

The following FAQ is inspired from the archive.today, archive.org, freezepage.com, and webcitation.org FAQs.

Which parts of web page are saved?

  1. Textual content of the web page.
  2. Images.
  3. Content of the frames.
  4. Content and images loaded or generated by Javascript on Web 2.0 sites
  5. Videos from certain sites.

Webpages are stored in two (well, technically three, but two very similar and one completely different) different ways, so if one site doesn't work on one rendering method, you can always try the other to see if it works.

Which parts of web page are not saved?

  1. Flash and content loaded by flash.
  2. PDF (will add support for this soon)
  3. RSS and other XML-pages saved not reliable. Most of them are not saved or saved as blank page.

How long does it take to make a snapshot ?

The same time as to load a page into your browser. Although, saving the pages with heavy scripts or the pages full of Ads may take up to few minutes. There is 5 minutes timeout, if page is not fully loaded in 5 minutes, the saving considered failed. It is not often, but it happens.

It there limit on the page size ?

The stored page with all images must be smaller than 50Mb. For video, the archiver will try to archive the highest quality possible (up to 720p) that is also under 100Mb. For example, if the 720p version is 90Mb, it will pick the 720p version. If the 720p version is 110 Mb, it will pick the 480p version. And if the 480p version is 110 Mb, it will pick the 360p version, and so on...

How long the page will be stored ?

Virtually forever. We have a lot of free space and although the archive grows with time, the storage and bandwidth get cheaper.

Do you delete my stored page(s) ?

Pages which violate our hoster's rules (cracks, porn, etc) may be deleted. Also, completely empty pages (or pages which have nothing but text like “502 Server Timeout”) may be deleted. Empty videos with no content or looped content may also be deleted.

How is the archive funded?

It is privately funded; there are no complex finances behind it. It may look more or less reliable compared to startup-style funding or a university project, depending on which risks are taken into account.

Does it support any API ?

Ghostarchive is in the process of adding support for MementoWeb API.

Can I have an account to manage my bookmarks ?

No. But you can keep bookmarks to archived pages in one of the existing bookmark managers, like Delicious, Google Bookmarks, …

Why does ghostarchive.org not obey robots.txt?

Because it is not a free-walking crawler, it saves only one page acting as a direct agent of the human user. Such services don't obey robots.txt (e.g. Google Feedfetcher, screenshot- or pdf-making services, isup.me, …)

I found incorrect/inaccurate/obsolete information. Can I request it to be altered or deleted?

The archive is not a news agency nor an authoritative source of reference information. It merely certifies that at the given point of time there was a page on the web. The page might well contain a fairy tale and despite “One day Little Red Riding Hood goes to visit her granny” being a false statement it is not the reason to burn the books. Note that weather forecasts on the archived pages are outdated as well.

My question is not here!

More questions and answers: https://ghostarchiving.tumblr.com