About Ghostarchive

Ghostarchive is a free-to-use archiving website designed to be fast and easy to use.

If your question is not here, feel free to ask it at the Google Form or email [email protected].

How does Ghostarchive work?

The process is simple: all one has to do is enter in the link of the page they would like to be archived, and Ghostarchive will store a snapshot of the website as it appeared at the time of archival. The snapshot will include any images and framed content. For some websites, videos are also saved.

There are two main archival replay systems: one is based on ReplayWeb.page technology, which can execute scripts in a sandbox, allowing for "high-fidelity" snapshot replay. However, the "ReplayWeb.page" technology makes use of Service Workers, which requires Javascript and an updated browser. You also need to be able to connect to the HTTPS version of the website.

For readers that choose to browse the web with Javascript disabled, use the HTTP version of the site, or use a browser that does not support Service Workers, an additional archival replay system is avaliable and can be accessed on any archived page. This "noscript" system does not rely on Javascript or any fancy web technologies. For the vast majority of sites, both replay systems work, but there are some sites that will only work with the ReplayWeb.page library, and others that will only work with the "noscript" replay system.

How long does archival take?

It depends on how many pages are in the queue, and also the complexity and size of the page being archived. Once your page is off the queue, the archiver has up to 5 minutes to completely archive the page.

Do you delete my stored page(s)?

Pages which violate the hoster's rules (cracks, porn, etc) may be deleted. Completely empty pages (or pages which have nothing but text like "502 Server Timeout") may be deleted, along with videos with looped content.

Is there a API available?

Ghostarchive is in the process of adding support for the MementoWeb API. In the mean time, making direct web queries and scraping the result should suffice.

Are you keeping backups of the data? What are the steps being taken to ensure the data stays safe?

Webpages are duplicated three times and video duplicated two times.

Are there storage limits on webpages being archived?

The stored page (including imported CSS files, fonts, images, etc...) must be smaller than 50 megabytes.

For video, the archiver will try to archive the highest quality possible (up to 360p) that is under 100 megabytes. For example, if the 360p version is 90 megabytes, it will pick the 360p version. If the 360p version is 110 megabytes, it will pick the 240p version. If the 240p version is 110 megabytes, it will pick the 144p version. If the 144p version is 110 megabytes, the video will not be archived.

Archiving livestreams or any video longer than 30 minutes may not work.

Can I have an account to manage archives/bookmarks/etc...

No, there is currently no account system in place. All parts of the site can be used without signing up.

How long will archived content be stored?

Archived content will be stored indefinitely, after all, that is the point of an archive.

While we have plenty of free space, in case of a storage shortage, the ability for new archives to be stored will be disabled, but the existing archived content will stay avaliable (Webcite-style).

What is the purpose of Ghostarchive?

Most societies place importance on preserving artifacts of their culture and heritage. Without such artifacts, civilization has no memory and no mechanism to learn from its successes and failures. Our culture now produces more and more artifacts in digital form. The purpose is to help preserve those artifacts and showcase an comprehensive and reflective piece of history for the general public.

Do you collect all the sites on the Web?

No, we collect only publicly accessible Web pages. We do not archive pages that require a password to access or pages that are only accessible when a person types into and sends a form.

The above FAQ is inspired from the archive.today, archive.org, freezepage.com, and webcitation.org FAQs, with additional questions and information.

You may also find the Terms of Service useful.