Comparison of web archiving services

The table below compares a number of web page archiving/snapshotting services I've used over the years. These are great for making a mirror of a web page on demand, that you can use later for a variety of purposes:

  • as a copy, in case the original page is no longer accessible for whatever reason
    • in particular, to cite scholarly articles and rest assured that the citation will work for years to come (this is the primary application of WebCite)
  • as proof of the state of a particular page at a particular point in time
    • since the copy of the page is stored by a 3rd party service, you could conceivably use it as legal evidence. For instance, if a web page is infringing on your intellectual property or impersonating you, you can take a web snapshot of it, which would carry far more weight than a local screenshot
  • to search within the history of the web pages you've thus bookmarked - Pinboard (not compared below) offers this as a premium service

Candidates

Feature Wayback Machine Archive.is WebCite iterasi freezePAGE Citebite Rooh.it
Established 1996 May 2012 Feb 1999 ? ? ? ?
Cost free free free, considering premium features ? ? ? ?
Expiration none none mentioned No new material accepted after end of 2013 unless funding goal is met ? 31 days (!), except for premium accounts ? ?
Archive 404 pages ? yes no yes no ? ?
Archive pages requiring login no no no yes, using the browser extension ? ? ?
Archive embedded elements ? yes yes yes Expired - yes, within limitations ? ?
Archive hashtag pages ? yes no ? ? ? ?
Archive scripts (e.g. Google Maps) GM no because of robots.txt, can't archive Disqus comments yes, but scripting disabled, and page is 1024x768 broken broken, and scripting disabled allegedly yes, but fails no no
Archive Twitter status pages ? yes text disappears yes Expired - yes, with limitations ? ?
Bookmarklet yes yes yes yes browser button yes yes, but didn't work
Browser extension ? no no yes, IE7 and Firefox ? ? ?
DOI support ? no assign/retrieve no no no no
Limitations ? 1024 pixel width none by design 750Kb for page + embedded elements; 5MB storage space for unregistered users, 10 MB for registeres users; accounts closed after 30 days of inactivity (unregistered users) or 60 days (registered users), and archived URLs are deleted along with the account; no SSL pages Must enter some text that's found on the page, to highlight ?
Override robots tags no yes no, no1 (by design, to avoid copyright violations) yes yes ? ?
Override no-cache tag ? yes no (by design, to avoid copyright violations) yes, even when not using the browser extension ? ? ?
Organize pages no no can enter keywords, but apparently can't search by them tags, folders, by date, search, publish collection, RSS folders no ?
Pop-up support (closing stays in archive) ? no yes ? ? ? ?
Private archiving no no always optional ? ? ?
Signup required no no no required optional with benefits (10MB storage space, must login once every 31 days to keep account active); otherwise, identified by browser cookie, with half the signed up allowances no ?
Short URL ? 15 characters 32 characters 21 characters 42 characters yes ?
Transparent URLs2 yes all snapshots, not yours yes no no no ?
Usability okay great; progress indicator must enter an e-mail address to received archiving confirmation at must navigate to "Completed archives" after archiving a page and wait for queuing ? good never worked for me
View other snaphots of a URL yes yes yes no ? ? ?
Will be around for years probably ? probably, if it manages to raise $50k ? ? ? ?
eXtra features search 1024x768 image screenshot for extra accuracy; link to any section within the page emails you the archived URL; has tags, description ? ? ? ?

Defunct services

Footnotes


  1. When attempting to archive certain pages, the notification email lists among the possible reasons for failure "The site in question refuses connections by crawling robots". Overriding the robots.txt file is ethically right because there is an actual user behind the request. ↩

  2. A transparent URL like http://www.webcitation.org/query?url=http%3A%2F%2Funcyclopedia.wikia.com%2Fwiki%2FNihilism&date=2009-08-23 preserves the original URL in case the archiving service is down. If a service with opaque URLs like https://iterasi.net/Viewer.aspx?RootAssetID=2961840 is down, there is no way of retrieving the original URL. One can bookmark transparent URLs in confidence that if the original page outlasts the archiving service, its URL will still be available. ↩

My tags:
 
Popular tags: