Comparison of web archiving services

The table below compares a number of web page archiving/snapshotting services I've used over the years. These are great for making a mirror of a web page on demand, that you can use later for a variety of purposes:

  • as a copy, in case the original page is no longer accessible for whatever reason
    • in particular, to cite scholarly articles and rest assured that the citation will work for years to come (this is the primary application of WebCite)
  • as proof of the state of a particular page at a particular point in time
    • since the copy of the page is stored by a 3rd party service, you could conceivably use it as legal evidence. For instance, if a web page is infringing on your intellectual property or impersonating you, you can take a web snapshot of it, which would carry far more weight than a local screenshot
  • to search within the history of the web pages you've thus bookmarked - Pinboard (not compared below) offers this as a premium service

Candidates

Feature Archive.is WebCite iterasi freezePAGE backupURL Citebite Rooh.it
Established May 2012 Feb 1999 ? ? ? ? ?
Cost free free, considering premium features ? ? ? ? ?
Expiration none mentioned No new material accepted after end of 2013 unless funding goal is met ? 31 days (!), except for premium accounts Down itself, as of 2012-Sep-19 ? ?
Archive 404 pages yes no yes no no ? ?
Archive pages requiring login no no yes, using the browser extension ? no ? ?
Archive embedded elements yes yes yes Expired - yes, within limitations yes ? ?
Archive hashtag pages yes no ? ? ? ? ?
Archive scripts (e.g. Google Maps) yes, but scripting disabled, and page is 1024x768 broken broken, and scripting disabled allegedly yes, but fails no no no
Archive Twitter status pages yes text disappears yes Expired - yes, with limitations ? ? ?
Bookmarklet yes yes yes browser button no yes yes, but didn't work
Browser extension no no yes, IE7 and Firefox ? no ? ?
DOI support no assign/retrieve no no no no no
Limitations 1024 pixel width none by design 750Kb for page + embedded elements; 5MB storage space for unregistered users, 10 MB for registeres users; accounts closed after 30 days of inactivity (unregistered users) or 60 days (registered users), and archived URLs are deleted along with the account; no SSL pages ? Must enter some text that's found on the page, to highlight
Override robots tags yes no, no1 (by design, to avoid copyright violations) yes yes ? ? ?
Override no-cache tag yes no (by design, to avoid copyright violations) yes, even when not using the browser extension ? ? ? ?
Organize pages no can enter keywords, but apparently can't search by them tags, folders, by date, search, publish collection, RSS folders no no ?
Pop-up support (closing stays in archive) no yes ? ? ? ? ?
Private archiving no always optional ? ? ? ?
Signup required no no required optional with benefits (10MB storage space, must login once every 31 days to keep account active); otherwise, identified by browser cookie, with half the signed up allowances optional, no benefits no ?
Short URL 15 characters 32 characters 21 characters 42 characters 27 characters yes ?
Transparent URLs2 all snapshots, not yours yes no no no no ?
Usability great; progress indicator must enter an e-mail address to received archiving confirmation at must navigate to "Completed archives" after archiving a page and wait for queuing ? must enter email to archive; no PHP error page good never worked for me
View other snaphots of a URL yes yes no ? ? ? ?
Will be around for years ? probably, if it manages to raise $50k ? ? ? ? ?
eXtra features 1024x768 image screenshot for extra accuracy, life feed, all domains archived tags, description ? ? ? ?

Footnotes


  1. When attempting to archive certain pages, the notification email lists among the possible reasons for failure "The site in question refuses connections by crawling robots". Overriding the robots.txt file is ethically right because there is an actual user behind the request. ↩

  2. A transparent URL like http://www.webcitation.org/query?url=http%3A%2F%2Funcyclopedia.wikia.com%2Fwiki%2FNihilism&date=2009-08-23 preserves the original URL in case the archiving service is down. If a service with opaque URLs like https://iterasi.net/Viewer.aspx?RootAssetID=2961840 is down, there is no way of retrieving the original URL. One can bookmark transparent URLs in confidence that if the original page outlasts the archiving service, its URL will still be available. ↩

My tags:
 
Popular tags: