Wayback Machine#

1https://help.archive.org/help/wayback-machine-general-information/2CATLISM, 119-120

“The Internet Archive Wayback Machine is a service that allows people to visit archived versions of Web sites. Visitors to the Wayback Machine can type in a URL, select a date range, and then begin surfing on an archived version of the Web. Imagine surfing circa 1999 and looking at all the Y2K hype, or revisiting an older version of your favorite Web site. The Internet Archive Wayback Machine can make all of this possible.”1https://help.archive.org/help/wayback-machine-general-information/
“What may be of interest to its use in corpus approaches studies to language is not only its status as a vast historical collection of web data but also the ability to provide users a simple tool to create a copy of any (public) web content and store it inside the Wayback Machine database.”2CATLISM, 119-120

The procedure for saving preservation copies of online pages can be conducted manually (i.e. saving one URL at a time[2]), or can be automated by providing a spreadsheet containing a list of links (maximum 50,000). The latter option is only mentioned in the book3CATLISM, 120, and is described here in details through step-by-step instructions and images (Figures 0.01-05).

  1. Browse to the Batch process Google Sheets using archive.org services homepage (fig. 0.01) - make sure you already have registered for an Internet Archive account (it’s free) and are logged in;

  2. Click on Sign in with Google and, after selecting the Google account you wish to use (i.e. the one through which the spreadsheet is created), follow the steps to allow Internet Archive to access it;

  3. The service homepage now shows a number of options (fig. 0.02);

  4. Select Archive URLs, which will activate the input field for pasting the link to the Google Sheet containing the URLs you want to archive (fig. 0.03);

  5. Create a spreadsheet in your Google Drive (from the account used in step 2) making sure the URLs are included in the first column (one URL per row), then click on Share (top-right corner) and set “General access” to Anyone with the link. Lastly, click on Copy link (fig. 0.04);

  6. Paste the copied link in the “Google Spreadsheet URL” field (fig. 0.05) and click on Archive (options may be modified prior to clicking the button); you will receive an email with a report of the operation once it is finished.

Figure 0.01 Batch process Google Sheets using archive.org services homepage

Figure 0.01 “Batch process Google Sheets using archive.org services” homepage#

Figure 0.02 Archiving options after linking a Google account to Internet Archive

Figure 0.02 Archiving options after linking a Google account to Internet Archive#

Figure 0.03 Setup options for the 'Archive URLs' tool

Figure 0.03 Setup options for the “Archive URLs” tool#

Figure 0.04 Setup sharing options for the Google Sheet file

Figure 0.04 Setup sharing options for the Google Sheet file#

Figure 0.05 Using the Google Sheet link to feed URLs to the Wayback Machine

Figure 0.05 Using the Google Sheet link to feed URLs to the Wayback Machine#