Get Old Webpages for any Domain

Get saved webpages for a domain from Wayback Machine in Google Sheets. 🚀

Check out the GIF to see it in action 🎥


How to add the Script to Google Sheets

1. Copy the script below:

/**
 * Get saved webpages for a domain from Wayback Machine in Google Sheets 
 *
 * @param {"adidas.com"} website input the domain (without the protocol)
 * @param {"20190101"} datefrom - [OPTIONAL] input the from date for the search (YYYYMMDD)
 * @param {"20200131"} dateto - [OPTIONAL] input the to date for the search (YYYYMMDD)
 * @param {"1000"} limit - [OPTIONAL] input a number to limit the number of rows returned
 * @customfunction
 */

function wayback(website, datefrom, dateto, limit) {

	// limit response to a max of 3000
	limit = limit || 3000;

	// return errors if there's no website or only one date input
	if (!website) return 'Error: you need to add a website';
	if (datefrom && !dateto) return 'Error: you need to add an end date';

	// build query dependent on whether date is selected
	const query = !datefrom ? `?url=${website}*&output=json` : `?url=${website}*&output=json&from=${datefrom}&to=${dateto}`;

	// fetch response and remove headers
	const response = UrlFetchApp.fetch(`http://web.archive.org/cdx/search/cdx${query}&limit=${limit}`);
	const content = JSON.parse(response.getContentText()).splice(1);

	if (content.length > 0) {

		// create a new array with columns needed
		const cols = () => {
			const col = [2, 3, 5];
			return content.map(r => col.map(i => r[i - 1]));
		}

		// format timestamp to readable date format
		const formatDate = (t) => {
			const year = t.substring(0, 4)
			const month = t.substring(4, 6);
			const day = t.substring(6, 8);
			return `${year}-${month}-${day}`;
		}

		//remove :80 from page if found
		const formatPage = (p) => {
			const query = ':80';
			return p.search(query) === -1 ? p : p.replace(query, '');
		}

		// create the wayback URL to view renderered page
		const formatWayback = (t, p) => {
			return `https://web.archive.org/web/${t}/${p}`;
		}

		// loop through new array, format it and add wayback URL column  
		let rows = [];
		for (i = 0; i < cols().length; i++) {

			data = cols()[i];
			const [date, page, status] = data;

			rows.push([formatDate(date), formatPage(page), status, formatWayback(date, page)]);
		}

		rows.unshift(['Date', 'URL', 'Status Code', 'Wayback URL']);

		return rows;

	} else {
		return 'Error: no data for website';
	}

}

2. Head over to Google Sheets

Or if you’re really smart, create a new sheet by going to: https://sheets.new

Select Script editor from the Tools menu.

Paste the script and save it.


3. Add the formula to any cell in your sheet

=wayback(domain, datefrom, dateto, limit)

domain represents any domain you want to input. Only include the domain and not the subdomain or protocol, ie. “adidas.com” and not “https://www.adidas.com”.

datefrom and dateto parameters allow you to specify a start and end date for the search. It must be formatted like YYYYMMDD eg. 20200101. You can also just input the year eg. 2020 or the year and month eg. 202005.

To make sure the request doesn’t timeout, the limit is set to 3000 rows by default. You can also set a smaller limit in the function or change the variable within the script itself.


Thanks for stopping by 👋

I’m Andrew Charlton, the Google Sheets nerd behind Keywords in Sheets. 🤓

Questions? Get in touch with me on social or comment below 👇


More Scripts

Submit your response

Your email address will not be published. Required fields are marked *

2 Responses on this post

  1. Hey Andrew,

    Thanks for the script.

    One suggestion: if you use the “collapse” & “matchType” parameters from the API (see https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server), we’d be able to retrieve unique URLs from archive.org. Correct me if I’m wrong, but in its current form, your template will retrieve several times the same URL. Which is not always what we need 🙂

    Happy to share (privately) a simple demo a have in Google Colab (I’m not fluent in JS XD) if you want.

    Thanks!