Preservation copies of online materials#
Preservation copies of online materials referenced in the book are included for two main reasons:
provide readers access to contents that resemble (as much as possible) the way they appeared when consulted during the final revision of the book (February-November 2022). The ‘look’ of the contents may however appear different, since the Wayback Machine may not be able to entirely replicate the graphical layout of a webpage
make the materials available in case they are changed, or in case a webpage/website is taken down/closed
It must be stressed that not all contents ‘are created equal’ as some are more ‘at risk’ than others: contents on e.g. Wikipedia or Encyclopedia Britannica are less likely to disappear due to the websites being closed than contents on e.g community projects or blogs owned by social media platforms. Examples of the latter two cases are: the website for the Libav project, abandoned in late 2022; and Instagram “About us” page, whose contents have been (partially moved) to other pages - details about the original founders however appear to have been deleted from the website.
While copies have been included for the majority of online contents, no preservation copies are provided for the following types of contents:
Academic papers, with the exception of ones hosted on conference/university websites and available through Open Access or Creative Common licences
Github code repositories (but not Github websites): these will be included in future versions of the compendium, using Software Heritage archival resources
Note
Preservation copies hosted on the Wayback Machine (web.archive.org) may at times be slow to load, depending on the size of the material (e.g. PDF files, Youtube videos) as well as current server load.
Column URL
contains the link as it appears in the printed volume, while clicking on the symbol will open the preservation copy.
Some links may be referenced more than once in the book; in the table below only their first occurrence (i.e. the page number in which the link is first referenced) is included. Using the Search function it is possible to input a portion of the link to find the relative results.
URL |
page |
preservation copy |
notes |
---|---|---|---|
http://aoir.org/reports/ethics2.pdf |
66 |
||
http://blog.research.google/2006/08/all-our-n-gram-are-belong-to-you.html |
305 |
||
http://blurrt.co.uk/ |
369 |
||
http://corpora.lancs.ac.uk/lancsbox/ |
307 |
||
http://data.europa.eu/eli/dir/2019/790/oj/eng |
62 |
||
http://perverted-justice.com |
369 |
||
http://perverted-justice.com/?archive=mg0942 |
370 |
||
http://ytdl-org.github.io/youtube-dl/supportedsites.html |
304 |
||
https://about.fb.com/news/2020/06/labeling-state-controlled-media/ |
308 |
||
https://about.fb.com/news/2021/04/how-we-combat-scraping/ |
307 |
||
https://about.instagram.com/blog/announcements/combatting-misinformation-on-instagram |
309 |
||
https://about.instagram.com/blog/announcements/instagram-verification-and-authentication-tool-updates |
310 |
||
https://about.instagram.com/blog/announcements/introducing-family-center-and-supervision-tools |
310 |
||
https://about.instagram.com/blog/announcements/introducing-stories-highlights-and-stories-archive |
309 |
||
https://about.instagram.com/blog/announcements/supporting-well-being-with-instagram-guides |
309 |
||
https://algorithmwatch.org/en/instagram-research-shut-down-by-facebook/ |
65 |
||
https://aoir.org |
59 |
||
https://aoir.org/reports/ethics3.pdf |
63 |
||
https://appleinsider.com/articles/12/07/25/apple_kills_windows_pc_support_in_safari_60 |
136 |
||
https://archive-it.org/blog/post/the-stack-warc-file/ |
137 |
||
https://automatetheboringstuff.com |
104 |
||
https://beautiful-soup-4.readthedocs.io/_/downloads/en/latest/pdf/ |
311 |
||
https://blog.archive.org/2016/10/23/defining-web-pages-web-sites-and-web-captures/ |
136 |
||
https://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/ |
308 |
||
https://blog.codinghorror.com/the-problem-with-urls/ |
306 |
||
https://blog.mattheworiordan.com/post/13174566389/url-regular-expression-for-links-with-or-without |
305 |
||
https://blog.twitter.com/en_us/a/2013/twitter-amplify-partnerships-great-content-great-brands-great-engagement |
304 |
||
https://blog.twitter.com/en_us/topics/company/2020/suspension |
136 |
||
https://blog.twitter.com/en_us/topics/product/2020/new-labels-for-government-and-state-affiliated-media-accounts |
312 |
||
https://blog.youtube/news-and-events/keep-fans-engaged-with-cards-end/ |
59 |
||
https://blog.youtube/news-and-events/update-to-youtube/ |
60 |
||
https://blogs.lse.ac.uk/impactofsocialsciences/2021/05/18/using-twitter-as-a-data-source-an-overview-of-social-media-research-tools-2021/ |
61 |
||
https://business.twitter.com/en/help/campaign-setup/conversational-ad-formats.html |
304 |
||
https://cass.lancs.ac.uk/log-ratio-an-informal-introduction/ |
94 |
||
https://clarin.ids-mannheim.de/standards/views/view-spec.xq?id=SpecCes |
59 |
||
https://collins.co.uk/pages/elt-cobuild-reference-the-collins-corpus |
59 |
||
https://corpus-analysis.com/ |
92 |
||
https://cwb.sourceforge.io |
305 |
||
https://cwb.sourceforge.io/index.php |
308 |
||
https://developer.chrome.com/blog/headless-chrome/ |
306 |
||
https://developer.mozilla.org/en-US/docs/Glossary/Base64 |
310 |
||
https://developer.mozilla.org/en-US/docs/Web/HTML/Element |
382 |
||
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/table |
303 |
||
https://developer.twitter.com/en/blog/product-news/2021/enabling-the-future-of-academic-research-with-the-twitter-api |
59 |
||
https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/geo-objects.html#place |
303 |
||
https://developer.twitter.com/en/docs/twitter-for-websites/cards/overview/abouts-cards |
304 |
||
https://developers.whatismybrowser.com/useragents/explore/ |
302 |
||
https://diasporafoundation.org/ |
59 |
||
https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github |
137 |
||
https://edition.cnn.com/2020/11/05/tech/steve-bannon-twitter-permanent-suspension/index.html |
369 |
||
https://electionemails2020.org/ |
59 |
||
https://en.wikipedia.org/wiki/Carriage_return |
137 |
||
https://en.wikipedia.org/wiki/Comparison_of_video_hosting_services#General_information |
304 |
||
https://en.wikipedia.org/wiki/Computer_science |
92 |
||
https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes |
304 |
||
https://en.wikipedia.org/wiki/List_of_Web_archiving_initiatives |
136 |
||
https://en.wikipedia.org/wiki/Markup_language |
60 |
||
https://en.wikipedia.org/wiki/Periscope_(service) |
304 |
||
https://en.wikipedia.org/wiki/Snaptu |
60 |
||
https://en.wikipedia.org/wiki/Vine_(service) |
304 |
||
https://ffmpeg.org/ |
305 |
||
https://foia.state.gov/Search/Results.aspx?collection=Clinton_Email |
59 |
Since the link takes to the results of a search query, the preservation copy leads to the search interface without showing any result. |
|
https://git-scm.com/ |
123 |
||
https://git-scm.com/book/en/v2 |
123 |
||
https://git-scm.com/downloads/guis |
137 |
||
https://github.blog/2020-11-16-standing-up-for-developers-youtube-dl-is-back/ |
312 |
||
https://googleblog.blogspot.com/2012/02/unicode-over-60-percent-of-web.html |
137 |
||
https://help.instagram.com/138925576505882 |
304 |
||
https://help.twitter.com/en/rules-and-policies/state-affiliated |
304 |
||
https://help.twitter.com/en/using-twitter/spaces |
304 |
||
https://help.twitter.com/en/using-twitter/twitter-moments |
304 |
||
https://html.spec.whatwg.org/multipage/introduction.html |
136 |
||
https://html.spec.whatwg.org/multipage/introduction.html#a-quick-introduction-to-html |
136 |
||
https://html.spec.whatwg.org/multipage/syntax.html#elements-2 |
136 |
||
https://iipc.github.io/warc-specifications/ |
136 |
||
https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/ |
139 |
||
https://jacobstar.medium.com/the-first-complete-guide-to-youtube-captions-f886e06f7d9d |
311 |
||
https://jalammar.github.io/illustrated-bert/ |
92 |
||
https://joinmastodon.org/ |
59 |
||
https://joinpeertube.org/ |
59 |
||
https://joinup.ec.europa.eu/collection/eupl/solution/joinup-licensing-assistant/jla-find-and-compare-software-licenses |
59 |
||
https://jsonlines.org/ |
136 |
||
https://kunststube.net/encoding/ |
130 |
||
https://labelstud.io |
137 |
||
https://lareviewofbooks.org/article/neoliberal-tools-archives-political-history-digital-humanities/ |
19 |
The webpage looks empty (black background) but the contents are available: text contents are in black colour over black background, and selecting (highlighting) them makes the text visible. |
|
https://later.com/blog/instagram-add-reminder/ |
312 |
||
https://lxml.de/ |
303 |
||
https://melaniewalsh.github.io/Intro-Cultural-Analytics/ |
104 |
||
https://mercury.postlight.com/web-parser/ |
302 |
||
https://morethesis.unimore.it/ |
303 |
||
https://netpreserve.org |
136 |
||
https://nlp.fi.muni.cz/raslan/2011/paper16.pdf |
139 |
||
https://nlp.stanford.edu/ |
305 |
||
https://ojs.aaai.org/index.php/ICWSM/article/view/15010 |
67 |
||
https://openrefine.org/documentation.html |
298 |
||
https://opensource.org/licenses/ |
59 |
||
https://pixelfed.org/ |
59 |
||
https://plotly.com/python/ |
370 |
||
https://pytorch.org/ |
305 |
||
https://returnyoutubedislike.com/ |
305 |
||
https://returnyoutubedislike.com/faq |
305 |
||
https://scrapy.org/ |
303 |
||
https://sfconservancy.org/GiveUpGitHub/ |
140 |
||
https://social.techcrunch.com/2019/12/16/instagram-fact-checking/ |
307 |
||
https://stackoverflow.com/questions/49625771/how-to-recreate-the-preview-from-instagrams-media-preview-raw-data |
304 |
||
https://support.apple.com/guide/safari/use-the-developer-tools-in-the-develop-menu-sfri20948/15.1/mac/12.0 |
136 |
||
https://support.mozilla.org/en-US/kb/firefox-reader-view-clutter-free-web-pages |
302 |
||
https://tei-c.org/ |
59 |
||
https://tutorial.djangogirls.org/en/intro_to_command_line/#open-the-command-line-interface |
138 |
||
https://twitter.com/search-advanced |
303 |
||
https://twittercommunity.com/t/introducing-the-new-academic-research-product-track/148632 |
59 |
||
https://ucrel.lancs.ac.uk/vard/userguide/ |
275 |
||
https://users.ox.ac.uk/~martinw/dlc/index.htm |
130 |
||
https://vanderwal.net/folksonomy.html |
312 |
||
https://w3techs.com/technologies/overview/character_encoding |
137 |
The webpage cannot be saved through the Wayback Machine – arguably due to a decision of the webmaster to have the whole website excluded from archiving efforts. A snapshot of the whole page as it appeared on 04/09/2023 has therefore been included in place of the preservation copy. |
|
https://wallabag.org/ |
369 |
||
https://wiki.archiveteam.org/index.php/GeoCities |
136 |
||
https://wiki.u-gov.it/confluence/display/ESSE3/Normativa+e+Tipo+Corso+di+Studio |
303 |
||
www.facebook.com/journalismproject/programs/third-party-fact-checking/new-ratings |
304 |
||
www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ |
130 |
||
www.loc.gov/standards/iso639-2/php/English_list.php |
305 |
||
www.similarweb.com/top-websites/ |
304 |
||
www.theguardian.com/technology/2022/apr/04/mind-blowing-ai-da-becomes-first-robot-to-paint-like-an-artist |
307 |
||
www.w3.org/TR/NOTE-datetime |
312 |
||
www.youtube.com/watch?v=5RwhEHzuulA |
304 |
||
www.baal.org.uk/wp-content/uploads/2021/03/BAAL-Good-Practice-Guidelines-2021.pdf |
59 |
||
www.cs.cmu.edu/~enron/ |
59 |
||
www.csoonline.com/article/3662039/hiq-v-linkedin-court-ruling-will-have-a-material-efect-on-privacy.html |
61 |
||
www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read |
18 |
||
www.fsf.org/ |
59 |
||
www.gnu.org/licenses/license-list.html |
59 |
||
www.gnu.org/philosophy/essays-and-articles.html |
59 |
||
www.huffpost.com/entry/ubersocial-ubertwitter_n_825360 |
60 |
||
www.instagram.com/about/us/ |
304 |
||
www.json.org/json-en.html |
136 |
||
www.kaggle.com/datasets/rtatman/fraudulent-email-corpus |
59 |
||
www.linkedin.com/posts/sarahgwight_user-agreement-linkedin-activity-6994402330884386816-UAkW/ |
59 |
||
www.nytimes.com/interactive/2016/12/10/business/media/pizzagate.html |
60 |
||
www.thunderbird.net/en-US/features/ |
59 |
||
www.versionmuseum.com/history-of/youtube-website |
60 |
||
www.w3.org/TR/1999/REC-html401-19991224/intro/sgmltut.html#h-3.2.1 |
136 |
||
www.eff.org/it/deeplinks/2022/04/scraping-public-websites-still-isnt-crime-court-appeals-declares |
62 |
||
www.gnu.org/philosophy/free-sw.html |
63 |
||
www.technologyreview.com/2022/08/31/1058800/what-does-gpt-3-know-about-me/ |
64 |
||
www.theguardian.com/technology/2018/jan/12/google-racism-ban-gorilla-black-people |
64 |
||
www.harvardmagazine.com/2000/01/code-is-law-html |
66 |
||
www.nytimes.com/2013/03/17/opinion/sunday/morozov-open-and-closed.html |
66 |
||
www.eff.org/it/deeplinks/2021/07/eff-ninth-circuit-recent-supreme-court-decision-van-buren-does-not-criminalize-web |
68 |
||
www.niso.org/publications/understanding-metadata-2017 |
68 |
||
www.gnu.org/philosophy/open-source-misses-the-point.en.html |
68 |
||
www.gnu.org/licenses/rms-why-gplv3.html |
68 |
||
www.gnu.org/philosophy/stallman-kth.html |
68 |
||
www.gnu.org/philosophy/nonsoftware-copyleft.html |
68 |
||
www.laurenceanthony.net/software/antconc/ |
73 |
||
https://lexically.net/wordsmith/ |
73 |
||
https://nlp.fi.muni.cz/trac/noske |
73 |
||
www.laurenceanthony.net/software/antconc/releases/AntConc411/license.pdf |
92 |
||
www.youtube.com/watch?v=ka4yDJLtSSc |
94 |
||
www.sketchengine.eu/wp-content/uploads/The_TenTen_Corpus_2013.pdf |
94 |
||
www.sketchengine.eu/wp-content/uploads/ske-statistics.pdf |
95 |
||
www.R-project.org |
95 |
||
www.nltk.org/book/ |
104 |
||
www.w3.org/International/questions/qa-what-is-encoding |
130 |
||
www.w3.org/International/getting-started/characters |
130 |
||
www.rfc-editor.org/rfc/rfc1866 |
136 |
||
www.w3.org/standards/xml/core |
136 |
||
www.iana.org/assignments/character-sets/character-sets.xhtml |
137 |
||
www.laurenceanthony.net/software/encodeant/ |
137 |
||
https://gwern.net/dnm-archive |
138 |
||
www.lrec-conf.org/proceedings/lrec2004/pdf/480.pdf |
138 |
||
www.britannica.com/event/January-6-U-S-Capitol-attack |
138 |
||
www.theguardian.com/technology/2018/may/25/gdpr-us-based-news-websites-eu-internet-users-la-times |
138 |
||
www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/introduction.html |
139 |
||
www.crummy.com/software/BeautifulSoup/ |
161 |
||
www.digitalocean.com/community/tutorials/how-to-scrape-web-pages-with-beautiful-soup-and-python-3 |
180 |
||
www.robotstxt.org/robotstxt.html |
302 |
||
www.theguardian.com/news/series/cambridge-analytica-files |
302 |
||
www.gnu.org/software/wget/ |
303 |
||
www.markdownguide.org/cheat-sheet |
303 |
||
www.tweepy.org |
303 |
||
www.youtube.com/watch?v=WySXiFsG0qU |
305 |
||
www.unicode.org/emoji/charts/full-emoji-list.html |
305 |
||
www.digit.fyi/facebook-and-instagram-to-age-gate-sexual-content-for-minors/ |
306 |
||
www.laurenceanthony.net/software |
306 |
||
www.theguardian.com/news/2018/mar/17/cambridge-analytica-facebook-influence-us-election |
307 |
||
www.theguardian.com/news/2018/mar/26/the-cambridge-analytica-files-the-story-so-far |
308 |
||
www.eff.org/deeplinks/2020/11/riaa-abuses-dmca-take-down-popular-tool-downloading-online-video |
308 |
||
www.eff.org/deeplinks/2020/11/github-reinstates-youtube-dl-after-riaas-abuse-dmca |
308 |
||
www.kielipankki.fi/development/korp/corpus-input-format/ |
309 |
||
www.unicode.org/L2/L2010/10132-emojidata.pdf |
311 |
||
www.swansea.ac.uk/gdpo/ |
369 |
||
www.unodc.org/unodc/en/commissions/CND/session/cnd-documents-index.html |
369 |
||
www.counterextremism.com/supremacy/traditionalist-worker-party-traditional-youth-network |
369 |
||
www.australia.gov.au/directories/australia/centrelink |
369 |
||
www.swansea.ac.uk/project-dragon-s/ |
369 |
||
www.pjfi.org/ |
370 |
||
https://libav.org |
305 |
The project was abandoned in late 2022, and the website is now deactivated. |