Preservation copies of online materials#

Preservation copies of online materials referenced in the book are included for two main reasons:

  • provide readers access to contents that resemble (as much as possible) the way they appeared when consulted during the final revision of the book (February-November 2022). The ‘look’ of the contents may however appear different, since the Wayback Machine may not be able to entirely replicate the graphical layout of a webpage

  • make the materials available in case they are changed, or in case a webpage/website is taken down/closed

It must be stressed that not all contents ‘are created equal’ as some are more ‘at risk’ than others: contents on e.g. Wikipedia or Encyclopedia Britannica are less likely to disappear due to the websites being closed than contents on e.g community projects or blogs owned by social media platforms. Examples of the latter two cases are: the website for the Libav project, abandoned in late 2022; and Instagram “About us” page, whose contents have been (partially moved) to other pages - details about the original founders however appear to have been deleted from the website.

While copies have been included for the majority of online contents, no preservation copies are provided for the following types of contents:

  • Academic papers, with the exception of ones hosted on conference/university websites and available through Open Access or Creative Common licences

  • Github code repositories (but not Github websites): these will be included in future versions of the compendium, using Software Heritage archival resources

Note

Preservation copies hosted on the Wayback Machine (web.archive.org) may at times be slow to load, depending on the size of the material (e.g. PDF files, Youtube videos) as well as current server load.

Column URL contains the link as it appears in the printed volume, while clicking on the symbol will open the preservation copy. Some links may be referenced more than once in the book; in the table below only their first occurrence (i.e. the page number in which the link is first referenced) is included. Using the Search function it is possible to input a portion of the link to find the relative results.

URL

page

preservation copy

notes

http://aoir.org/reports/ethics2.pdf

66

http://blog.research.google/2006/08/all-our-n-gram-are-belong-to-you.html

305

http://blurrt.co.uk/

369

http://corpora.lancs.ac.uk/lancsbox/

307

http://data.europa.eu/eli/dir/2019/790/oj/eng

62

http://perverted-justice.com

369

http://perverted-justice.com/?archive=mg0942

370

http://ytdl-org.github.io/youtube-dl/supportedsites.html

304

https://about.fb.com/news/2020/06/labeling-state-controlled-media/

308

https://about.fb.com/news/2021/04/how-we-combat-scraping/

307

https://about.instagram.com/blog/announcements/combatting-misinformation-on-instagram

309

https://about.instagram.com/blog/announcements/instagram-verification-and-authentication-tool-updates

310

https://about.instagram.com/blog/announcements/introducing-family-center-and-supervision-tools

310

https://about.instagram.com/blog/announcements/introducing-stories-highlights-and-stories-archive

309

https://about.instagram.com/blog/announcements/supporting-well-being-with-instagram-guides

309

https://algorithmwatch.org/en/instagram-research-shut-down-by-facebook/

65

https://aoir.org

59

https://aoir.org/reports/ethics3.pdf

63

https://appleinsider.com/articles/12/07/25/apple_kills_windows_pc_support_in_safari_60

136

https://archive-it.org/blog/post/the-stack-warc-file/

137

https://automatetheboringstuff.com

104

https://beautiful-soup-4.readthedocs.io/_/downloads/en/latest/pdf/

311

https://blog.archive.org/2016/10/23/defining-web-pages-web-sites-and-web-captures/

136

https://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/

308

https://blog.codinghorror.com/the-problem-with-urls/

306

https://blog.mattheworiordan.com/post/13174566389/url-regular-expression-for-links-with-or-without

305

https://blog.twitter.com/en_us/a/2013/twitter-amplify-partnerships-great-content-great-brands-great-engagement

304

https://blog.twitter.com/en_us/topics/company/2020/suspension

136

https://blog.twitter.com/en_us/topics/product/2020/new-labels-for-government-and-state-affiliated-media-accounts

312

https://blog.youtube/news-and-events/keep-fans-engaged-with-cards-end/

59

https://blog.youtube/news-and-events/update-to-youtube/

60

https://blogs.lse.ac.uk/impactofsocialsciences/2021/05/18/using-twitter-as-a-data-source-an-overview-of-social-media-research-tools-2021/

61

https://business.twitter.com/en/help/campaign-setup/conversational-ad-formats.html

304

https://cass.lancs.ac.uk/log-ratio-an-informal-introduction/

94

https://clarin.ids-mannheim.de/standards/views/view-spec.xq?id=SpecCes

59

https://collins.co.uk/pages/elt-cobuild-reference-the-collins-corpus

59

https://corpus-analysis.com/

92

https://cwb.sourceforge.io

305

https://cwb.sourceforge.io/index.php

308

https://developer.chrome.com/blog/headless-chrome/

306

https://developer.mozilla.org/en-US/docs/Glossary/Base64

310

https://developer.mozilla.org/en-US/docs/Web/HTML/Element

382

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/table

303

https://developer.twitter.com/en/blog/product-news/2021/enabling-the-future-of-academic-research-with-the-twitter-api

59

https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/geo-objects.html#place

303

https://developer.twitter.com/en/docs/twitter-for-websites/cards/overview/abouts-cards

304

https://developers.whatismybrowser.com/useragents/explore/

302

https://diasporafoundation.org/

59

https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github

137

https://edition.cnn.com/2020/11/05/tech/steve-bannon-twitter-permanent-suspension/index.html

369

https://electionemails2020.org/

59

https://en.wikipedia.org/wiki/Carriage_return

137

https://en.wikipedia.org/wiki/Comparison_of_video_hosting_services#General_information

304

https://en.wikipedia.org/wiki/Computer_science

92

https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes

304

https://en.wikipedia.org/wiki/List_of_Web_archiving_initiatives

136

https://en.wikipedia.org/wiki/Markup_language

60

https://en.wikipedia.org/wiki/Periscope_(service)

304

https://en.wikipedia.org/wiki/Snaptu

60

https://en.wikipedia.org/wiki/Vine_(service)

304

https://ffmpeg.org/

305

https://foia.state.gov/Search/Results.aspx?collection=Clinton_Email

59

Since the link takes to the results of a search query, the preservation copy leads to the search interface without showing any result.

https://git-scm.com/

123

https://git-scm.com/book/en/v2

123

https://git-scm.com/downloads/guis

137

https://github.blog/2020-11-16-standing-up-for-developers-youtube-dl-is-back/

312

https://googleblog.blogspot.com/2012/02/unicode-over-60-percent-of-web.html

137

https://help.instagram.com/138925576505882

304

https://help.twitter.com/en/rules-and-policies/state-affiliated

304

https://help.twitter.com/en/using-twitter/spaces

304

https://help.twitter.com/en/using-twitter/twitter-moments

304

https://html.spec.whatwg.org/multipage/introduction.html

136

https://html.spec.whatwg.org/multipage/introduction.html#a-quick-introduction-to-html

136

https://html.spec.whatwg.org/multipage/syntax.html#elements-2

136

https://iipc.github.io/warc-specifications/

136

https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/

139

https://jacobstar.medium.com/the-first-complete-guide-to-youtube-captions-f886e06f7d9d

311

https://jalammar.github.io/illustrated-bert/

92

https://joinmastodon.org/

59

https://joinpeertube.org/

59

https://joinup.ec.europa.eu/collection/eupl/solution/joinup-licensing-assistant/jla-find-and-compare-software-licenses

59

https://jsonlines.org/

136

https://kunststube.net/encoding/

130

https://labelstud.io

137

https://lareviewofbooks.org/article/neoliberal-tools-archives-political-history-digital-humanities/

19

The webpage looks empty (black background) but the contents are available: text contents are in black colour over black background, and selecting (highlighting) them makes the text visible.

https://later.com/blog/instagram-add-reminder/

312

https://lxml.de/

303

https://melaniewalsh.github.io/Intro-Cultural-Analytics/

104

https://mercury.postlight.com/web-parser/

302

https://morethesis.unimore.it/

303

https://netpreserve.org

136

https://nlp.fi.muni.cz/raslan/2011/paper16.pdf

139

https://nlp.stanford.edu/

305

https://ojs.aaai.org/index.php/ICWSM/article/view/15010

67

https://openrefine.org/documentation.html

298

https://opensource.org/licenses/

59

https://pixelfed.org/

59

https://plotly.com/python/

370

https://pytorch.org/

305

https://returnyoutubedislike.com/

305

https://returnyoutubedislike.com/faq

305

https://scrapy.org/

303

https://sfconservancy.org/GiveUpGitHub/

140

https://social.techcrunch.com/2019/12/16/instagram-fact-checking/

307

https://stackoverflow.com/questions/49625771/how-to-recreate-the-preview-from-instagrams-media-preview-raw-data

304

https://support.apple.com/guide/safari/use-the-developer-tools-in-the-develop-menu-sfri20948/15.1/mac/12.0

136

https://support.mozilla.org/en-US/kb/firefox-reader-view-clutter-free-web-pages

302

https://tei-c.org/

59

https://tutorial.djangogirls.org/en/intro_to_command_line/#open-the-command-line-interface

138

https://twitter.com/search-advanced

303

https://twittercommunity.com/t/introducing-the-new-academic-research-product-track/148632

59

https://ucrel.lancs.ac.uk/vard/userguide/

275

https://users.ox.ac.uk/~martinw/dlc/index.htm

130

https://vanderwal.net/folksonomy.html

312

https://w3techs.com/technologies/overview/character_encoding

137

The webpage cannot be saved through the Wayback Machine – arguably due to a decision of the webmaster to have the whole website excluded from archiving efforts. A snapshot of the whole page as it appeared on 04/09/2023 has therefore been included in place of the preservation copy.

https://wallabag.org/

369

https://wiki.archiveteam.org/index.php/GeoCities

136

https://wiki.u-gov.it/confluence/display/ESSE3/Normativa+e+Tipo+Corso+di+Studio

303

www.facebook.com/journalismproject/programs/third-party-fact-checking/new-ratings

304

www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

130

www.loc.gov/standards/iso639-2/php/English_list.php

305

www.similarweb.com/top-websites/

304

www.theguardian.com/technology/2022/apr/04/mind-blowing-ai-da-becomes-first-robot-to-paint-like-an-artist

307

www.w3.org/TR/NOTE-datetime

312

www.youtube.com/watch?v=5RwhEHzuulA

304

www.baal.org.uk/wp-content/uploads/2021/03/BAAL-Good-Practice-Guidelines-2021.pdf

59

www.cs.cmu.edu/~enron/

59

www.csoonline.com/article/3662039/hiq-v-linkedin-court-ruling-will-have-a-material-efect-on-privacy.html

61

www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read

18

www.fsf.org/

59

www.gnu.org/licenses/license-list.html

59

www.gnu.org/philosophy/essays-and-articles.html

59

www.huffpost.com/entry/ubersocial-ubertwitter_n_825360

60

www.instagram.com/about/us/

304

www.json.org/json-en.html

136

www.kaggle.com/datasets/rtatman/fraudulent-email-corpus

59

www.linkedin.com/posts/sarahgwight_user-agreement-linkedin-activity-6994402330884386816-UAkW/

59

www.nytimes.com/interactive/2016/12/10/business/media/pizzagate.html

60

www.thunderbird.net/en-US/features/

59

www.versionmuseum.com/history-of/youtube-website

60

www.w3.org/TR/1999/REC-html401-19991224/intro/sgmltut.html#h-3.2.1

136

www.eff.org/it/deeplinks/2022/04/scraping-public-websites-still-isnt-crime-court-appeals-declares

62

www.gnu.org/philosophy/free-sw.html

63

www.technologyreview.com/2022/08/31/1058800/what-does-gpt-3-know-about-me/

64

www.theguardian.com/technology/2018/jan/12/google-racism-ban-gorilla-black-people

64

www.harvardmagazine.com/2000/01/code-is-law-html

66

www.nytimes.com/2013/03/17/opinion/sunday/morozov-open-and-closed.html

66

www.eff.org/it/deeplinks/2021/07/eff-ninth-circuit-recent-supreme-court-decision-van-buren-does-not-criminalize-web

68

www.niso.org/publications/understanding-metadata-2017

68

www.gnu.org/philosophy/open-source-misses-the-point.en.html

68

www.gnu.org/licenses/rms-why-gplv3.html

68

www.gnu.org/philosophy/stallman-kth.html

68

www.gnu.org/philosophy/nonsoftware-copyleft.html

68

www.laurenceanthony.net/software/antconc/

73

https://lexically.net/wordsmith/

73

https://nlp.fi.muni.cz/trac/noske

73

www.laurenceanthony.net/software/antconc/releases/AntConc411/license.pdf

92

www.youtube.com/watch?v=ka4yDJLtSSc

94

www.sketchengine.eu/wp-content/uploads/The_TenTen_Corpus_2013.pdf

94

www.sketchengine.eu/wp-content/uploads/ske-statistics.pdf

95

www.R-project.org

95

www.nltk.org/book/

104

www.w3.org/International/questions/qa-what-is-encoding

130

www.w3.org/International/getting-started/characters

130

www.rfc-editor.org/rfc/rfc1866

136

www.w3.org/standards/xml/core

136

www.iana.org/assignments/character-sets/character-sets.xhtml

137

www.laurenceanthony.net/software/encodeant/

137

https://gwern.net/dnm-archive

138

www.lrec-conf.org/proceedings/lrec2004/pdf/480.pdf

138

www.britannica.com/event/January-6-U-S-Capitol-attack

138

www.theguardian.com/technology/2018/may/25/gdpr-us-based-news-websites-eu-internet-users-la-times

138

www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/introduction.html

139

www.crummy.com/software/BeautifulSoup/

161

www.digitalocean.com/community/tutorials/how-to-scrape-web-pages-with-beautiful-soup-and-python-3

180

www.robotstxt.org/robotstxt.html

302

www.theguardian.com/news/series/cambridge-analytica-files

302

www.gnu.org/software/wget/

303

www.markdownguide.org/cheat-sheet

303

www.tweepy.org

303

www.youtube.com/watch?v=WySXiFsG0qU

305

www.unicode.org/emoji/charts/full-emoji-list.html

305

www.digit.fyi/facebook-and-instagram-to-age-gate-sexual-content-for-minors/

306

www.laurenceanthony.net/software

306

www.theguardian.com/news/2018/mar/17/cambridge-analytica-facebook-influence-us-election

307

www.theguardian.com/news/2018/mar/26/the-cambridge-analytica-files-the-story-so-far

308

www.eff.org/deeplinks/2020/11/riaa-abuses-dmca-take-down-popular-tool-downloading-online-video

308

www.eff.org/deeplinks/2020/11/github-reinstates-youtube-dl-after-riaas-abuse-dmca

308

www.kielipankki.fi/development/korp/corpus-input-format/

309

www.unicode.org/L2/L2010/10132-emojidata.pdf

311

www.swansea.ac.uk/gdpo/

369

www.unodc.org/unodc/en/commissions/CND/session/cnd-documents-index.html

369

www.counterextremism.com/supremacy/traditionalist-worker-party-traditional-youth-network

369

www.australia.gov.au/directories/australia/centrelink

369

www.swansea.ac.uk/project-dragon-s/

369

www.pjfi.org/

370

https://libav.org

305

The project was abandoned in late 2022, and the website is now deactivated.