Corpus Approaches to Language in Social Media | Online Compendium#

Cover for the book Corpus Approaches to Language in Social Media

Note

You are browsing version v1.0.0

This website serves as online compendium for the book Corpus Approaches to Language in Social Media (CATLISM; Di Cristofaro 2023) published in the series Routledge Advances in Corpus Linguistics1A preview of the book (Chapter 1) is available from taylorfrancis.com.

The focus of both this compendium and the book is the collection, processing, and formatting of digital data from social media (understood as “any digital content [that] provides the user with the ability to interact with it […] through a unique uniform resource locator (URL)”2CATLISM, 4) for corpus purposes (see also On scripts and tools).
The aim is

3CATLISM, 2-4

proposing a broad view of corpus approaches able to include those notions and mechanisms [’digital technicalities] that – while not classically associated with natural language – are […] i) foundational of the digital environments in which language production and exchanges occur and ii) at the core of the techniques that are used to produce, collect, and process the focus of investigation, that is, digital textual data.3CATLISM, 2-4

As such this online compendium contains:

the scripts included in the volume4Version 1.0.0 reflects the contents, scripts and code snippets as they appear in the printed book; subsequent versions may contain modifications and updates. Consult the changelog for a list of changes. - downloadable and formatted using colour-coded syntax highlighting - aimed at collecting and processing data from webpages, blogs, fora, Facebook, Instagram, Twitter, Youtube

interactive videos documenting the use of the commands and tools employed throughout the volume

further scripts and instructions for tools aimed at collecting data from platforms that, due to reasons of space, could not be included in the volume;

updates to scripts and commands in case – due to technical changes - they become ineffective/outdated;

updates to topics discussed in the book;

links to preservation copies of all the online materials referenced in the volume as archived through The Wayback Machine

Where possible and unless stated differently (e.g. in the case of quotations), all the textual contents are published under Creative Commons CC BY-NC 4.0, while all the scripts are licenced under the open source GPLv3 licence - see FAQs for more details on how to (re)use the materials.

Important

Descriptions and further details for scripts and code originally available in the book are left out of this compendium. Scripts and code exclusive to this online compendium are fully described and detailed in each relevant page/section.
A number of answers to common questions are included in the FAQs section.

How to use this online compendium#

Consult the Using the online compendium section for more details on how to use this website, as well as a legend of the symbols used throughout the pages.

Structure of the online compendium#

Contents