Reference list for online contents#

The table below lists commands, scripts, and figures included in the book and contained in the online compendium: accessing the online version of each content can be achieved by clicking on the relevant ID field.
Elements are organised alphabetically according to the ID value. Clicking on a column header changes the sorting order; the included Search field looks for a string across all contents (i.e. the search is applied to all columns at once).

ID	description	page
e4.01	Default appearance of terminal when `conda` environment is active	101
c4.01	Command to create a new virtual environment in `conda`	101
c4.02	Command to activate a virtual environment in `conda`	101
c4.03	Command to deactivate the virtual environment in `conda`	101
e4.02	Default appearance of terminal when `conda` environment `myenv` is active	101
c4.04	Command to install `pip` in `conda`	101
c4.06	Initiate `git` in a local folder	124
c4.07	Clone a remote repository	124
c4.08	Add all changes (even from previously untracked files) to the local `git` database (i.e. `stage` the changes)	125
c4.09	Record (`commit`) all changes, along with a textual description of what has been changed	125
c4.10	Send (`push`) all changes to the remote repository	125
c4.11	Obtain (`fetch`) all changes from the remote repository	125
c4.12	Include/apply (`fetch`) all changes from the remote repository to the local repository	125
c4.13	Obtain and include/apply (`pull`) all changes from the remote repository to the local repository	125
c5.01	Install `archivebox`	152
c5.02	Setup `archivebox`	153
c5.03	Start `archivebox`	153
c5.04	Install `trafilatura`	156
c5.05	Start `trafilatura` CLI version	156
c5.06	Start `trafilatura` GUI version	156
c5.07	Install package `gooey`	157
c5.08	Use `trafilatura` to download .txt version of URLs contained in a list	157
c5.09	Use `trafilatura` to download .xml version of URLs contained in a list	157
e5.04	Example of the XML structure created by `trafilatura`	158-159
c5.10	Use `trafilatura` to extract XML files from local HTML files, including formatting, links, and images	159
s5.01	Extract links from HTML pages using `BeautifulSoup`	162-163
s5.02a	Download and scrape HTML pages from links extracted with `s5.01` using `Selenium` and `BeautifulSoup`	164-166
s5.02b	Download and scrape HTML pages from links extracted with `s5.01` using `requests` and `BeautifulSoup`	166
s5.03	Extract metadata from the downloaded HTML pages using `BeautifulSoup`	166-171
e5.08	Basic structure of the metadata table included in MoreThesis pages	171-173
s5.04	Download PDF files linked in HTML pages	174-175
s5.05	Extract the contents of PDF files as plain-text using `textract`	176
s5.06	Create an XML corpus combining the metadata from HTML pages and the contents of PDF files using `lxml`	177-180
c5.11	Install `snscrape`	183
c5.12	Basic `snscrape` syntax	183
c5.13	Accessing the “help” section for a specific `snscrape` scraper	183
c5.14	Basic syntax to scrape tweets to .jsonl using `snscrape` advanced search query	191
c5.15	Example of how to scrape tweets to .jsonl using `snscrape` advanced search query	191
c5.16	Example of `snscrape` advanced search query using operators	192
c5.17	Example of `snscrape` advanced search query using operators	192
c5.18	Example of `snscrape` advanced search query using operators	192
c5.19	Example of `snscrape` advanced search query using operators	192
c5.20	Example of `snscrape` advanced search query using operators	192
c5.21	Install `pandas`	193
c5.22	Use script `[s5.07]` to scrape tweets using a list of queries	193
s5.07	Scrape tweets with `snscrape` using a list of queries	193-196
e5.09	Example of filename saved by script `[s5.07]`	196
Table 5.12	Metadata data points collected by `snscrape`	197-203
e5.10	Example of data extracted with `[s5.08]`	204
s5.08	Convert tweets extracted with `snscrape` to XML format	204-206
c5.23	Install `instaloader`	206
c5.24	Basic `instaloader` syntax	206
c5.25	Example of `instaloader` scraping command; download comments and geolocations	208
Table 5.17	Metadata data points collected by `instaloader` for posts	209-218
Table 5.18	Metadata data points collected by `instaloader` for comments	218
e5.11	Example of data extracted with `[s5.09]`	220
s5.09	Convert Instagram posts and comments extracted with `instaloader` to XML format	220-226
c5.26	Install `facebook-scraper`	228
c5.27	Basic `facebook-scraper` syntax	228
Table 5.21	Metadata data points collected by `facebook-scraper` for posts	229-233
Table 5.22	Metadata data points collected by `facebook-scraper` for profiles	233-235
Table 5.23	Metadata data points collected by `facebook-scraper` for groups	236
e5.12	Example of data extracted with `[s5.10]`	236-237
s5.10	Convert Facebook posts and comments extracted with `facebook-scraper` to XML format	237-242
s5.11	Get profile details from Facebook using `facebook-scraper`	242-245
s5.12	Implement the collection of profile details (`[s5.11]`) into `[s5.10]`	245-246
c5.28	Install `youtube-dl`	247
c5.29	Basic `youtube-dl` syntax	247
e5.14	Example of TTML format	252-253
e5.15	Example of SRV format without auto-captioning	253
e5.16	Example of SRV format with auto-captioning	253-254
c5.30	Install `youtube-comment-downloader`	254
c5.31	Basic `youtube-comment-downloader` syntax	254
Table 5.28	Metadata data points collected by `youtube-dl` for videos	255-262
Table 5.29	Metadata data points collected by `youtube-comment-downloader` for comments	262
c5.32	Extracting video details, metadata, and subtitles from Youtube without multimedia files	263
e5.18	Example of data extracted with `[s5.13]`	264
e5.19	Example of data extracted with `[s5.14]`	264
s5.13	Extract collected Youtube data (everything except comments) to XML format	264-269
s5.14	Extract collected Youtube comments to XML format	269-272
s5.15	Sample usage of the module `dateutil.parser` to parse a date in string format	274
Figure 5.8	Example of recognised spelling variants in VARD	276
Figure 5.9	Example of unrecognised spelling variants in VARD	277
e5.21	Example of normalised data in XML format generated with VARD	278
c5.33	Install `textract`	279
c5.34	Basic `textract` syntax	279
s5.16	Identify a set of predefined languages in .txt files and write a summary report in spreadsheet format	281-283
e5.23	Example of hashtags transformed through `[s5.17]`	286
s5.17	Segment hashtags and transform them into XML tags in a XML corpus file	287-288
e5.24	Regular expression to capture usernames/username handles	289
e5.25	Regular expression to capture simple URLs	289
e5.26	Regular expression to capture complex URLs	289
e5.27	Regular expression to capture cashtags	289
c5.35	Install `stanza`	291
s5.18	Install `stanza` language models	291
e5.28	Example of data in XML format extracted with `[s5.20]`	292
s5.19	Annotate `.txt` files and output the results in XML format	292-294
s5.20	Annotate `.txt` files and output the results in XML verticalised format	294-295
e5.29	Example of XML verticalised format	296
Figure 5.10	OpenRefine main page	298
Figure 5.11	Preview for CSV import in OpenRefine	299
Figure 5.12	Preview for JSON import in OpenRefine	299
Figure 5.13	Preview for XML import – step 1 – in OpenRefine	300
Figure 5.14	Preview for XML import – step 2 – in OpenRefine	300
Figure 5.15	Using ‘facets’ (filters) in OpenRefine	301
Figure 6.1	Example of a page collected from the Silk Road 1 forum	317
e6.01	Example (modified) of the post structure in Silk Road 1 HTML pages	320-321
e6.02	XML meta-structure of the data extracted through `[s6.01]` (Silk Road 1 corpus)	321-322
s6.01	Convert Silk Road 1 HTML pages to XML format using `BeautifulSoup`	323-328
e6.03	XML meta-structure of the documents included in the DPM corpus	328
c6.01	Scrape tweets created after a specific date with `twint`	337
c6.02	Scrape tweets created after a specific date with `snscrape` (replicating the results produced with `[c6.01]`	338
s6.02	Convert tweets extracted with `twint` in CSV format to XML	338-340
e6.04	Example of data extracted with `[s6.02]`	340
e6.05	Example of syntax used by WordPress to show all posts available in a website	342
s6.03	Collect (crawl) all posts links from a WordPress website	342-344
e6.06	Example of a message containing an emoji	344
e6.07	Examples of emoji transliterations applied to `[e6.06]` through `[s6.04]`	344-345
s6.04	Function to transliterate emojis using the `emoji` module	345-346
e6.08	Example of data extracted with `[s6.06]` (PJ corpus)	352-353
s6.05	Collect all chatlogs from perverted-justice.com	353-357
s6.06	Convert PJ chatlogs into XML format	360-366
Figure 6.3	Example of the interactive plot created for the visual exploration of collocations	368
Figure 5.1	#LancsBox main interface	149
Figure 5.2	#LancsBox data collection interface	149