Archivebox#
Data collection from websites can be obtained using Archivebox
.
Options and arguments for the tool can be found in the official documentation.
CATLISM, 152
Installing the tool1CATLISM, 152
#
Command
[c5.01]
#pip install archivebox
[c5.01]
CATLISM, 153
Using the tool2CATLISM, 153
#
Command
[c5.02]
#archivebox init --setup
Command
[c5.03]
#archivebox server
In the video, a local folder for storing archivebox
settings and downloaded data is created through the command mkdir archivebox_folder
, only available in Unix-like systems ( and ).
Windows user should instead employ md archivebox_folder
().
Once [c5.03]
is issued, it is possible to access the web application by browsing the address htpp://127.0.0.1:8000
(as indicated in the CLI).
[c5.02-03]
CATLISM, 153-155
Extracting the data3CATLISM, 153-155
#

Figure 5.3 Archivebox main page#

Figure 5.4 Archivebox URLs collection page#

Figure 5.5 Archivebox main page showing the list of collected web pages#