Archivebox#
Data collection from websites can be obtained using Archivebox
.
Options and arguments for the tool can be found in the official documentation.
CATLISM, 152
Installing the tool1CATLISM, 152
#
Command
[c5.01]
#pip install archivebox
[c5.01]
CATLISM, 153
Using the tool2CATLISM, 153
#
Command
[c5.02]
#archivebox init --setup
Command
[c5.03]
#archivebox server
In the video, a local folder for storing archivebox
settings and downloaded data is created through the command mkdir archivebox_folder
, only available in Unix-like systems ( and ).
Windows user should instead employ md archivebox_folder
().
Once [c5.03]
is issued, it is possible to access the web application by browsing the address htpp://127.0.0.1:8000
(as indicated in the CLI).
[c5.02-03]
CATLISM, 153-155
Extracting the data3CATLISM, 153-155
#
data:image/s3,"s3://crabby-images/0b340/0b3407d29ed9bcd2d958c007f5f79862bf4ab9ad" alt="Figure 5.3 Archivebox main page"
Figure 5.3 Archivebox main page#
data:image/s3,"s3://crabby-images/89dfe/89dfec6ac97f09dd8b03186287597bf5dceba904" alt="Figure 5.4 Archivebox URLs collection page"
Figure 5.4 Archivebox URLs collection page#
data:image/s3,"s3://crabby-images/783e0/783e0696560f60c3262cc79c27a198c5ee86e73a" alt="Figure 5.5 Archivebox main page showing the list of collected web pages"
Figure 5.5 Archivebox main page showing the list of collected web pages#