Archivebox#

Data collection from websites can be obtained using Archivebox.
Options and arguments for the tool can be found in the official documentation.

1CATLISM, 152

Installing the tool1CATLISM, 152#

Command [c5.01]#
pip install archivebox
[c5.01]
2CATLISM, 153

Using the tool2CATLISM, 153#

Command [c5.02]#
archivebox init --setup
Command [c5.03]#
archivebox server

In the video, a local folder for storing archivebox settings and downloaded data is created through the command mkdir archivebox_folder, only available in Unix-like systems ( and ).
Windows user should instead employ md archivebox_folder ().
Once [c5.03] is issued, it is possible to access the web application by browsing the address htpp://127.0.0.1:8000 (as indicated in the CLI).

[c5.02-03]
3CATLISM, 153-155

Extracting the data3CATLISM, 153-155#

Figure 5.3 Archivebox main page

Figure 5.3 Archivebox main page#

Figure 5.4 Archivebox URLs collection page

Figure 5.4 Archivebox URLs collection page#

Figure 5.5 Archivebox main page showing the list of collected web pages

Figure 5.5 Archivebox main page showing the list of collected web pages#