Archivebox#

Data collection from websites can be obtained using Archivebox.
Options and arguments for the tool can be found in the official documentation.

¹CATLISM, 152

Installing the tool ¹ ¹`CATLISM, 152`#

Command [c5.01]#

pip install archivebox

²CATLISM, 153

Using the tool ² ²`CATLISM, 153`#

Command [c5.02]#

archivebox init --setup

Command [c5.03]#

archivebox server

In the video, a local folder for storing archivebox settings and downloaded data is created through the command mkdir archivebox_folder, only available in Unix-like systems ( and ).
Windows user should instead employ md archivebox_folder ().
Once [c5.03] is issued, it is possible to access the web application by browsing the address htpp://127.0.0.1:8000 (as indicated in the CLI).

³CATLISM, 153-155

Extracting the data ³ ³`CATLISM, 153-155`#

Figure 5.3 Archivebox main page — *Figure 5.3* Archivebox main page#

Figure 5.4 Archivebox URLs collection page — *Figure 5.4* Archivebox URLs collection page#

Figure 5.5 Archivebox main page showing the list of collected web pages — *Figure 5.5* Archivebox main page showing the list of collected web pages#

Archivebox

Contents

Archivebox#

Installing the tool 1 1CATLISM, 152#

Using the tool 2 2CATLISM, 153#

Extracting the data 3 3CATLISM, 153-155#

Installing the tool ¹ ¹`CATLISM, 152`#

Using the tool ² ²`CATLISM, 153`#

Extracting the data ³ ³`CATLISM, 153-155`#