Trafilatura#
Data collection from websites can be obtained using the corpus-linguistics-oriented Trafilatura
.
Options and arguments for the tool can be found in the official documentation.
CATLISM, 156
Installing the tool1CATLISM, 156
#
pip install trafilatura
[c5.04]
pip install gooey
CATLISM, 156-157; 159
Using the tool2CATLISM, 156-157; 159
#
trafilatura
trafilatura_gui
trafilatura -i list.txt -o txtfiles
[c5.09]
In the video the argument -vv
is used to print trafilatura
messages to the console; otherwise no messages will be printed if the operation succeeds.
trafilatura --xml -i list.txt -o xmlfiles
[c5.09]
In the video the argument -vv
is used to print trafilatura
messages to the console; otherwise no messages will be printed if the operation succeeds.
trafilatura --xml --formatting --link --images --inputdir localHTML -o xmlfiles
[c5.10]
In the video the argument -vv
is used to print trafilatura
messages to the console; otherwise no messages will be printed if the operation succeeds.
CATLISM, 158-159
Example of data extracted with Trafilatura3CATLISM, 158-159
#
1<doc sitename="The Guardian" title="‘Mind-blowing’: Ai-Da becomes first robot to paint like an artist" author="Caroline Davies" date="2022-04-04" source="https://www.theguardian.com/technology/2022/apr/04/mind-blowing-ai-da-becomes-first-robot-to-paint-like-an-artist" hostname="theguardian.com" excerpt="AI algorithms prompt robot to interrogate, select, decision-make to create a painting" categories="Technology" tags="Robots,Technology,Artificial intelligence (AI),Art,Computing,Consciousness" fingerprint="ETYg93u3aaAiAbJW0sOWW472T+4=">
2 <main>
3 <p>Brush clamped firmly in bionic hand, Ai-Da’s robotic arm moves slowly, dipping in to a paint palette then making slow, deliberate strokes across the paper in front of her.</p>
4 <p>This, according to Aidan Meller, the creator of the world’s first ultra-realistic humanoid robot, Ai-Da, is “mind-blowing” and “groundbreaking” stuff.</p>
5 [...]
6 <graphic src="https://i.guim.co.uk/img/media/bce15258910b44eefb4855a9cdd5f87d76725d59/0_168_5068_3042/master/5068.jpg?width=620&quality=85&fit=max&s=e724f5ca70f9995e856648c2556a7637" alt="Ai-Da takes more than five hours to make a painting, but no two works are exactly the same." />
7 <p>
8 With rapidly developing
9 <ref target="https://www.theguardian.com/technology/artificialintelligenceai">artificial intelligence</ref>
10 , growing accessibility to super computers and machine learning on the up, Ai-Da – named after the computing pioneer Ada Lovelace – exists as a “comment and critique” on rapid technological change.
11 </p>
12 <graphic src="https://i.guim.co.uk/img/media/c52e761d528cc2685705ba8a19956155389ccd5b/0_8_4480_5600/master/4480.jpg?width=380&quality=85&fit=max&s=36006bc6f77064e854f4c2301804bc78" alt="Ai-Da Robot with creator Aidan Meller." />
13 </main>
14 <comments />
15</doc>