Other elements#

Transformation of various elements (e.g. URLs, email addresses) from their original format into XML elements may be obtained by using specific regular expressions in conjunction with with script [s5.17]. As noted in the volume

1CATLISM, 287

while more efficient and safer options exist (i.e. the use of the lxml module to modify an existing XML file to avoid the deletion of elements that may result in a malformed structure), the advantage of this strategy is that it can be applied to any type of file (.txt, .csv, .xml, .json, etc.) and adapted to transform [any element] into any required syntax1CATLISM, 287

Each regular expression is complemented with a direct link to its respective interactive version of RegExr ([]), the tool suggested in the book for the inspection and creation of regular expressions.

2CATLISM, 289

Regular expression to capture usernames (e.g. @matteodic)2CATLISM, 289#

Example [e5.24]#
1(?<=^|\s)(@[\w.]+)(?<!\.)

Inspect regular expression [e5.24] on RegExr

3CATLISM, 289

Regular expression to capture simple URLs (e.g. http://example.com and https://example.com)3CATLISM, 289#

Example [e5.25]#
1http[s]?:\/\/(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+

Inspect regular expression [e5.25] on RegExr

5CATLISM, 289

Regular expression to capture cashtags (e.g. $EUR)5CATLISM, 289#

Example [e5.27]#
1(?:^|\s)([\$]{1})(\w+)

Inspect regular expression [e5.27] on RegExr