Data processing#
- Date, time, and Unix
- Text normalisation
- PDF, Word, images
- Language detection
- Emoticons and emojis
- Hashtags (word segmentation)
- Other elements
- Regular expression to capture usernames (e.g.
@matteodic
) - Regular expression to capture simple URLs (e.g.
http://example.com
andhttps://example.com
) - Regular expression to capture complex URLs (e.g. simple URLs plus email addresses,
mailto:
links, URLs with optional parameters) - Regular expression to capture cashtags (e.g.
$EUR
)
- Regular expression to capture usernames (e.g.
- Annotations
- Verticalised (.vrt) format