Data processing#
- Date, time, and Unix
- Text normalisation
- PDF, Word, images
- Language detection
- Emoticons and emojis
- Hashtags (word segmentation)
- Other elements
- Regular expression to capture usernames (e.g.
@matteodic) - Regular expression to capture simple URLs (e.g.
http://example.comandhttps://example.com) - Regular expression to capture complex URLs (e.g. simple URLs plus email addresses,
mailto:links, URLs with optional parameters) - Regular expression to capture cashtags (e.g.
$EUR)
- Regular expression to capture usernames (e.g.
- Annotations
- Verticalised (.vrt) format