Verticalised (.vrt) format#
Resembling - and making use of - XML syntax, the verticalised format (VRT, VeRticalised text) is a “token-oriented columnar text format”1https://www.kielipankki.fi/development/korp/corpus-input-format/ where the tab
character (Tab ↹) is used to separate a token from its POS and lemma details (and potentially, any further annotation detail as well). It is the default accepted format for the IMS Corpus Workbench ([Evert and Hardie, 2011]) as well as a number of other corpus tools (e.g. SketchEngine).
In example [e5.29]
(showing a sample of the format) the symbol →
is the graphical representation of the tab
character.
1<?xml version='1.0' encoding='UTF-8'?>
2<text>
3 <s n="1">
4 And→and→CCONJ
5 now→now→ADV
6 ,→,→PUNCT
7 for→for→ADP
8 something→something→PRON
9 completely→completely→ADV
10 different→different→ADJ
11 !→!→PUNCT
12 </s>
13</text>