CT Specifications Guide
Positional and Structural Attributes
(Corpus Taurinense Ver. 1.8, 2008.05.08)
POSIT |
word |
the token |
mangia |
... ... ... |
|
lemma |
the lemma to which the token has been brought back |
mangiare |
cf. separate list |
|
pos |
the Part of Speech with its Hierarchy-Defining Features (HDF) |
v.m.f.ind.pr |
cf. FD 1 hereunder |
|
kat |
+ the Morphosyntactic Features (MSF) codes |
3,0,6,0,0 |
cf. FD 2 hereunder |
|
typ |
the structure of a (portion of a) text: {prose; verse; rubrica}; crossreferenced with type |
/P |
/P /V /R |
|
y |
if the wordform changes in manuscript or edition |
y |
y / n |
|
genre |
the literary genre of a text: {documentary, didactic, historical, narrative, lyric}; crossreferenced with genr |
nar |
doc; did; stor; nar; lir |
|
msform |
the unaltered token really appearing in manuscript |
magia |
... ... ... |
|
philform |
the philological emendation, with the usual diacritics (round & square brackets, italics) |
ma(n)gia; ma[n]gia; ma¦n¦gia |
cf. list hereunder |
|
mwlword |
the MW-lemma |
a°÷l°postutto |
cf. separate list |
|
mwlkat |
the MW-pos |
45 |
cf. FD 1 hereunder |
|
mwlnum |
the MW tokens order |
1 |
{1;2;n} |
STRCT |
author |
the author of a text |
Anonimo |
cf. CT texts hereunder |
|
title |
title of a text |
Novellino |
cf. CT texts hereunder |
|
chapter |
chapter number |
n |
... ... ... |
|
par |
paragraph number |
n |
... ... ... |
|
s |
sentence number |
n |
... ... ... |
|
line |
line number (of the page) |
n |
... ... ... |
|
page |
page number (of the printed edition) |
n |
... ... ... |
|
type |
crossreference with typ |
/P |
/P /V /R |
|
genr |
crossreference with genre |
nar |
doc; did; stor; nar; lir |
|
mwl |
MW boundaries |
mwl |
mwl |
Feature Declaration (FD) 1: POS & HDF
(CT Tagset Ver. 1.4, 2008.05.08)
|
|
kat (HCF) |
pos (HDF) |
|
Tagset |
noun |
20 |
n.c |
noun.common |
|
|
21 |
n.p |
noun.proper |
|
adjective |
26 |
adj |
adjective |
|
pro-det |
30 |
pd.dem.s |
pro-det.demonstrative.strong |
|
|
31 |
pd.dem.w |
pro-det.demonstrative.weak |
|
|
32 |
pd.idf |
pro-det.indefinit. |
|
|
33 |
pd.pos.s |
pro-det.possessive.strong |
|
|
34 |
pd.pos.w |
pro-det.possessive.weak |
|
|
35 |
pd.int |
pro-det.interrogative |
|
|
36 |
pd.rel |
pro-det.relative |
|
|
37 |
pd.per.s.no |
pro-det.personal.strong.nominative |
|
|
38 |
pd.per.s.ob |
pro-det.personal.strong.oblique |
|
|
39 |
pd.per.w.ob |
pro-det.personal.weak.oblique |
|
|
40 |
pd.exc |
pro-det.exclamative |
|
|
41 |
pd.per.w.no |
pro-det.personal.weak.nominative |
|
adverb |
45 |
adv.gn |
adverb.general |
|
|
46 |
adv.pc |
adverb.particle |
|
|
47 |
adv.cnt |
adverb.connective |
|
conjunction |
50 |
conj.co |
conjunction.coordinative |
|
|
51 |
conj.sb |
conjunction.subordinative |
|
adposition |
56 |
adp.pre |
adposition.preposition |
|
|
57 |
adp.post |
adposition.postposition |
|
article |
60 |
art.d |
article.determinative |
|
|
61 |
art.i |
article.indeterminative |
|
numeral |
64 |
num.car |
numeral.cardinal |
|
|
65 |
num.ord |
numeral.ordinal |
|
interjection |
68 |
intj |
interjection |
|
punctuation |
70 |
punct.fi |
punctuation.final |
|
|
71 |
punct.nfi |
punctuation.non-final |
|
residuals |
75 |
r.frg |
residual.foreign |
|
|
76 |
r.abb |
residual.abbreviation |
|
|
77 |
r.for |
residual.formulae |
|
|
78 |
r.epe |
residual.epenthesis |
|
verb (main) |
111 |
v.m.f.ind.pr |
verb.main.finite.indicative.present |
|
|
112 |
v.m.f.ind.ipf |
verb.main.finite.indicative.imperfect |
|
|
113 |
v.m.f.ind.pt |
verb.main.finite.indicative.past |
|
|
114 |
v.m.f.ind.ft |
verb.main.finite.indicative.future |
|
|
115 |
v.m.f.sub.pr |
verb.main.finite.subjunctive.present |
|
|
116 |
v.m.f.sub.ipf |
verb.main.finite.subjunctive.imperfect |
|
|
117 |
v.m.f.cnd.pr |
verb.main.finite.conditional.present |
|
|
118 |
v.m.f.imp.pr |
verb.main.finite.imperative.present |
|
|
121 |
v.m.nf.inf.pr |
verb.main.non-finite.infinitive.present |
|
|
122 |
v.m.nf.par.pr |
verb.main.non-finite.participle.present |
|
|
123 |
v.m.nf.par.pt |
verb.main.non-finite.participle.past |
|
|
124 |
v.m.nf.ger.pr |
verb.main.non-finite.gerunde.present |
|
verb (auxiliar) |
211 |
v.a.f.ind.pr |
verb.auxiliar.finite.indicative.present |
|
|
212 |
v.a.f.ind.ipf |
verb.auxiliar.finite.indicative.imperfect |
|
|
213 |
v.a.f.ind.pt |
verb.auxiliar.finite.indicative.past |
|
|
214 |
v.a.f.ind.ft |
verb.auxiliar.finite.indicative.future |
|
|
215 |
v.a.f.sub.pr |
verb.auxiliar.finite.subjunctive.present |
|
|
216 |
v.a.f.sub.ipf |
verb.auxiliar.finite.subjunctive.imperfect |
|
|
217 |
v.a.f.cnd.pr |
verb.auxiliar.finite.conditional.present |
|
|
218 |
v.a.f.imp.pr |
verb.auxiliar.finite.imperative.present |
|
|
221 |
v.a.nf.inf.pr |
verb.auxiliar.non-finite.infinitive.present |
|
|
222 |
v.a.nf.par.pr |
verb.auxiliar.non-finite.participle.present |
|
|
223 |
v.a.nf.par.pt |
verb.auxiliar.non-finite.participle.past |
|
|
224 |
v.a.nf.ger.pr |
verb.auxiliar.non-finite.gerunde.present |
|
verb (modal) |
311 |
v.md.f.ind.pr |
verb.modal.finite.indicative.present |
|
|
312 |
v.md.f.ind.ipf |
verb.modal.finite.indicative.imperfect |
|
|
313 |
v.md.f.ind.pt |
verb.modal.finite.indicative.past |
|
|
314 |
v.md.f.ind.ft |
verb.modal.finite.indicative.future |
|
|
315 |
v.md.f.sub.pr |
verb.modal.finite.subjunctive.present |
|
|
316 |
v.md.f.sub.ipf |
verb.modal.finite.subjunctive.imperfect |
|
|
317 |
v.md.f.cnd.pr |
verb.modal.finite.conditional.present |
|
|
318 |
v.md.f.imp.pr |
verb.modal.finite.imperative.present |
|
|
321 |
v.md.nf.inf.pr |
verb.modal.non-finite.infinitive.present |
|
|
322 |
v.md.nf.par.pr |
verb.modal.non-finite.participle.present |
|
|
323 |
v.md.nf.par.pt |
verb.modal.non-finite.participle.past |
|
|
324 |
v.md.nf.ind.pr |
verb.modal.non-finite.gerunde.present |
Feature Declaration (FD) 2: MSF
(CT Tagset Ver. 1.4, 2008.05.08)
|
|
kat (MSF) |
|
|
MSF |
person |
1 |
pers=1 |
position 1 |
|
|
2 |
pers=2 |
|
|
|
3 |
pers=3 |
|
|
gender |
4 |
gend=masc |
position 2 |
|
|
5 |
gend=fem |
|
|
|
4;5 |
gend=c |
|
|
number |
6 |
numb=sg |
position 3 |
|
|
7 |
numb=pl |
|
|
|
6;7 |
numb=n |
|
|
degree |
8 |
degr=pos |
position 4 |
|
|
9 |
degr=comp |
|
|
|
10 |
degr=sup |
|
|
multiword |
11-19 |
loc=11-19 |
position 5 |
CT texts
(Corpus Taurinense Ver. 1.8, 2008.05.08)
|
author |
title |
genre |
type |
CT Texts |
MaestroRinuccino |
Sonetti |
lir |
/V |
|
BonoGiamboni |
LibroViziVirtù |
did |
/P |
|
BonoGiamboni |
TrattatoViziVirtù |
did |
/P |
|
BrunettoLatini |
Favolello |
did |
/V |
|
BrunettoLatini |
Tesoretto |
did |
/V |
|
BrunettoLatini |
Rettorica |
did |
/P |
|
Anonimi |
CapitoliCompagniaSanGilio(Statuti84) |
doc |
/P |
|
DanteAlighieri |
VitaNuova |
lir |
/P /V |
|
Anonimi |
CapitoliCompagniaMadonnaOrsanmichele(Statuti94/97) |
doc |
/P |
|
ConsiglioDe'Cerchi |
Lettera |
doc |
/P |
|
Consiglio&LapoDe'Cerchi |
Lettera |
doc |
/P |
|
CastraGualfredi&c |
LibroDareEdAvere |
doc |
/P |
|
LapoRiccomanni |
LibroDareEdAvere |
doc |
/P |
|
Anonimo |
FioreDiFilosafi |
nar |
/P |
|
Anonimi |
LibroOrdinamentiCompagniaSMariaCarmine(Statuti80) |
doc |
/P |
|
Anonimo |
CronicaFiorentina |
stor |
/P |
|
Anonimo |
VolgarizzamentoDisciplinaClericalis |
nar |
/P |
|
Anonimo |
Novellino |
nar |
/P /V |
|
GuidoCavalcanti(?) |
DueBallate(I'Vidi/SolPerPietà) |
lir |
/P |
|
GuidoCavalcanti |
Rime |
lir |
/V |
|
JacopoCavalcanti |
TreSonetti |
lir |
/V |
Special conventions
Symbols |
¬ |
logicalnot |
compounds |
Mei¬di¬donna |
|
|
|
n\d-phonosyntax |
nonn¬ è, foglia¬d è |
|
| |
brokenbar |
phylological italics |
or|o| |
|
· |
periodcenter |
proclisis (with assimilation) |
de· regno, Be· ll' ho |
|
÷ |
divide |
graphoclisis |
porta ÷l ÷te ÷ne |
|
~ |
tilde |
compendia |
ka~ agosto |
|
^ |
caret |
ellipsis |
^^^ (corresponding to usual "...") |
|
|
|
lacuna |
molto [^^^^] francamente |
|
× |
multiply |
deperdita |
××osta (unreadable characters) |
|
* |
asterisk |
vacua |
die *** d’ aprile (blank in ms.) |
|
Ø |
Oslash |
zero morphemes |
a ÷Ø demonî 'ai demonii' |
|
©...® |
copyright®ister |
typographic italics |
le credenze del © Credo in Deo ® |
|
(...) |
round brackets |
solved abbreviation |
mante(n)gna |
|
[...] |
square brackets |
integration |
rag[g]io |
|
{...} |
brace brackets |
graphical symbols |
{SN} 'signum notari' |
Beware!
+ |
Graphoclitics are treated as individual tokens. They are marked by the divide (ASCII 246 = ANSI 247) "÷" |
+ |
Lemmas are introduced by the formula "lemma=" |
+ |
Cassationes and expunctiones were not included in the CT texts. |
+ |
The template of annotation in CT is "token_lemma=lemma,HDF/HCF,MSF1,MSF2,MSF3,MSF4,MSF5" |
Manuel Barbera, 29 August 2000; Rev. 1 November 2008.