CT Specifications Guide

 

 

 

Positional and Structural Attributes

(Corpus Taurinense Ver. 1.8, 2008.05.08)

 

 

POSIT

word

the token

mangia

...   ...   ...

 

lemma

the lemma to which the token has been brought back

mangiare

cf. separate list

 

pos

the Part of Speech with its Hierarchy-Defining Features (HDF)

v.m.f.ind.pr

cf. FD 1 hereunder

 

kat

+ the Morphosyntactic Features (MSF) codes
+ the Hierarchy-Collapsed Features (HCF) codes

3,0,6,0,0
111

cf. FD 2 hereunder
cf. FD 1 hereunder

 

typ

the structure of a (portion of a) text: {prose; verse; rubrica}; crossreferenced with type

/P

/P /V /R

 

y

if the wordform changes in manuscript or edition

y

y / n

 

genre

the literary genre of a text: {documentary, didactic, historical, narrative, lyric}; crossreferenced with genr

nar

doc; did; stor; nar; lir

 

msform

the unaltered token really appearing in manuscript

magia

...   ...   ...

 

philform

the philological emendation, with the usual diacritics (round & square brackets, italics)

ma(n)gia; ma[n]gia; ma¦n¦gia

cf. list hereunder

 

mwlword

the MW-lemma

a°÷l°postutto

cf. separate list

 

mwlkat

the MW-pos

45

cf. FD 1 hereunder

 

mwlnum

the MW tokens order

1

{1;2;n}

STRCT

author

the author of a text

Anonimo

cf. CT texts hereunder

 

title

title of a text

Novellino

cf. CT texts hereunder

 

chapter

chapter number

n

...   ...   ...

 

par

paragraph number

n

...   ...   ...

 

s

sentence number

n

...   ...   ...

 

line

line number (of the page)

n

...   ...   ...

 

page

page number (of the printed edition)

n

...   ...   ...

 

type

crossreference with typ

/P

/P /V /R

 

genr

crossreference with genre

nar

doc; did; stor; nar; lir

 

mwl

MW boundaries

mwl

mwl

 

 

 

Feature Declaration (FD) 1: POS & HDF

(CT Tagset Ver. 1.4, 2008.05.08)

 

 

 

 

kat (HCF)

pos (HDF)

 

Tagset

noun

20

n.c

noun.common

 

 

21

n.p

noun.proper

 

adjective

26

adj

adjective

 

pro-det

30

pd.dem.s

pro-det.demonstrative.strong

 

 

31

pd.dem.w

pro-det.demonstrative.weak

 

 

32

pd.idf

pro-det.indefinit.

 

 

33

pd.pos.s

pro-det.possessive.strong

 

 

34

pd.pos.w

pro-det.possessive.weak

 

 

35

pd.int

pro-det.interrogative

 

 

36

pd.rel

pro-det.relative

 

 

37

pd.per.s.no

pro-det.personal.strong.nominative

 

 

38

pd.per.s.ob

pro-det.personal.strong.oblique

 

 

39

pd.per.w.ob

pro-det.personal.weak.oblique

 

 

40

pd.exc

pro-det.exclamative

 

 

41

pd.per.w.no

pro-det.personal.weak.nominative

 

adverb

45

adv.gn

adverb.general

 

 

46

adv.pc

adverb.particle

 

 

47

adv.cnt

adverb.connective

 

conjunction

50

conj.co

conjunction.coordinative

 

 

51

conj.sb

conjunction.subordinative

 

adposition

56

adp.pre

adposition.preposition

 

 

57

adp.post

adposition.postposition

 

article

60

art.d

article.determinative

 

 

61

art.i

article.indeterminative

 

numeral

64

num.car

numeral.cardinal

 

 

65

num.ord

numeral.ordinal

 

interjection

68

intj

interjection

 

punctuation

70

punct.fi

punctuation.final

 

 

71

punct.nfi

punctuation.non-final

 

residuals

75

r.frg

residual.foreign

 

 

76

r.abb

residual.abbreviation

 

 

77

r.for

residual.formulae

 

 

78

r.epe

residual.epenthesis

 

verb (main)

111

v.m.f.ind.pr

verb.main.finite.indicative.present

 

 

112

v.m.f.ind.ipf

verb.main.finite.indicative.imperfect

 

 

113

v.m.f.ind.pt

verb.main.finite.indicative.past

 

 

114

v.m.f.ind.ft

verb.main.finite.indicative.future

 

 

115

v.m.f.sub.pr

verb.main.finite.subjunctive.present

 

 

116

v.m.f.sub.ipf

verb.main.finite.subjunctive.imperfect

 

 

117

v.m.f.cnd.pr

verb.main.finite.conditional.present

 

 

118

v.m.f.imp.pr

verb.main.finite.imperative.present

 

 

121

v.m.nf.inf.pr

verb.main.non-finite.infinitive.present

 

 

122

v.m.nf.par.pr

verb.main.non-finite.participle.present

 

 

123

v.m.nf.par.pt

verb.main.non-finite.participle.past

 

 

124

v.m.nf.ger.pr

verb.main.non-finite.gerunde.present

 

verb (auxiliar)

211

v.a.f.ind.pr

verb.auxiliar.finite.indicative.present

 

 

212

v.a.f.ind.ipf

verb.auxiliar.finite.indicative.imperfect

 

 

213

v.a.f.ind.pt

verb.auxiliar.finite.indicative.past

 

 

214

v.a.f.ind.ft

verb.auxiliar.finite.indicative.future

 

 

215

v.a.f.sub.pr

verb.auxiliar.finite.subjunctive.present

 

 

216

v.a.f.sub.ipf

verb.auxiliar.finite.subjunctive.imperfect

 

 

217

v.a.f.cnd.pr

verb.auxiliar.finite.conditional.present

 

 

218

v.a.f.imp.pr

verb.auxiliar.finite.imperative.present

 

 

221

v.a.nf.inf.pr

verb.auxiliar.non-finite.infinitive.present

 

 

222

v.a.nf.par.pr

verb.auxiliar.non-finite.participle.present

 

 

223

v.a.nf.par.pt

verb.auxiliar.non-finite.participle.past

 

 

224

v.a.nf.ger.pr

verb.auxiliar.non-finite.gerunde.present

 

verb (modal)

311

v.md.f.ind.pr

verb.modal.finite.indicative.present

 

 

312

v.md.f.ind.ipf

verb.modal.finite.indicative.imperfect

 

 

313

v.md.f.ind.pt

verb.modal.finite.indicative.past

 

 

314

v.md.f.ind.ft

verb.modal.finite.indicative.future

 

 

315

v.md.f.sub.pr

verb.modal.finite.subjunctive.present

 

 

316

v.md.f.sub.ipf

verb.modal.finite.subjunctive.imperfect

 

 

317

v.md.f.cnd.pr

verb.modal.finite.conditional.present

 

 

318

v.md.f.imp.pr

verb.modal.finite.imperative.present

 

 

321

v.md.nf.inf.pr

verb.modal.non-finite.infinitive.present

 

 

322

v.md.nf.par.pr

verb.modal.non-finite.participle.present

 

 

323

v.md.nf.par.pt

verb.modal.non-finite.participle.past

 

 

324

v.md.nf.ind.pr

verb.modal.non-finite.gerunde.present

 

 

 

Feature Declaration (FD) 2: MSF

(CT Tagset Ver. 1.4, 2008.05.08)

 

 

 

 

kat (MSF)

 

 

MSF

person

1

pers=1

position 1

 

 

2

pers=2

 

 

 

3

pers=3

 

 

gender

4

gend=masc

position 2

 

 

5

gend=fem

 

 

 

4;5

gend=c

 

 

number

6

numb=sg

position 3

 

 

7

numb=pl

 

 

 

6;7

numb=n

 

 

degree

8

degr=pos

position 4

 

 

9

degr=comp

 

 

 

10

degr=sup

 

 

multiword

11-19

loc=11-19

position 5

 

 

 

CT texts

(Corpus Taurinense Ver. 1.8, 2008.05.08)

 

 

author

title

genre

type

CT Texts

MaestroRinuccino

Sonetti

lir

/V

 

BonoGiamboni

LibroViziVirtù

did

/P

 

BonoGiamboni

TrattatoViziVirtù

did

/P

 

BrunettoLatini

Favolello

did

/V

 

BrunettoLatini

Tesoretto

did

/V

 

BrunettoLatini

Rettorica

did

/P

 

Anonimi

CapitoliCompagniaSanGilio(Statuti84)

doc

/P

 

DanteAlighieri

VitaNuova

lir

/P /V

 

Anonimi

CapitoliCompagniaMadonnaOrsanmichele(Statuti94/97)

doc

/P

 

ConsiglioDe'Cerchi

Lettera

doc

/P

 

Consiglio&LapoDe'Cerchi

Lettera

doc

/P

 

CastraGualfredi&c

LibroDareEdAvere

doc

/P

 

LapoRiccomanni

LibroDareEdAvere

doc

/P

 

Anonimo

FioreDiFilosafi

nar

/P

 

Anonimi

LibroOrdinamentiCompagniaSMariaCarmine(Statuti80)

doc

/P

 

Anonimo

CronicaFiorentina

stor

/P

 

Anonimo

VolgarizzamentoDisciplinaClericalis

nar

/P

 

Anonimo

Novellino

nar

/P /V

 

GuidoCavalcanti(?)

DueBallate(I'Vidi/SolPerPietà)

lir

/P

 

GuidoCavalcanti

Rime

lir

/V

 

JacopoCavalcanti

TreSonetti

lir

/V

 

 

 

Special conventions

 

 

 

Symbols

¬

logicalnot

compounds

Mei¬di¬donna

 

 

 

n\d-phonosyntax

nonn¬ è, foglia¬d è

 

|

brokenbar

phylological italics

or|o|

 

·

periodcenter

proclisis (with assimilation)

de· regno, Be· ll' ho

 

÷

divide

graphoclisis

porta ÷l ÷te ÷ne

 

~

tilde

compendia

ka~ agosto

 

^

caret

ellipsis

^^^ (corresponding to usual "...")

 

 

 

lacuna

molto [^^^^] francamente

 

×

multiply

deperdita

××osta (unreadable characters)

 

*

asterisk

vacua

die *** d’ aprile (blank in ms.)

 

Ø

Oslash

zero morphemes

a ÷Ø demonî 'ai demonii'

 

©...®

copyright&register

typographic italics

le credenze del © Credo in Deo ®

 

(...)

round brackets

solved abbreviation

mante(n)gna

 

[...]

square brackets

integration

rag[g]io

 

{...}

brace brackets

graphical symbols

{SN} 'signum notari'

 

 

 

Beware!

 

 

+

Graphoclitics are treated as individual tokens. They are marked by the divide (ASCII 246 = ANSI 247) "÷"

+

Lemmas are introduced by the formula "lemma="

+

Cassationes and expunctiones were not included in the CT texts.

+

The template of annotation in CT is "token_lemma=lemma,HDF/HCF,MSF1,MSF2,MSF3,MSF4,MSF5"

 



Manuel Barbera, 29 August 2000; Rev. 1 November 2008.