Projects.

Corpora built at bmanuel.org.

CT.

"Corpus Taurinense".

Corpus Taurinense is a corpus of 13th Century Florentine texts, POS-tagged accordingly an expressly devised EAGLES tagset. From the homepage, both the plain CT (Ver. 1.8, 2008-05-08; 22 texts) and the improved CT+ (Ver. 0.1, 2010.09.19; currently 26 texts, but still growing) can be reached. The CQP encoded corpus is also available for downloading.

NUNC.

"NewsgroupsUseNet Corpora".

NUNC is a multilingual (It. De. Fr. En. Es. Ma. Su. Ee. Pt.) suite of corpora based on the language of newsgroups, freely available and querable online. Devised by Manuel Barbera, NUNC was born in 2002, and is currently under developement by A. Allora, M. Barbera, S. Colombo, E. Corino, C. Marello, S. Casavecchia, C. Onesti, M. Tomatis, L. Valle and others. There are already some corpora available for testing (Italian, UK English, French and Spanish).

Jus Jurium.

Jus Jurium (viz. 'minestrone of Laws') is a free Italian Corpus covering the full Legal universe of discourse current in contemporary Italy, POS-tagged and with textual and diplomatical markup. Devised by Manuel Barbera, soon joined by Cristina Onesti and Elisa Corino, the Corpus Juris was born in February 2005. Besides the homepage, full documentation a first beta of the corpus will be available soon.

Corpus Segusinum.

Corpus Segusinum is a regional newspaper Italian corpus, freely available online. Devised by Manuel Barbera, joined by Cristina Onesti, the Corpus Segusinum was born in February 2005. Besides the homepage, full documentation and a first beta of the corpus are already available.

Athenaeum.

Athenaeum is a free corpus built up with texts produced by Turin University, POS-tagged and classified by topics and text gender. Athenaeum was born in 2004 to celebrate Turin University 6th centenary. Besides the homepage, a first version of the corpus is already available.

VALICO.

"Varietà di Apprendimento della Lingua Italiana: Corpus Online".

VALICO is an Italian international Learner Corpus freely available and querable online. Devised by Manuel Barbera and Carla Marello, soon joined by Elisa Corino, VALICO was born on the 17th of June 2003. The project has now changed direction (C. Marello and E. Corino only) and migrated to another website: http://www.valico.org/. Here only the old homepage of the first version of the project and its original Guidelines are maintained, mainly for historical documentation.

VINCA.

"Varietà di Italiano di Nativi Corpus Appaiato".

VINCA is a Corpus of Native Written Italian freely available and querable online. Devised by Manuel Barbera and Carla Marello, soon joined by Elisa Corino, VINCA was born in 2004 as paired corpus for VALICO. The project has now changed direction (C. Marello and E. Corino only) and migrated to another website: http://www.valico.org/. Here only the old homepage of the first version of the project and its original Guidelines are maintained, mainly for historical documentation.

Internal Documentation.

Some useful (but unstable! it's always under developement) documentation, intended primarily for internal use: Athenaeum header template and markup file, CT specification, NUNC header template and markup file, Valico header template, Vinca header template, and the FIRB macro-header template. Beware that all this TXT stuff may sometimes look scrambled when viewed through some web browsers, but will be just fine when downloaded on your client.

Additional Infos.

All the corpora are (and will remain) freely available online: you are legally entitled to use them, and it's enough that you recognise whence your data came from. All corpora are encoded in CQP format and are accessed through the Corpus Query Workbench of IMS Stuttgart. A CQP Query Language Tutorial, both in pdf and html, and a Corpus Encoding Tutorial, both in pdf and html, are available on the website.

***HTML code & design by Manuel Barbera***