(1) - Corpora and Corpus Linguistics.
(2) - Multilingual and Parallel Corpora.
(3) - Electronic Literary Text Archives.
(4) - References, Standards & Educational Resources.
(5) - Tools.
Basically this kind of text collection differs from true Corpora because texts are not only untagged (only seldom there is some markup done) but even not tokenized: they are intended to be read and not to computationally processed. Usually they are in plain formats (TXT, RTF, HTML, PDF), and more seldom in SGML with TEI encodings. As a rule query can be done only for strings of rough text (sometimes you can do it even online, defining the "corpus" of your search), but for more sophisticated queries you must at least tokenize them previously. Usually true Electronic Editions are free and are provided from people who support Public Domain and fight the Copyright lobby, but there are also some istitutional site (often unfortunately less useful).
The Association des Bibliophiles Universels, ou ABU (pronounced as "abou") was founded in April 1993 in order to gave free acces on the Web to francophone texts of public domain (i.e. of authors deceased at least 70 years ago). It is, so to say, the French equivalent of Project Gutenberg. All text are stored in HTML and in plain TXT format (with line breaks and ISO Latin Charset) and are freely downloadable. Simple string searches can be done online.
"A Thousand Books of Wisdom" is the fourth major release of Tibetan data by the Asian Classics Input Project. The core of the entire Asian Classics Input Project consists of a dedicated group of Tibetan refugees in south Asia who are accomplishing the great majority of the Project’s work: the input of tens of thousands of pages of Tibetan woodblock prints, in the hope to save the disappearing Tibtan books. First they search the globe for the remaining collections of books and record their location and contents in catalog form (i.e. the St. Petersburg Catalog); next they copy the books in e-text format and send these copies to be input onto computer media at data entry centers. Over the past ten years ACIP has released tens of thousands of pages of great books, on tens of thousands of computer disks and through the World Wide Web, completely free. Nearly all texts RTF format (in Tibetan script and romanization) can be freely accessed, both requesting them on disk (order infos) or downloading them from the site. Beware that only a few a number of the items in the ACIP database are restricted and do not appear in the public releases (in respect of the centuries-old tradition of the Buddhist lineages of Tibet and India, and in particular out of respect for the current holders of these lineages who work closely with ACIP in locating and preserving these materials, ACIP has a policy of not releasing to the general public those texts which are by tradition considered secret; users who have received the necessary initiations to study these materials may however submit a request). Please notice the that Sambotha fonts with the ACIP encoding for Tibetan are freely downloadable from the Nitharta site. [2001 May 1].
The Alex Catalogue of Electronic texts is a collection of digital documents collected in the subject areas of English literature, American literature, and Western philosophy. You can also download complete collection of texts (e.g Corpora), such as "All philosophy texts" (10 MB tar file g-zipped), "All English literature texts" (22 MB tar file g-zipped), "All English literature texts" (22 MB tar file g-zipped)
The Catalogue isn't only an Archive of downloadable texts: you can also search the content of located texts and make some query online. For example, you can search for Mark Twain's The Adventures Of Huckleberry Finn. Simple. You can then search the content of The Adventures for the words like fish and belly to get a description of Huck Finn's father. Moreover, you can search the content of multiple documents simultaneously. For example, you can first locate all the documents in the collection authored by Mark Twain. Next, you can search selected documents for something like slav* (which includes slave, slaves, slavery, etc.) to draw out themes across texts. Another feature of the Catalogue is the on-the-fly creation of PDF files. Using this option you can specify things like fonts and font sizes for your output.
A rich and well structured list of resources of Electronic Texts in the Web . It is written in Italian, but is not limited in scope to Italian (only one of the six pages displayed is devoted to Italian).
The American Verse Project is a collaborative project between the University of Michigan Humanities Text Initiative (HTI) and the University of Michigan Press. The project is assembling an electronic archive of volumes of American poetry prior to 1920. The full text of each volume of poetry is being converted into digital form and coded in SGML using the TEI Guidelines. All texts are freely readable and downloadable (by individuals for personal use, research, and teaching) either in HML or SGML formats. Simple, boolean or co-occurence searches can be submitted throughout the entire American Verse Project collection; there is also an interface for searching only personally selected works in the collection.
Till now there are only three texts, freely available. Beware that by "annotation" they intend only the (prevalently Lexical) notes.
This Anthology, a part of the "Repertorio Ibero e Iberoamericano de Ensayistas y Filósofos", provides a fair amount of Spanish Language text of philosophical, critical and scholarly genre. All text (HTML format only) are free, but often they are only extracts of larger works.
Riccardo Scateni page contains about forty Italian Literary Texts, mainly from Progetto Manuzio, free in html format.
It is a collection of commercial CD-ROMs of Italian Literary Texts in electronic editions, available for reading and for queries with Eugenio Picchi's powerful DBT. Now available at discount prices also from Libreria Chiari.
+ Archivio della tradizione lirica da Petrarca a Marino (Liric Poetry from Petrarca to Marino) edited by Amedeo Quondam. 112 autors' works and the main Anthologies of Italian XIV-XVI century poetry (incl. Rimatori bolognesi del Trecento e del Quattrocento; Canti carnascialeschi and Nuovi canti carnascialeschi; Rime del Codice Isoldiano; Testi del primo Certame coronario; Lirici toscani del Quattrocento - corpus Lanza; Canzoniere italiano del secolo XIV; Poeti fidenziani; Rime degli Accademici Eterei; Rime del Cinquecento - corpus Frati).
+ Francesco Petrarca - Opera Omnia (Petrarca's Complete Edition) edited by Francesco Stoppelli.
+ Torquato Tasso - Tutte le Opere (Tasso Complete Edition) edited by Amedeo Quondam. The Correspondence and the Biography by G. B. Manso are included.
+ Giacomo Leopardi - Tutte le Opere (Leopardi Complete Edition) edited by Lucio Felici. Complete Correspondence, different Redactions, the two Chrestomathiae and the Commentary to Petrarca are included.
+ Commenti danteschi dei secoli XIV, XV e XVI (Comments on Dante from XIV, XV and XVI centuries) edited by Paolo Procaccioli.
A rich collection of Latin e-Texts, ranging from Biblical and Liturgical texts (from the Vulgata - English version as well - to Tridentine Latin Mass, with links to Bible browsers, Database of Gregorian Chants, etc.) to Classical and Late Latin literary texts. There is also a collection of Greek Classical texts that influenced Latin Tradition, of Medieval Latin Texts and Translations (c.400-c.1500), and even of Grammatical Texts (Donatus). Good links to many Latin and Medieval resources and a lot of freely downloadable material (but it depends from the links ...): however a great site.
Started in Aug 1991, arXiv.org (formerly xxx.lanl.gov) is a fully automated electronic archive and distribution server for research papers. Covered areas include physics and related disciplines, mathematics, nonlinear sciences, computational linguistics, and neuroscience. Users can retrieve papers from the archive either through an on-line world wide web interface, or by sending commands to the system via e-mail. Texts are usually in Tex / Latex format and can be freely retrieved. For more details, cf. under the References section. [2001 April 23].
The French Texts Archive of the Université de Genève. The number of texts hosted or (more often) linked to is very high (see the list at this address) but the availability of texts varies from really free and downloadable (e.g the texts direcltly hosted on the site) to very poor (e.g. the Gallica texts). There is also a special list of texts of French authors written or translated in other languages, available by the following http.
This site points to web resources for more than thirty languages: a lot of linguistic papers, but also a few native language e-Texts.
Babelot is a powerful multilingual catalogue of e-texts; it offers metasearch for 25,153 e-texts in 39 languages on 25 sites (incl. Project Gutemberg, ETC Virginia, Gutemberg DE, Perseus, etc.). It is maintained by Èulogos. [2002 February 18].
This Library (maintained by the Tabula Fati Publishing of Chieti) includes only a few literary Italian texts (originals and translations). All are free in HTML format, sometimes not download friendly (larger texts are divided in more pages).
The CiBIT (Centro interuniversitario Biblioteca Italiana Telematica) project, directed by Mirko Tavoni and mantained at Pisa University, is aimed to collect Italian Literary Texts (latin and dialectal as well) and made them freely available for reading, download (not always so free ...) and simple queries. It is working only after March 2000 and is still under construction. There is also a small POS-tagged and lemmatized corpus, cf. CiBIT Lemmatized Corpus.
This Italian free Electronic Literary Text Archive is the library of Cyberia by Nem0, and has also a mirror at Tiscali. The catalogue of classics is not as large as Progetto's, but there is also some modern new text published by the authors directly on the web. Most texts are in HTML format (but there is also a few unusable JPG facsimile) and can be freely viewed and copied. The more outstanding feature of the site is however the availability of a vocal synthesis: the plug-in MyVoice Net, freely available from the Cyberia site, both for Win and Mac, interfaces your browser with a vocal synthesis system based on the Eloquens 2000 technology by CSELT. [Rev. 2001 December 31].
Besides a lot of other librarian services and resources, this site offers a rich collection of Catalan e-texts of every gender, all freely browsable (and downloadable) in HTML format with frames. The site is wholly in Catalan language. [2002 February 18].
Polish Literary Texts in HTML format. You can browse them through pop-up windows. All pages are in Polish.
An Archive of Literary text all written in Latin: how nice! The Bibliotheca Latina is by far the biggest Archive, and has also a lot of medieval, humanistic and modern Latin texts. There are also smaller Greek, German, English and French Archives.
All the texts are freely readable online but are not planned for downloads.
This free Electronic Literary Text Archive contains 301 poems in HTML (beware of Bulgarian font encoding: they don't say what it is) format and can be freely viewed and copied.
This is one of those things you don't know how to classify, but a good educational indeed. Essentialy it is a Latin hypertext reader for De Bello Gallico, I. It is a freeware (C 1999 by Michael Cummings) for Windows and PC DOS operating systems that you can doenload and install on your PC. These programs are designed for the student of Latin who knows some grammar, but who lacks vocabulary. See more under the Latin section.
The Connecticut College CAPA is a huge electronic archive designed to make out-of-print volumes of American poetry available through the Webto readers, scholars, and researchers. All text are stored in HTML and are freely downloadable. Contact: Wendy Battin.
The INaLF (Institut National de la Langue Française - CNRS) Catalogue provides a selection on scientific grounds (both philological and informatic) of the French literary text resources available on the Web. After you have digested all their disclaimers and evaluation grids, the links to true freely downloadable texts are however relatively few.
CELT is the main online resource for contemporary and historical Irish documents in literature, history and politics. Mission Statement is to bring the wealth of Irish literary and historical culture (in Irish, Latin, Old Norse, Anglo-Norman French, and English) to the Internet in a rigorously scholarly project that is, at the same time, user-friendly for the widest possible range of readers and researchers - academic scholars, teachers, students (at all levels), and the general public, in Ireland and internationally. The published texts are freely available in various format: HTML (for browsing), SGML TEI compliant (download from FTP), plain text and PS for printing. The number of texts still in progress is huge, and now [2001 June 19] they have approx. 3,800,000 words online. Simple searches (for word or part of word) throrough the CELT database can be made both for texts and markup. Definetely a great site. [Last Rev. 2001 August 27].
Le Chateau has a room called "Chef d'Oeuvre de la Littérature Française", where you can freely and easily download (in ZIP format) some French texts, ranging from Molière's Completed Works to Pascal's Pensées, Baudelaire's Fleurs du mal, and Proust's Du côté de chez Swann.
A precious repository of free chinese e-texts ranging from the classical pre-Qin and Song to the Qing and modern ones. There are electronic versions of Chinese philosophical texts created by the Wesleyan Confucian E-text Project; Electronic versions of Chinese philosophical texts from other sources, some with minor improvements; and Information on and links to more information on the preparation and use of these texts. [2001 May 1].
This is a web edition, created at Virginia Library ETC (cf. the Electronic Text Center at the University of Virginia file), of the Chiricahua and Mescalero Apache Texts by Harry Hoijer, originally published by University of Chicago Press, 1938. There are 46 Chiricahua and 9 Mescalero texts, all free, but not very download-friendly, because they are displayed in frames: you can e.g. display a bilingual Apache - English version (either with or without notes), or Apache only, or notes only, or English version with ethnological notes, etc. For correct display a special Apache - Navajo font (a Times New Roman supplement), developed at the San Juan School District's and freely downloadable from this page, is needed. [2001 July 21; rev. 2002 January 25].
A collection of patristic texts, mainly in english translation, but with a few latin originals, and some translations in other languages as well (mainly russian and chinese). All texts are freely downloadable with theological markup (ThML) or HTML, plain or zipped.
A few free French texts (mostly uncomplete) and lot of links that often don't take you directly to the text you want to download.
The Computation and Language E-Print Archive (Cmp-Lg) was a fully automated electronic archive and distribution server for papers on computational linguistics, natural-language processing, speech processing, and related fields. Now absorbed into, and superseded by, the CoRR (Computing Research Repository). For more details, cf. under the References section. [2001 April 23].
There are all the plays in HTML format; texts are all of public domain, but is broken in a lot of files and is ultimately based on the free and easily available Moby Shakespeare, a part of the Moby Project. The most interesting feature of the site is the free search engine, but last time I checked was down. [2001 April 23].
Several links to free Irish e-texts (HTML), often with English translation side-by-side. They range from poems, to crosswords (e.g.), to Gaelic language Bullettin Board messages on several topics (do you want a bilingual recipe of "Seabhdar Cóilise le Duileasc / Cauliflower Chowder with Dulse"? Here it is). Beware that in all other respects the site is strictly Irish language only. [2001 May 1].
The CME is an online querable collection of Middle English texts (see the Bibliography) assembled from works contributed by University of Michigan faculty and from texts provided by the Oxford Text Archive, as well as works created specifically for the Corpus by the HTI (Humanities Text Initiative ). All texts in the archive are valid SGML documents, tagged in conformance with the TEI Guidelines, and converted to the TEI Lite DTD for wider use. The full 61 texts collection (or any selection of them you would define) can be freely seached online, and every text can be freely and integrally accessed in HTML format. [2001 August 27].
The online Computing Research Repository (CoRR) has been established in September 1998 in order to provide a single repository to which researchers from the whole field of computing could submit reports and have them published on the web in 24 hours. CoRR is freely available, and several formats are accepted, from TEX to PDF. The CoRR has superseded and absorbed the former Cmp-Lg E-Print Archive; for more details, cf. under the References section. [2001 April 23].
In the Corvinus Virtual Library you will find freely readable and downloadable transcriptions of a good lot of books on Hungarian history, published in the United States of America, in the English language (some others are translated from Hungarian). Texts are usually in DOC format (a few are PDF).
The Text Collection of the Creolist Archives at Stockholm Universitet contains text samples in various contact languages. The length of the texts varies considerably, which to a great extent depends on the scarcity of early texts. On the other hand, as these early attestations are crucial to our understanding of how language restructuring works, an effort has been made to include a fair number of such texts, even though they usually consist only of a few sentences. Many of those presently available have been discovered in old traveller's accounts by Philip Baker and Mikael Parkvall, whereas others are the oft-cited "classical" examples, or newly written material. Texts, all freely downloadable, are from: (1) Dutch-based varieties, spanning from a New York 1744 remnants and a few (South) African attestations from 1600-1870 and 1682-3, to Berbice (Caribbean Dutch) from 1827 and 1881, and Negerhollands (Caribbean Dutch, Virgin Islands) from 1923. (2) English-based varieties, ranging from Caribbean Creoles, with Miskito Creole (scanty old witnesses from 1707, 1827, 1847, 1872, 1899), Jamaican Creole (old texts, namely: 1733, 1780, 1788, 1789, 1803, 1806, 1817 1826, 1830s, 1834, 1839, 1841, 1859), Virgin Islands Creole, with old documentation from St. Croix (1878) and Tortola (1830), Leeward Islands Creole from Antigua (witnesses of 1788, 1810, 1825, 1834, and 1850), Nevis (1802 and 1825), Montserrat (1825) and St. Kitts (1708, 1718, 1802, 1834), Commonwealth Winward Islands (old texts from St. Vincent, ranging from 1791, 1817, 1828, to 1834), Barbadian (three small witnesses from 1652, 1692, 1859), Tobago (with a single short witness from 1774), and Guyana Creole (a few and mainly old witnesses, ranging from 1796, 1797, 1835, 1836, 1859, to 1990s), to Papua Creoles, with two Tok Pisin pages (a scientific abstract and a traditional story); and to African Creoles, with West African Pidgin English from Cameroon (1933), Ghana (1795, 1686, 1721), and Nigeria (1793, 1804, 1807, 1825, 1890-1930, 1926, 1963, 1965, 1995), with Kru Liberian Pidgin English (texts from 1819, 1821, 1832), and with Krio Creole English, mainly from Sierra Leone (1787, 1791, 1815, 1820, 1822, 1830s, 1840s, 1843, 1846, 1990s); (3) French-based varieties, ranging from North American Creoles with Mitchif (a single text from 1986) and Louisiana Creole (with a few old relics from 1720s, 1731, 1748, 1750s, 1773, 1800s, 1881, to 1902), to Caribbean Creoles, with Haitian (besides old texts from 1786, 1791, 1797 and 1802, there are a large collection of the Nan Peyi Dayiti, i.e. the abridged e-mail edition of the newsweekly "Haïti Progrès", and the Haitian Bible), French Antillean Creole, from St. Thomas (1884), Guadeloupe (besides old texts from 1664 and 1886, there are Un Conte Créole from 1997, and texts since 1997 from the journal "Madjoumbé", formerly "Kimafoutiésa") and Martinique (1671, 1695, 1698, 1790, 1793), and Commonwealth Antillean Creole, from St. Lucia (1900) and Grenada (1650s), to South American Creoles, with Guyanais (a few old relics: 1743, 1744, 1789, 1790s, 1799, 1872), and to Pacific Creoles, with Mauritian Creole, i.e. Isle de France CF, (1850); (4) Spanish-based varieties, with the Caribbean Colombia Creole Palenquero (a single long dialogue from 1988); (5) Portuguese-based varieties, ranging from West African Creoles, with Upper Guinea Creoles (a single old and scanty witness from the Guinea-Bissau Creole: 1696), Gulf of Guinea Creoles (a single small and old text from São Tomé: 1766) and a batch of old fragments and witnesses of untrackable origin (1621, 1676, 1682, 1760), to Asian Creoles, with Macaísta (two relatively small but recent texts: Jorge Remedios on Macaísta and Unga Lobo co Unga Cordêro / Unga 'Stória di'Sopo). [2001 August 14].
The CRIBeCu (Centro di Ricerca per i BEni CUlturali), besides some tools, provides a few online querable SGML Italian Literary text, namely Vasari's Vite and the Vocabolario della Crusca.
A good list of links on corpus linguistics maintained by Cristiana De Santis (e-mail). Especially worth noting is the e-text resources list; for more details cf. under References & standards section. [2001 July 7]
Croatian E-text project (electronic text) is is an effort to bring the printed Croatian books to the Internet. As in the Project Gutenberg phylosophy, the Croatian E-text Project should strive to make as much information available as possible in Croatian and other languages (there is also a conspicuous group of text in Spanish) translated from Croatian or about Croatia. A lot of information on Croatian e-writing. The texts appears in the form of links to other sites (many to Croatia Net in the USA) where they are deposited, and their consistency is very variable, ranging from pages mainly devoted to literary criticism, to true downloadable texts.
There are five short narratives in Guerrero Nahuatl with Spanish version side-by-side. Free HTML pages. [2001 May 9].
This page, from the Department of Asian and Pacific Linguistics - Institute of Cross-Cultural Studies - Tokyo University, maintained by Kazuto Matsumura, provides a few free Uralic languages and Korean texts. All HTML format. The home page is under construction.
+ Estonian e-Texts: http://184.108.40.206/kmatsum/estonian.html
a few free Estonian e-Texts: the 1918 Declaration of Independence, five legal texts and three newspaper articles.
+ Finnish e-Texts: http://220.127.116.11/linguistics/texts/kmatsum/suomi/findex.html: few free Finnish e-Texts: two standard Finnish, and one historical (Kieliasetus 1873).
+ Karelian e-Texts: http://18.104.22.168/linguistics/texts/kmatsum/finnic/karjala.html
single and short free Karelian e-Texts: Pekka Zaikov, Luvemma vienankarjalaksi. 3.-4. luokka, Petroskoi "Karjala", 1995.
+ Middle Korean e-Texts: http://22.214.171.124/linguistics/texts/fkr.html
single free Finnish e-Text: the 1449 'uer'incengaqjigog saq romanized text with SJIS kanjis.
+ Mari e-Texts: http://126.96.36.199/linguistics/texts/kmatsum/mari/marindex.html#index
few free standard Eastern Meadow Mari e-Text: three 1994 newspaper articles.
+ Veps e-Texts: http://188.8.131.52/linguistics/texts/kmatsum/finnic/vepsa.html
single and short free Vepse e-Text: Nina Zaiceva, Maria Mullonen, Ic^emoi lugemišt, Vepsän kelen lugendkirj 3.-4. klassale, Petroskoi "Karjala", 1994.
The Mudcat Café (cf. homepage), an online magazine dedicated to blues and folk music, maintains the Digitrad Lyric Database, a huge e-text archive with over 8000 popular lyrics in English, all freely downloadable in simple HTML format. Usually even the music is available. [2002 February 19].
The Danish National Literary Archive, mantained by the Kongelige Bibliotek, hold a few Danish texts in html format, freely readable online. There aren't download-friendly versions of the texts.
Early Canadiana Online (ECO) is a full text online collection of more than 3,000 books and pamphlets (English and French languages) documenting Canadian history from the first European contact to the late 19th century. The collection is particularly strong in literature, women's history, native studies, travel and exploration, and the history of French Canada. You can make simple queries online, but unfortunately you can download texts only one page at a time. For the English version go to this page and for the French version go to this other one.
This Russian only site by Evgenij Peskin is a good source for free Russian e-texts (mostly Pushkin, Chekhov, Blok, Dostoevskij, Gogol’) in sober html format. Beside texts there are also pages dealing with Russian authors. [2001 June 20].
The Center, extablished in 1992, combines an online archive of thousands of SGML-encoded electronic texts and images with a library service that offers hardware and software suitable for the creation and analysis of text. The Electronic Text Center's holdings are a large collection of browseable and searchable texts, and include approximately 45,000 on- and off-line humanities texts in twelve languages – but chiefly English –, with more than 50,000 related images. Unfortunately most of this stuff is usually available only to U.V. students, faculty, and staff. See however at this page: there are useful links as well.
"Den elektroniske bokhylla" is a nice page of Nynorsk texts. It is very simple and easy to read and download (in HTML). Contact: Jon Grepstad.
Finally a really free site from an Universitary Istitution, and a very good one! The EServer formerly at Carnegie Mellon University and now at the University of Washington was founded on 1990 as "The English Server". Today it offers a huge collection of 30,366 free works online, mainly in TXT format. Texts, all in English language, are both originals and translations, and cover a wide range of literary texts, from novels to drama, essays and Journals (such as Bad Subjects and Cultronics).
Home to electronic texts of all kinds, from contemporary American amateur authors to Shakespeare, from the mainstream and off-beat religious texts to the profane personal poetry; there are e-zines of every kind, from the political to the technical; many texts come from the Usenet. English Langauge is prevalent, but there is also space for other languages. There is for example a short story about Cinderella in Russian. The peculiarity of this story is in fact that it is written using only words which start with the same letter as Cinderella (in Russian it is 'z' for Zolushka=Cinderella). And so on.
All texts are freely downloadable from the E-text FTP.
The Electronic Text Corpus of Sumerian Literature is in preparation at the University of Oxford. Its aim is to make accessible, via the World Wide Web, over 400 literary works composed in the Sumerian language in ancient Mesopotamia during the late third and early second millennia BC. At this site you will find a catalogue of these works, together with a Sumerian text, English prose translation and bibliographical information for each composition: all are freely downloadable. New material, and new user facilities, are added to the site regularly. Although minor corrections will be made, no major changes are planned for the editions presented here until the end of the first phase of the project in late 2000. If you wish to use or cite the corpus, please use the following form of citation: J.A. Black - G. Cunningham - E. Robson - G. Zólyomi, The Electronic Text Corpus of Sumerian Literature (http://www-etcsl.orient.ox.ac.uk/), Oxford 1998. [2001 May 1].
Giuseppe Bonghi's Italian Classics Library provides 118 texts of 36 Italian authors (source editions often are not the best available). All texts are HTM and access to them is free; since a lot of texts is divided in a number of minor pages, downloading is not an easy job.
This page maintained by Lyle Neff (cf. homepage) is a simply list of literary e-text resources on the Web. Small but interesting.
This French literature "petit bibliothèque portatif" is hosted by the French Foreign office and is maintained by Olivier D. J. Tableau. It offers a good amount of texts in PDF / RTF / Clarice Works format, all free and easy to download. Texts, ranges from Montaigne to Diderot, from Balzac to Leroux, and from Baudelaire to Mallarmé; there is still some French translation of foreign authors (e.g. Shakespeare Kafka and Goethe). Surely a good site.
The Ezio Galiano Foundation for Italian blind people, besides other services, provides about 2,500 literary works in italian language, ranging from classical masterpieces to leisure fiction: see the Catalogue. Free downloads can be made typing "http://www.galiano.it/biblio/autori?/filenoun.zip" where "autori?" must be replaced by "autoriA" if the author name begins with A, "autoriB" if begins with B, and so on.
The E-Text section of the Biblioteca magistrale "Freinet" (Tangram, in via Portici 204 - Merano) includes 70 texts of Classics of western literature in Italian language. All are either in TXT or ZIP format, free and ready for download.
The site of the Bibliothèque Nationale Française, online since 1977, promises 70000 digitalized documents numérisés, both in image (BNF manusripts ecc.) and in text format (from INALF Frantext database in cooperation with publishing houses Acamédia, Bibliopolis et Honoré Champion). Unfortunately, many items are not available (The notice "Ce document est protégé au titre de la propriété littéraire et artistique. Pour le consulter, vous devez vous rendre à la Bibliothèque nationale de France" come often also for texts that could be put in the public domain), and also the free text can be read online but downloaded only partially in PDF format. If you want texts and not legalese, this is not the right place for you. Some simple string searches can be done online.
This elegant site by Pedro Benito Somalo offers all the works of Gonzalo de Berceo, fully provided with vocabulary, biography, critical documentation, photographic gallery of Berceo landscapes, ecc. All texts (in flowery HTML, often divided by chapters) are freely browsable and downloadable. It's only a pity that such a nice site have to be hosted by Geocities, with all his annoying popups. [2002 February 18].
The Green Library (or Cactus Library), a project of the Saint-Petersburg School of Religion and Philosophy (SRPh), has by now a few Greek (Aristotle’s De Anima, Plato’s De Republica), Russian (Dostoevskij, Leskov, Turgenev, Derzhavin) and French (Turgenev) texts. More titles are announced as forthcoming, also in Latin (Abelard), French (Casanova) and Old Slavonian (Bible). All titles are freely downloadable in PDF format. [2001 May 1].
Internet Medieval Sourcebook is a great page of resources for medieval texts online, ranging from Literary Texts to Saint's Lives and Acts of Councils, in several languages but mainly translated in English. The textual consistency of the linked pages is variable.
The Newspaper page at the Institut für Maschinelle Sprachverarbeitung Stuttgart, made by Isabel Trancoso, it's a first rate resource list for 21 countries (Argentina, Australia, Austria, Belgium, Brazil, Czechoszlovakia, Denmark, Estonia, France, Germany, Italy, The Netherlands, New Zealand, North America, Norway, Poland, Portugal, South America, Spain, Sweden, Thailand, UK). Most are link to plain newspaper web edition, some to e-text repositories.
A small library of interactive hypertexts for free reading and search maintained by Éulogos. All literary texts, many religious (the BRI, Bibliotheca Religiosa). Nine languages are so far supported (Albanian, German, English, Spanish, French, Italian, Latin, Finnish).
A short collection of texts from the isLa, a band popular in Philippines. Three texts are in tagalog, one in tagalog with English version, and one in English. [2002 February 22].
Keimena is a large collection of modern Greek texts. The pages are all in Greek (you must have a Greek font installed) are are full of annoying Tripods popups. The texts are freely readable, but are displayed in heavy (and slow) HTML pages unsuitable for download.
The Textlist page of the Kirchenmusik online site (a good and well known resource for music lovers) by Joachim Vogelsänger unfolds a huge and free collection of texts of Oratorios, Cantatas, Sacred Hymns and so like. The mosts are in German, spanning from Schütz's Musikalische Exequien to Graun's Der Tod Jesu, J. S. Bach's Matthäuspassion, Mendelssohn's Paulus, Brahms' Vier ernste Gesänge and Webern's Kantate op. 31. English texts are also well represented, spanning from Händel's Alexanderfest to Britten's Ceremony of Carols and Tippett's A child of our time. A few texts are also in Latin, such as Ave Maris Stella, Stabat Mater, Vexilla Regis, Lamentationes Hieremiae, and so on. There are also only very few texts in French (Fauré's Cantique de Jean Racine and Saint-Saens' Weihnachtsoratorium) or in Italian (Monteverdi's Combattimento di Tancredi e Clorinda and Lamento d'Arianna). All the texts are freely downloadable in simple HTML format. [2001 August 27].
The Reference link page of the Library of Congress for American electronic publishing.
A rich library of Russian e-text (1.680 MB of data at 28 July 2000), online since 1994. Texts ranges from litterary to technical ones (e.g. Unix materials), and are all freely downloadable in TXT\HTML with Windows cyrillic encoding (koi8-ru windows-1251). A great site! This library, which is maintained by Maksim Eugenievich Moshkow (cf. this home page), has also a lot of mirror sites, listed in this http.
This page contains the texts of 1842 poets and 1217 composers in 22 different languages: of course German is prevalent (with 3449 texts), followed by English (1865) and French (1135), but minor languages are well attested as well, ranging from Russian (818, incl. Ukrainian), Italian (576), Finnish (137), Polish (133), Spanish (124), Swedish (122), Czech (97, incl. Slovak and Moravian), Norwegian (93), Hungarian (77), Danish (59), Latin (31), Romanian (20), Yiddish (12), Dutch (8), up to Hebrew (2) and Armenian, Adzerbaijani, Greek and Portoguese (1 each). All texts are HTML and freely downloadable. A treasure for the music lover; as for computational linguists, texts are too short, but besides poet and composer arrangements, there is also a language arrangement available, which can be of some use.
A site devoted to contemporary Austrian literary texts. Original poetry is prevalent, but there is also some translation (e.g. Lull, Shakespeare). All texts are HTML and are freely downloadable.
A hypertext (HTML) Library of Brasilian Literature Texts. Several Brazilian classic literary works are already available in html format for browsing.
It is an archive of original and translated literary texts in Esperanto. Huge, but texts are stored in html freely readable online format and there aren't more download-friendly versions.
The Little Sailing library offers some Classical Greek (Aeschines, Aeschylus, Apollodorous, Aristotle, Aristophanes, Euripides, Herodotus, Hesiod, Lucian, Xenophon, Homer, Pausanias, Plato, Plutarch, Sophocles, Thucydides) and a few Modern Greek (Skaribas, Doumenis) texts in unicode encoding . All texts can be freely downloaded or browsed online, often with side-by-side translation. The texts you can download are compressed and each file contains a full text in MS Word 97 format (the original text only - no translation). A font of Unicode encoding type that supports the Greek Extended range is all what you in order to can see the texts. [2001 May 1].
These texts (poetry, folklore and papers) are taken from "Lîvõd Tekstõd", Rîga, 1991 edited by Valda Úuvcâne, and were transcribed in html by Uldis Balodis (email; other adress). This anthology was intended to make available reading material to learners of the Livonian language. This being mainly due to the fact that written Livonian texts are so scarce. The texts (HTML pages only) are short and sometimes with transcription problems but are free. [2001 May 7].
A simple page with versions of the Pater Noster in many germanic languages (Afrikaans, Alsatian, Bavarian, English, Danish, Dutch, Frisian, German, Gothic, Icelandic, Norn, Norwegian, Old Saxon, Pennsylvania Dutch, Plattdeutsch, Swedish) by Catherine Ball (see her homepage), the webmistress of the Old English Pages. There is also an easy interface that allows the comparation of any two texts, thus creating some sort of parallel texts. This page was prepared for the use of classes in linguistics, history of the English language, and Old English. [2001 July 13].
Awabakal is a long time (second half of XIX century) extinct dialect of the more recently extinct (second half of XX century) Wanarua language (Yuin-kuric, New South Wales). A translation of Luke was made between 1827 and 1831 by the missionary Lancelot Threlkeld and Biraban (McGill) in the Newcastle/Lake Macquarie area. Over that same period, the Awabakal population declined to such an extent that only a few families could be found. For this reason, much to his regret, Threlkeld was unable to publish the gospel or make any attempt to teach the Awabakal people to read it. The original manuscript of Threlkeld's fourth revision is still in the Sir James Grey collection at the Auckland Public Library, New Zealand. The current edition (1997, published on the occasion of the bicentenary of white settlement in Newcastle, NSW) however makes use of the larger 1892 publication, An Australian Language published after Threlkeld's death by Dr Fraser and printed by the New South Wales Government Printer. There are freely available in PDF format both the full text or a smaller sample. [2001 May 18]
MedDb of the "Centro Ramón Piñeiro para a Investigación en Humanidades" is a database providing the complete corpus of Medieval Galego-Portuguese lyrics. The serch online can be activated only after registering, but the registration is easy and free.
This page of the Forgotten Ground Regained site provides a good collection of medieval Germanic (mainly Middle English) poetry texts, ranging from Old English to Middle English and Scots, with some hints of Old Norse and Old (High and Low) German. All texts are freely downloadable (but often broken in more files). [Rev. 2001 December 2].
Moby Shakespeare edition, a part of the Moby Project, is the only complete freeware e-text of all Shakespeare’s works. It is easily available in more or less complete version and formats from nearly all literary English e-texts repositories. The ftp adress hereupon has a fairly complete version, with a single txt file for each work. A full zipped version is available by ftp from a lot of site, such as Gatekeeper, Sheffield University, etc. [2001 April 23].
An East African language and culture resource page. There are also some links to Swahili e-Texts.
This site provides some unconventional Italian texts, ranging from transaltions of the classic Eroda's Mimiambis, Theophrast's Characters and Poggio's Facetiae to the anonimous Manganello and Visconti Venosta's Prode Anselmo. All texts are HTML freely downloadable.
Myriobiblos, The E-text Library of the Church of Greece, provides a lot free HTML e-texts (you can browse and save them) from Classical to modern Greek; but there are also fewer texts (mainly translation) in Bulgarian, English, French, German, Italian, Romanian and Russian. Texts are also categorized by subjects: Bible, Liturgical texts, Saints, Catechism, Orthodox spirituality, Patristic texts, Patrology, Theology, Church history, Church dialogue, Church art, Christian art, Christian archaeology, Byzantine history, Modern Greek history, Modern Greek Literature, Philosophy, Debates, and Bibliographies. [2001 May 1].
This page, by SorrentoRadio, collects over 500 texts of Napolitan songs, from the classical to the lesser known ones. All texts are freely browsable and downloadable in plain HTML format. There is also a Neapolitan Proverbs page (with glosses in Italian) and a short Old Neapolitan - Italian Glossary. [2002 February 23].
NCSTRL (pronounced "ancestral") is an international collection of computer science research reports and papers made available for non-commercial use from a number of participating institutions and archives. Texts can be freely retrieved. [2001 April 23].
This Electronic Data Base, maintained by the Libera Associazione Nuovo Rinascimento (which was born in the Italianistics Department of Firenze University), containes some Italian Literature texts (mainly but not exclusively Renaissance texts) and some modern scholar works as well. All are freely downloadable in several formats.
The Online Book Initiative "Online Book Repository" (OBR) is a large collection of English language texts (originals and translations) and related materials ranging from Shakespeare and The Bible to novels, poetry, standards documents, etc. The page is only an index, but it is speedy and all texts are ready to be freely downloaded. Contact: email@example.com.
This page, which is a part of the rich Old English Pages site by Catherine Ball (see her homepage), is a valuable index to electronic editions of Old English texts, translations, and images of Anglo-Saxon manuscripts available on the Web. [2001 July 13].
This frequently updated site provides references to German Literature resources. Although it is of exclusive literary interests, sometimes it provides as well some useful links to German e-texts.
The Online Medieval and Classical Library (hold by The Berkeley Digital Library SunSITE) is a collection of some of the most important literary works of Classical and Medieval civilization translated into English. Texts can be browsed and serched online and you can also freely download them in ZIP format from the OMACL FTP Site at the University of Kansas. At present there are only 30 texts available, and many of the larger texts are also available in multiple-file editions.
The Book Listing page (http://digital.library.upenn.edu/books/lists.html) is not a true Archive of texts, but a database of resources in the web about English literature works and authors. The Archive page, instead, is good list of Sites with free texts online in different Languages.
This page collects links to every work of classic horror and fantasy fiction (lato sensu: Shakespeare, Goethe's Faust and Milton's Paradises are comprised as well!) available on the Internet. All texts are in English language. The main purpose of this page is not to display the works themselves, but rather to direct the reader to other sites where the works are housed. When a particular text can be found in several different places, the site which is easiest to access and has the most graphics, information, etc., has been selected. Only texts which are hard to find and not easily accessible are reproduced on this site. Due to space limitations, links to other sites are used as much as possible. All texts are freely downloadable, usually in ZIP format.
There are so far 19.748 links to German Literary Texts, but a lot of them bring you to subscription sites or to texts unsuitable for downloading.
Opera e-Libretto (Collection Ulric Voyer) is a collection of 220 free e-texts of opera libretti. Displayed libretti are in Italian (Monteverdi, Provenzale, Haendel, Vivaldi, Piccinini, Pergolesi, Cimarosa, Mozart, Salieri, Jommelli, Spontini, Botnjanskij, Fioravanti, Rossini, Bellini, Donizetti, Soliva, Verdi, Boito, Ricci, Mancinelli, Anfossi, Giordano, Catalani, Mascagni, Leoncavallo, Zandonai, Puccini); French (Charpentier, Rameau, Campra, Rousseau, Gluck, Méhul, Gretry, Berlioz, Auber, Bizet, Boieldieu, Offenbach, Dukas, Massenet, Lalo, Chabrier, Saint-Saëns, Thomas, Gounod, Massé, Reyer, Delibes, Debussy, Cui, Halévy, Bruneau, Roussel, Pierné, Laparra, Voyer - for Cendrillon, Jongleur de Notre-Dame and Sapho there are also English versions, for Werther Italian and English version); English (Purcell, Blow, Clay, Cellier, Edwards, Sullivan, Ford, Cadman, Gershwin, Yanelow, hoiby, Neff + Engl. translation of Marschner’s Vampyr); German (Mozart, Weber, Wagner, Johann Strauss, Richard Strauss, Freiherr von Franckenstein); Russian (Rimsky-Korsakov, Mussorskij), mainly in english-style transcription (only for Bojarynja Vera Šeloga and May Night a KOI-8 version is available); Danish (Nielsen). All texts are in html, usually broken in more files according to act divisions. [2001 June].
The Text page of the ORB (Online Reference Book for Medieval Studies) has a good list of Links to excerpts and full texts from primary and secondary sources, housed at the ORB server or elsewhere on the World Wide Web. Many English translations, some Latin, and a few other language as well.
(beware that the page has lot of frames and Java)
A large catalogue (you can download the PDF) of texts, mainly of literary, philological and scholarly genre. English Language is prevalent but not exclusive (there is room as well, for the Nibelunglied, the Biblia Hebraica Stuttgartensia, Ariosto's Orlando Furioso, Mistral's Mireio, Boetii De Syllogismo Hypothetico, and so on). The texts are all in TEI conformant XML / SGML / HTML formats, but versions in ASCII or RTF are often supplied as well.
They offer also some linguistic corpora for free after sending a disclaimer statement (e.g. Lampeter Corpus, Northern Ireland Speech Corpus, SUSANNE Corpus): query their catalogue with search author=corpora.
The most of the materials are freely available for non-commercial use , after you have subscribed a legal disclaimer; for few materials, however, you have to pay membership. The anonymous ftp is said to be of public access, but last time I tried I didn't succeeded to login ... Email.
This little page offers the text of the Lord's Prayer in Taino (with glosses either in English or Spanish) from Dr. Cayetano Coll y Toste's Prehistoria de Puerto Rico, 1493. It's a part of the Official Jatibonicu Taino Tribal Government Web Site. For the reviving of the Taino language, cf. also their Dictionary of the Spoken Taino language. [2001 July 21].
The Patrologia Latina Database is an electronic version of the first edition of Jacques-Paul Migne's Patrologia Latina, published between 1844 and 1855, and the four volumes of indexes published between 1862 and 1865. The Patrologia Latina comprises the works of the Church Fathers from Tertullian in 200 AD to the death of Pope Innocent III in 1216. The Patrologia Latina Database contains the complete Patrologia Latina, including all prefatory material, original texts, critical apparatus and indexes. Migne's column numbers, essential references for scholars, are also included. The text is encoded in SGML and the search interface permits searching by the SGML tags. The database can be searched by single words, truncated terms or phrases, or by using a combination of Boolean operators. Searches can be limited to specified authors and texts or performed across the entire corpus. Contact: Chadwyck-Healey Ltd, The Quorum, Barnwell Road, Cambridge CB5 8SW. Tel: 01223 215 512. Fax: 01223 215 514. Email.
+ The Patrologia is available as a collection of CD-ROMs (cf. this address), on conditions to be discussed with a Chadwick-Healey representative (follow this link).
+ The Patrologia Latina Database is also available as an online database, accessed through the Internet on payment of an annual subscription fee, to be discussed with a Chadwick-Healey representative (see at this address).
mirrors: United Kingdom; Germany.
Perseus, a Tufts University Project, is a database of Classica Greek and Latin markup-tagged Texts you can queries online. Texts, images, and maps in the Perseus Digital Library are all interconnected, making it easy for readers to look up for something in more texts using a single Lookup Tool. But unfortunately you can neither download neither read continuously any of their texts.
"Filosofia in Italia" Page of Philosophical Texts (translated) in Italian Language. A lot of items (but not all) are only links to Progetto Manuzio; most texts are freely downloadable in zip format.
Starry.com archive of unpublished, prepublished and cyber-published American literature: from traditional novels written for the web to real "virtual novel". All text are HTML suitable for reading online and not for downloading, but, of course, they can be freely downloaded as well. If you want fresh, modern and post-modern, narrative raw texts in English, this is surely a good spot.
Progetto Duecento is a database covering most of Thirteenth century Italian poetry made by Francesco Bonomi. You can make only very simple queries (and direct access to texts is forbidden) online unless you buy the offline program at the small cost of 40.20$. It allows to search for rhymes, cooccurencies of two items and wildcarts, exctraction of contexts and of full texts. It is a small but fully functional free lanced competitor to the big OVI database and his GATTO query sistem.
Liber Liber has the largest Italian library of electronic literary texts. All are freely downloadable in zip format.
The oldest (founded by Michael Hart in 1971) and largest project to get out of copyright literature online, freely available. There are a lot of mirror site you can chose to search their database and download files (the following is the Italian one).
Text are usually in TXT format. The majority of the texts (about 1350) is in English, but there are also a few titles in other European language, for example: Brentano's Das Maerhen von dem Mytrenfraeulin (note the orthographical adaptation), Ariosto's Orlando Furioso, Dumas fils' La dame aux camelias, Cervantes' Don Quijote, etc.
Most Project Gutenberg e-texts are public domain. You can do anything you like with these – you can re-post them on your site, print them, distribute them, convert them to other formats. Some Project Gutenberg e-texts have copyright restrictions. You can still download and read these, but you may not be allowed to reproduce, modify or distribute them. When browsing or searching on the site, you will see these copyright-restricted texts indicated in the listings. For fuller information about them, download the e-text and read the header of the file, which will spell out the conditions in detail.
+ Sealsoft Literary Works is a page where you can make online searches on all the Gutenberg texts, treated as a 80 and more million words corpus.
Laurens Jz. Coster Ontwerp is currently trying to set up a comprehensive collection of Dutch literary masterpieces on the World Wide Web. Although most of the pages are (and will continue to be) in Dutch only, the home page is in English. A lot of Dutch texts in html format, freely readable online. There aren't download-friendly versions of the texts.
An austere ftp site from Washington with an archive of Classical Latin text in TeX format (Apuleius, Ausonius, Caesar, Catullus, Cicero, Horatius, Livius, Nepos, Ovidius, Propertius, Prudentius, Sallustius, Tibullus. Vergilius) with separated commentary texts. The texts and commentaries stored in this directory tree have been scanned into ASCII form, edited by hand and converted into TeX format from documents free of Copyright and are freely downloadable. This usually means texts that are sufficiently old that their copyright has run out (see README.copyright); so you are not likely to find recent scholarship here. What you will find, however, is material from the last century. TeX is a good (and free!) typesetter; but if you don't have the program, having the files in that format will not help you. Therefore, in the Utils directory is a program "tex2asc", a rudimentary TeX-to-ASCII converter. It was written solely for the purpose of converting these particular documents into ASCII form, so it probably won't work if you try to use it on any more complicated TeX document. If you do plan to use TeX to format the documents, make sure that you have the file "ks_macros.tex", also stored in the Utils directory. This file contains macros needed by some of the documents. If the document you plan to format contains Greek characters, be sure also that you have the Greek font specifications (gr*.pk and gr*.tfm, also in the Utils directory).
Nordic free Electronic Literary Text Archive. Project Runeberg publishes Nordic literature on the Internet since 1992: this means free electronic editions of old books from Sweden and the Nordic countries. The PR catalogue lists more than 200 titles, most of which are in the Swedish language. All texts are freely downloadable, mainly as HTML files.
Projecto Vercial has the largest Portuguese library of electronic literary texts. All are freely downloadable.
Slovak free Electronic Literary Text Archive. About 60 Slovak authors are represented. Documents are stored in HTML format. All pages are in Slovak.
Nothing to do with Project Gutenberg ... Projekt Gutenberg - DE has texts in German language of 300 and more authors, from Aesopus to Zola (all in German). All texts are freely readable online but not are not planned for downloads (longer works are divided into several chapters). You can instead order a CD ROM with the whole corpus of Gutemberg-DE for only 39.80 DM.
This page maintained by Lyle Neff (cf. homepage) is a rich database of the online sources of opera libretti. A lot of e-texts (html format) are freely available directly from the site, other are only linked to. Beside libretti also secular songs and sacred vocal music are also dealt with. Language covered are Italian, French, English, German, Russian, Spanish (zarzuelas), Latin (sacred vocal music) and Jewish (songs). There are also links to other less specific musical and linguistic resources. [2001 June 20].
Korean texts: there are eight collections of papers dealing with Korean lexicography. Each paper is downloadable as a PDF file. Obviously all the site is strictly in Korean ... Beware also that downloading may be a nightmare - at least it was so last time I tried. [2001 April 25].
A small page of cooking recipes (Tagalog only) from the University of Pennsilvania Tagalog website.
+ A second page, with other recipes. [2002 February 22].
The lack of publicly accessible, machine-readable data is a major impediment to research in Southeast Asia. These archives, maintained by the SEASRC (South East Asian Computing And Linguistics Center) serve as a repository for raw, tagged or otherwise prepared texts, word lists, dictionaries, and the like. Given interest and availability, we will also archive sound files. Till now only a few Thai texts (mainly onomastics and toponomastics) are available.
It's only an anthology of poems, but covers 179 catalan poets, ranging from the classical (e.g. Ausias March) to modern unknown and inedited authors. All texts are freely readable as HTML pages (one for each author), and you can save them in this format. Some Biography is provided as well.
A golden mine of resources for Biblical studies and Semitic philology, including also some link to e-texts (Bible in hebraic, Septuagint and Old Greek versions, Latin Vulgata, Dead Sea Scrolls, etc.).
This SETIS Library (The Scholarly Electronic Text and Image Service) provides a rich collection of 18th, 19th and early 20th century Australian texts. All are freely readable and downloadable in TEI2 conformant HTM format.
A golden mine of links to chinese downloadable e-texts on the Web, ranging form Classics to online newspapers and usenet archives. Beware however that a lot of links are often down.
Star Thrower Publishing provides modern and experimental American literature freely online. All text are are HTML suitable for reading online and not for downloading, but, of course, they can be freely downloaded as well. Go to this page and see what they have. [Last checked 2001 December 25].
The Sumerian Text Archive offers a growing collection of transliterated Sumerian texts. These texts have been transliterated using only characters from the ASCII alphabet so that the text files can be used on every type of computer. As a result, however, the transliterations deviate in a number of ways from what is common practice in Sumerology (cf. the List of Conventions). All texts (administrative UR III, Old Sumerian and Old Akkadian; Royal Inscriptions) are freely downloadable. [2001 May 1].
TITUS server provides text materials from languages that are relevant for Indo-European studies. A lot of language (also non Indo-european) are present either with text editions by TITUS itself, or with external e-texts or projects which are simply linked to: see the general Index, but beware that's very heavy. TITUS hosts also many projects and initiatives, such as Armazi, Ogamica and Tocharian Projects. Texts mostly are presents in HTML format, but often also in Wordcruncher format, and sometimes in TXT. Usually the version in HTML (UTF8) and Wordcruncher format are publicly available (that's to say that can be downloaded and used freely for scholarly purposes, provided that they are quoted as sources and the name(s) of the editor(s) and the date of last changes are indicated in publications) whilst the TXT versions is restricted to TITUS members. Most of the texts are as a matter of fact accessible on the TITUS WordCruncher Server for investigations of many kinds). [2001 July 14; Rev. 2001 August 30].
The Universal Library, hosted by the Computer Science Department of Carnegie Mellon University, provides a good page of resources on e-texts in the Web. Mainly (but not exclusively) English ones.
A rich archive of Literary English Texts, but, unfortunately, restricted to Toronto University students and staff. Always the same old story.
Brazilian free Electronic Literary Text Archive. The catalogue lists a lot of texts, all freely downloadable, mainly as RTF or PDF files.
In these page by R. J. C. Watt there are online concordances (kwic format, made with the Concordance tool) with e-texts aside of a few English poetry (Shelley, Coleridge, Keats, Blake, Hopkins). They seem quite nice.
The texts of the main works of the famous French philosoph Gilles Deleuzes freely available directly on his site. All texts are in HTML formats and are broken in more pages for better online reading (but for worser downloading ...). Besides the French original, English and Spanish translation are available as well, so you can construct three parallel texts (if not a true parallel corpus).
The WES Section of the University of Virginia Library provides a list of resources for 17 European literatures, some of which may be useful for accessing text archives. The languages displayed are: Catalan, Danish, Dutch, Finnish, French, Galego-Portuguese, German, Greek, Irish, Italian, Latin, Norwegian, Old Norse & Icelandic, Occitan, Portuguese, Spanish and Swedish.
A useful site for collecting corpora of literary texts.
Slovenian free Electronic Literary Text Archive. A rich collection of Slovenian authors, and some translation as well. Documents are stored in HTML format. All pages are in Slovenian.