Presentation of The Pangloss Collection
Pangloss Collection : Corpora of natural speech in "rare" languages 
The Pangloss Collection provides free access to documents of connected, spontaneous speech, mostly in "rare" or endangered languages, recorded in their cultural context and transcribed in consultation with native speakers. Its goal is to contribute to the documentation and study of a precious human heritage: the world's languages.
At present, the archive contains 1460 records in 70 languages, with 430 documents annotated.
Pangloss is itself part of a larger repository called Cocoon (for “Collections de Corpus Oraux Numériques”, i.e. Collections of Digital Oral Corpora) – formerly CRDO-Paris. While Pangloss has a specific focus on endangered and underdocumented languages, Cocoon is open to corpora in all the world’s languages.
A sound archive with synchronized transcriptions
For linguistic science, language is first and foremost spoken language. The medium of spoken language is sound. The Pangloss Collection gives access to original recordings simultaneously with transcriptions and translations, as a guarantee of authenticity and as a resource for further research.
The documents prepared by the project incorporate both sound and text -- at a minimum a phonological transcription and free translation, and where available word-by-word glosses, notes, etc. The text is indexed to the sound at the level of the "sentence" or intonational group.
A structured, open architecture
To ensure the continuity of these resources, all is saved in a structured archive in accordance with the latest data-processing standards (XML, OLAC, Dublin Core...), in an open format, and may be downloaded for research purposes.
The tools used to prepare and disseminate it are open-source.
Publications 
2011.
MICHAILOSVSKY, Boyd; MICHAUD, Alexis; GUILLAUME, Séverine. "A simple architecture
for the fine-grained documentation of endangered languages: the LACITO multimedia archive". Keynote talk at Oriental-COCOSDA 2011, Hsinchu, Taiwan.
Jacobson, Michel; Boyd Michailovsky and John Brandon Lowe: Linguistic documents synchronizing sound and text in the special issue "Speech Annotation and Corpus Tools" of Speech Communication, n°33 (2001).
Pre-print version available in pdf
Talks
Jacobson, Michel. 2004. « La documentation sur les langues : de la collecte des données brutes à leur archivage ». Colloque: Sources et ressources pour les sciences sociales - Nature/Traitement/Modélisation. Paris EHESS. 9-11 december 2004.
Lowe, John; Michel Jacobson and Boyd Michailovsky. 2004. « Interlinear Text Editor Demonstration and Projet Archivage Progress Report ». 4th E-MELD (Electronic Metastructure for Endangered Languages Data) workshop on language engineering: Linguistic Databases and Best Practice. Detroit. 15-18 July 2004.
Jacobson, Michel and Boyd Michailovsky. 2003. « Le programme "Archivage" et la mise en ligne des archives orales du LACITO ». La diffusion du patrimoine sonore. Après-midi d'étude de l'Association Française des détenteurs d'Archives Sonores (AFAS). Paris. 2 october 2003.
Jacobson, Michel. 2003. « Corpus informatisé en linguistique de terrain ». Linguistique et Corpus. 36e Colloque International de la Societas Linguistica Europea. Lyon. 4-7 september 2003.
Jacobson, Michel and Alexis Michaud. 2002. « Le projet archivage du LACITO ». Collecte et exploitation de corpus dans le domaine berbère. Journées d'étude de l'INALCO. Paris. 6 december 2002.
Jacobson, Michel and Boyd Michailovsky. 2002. « Linking Linguistic Resources: time aligned corpus and dictionary ». Resources and Tools in Field Linguistics. Workshop international. Las Palmas. 26-27 may 2002.
Michailovsky, Boyd. 2002. « Searchable speech/text corpora: a resource for spoken language ». SCALLA (Sharing Capability in Localisation and Human Language Technologies). Working Conference, National Center for Software Technology (Programme of the European Commission: ASIA-Information Technology and Communications). Bangalore. 21-23 november 2001.
Jacobson, Michel. 2001. « Un système d'édition pour des documents linguistiques texte-son ». 4e Colloque International sur le Document Electronique. Toulouse. 24-26 october 2001.
Mazaudon, Martine and John Brandon Lowe. 2001. « Limbu and Hayu online text demonstration ». Workshop on Tibeto-Burman Linguistics. U.C. Santa Barbara. 28-29 july 2001.
Jacobson, Michel. 2001. « Archivage de documents linguistiques et leurs applications » Journée Jeunes chercheurs en Linguistique Appliquée. Paris. 27 june 2001.
Michailovsky, Boyd. 2001. « The LACITO Archive project markup (pdf) ». The Need for Standards. Workshop The Digitization of Language Data. Santa Barbara. 22 june 2001.
Jacobson, Michel; Boyd Michailovsky; Françoise Ozanne-Rivierre and Jean-Claude Rivierre. 2001 « La documentation des langues et des traditions orales: Le projet Archivage du LACITO ». Colloque international Identités autochtones: Paroles, écrits et nouvelles technologies. Paris (UNESCO). 17 may 2001.
Jacobson, Michel and Boyd Michailovsky. 2000. « A linguistic Archive on the web ». Workshop Web-Based Language Documentation and Description. Philadelphie. 14 december 2000.
Jacobson, Michel. 2000. « Les outils modernes pour la notation de corpus de parole ». Colloque Transcription de la parole normale et pathologique. Tours. 8 december 2000.
Jacobson, Michel. 2000. « XML Tools for managing linguistic data: The LACITO Archives Project ». Workshop Linguistic Exploration: New methods for creating, exploring and disseminating linguistic field data. Chicago. 6 january 2000.
Jacobson, Michel. 1998. « Projet Archivage: pour la conservation de documents sonores et textuels synchronisés en linguistique ». Colloque Immagini, Memoria, Digitalizzazione. Milan. 26 november 1998.
Team 
Séverine Guillaume 
Boyd Michailovsky 
Alexis Michaud 
Copyright 
The Archive is an ongoing project of the LACITO research unit ("lab") of the French National Center for Scientific Research.
Copyright of Archive documents is held by the authors. Non-commercial use for research purposes is authorized; please contact the project for any other use.