Abstract
This paper presents the Latvian Language Learner Corpus (LaVA) developed at the Institute of Mathematics and Computer Science, University of Latvia. LaVA corpus contains 1015 essays (190k tokens and 790k characters excluding whitespaces) from foreigners studying at Latvian higher education institutions and who are learning Latvian as a foreign language in the first or second semester, reaching the A1 (possibly A2) Latvian language proficiency level. The corpus has morphological and error annotations. Error analysis and the statistics of the LaVA corpus are also provided in the paper. The corpus is publicly available at: http://www.korpuss.lv/id/LaVA.
Original language | English |
---|---|
Title of host publication | 13th Language Resources and Evaluation Conference, LREC 2022 |
Subtitle of host publication | Proceedings |
Editors | Nicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Jan Odijk, Stelios Piperidis |
Publisher | European Language Resources Association (ELRA) |
Pages | 727-731 |
Number of pages | 5 |
ISBN (Electronic) | 9791095546726 |
ISBN (Print) | 9791095546726 |
Publication status | Published - 2022 |
Event | 13th International Conference on Language Resources and Evaluation, LREC 2022 - PALAIS DU PHARO, Marseille, France Duration: 20 Jun 2022 → 25 Jun 2022 Conference number: 13 https://aclanthology.org/2022.lrec-1.0.pdf https://lrec2022.lrec-conf.org/en/ |
Conference
Conference | 13th International Conference on Language Resources and Evaluation, LREC 2022 |
---|---|
Abbreviated title | LREC 2022 |
Country/Territory | France |
City | Marseille |
Period | 20/06/22 → 25/06/22 |
Internet address |
Keywords*
- acquisition
- annotated
- Latvian
- learner corpus
Field of Science*
- 5.3 Educational sciences
- 6.2 Languages and Literature
Publication Type*
- 3.1. Articles or chapters in proceedings/scientific books indexed in Web of Science and/or Scopus database