LaVA - Latvian Language Learner corpus

Roberts Darģis, Ilze Auziņa, Inga Kaija, Kristīne Levāne-Petrova, Kristīne Pokratniece

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)
7 Downloads (Pure)


This paper presents the Latvian Language Learner Corpus (LaVA) developed at the Institute of Mathematics and Computer Science, University of Latvia. LaVA corpus contains 1015 essays (190k tokens and 790k characters excluding whitespaces) from foreigners studying at Latvian higher education institutions and who are learning Latvian as a foreign language in the first or second semester, reaching the A1 (possibly A2) Latvian language proficiency level. The corpus has morphological and error annotations. Error analysis and the statistics of the LaVA corpus are also provided in the paper. The corpus is publicly available at:

Original languageEnglish
Title of host publication13th Language Resources and Evaluation Conference, LREC 2022
Subtitle of host publicationProceedings
EditorsNicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Jan Odijk, Stelios Piperidis
PublisherEuropean Language Resources Association (ELRA)
Number of pages5
ISBN (Electronic)9791095546726
ISBN (Print)9791095546726
Publication statusPublished - 2022
Event13th International Conference on Language Resources and Evaluation, LREC 2022 - PALAIS DU PHARO, Marseille, France
Duration: 20 Jun 202225 Jun 2022
Conference number: 13


Conference13th International Conference on Language Resources and Evaluation, LREC 2022
Abbreviated titleLREC 2022
Internet address


  • acquisition
  • annotated
  • Latvian
  • learner corpus

Field of Science*

  • 5.3 Educational sciences
  • 6.2 Languages and Literature

Publication Type*

  • 3.1. Articles or chapters in proceedings/scientific books indexed in Web of Science and/or Scopus database


Dive into the research topics of 'LaVA - Latvian Language Learner corpus'. Together they form a unique fingerprint.

Cite this