Valodas korpusu izmantošana latviešu valodas uzdevumu automātiskā ģenerēšanā

Translated title of the contribution: Use of the Language Corpora in Automatic Generation of Latvian Language Exercises

Ilze Auziņa, Roberts Darģis, Inga Kaija, Kristīne Levāne-Petrova, Kristīne Pokratniece

Research output: Contribution to journalArticlepeer-review


Today, language corpora are not only the empirical basis of research but can also be used in developing a variety of data-driven teaching materials and tools. The experience of other countries shows that the development of self-assessment exercises for language learning can be partially or fully automated using language corpora and natural language processing (NLP) tools, thus providing both a variety of exercises and support for teachers in the implementation of the curriculum. The Latvian Language Learners Corpus (LaVA) developed at the Institute of Mathematics and Computer Science, University of Latvia, includes more than 1000 texts created by foreign Latvian language learners studying at Latvian higher education institutions for the first or second semester reaching A1 (possibly A2) Latvian language proficiency level. The size of the corpus is more than 180 000 words. According to the LaVA data analysis, including learners error analysis, exercises and tests are generated. Data analysis allows us to identify problematic spelling, grammar, and vocabulary issues. The exercises are intended to help the language learner to strengthen the linguistic competence of Latvian language, for example, the use of verb forms in the indicative mood, both in indefinite and perfect tense forms. The article discusses the methodology according to which, based on the statistical and quantitative analysis of the LaVA corpus data, sample sentences are selected from different corpora of Latvian language, for example, The Balanced Corpus of Modern Latvian (LVK2018), The Corpus of Students’ Essays (SPK), as well describes the task-development algorithms and development of online self-assessment exercises site.

Translated title of the contributionUse of the Language Corpora in Automatic Generation of Latvian Language Exercises
Original languageLatvian
Pages (from-to)264-283
Number of pages20
Issue number47
Publication statusPublished - 2022


  • Computational linguistics
  • Exercises
  • Language corpora
  • Latvian language acquisition
  • Sentence selection

Field of Science*

  • 6.2 Languages and Literature

Publication Type*

  • 1.1. Scientific article indexed in Web of Science and/or Scopus database


Dive into the research topics of 'Use of the Language Corpora in Automatic Generation of Latvian Language Exercises'. Together they form a unique fingerprint.

Cite this