Data collection for learner corpus of Latvian: copyright and personal data protection

Inga Kaija (Coresponding Author), Ilze Auzina

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


Copyright and personal data protection are two of the most important legal aspects of collecting
data for a learner corpus. The paper explains the challenges in data collection for the learner
corpus of Latvian “LaVA” and describes the procedure undertaken to ensure protection of the
texts’ authors’ rights. An agreement / metadata questionnaire form was created to inform the
authors of the ways their texts are used and to receive the authors’ permission to use them in the
stated way. The information, permission, and the metadata questionnaire are printed on one side
of an A4 size paper sheet, and the author is supposed to write the text on the other side by hand,
thus eliminating the need to identify the author of the text separately. After scanning and adding
to the corpus, the text originals are returned to the authors.
Original languageEnglish
Title of host publicationSelected Papers from the CLARIN Annual Conference 2019
EditorsKiril Simov, Maria Eskevich
Number of pages7
Publication statusPublished - 3 Jul 2020
EventCLARIN Annual Conference 2019 - Leipzig, Germany
Duration: 30 Sept 20192 Oct 2019

Publication series

NameLinköping Electronic Conference Proceedings
ISSN (Print)1650-3740


ConferenceCLARIN Annual Conference 2019


  • copyright
  • personal data protection
  • learner corpus
  • Latvian

Field of Science*

  • 6.2 Languages and Literature
  • 1.1 Mathematics

Publication Type*

  • 3.2. Articles or chapters in other proceedings other than those included in 3.1., with an ISBN or ISSN code


Dive into the research topics of 'Data collection for learner corpus of Latvian: copyright and personal data protection'. Together they form a unique fingerprint.

Cite this