2019.1.6

 ČASOPIS PRO MODERNÍ FILOLOGII 2019 (101) 1

Korpus českého jazyka 2. poloviny 19. století

CORPUS OF THE CZECH LANGUAGE OF THE 2ND HALF OF THE 19TH CENTURY

 

Karel Kučera — Kateřina Najbrtová — Klára Pivoňková — Anna Řehořková — Martin Stluka

 

 FULL TEXT   

 ABSTRACT (en)

The paper describes the principles and structure of the one-million-word DIA1900 Corpus built at the Institute of the Czech National Corpus (CNC) in Prague, focused on the language of Czech texts published in the years 1851 to 1900. The DIA1900, planned for publication by June 2020 and to be followed by the DIA1850 (a corpus built around the same principles, with the focus on the first half of the 19th century), observes both the balanced representation of the three major text types (belles lettres — journalistic texts — technical/scientific texts) and the system of morphological tagging implemented in the synchronic corpora included in the CNC project, thus facilitating the diachronic comparison of two stages in the development of Czech.
A brief description is given of the structure of the morphological terminology used in the lemmatisation and tagging of the corpus, and of two tools designed to help search the 19th century texts with their fluctuating orthographic consistency combined with phonological and morphological variation characteristics of the language of the period: (1) a multiple select/suggest feature (reminding the user of the existence of non-standard orthographic and phonological variants of the lemma found in the corpus before the lemma search is started) and (2) the position attribute (informing the user of the ambiguous status of a word in the text, resulting from a misprint or misspelling, damaged page etc.).

 KEYWORDS (en)

diachronic corpus, lemmatisation, morphological tagging, post-national revival Czech, 19th century Czech, phonological variability, orthographic variability, morphological variability

 KLÍČOVÁ SLOVA (cs)

diachronní korpus, lemmatizace, morfologické značkování, poobrozenská čeština, čeština 19. století, hlásková variabilita, pravopisná variabilita, morfologická variabilita

 DOI

https://doi.org/10.14712/23366591.2019.1.6

 REFERENCES

Bláha, O. (2016): Poznámky k morfologickému vývoji češtiny. Olomouc: Univerzita Palackého v Olomouci, 2016.

Kosek, P. (2017): Periodizace vývoje češtiny. In: P. Karlík — M. Nekula — J. Pleskalová (eds.), CzechEncy — Nový encyklopedický slovník češtiny. URL: https://www.czechency.org/slovnik/PERIODIZACE VÝVOJE ČEŠTINY.

Stich, A. (1991): O počátcích moderní spisovné
češtiny. Naše řeč, 74, s. 57–62.

Synková, P. (2017): Popis staročeské apelativní deklinace (se zřetelem k automatické morfologické analýze textů Staročeské textové banky). Praha: Filozofická fakulta UK, 2017.

Šlosar, D. (2017): Poobrozenská čeština 19. stol. In: P. Karlík — M. Nekula — J. Pleskalová (eds.), CzechEncy — Nový encyklopedický slovník češtiny. URL: https://www.czechency.org/slovnik/POOBROZENSK%C3%81%20%C4%8CE%C5%A0TINA%2019.%20STOL.

Tichý, O. (2017): Nástroj na tvaroslovnou analýzu staré angličtiny. Časopis pro moderní filologii, 99, 1, s. 40–54.

Úvod > 2019.1.6