CZLC/CNC_KSK
收藏Hugging Face2024-08-19 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/CZLC/CNC_KSK
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-sa-4.0
language:
- cs
---
## Introduction
This is a sample from [Corpus of Private Correspondence (KSK-dopisy)](https://wiki.korpus.cz/doku.php/en:cnk:ksk-dopisy) dataset, maintained by [Czech National Corpus](https://korpus.cz/) project.
The dataset was created from shared `.vert` file format using [convert_ksk.py](https://huggingface.co/datasets/CZLC/CNC_KSK/blob/main/convert_ksk.py) script.
## About the Dataset
(Taken from project [Wiki](https://wiki.korpus.cz/doku.php/en:cnk:ksk-dopisy), translated).
**Private Correspondence Corpus (KSK-Letters)** allows insight into the language and style of contemporary epistolary texts of a private nature. This corpus captures what might be the final stage in the existence of traditional handwritten correspondence.
- **Corpus Name:** KSK-Letters
- **Number of Letters:** 2,000
- **Number of Positions (Tokens):** 942,573
- **Number of Positions (Tokens) without Punctuation and Other Marks:** 764,918
- **Number of Word Forms (Words):** 76,587
- **Years of Writing:** 1990–2004
The selection of texts maintains the condition of diversity in idiolects, meaning it represents the language of 2,000 different writers from across the Czech Republic, belonging to all age and educational categories. However, emphasis is placed particularly on the communication of young people, which best illustrates the current developmental trends in the Czech language, the transformation of the correspondence genre, and the written expression in general.
## Citation
If you use the corpus, please cite the following work:
```bibtex
@misc{hladka2006ksk,
author = {Zdeňka Hladká},
title = {KSK-dopisy (Korpus soukromé korespondence): přepisy ručně psaných dopisů z let 1990--2004},
year = {2006},
howpublished = {Ústav Českého národního korpusu FF UK, Praha},
note = {Released corpus}
}
```
提供机构:
CZLC



