five

CZLC/CNC_fictree

收藏
Hugging Face2024-08-21 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/CZLC/CNC_fictree
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-sa-4.0 language: - cs --- ## Introduction This is a sample from the [FicTree](https://wiki.korpus.cz/doku.php/en:cnk:fictree) dataset, maintained by the [Czech National Corpus](https://korpus.cz/) project. The dataset was created from shared `.vert` file format using the [convert_FICTREE.py](https://huggingface.co/datasets/CZLC/CNC_fictree/blob/main/convert_FICTREE.py) script. ## About Original Dataset (Taken from project [Wiki](https://wiki.korpus.cz/doku.php/en:cnk:fictree)). The **FicTree treebank** is a syntactically annotated corpus of Czech fiction. It consists of 135,000 words (166,000 tokens). The lemmatization, morphological, and syntactic annotation were performed manually. ### Composition of the FicTree Treebank: The FicTree treebank consists of eight literary works published in the Czech Republic between 1991 and 2007. The texts in the treebank include six fiction titles, a children’s fiction book, and a book of memoirs. Most of the texts were first published between 1991 and 2007, except for one text published in 1969. Five texts (80% of all tokens) are original Czech texts, while the other three are translations (from German and Slovak). ## Citation If you use this resource, please cite the following work: ```bibtex @misc{jelinek2017fictree, author = {T. Jelínek and M. Hnátková and H. Skoumalová}, title = {FicTree: Manuálně syntakticky anotovaný korpus české beletrie}, year = {2017}, howpublished = {Ústav Českého národního korpusu FF UK, Praha}, note = {Available from WWW: \url{http://www.korpus.cz}} }
提供机构:
CZLC
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作