CZLC/CNC_fictree

Name: CZLC/CNC_fictree
Creator: CZLC
Published: 2024-08-21 10:02:53
License: 暂无描述

Hugging Face2024-08-21 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/CZLC/CNC_fictree

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-sa-4.0 language: - cs --- ## Introduction This is a sample from the [FicTree](https://wiki.korpus.cz/doku.php/en:cnk:fictree) dataset, maintained by the [Czech National Corpus](https://korpus.cz/) project. The dataset was created from shared `.vert` file format using the [convert_FICTREE.py](https://huggingface.co/datasets/CZLC/CNC_fictree/blob/main/convert_FICTREE.py) script. ## About Original Dataset (Taken from project [Wiki](https://wiki.korpus.cz/doku.php/en:cnk:fictree)). The **FicTree treebank** is a syntactically annotated corpus of Czech fiction. It consists of 135,000 words (166,000 tokens). The lemmatization, morphological, and syntactic annotation were performed manually. ### Composition of the FicTree Treebank: The FicTree treebank consists of eight literary works published in the Czech Republic between 1991 and 2007. The texts in the treebank include six fiction titles, a children’s fiction book, and a book of memoirs. Most of the texts were first published between 1991 and 2007, except for one text published in 1969. Five texts (80% of all tokens) are original Czech texts, while the other three are translations (from German and Slovak). ## Citation If you use this resource, please cite the following work: ```bibtex @misc{jelinek2017fictree, author = {T. Jelínek and M. Hnátková and H. Skoumalová}, title = {FicTree: Manuálně syntakticky anotovaný korpus české beletrie}, year = {2017}, howpublished = {Ústav Českého národního korpusu FF UK, Praha}, note = {Available from WWW: \url{http://www.korpus.cz}} }

提供机构：

CZLC

5,000+

优质数据集

54 个

任务类型

进入经典数据集