ANTILLES
收藏数据集概述
数据集名称
- ANTILLES:一个开放的法语语言学增强词性标注语料库
数据集描述
ANTILLES是一个基于 UD_French-GSD 的词性标注语料库,原始创建于2015年,基于universal dependency treebank v2.0。- 原始语料库包含400,399个单词(16,341个句子),有17个不同的类别。通过应用标签增强脚本
transform.py,现在扩展到60个类别,增加了性别、数目、语气、人称、时态或动词形式等语义信息。
数据集来源
- 数据集的标签基于LIA_TAGG统计POS标记器,由Frédéric Béchet于2001年编写。
数据集统计
- 训练集:14,449个文档,平均每个文档24.54个标记。
- 开发集:1,476个文档,平均每个文档24.19个标记。
- 测试集:416个文档,平均每个文档24.08个标记。
数据集标签
- 原始标签:PRON, VERB, SCONJ, ADP, CCONJ, DET, NOUN, ADJ, AUX, ADV, PUNCT, PROPN, NUM, SYM, PART, X, INTJ
- 新标签:PREP, AUX, ADV, COSUB, COCO, PART, PRON, PDEMMS, PDEMMP, PDEMFS, PDEMFP, PINDMS, PINDMP, PINDFS, PINDFP, PROPN, XFAMIL, NUM, DINTMS, DINTFS, PPOBJMS, PPOBJMP, PPOBJFS, PPOBJFP, PPER1S, PPER2S, PPER3MS, PPER3MP, PPER3FS, PPER3FP, PREFS, PREF, PREFP, VERB, VPPMS, VPPMP, VPPFS, VPPFP, VPPRE, DET, DETMS, DETFS, ADJ, ADJMS, ADJMP, ADJFS, ADJFP, NOUN, NMS, NMP, NFS, NFP, PREL, PRELMS, PRELMP, PRELFS, PRELFP, PINTFS, INTJ, CHIF, SYM, YPFOR, PUNCT, MOTINC, X
数据集许可证
- 许可证:Creative Commons Attribution-ShareAlike 4.0 International License
数据集贡献者
- LABRAK Yanis:LIA, Avignon University, Avignon, France.
- DUFOUR Richard:LS2N, Nantes University, Nantes, France.
引用信息
-
ANTILLES 扩展语料库: latex @inproceedings{labrak:hal-03696042, TITLE = {{ANTILLES: An Open French Linguistically Enriched Part-of-Speech Corpus}}, AUTHOR = {Labrak, Yanis and Dufour, Richard}, URL = {https://hal.archives-ouvertes.fr/hal-03696042}, BOOKTITLE = {{25th International Conference on Text, Speech and Dialogue (TSD)}}, ADDRESS = {Brno, Czech Republic}, PUBLISHER = {{Springer}}, YEAR = {2022}, MONTH = Sep, KEYWORDS = {Part-of-speech corpus ; POS tagging ; Open tools ; Word embeddings ; Bi-LSTM ; CRF ; Transformers}, PDF = {https://hal.archives-ouvertes.fr/hal-03696042/file/ANTILLES_A_freNch_linguisTIcaLLy_Enriched_part_of_Speech_corpus.pdf}, HAL_ID = {hal-03696042}, HAL_VERSION = {v1}, }
-
UD_French-GSD 语料库: latex @misc{ universaldependencies, title={UniversalDependencies/UD_French-GSD}, url={https://github.com/UniversalDependencies/UD_French-GSD}, journal={GitHub}, author={UniversalDependencies} }
-
LIA TAGG: latex @techreport{LIA_TAGG, author = {Frédéric Béchet}, title = {LIA_TAGG: a statistical POS tagger + syntactic bracketer}, institution = {Aix-Marseille University & CNRS}, year = {2001} }




