Sequence Tagging of FA-KES Dataset

NIAID Data Ecosystem2026-03-12 收录

下载链接：

https://zenodo.org/record/3534273

下载链接

链接失效反馈

官方服务：

资源简介：

We used the BIOE Sequence Tagging strategy which was utilized in OpenTag [1] in the aim of getting every word in the dataset associated with a label called ‘tag’. A tag consists of one of these letters B, I, O, or E, that stand respectively for beginning, inside, outside, or end of an attribute, followed by a ‘-’ sign, followed by three letters that represent the type of information that was initially extracted. Tokens were labeled with one of the following tags: ‘B-LOC’, ‘I-LOC’, ‘E-LOC’, ‘B-CIV’, ‘I-CIV’, ‘E-CIV’, ‘B-NCV’, ‘I-NCV’, ‘E-NCV’, ‘B-WMN’, ‘I-WMN’, ‘E-WMN’, ‘B-CHD’, ‘I-CHD’, ‘E-CHD’, ‘B-ACT’, ‘I-ACT’,‘E-ACT’, ‘B-COD’, ‘I-COD’, ‘E-COD’, ‘B-DAT’, ‘I-DAT’, ‘E-DAT’, or ‘O’ (where O stands for words outside the scope, LOC for the incident location, CIV for the number of civilians dead, NCV for the number of non-civilians dead, WMN for the number of women targeted, CHD for the number of children killed, ACT for actor/authority responsible for the incident, COD for the cause of death, and DAT for date of incident). This was done by creating a parser that would automatically tag each word with the appropriate tag. We created three subsets of the FA-KES dataset. The first one consists of the articles' titles, the second one of the articles' titles concatenated with the articles' first paragraphs, and the third one consists of the articles' titles along with their contents. The first column in each of the three CSV files represents the article number in the dataset, the second column contains the sequence of words for each article and the third one holds the tags linked to the tokens of the previous column. [1]: G. Zheng, S. Mukherjee, X. L. Dong, and F. Li, “Opentag: Open attribute value extraction from product profiles,” CoRR, vol. abs/1806.01264, 2018.

创建时间：

2020-10-18

5,000+

优质数据集

54 个

任务类型

进入经典数据集