Sequence Tagging of FA-KES Dataset
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/3534273
下载链接
链接失效反馈官方服务:
资源简介:
We used the BIOE Sequence Tagging strategy which was utilized in OpenTag [1] in the aim of getting every word in the dataset
associated with a label called ‘tag’. A tag consists of one of these letters B, I, O, or E, that stand respectively for beginning, inside, outside, or end of an attribute, followed by a ‘-’ sign, followed by three letters that represent the type of information that was initially extracted. Tokens were labeled with one of the following tags: ‘B-LOC’, ‘I-LOC’, ‘E-LOC’, ‘B-CIV’, ‘I-CIV’, ‘E-CIV’, ‘B-NCV’, ‘I-NCV’, ‘E-NCV’, ‘B-WMN’, ‘I-WMN’, ‘E-WMN’, ‘B-CHD’, ‘I-CHD’, ‘E-CHD’, ‘B-ACT’, ‘I-ACT’,‘E-ACT’, ‘B-COD’, ‘I-COD’, ‘E-COD’, ‘B-DAT’, ‘I-DAT’, ‘E-DAT’, or ‘O’ (where O stands for words outside the scope, LOC for the incident location, CIV for the number of civilians dead, NCV for the number of non-civilians dead, WMN for the number of women targeted, CHD for the number of children killed, ACT for actor/authority responsible for the incident, COD for the cause of death, and DAT for date of incident). This was done by creating a parser that would automatically tag each word with the appropriate tag.
We created three subsets of the FA-KES dataset. The first one consists of the articles' titles, the second one of the articles' titles concatenated with the articles' first paragraphs, and the third one consists of the articles' titles along with their contents. The first column in each of the three CSV files represents the article number in the dataset, the second column contains the sequence of words for each article and the third one holds the tags linked to the tokens of the previous column.
[1]: G. Zheng, S. Mukherjee, X. L. Dong, and F. Li, “Opentag: Open attribute value extraction from product profiles,” CoRR, vol. abs/1806.01264, 2018.
创建时间:
2020-10-18



