Tudorx95/NER_Political_Economic
收藏Hugging Face2026-04-23 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Tudorx95/NER_Political_Economic
下载链接
链接失效反馈官方服务:
资源简介:
这是一个用于政治经济实体识别的自定义NER数据集,通过以下方式构建:1) 使用CoNLL-2003和WNUT-2017等预注释源(通过地名录重新映射);2) 对CC-News、Wikipedia和SEC EDGAR等数据使用Snorkel标注函数进行弱监督;3) 为稀有类别生成合成模板。数据集包含11个标签,如政治家、政党、政治组织、金融机构、经济指标、政策、立法、市场事件、货币、贸易协定和地理政治实体。数据集分为训练集、验证集和测试集,其中测试集包含CoNLL-2003测试集作为黄金标准。每条数据为一个JSON对象,包含文本、实体列表和来源信息。
Custom NER dataset for politico-economic entity recognition, built using: 1) Pre-annotated sources: CoNLL-2003, WNUT-2017 (re-mapped via gazetteers); 2) Weak supervision: Snorkel labeling functions over CC-News, Wikipedia, SEC EDGAR; 3) Synthetic templates for rare classes. The dataset includes 11 labels such as POLITICIAN, POLITICAL_PARTY, POLITICAL_ORG, FINANCIAL_ORG, ECONOMIC_INDICATOR, POLICY, LEGISLATION, MARKET_EVENT, CURRENCY, TRADE_AGREEMENT, and GPE. The dataset is split into training, validation, and test sets, with the test set including CoNLL-2003 test as gold standard. Each entry is a JSON object containing text, list of entities, and source information.
提供机构:
Tudorx95



