neuralshift/sigarra
收藏Hugging Face2024-05-15 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/neuralshift/sigarra
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- pt
license: cc-by-sa-4.0
size_categories:
- n<1K
pretty_name: SIGARRA News Corpus
dataset_info:
features:
- name: id
dtype: string
- name: tokens
sequence: string
- name: ner_tags
sequence: string
splits:
- name: train
num_bytes: 2954783
num_examples: 905
download_size: 544454
dataset_size: 2954783
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# Dataset Card for "sigarra"
The "sigarra" dataset available on Hugging Face is not a property of NeuralShift. We are uploading it to the platform to increase its accessibility and foster further research.
Here's some additional information about the original SIGARRA News Corpus:
Source: University of Porto (UP) SIGARRA information system
Content: A collection of academic news articles with manual annotations for named entity recognition.
Size: Approximately 4.22 MB
Format: Comma-separated values (CSV), ZIP archive, and XML
提供机构:
neuralshift
原始信息汇总
数据集概述
基本信息
- 名称: SIGARRA News Corpus
- 语言: 葡萄牙语 (pt)
- 许可证: CC-BY-SA-4.0
- 大小分类: 小于1K
数据集详情
- 特征:
- id: 字符串类型
- tokens: 字符串序列
- ner_tags: 字符串序列
数据划分
- 训练集:
- 大小: 2954783字节
- 示例数量: 905
下载与数据集大小
- 下载大小: 544454字节
- 数据集大小: 2954783字节
配置
- 默认配置:
- 数据文件:
- 训练集路径: data/train-*
- 数据文件:



