olafmeneses/EhuHac
收藏Hugging Face2025-11-15 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/olafmeneses/EhuHac
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: id
dtype: int64
- name: source_text
dtype: string
- name: dest_text
dtype: string
splits:
- name: complete_dataset
num_bytes: 128004936
num_examples: 586839
- name: train
num_bytes: 2181261
num_examples: 10000
- name: validation
num_bytes: 218126
num_examples: 1000
- name: test
num_bytes: 218126
num_examples: 1000
download_size: 79209185
dataset_size: 130622449
configs:
- config_name: default
data_files:
- split: complete_dataset
path: data/complete_dataset-*
- split: train
path: data/train-*
- split: validation
path: data/validation-*
- split: test
path: data/test-*
---
# Dataset Card for EhuHac (Es-Eu)
HAC is a corpus made up of texts translated into four languages (and also one of the originals in the field).
This dataset only contains sentences for spanish (es) - euskera (eu).
## Dataset Sources
Source: Hizkuntzen Arteko Corpusa (HAC). Ibon Sarasola, Pello Salaburu, Josu Landa, 2015. Bilbo: UPV/EHU (Euskara Institutoa). ISBN: 978-84-693-9891-3 (https://www.ehu.eus/ehg/hac/)
Can be found at https://opus.nlpl.eu/EhuHac/es&eu/v1/EhuHac (J. Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012))
提供机构:
olafmeneses



