collectivat/salom-ladino-articles
收藏Hugging Face2022-10-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/collectivat/salom-ladino-articles
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- found
language_creators:
- found
language:
- lad
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- 100K<n<1M
source_datasets:
- original
task_categories:
- text-generation
task_ids:
- language-modeling
---
# Şalom Ladino articles text corpus
Text corpus compiled from 397 articles from the Judeo-Espanyol section of [Şalom newspaper](https://www.salom.com.tr/haberler/17/judeo-espanyol). Original sentences and articles belong to Şalom.
Size: 176,843 words
[Offical link](https://data.sefarad.com.tr/dataset/salom-ladino-articles-text-corpus)
Paper on [ArXiv](https://arxiv.org/abs/2205.15599)
Citation:
```
Preparing an endangered language for the digital age: The Case of Judeo-Spanish. Alp Öktem, Rodolfo Zevallos, Yasmin Moslem, Güneş Öztürk, Karen Şarhon.
Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia (EURALI) @ LREC 2022. Marseille, France. 20 June 2022
```
This dataset is created as part of project "Judeo-Spanish: Connecting the two ends of the Mediterranean" carried out by Col·lectivaT and Sephardic Center of Istanbul within the framework of the “Grant Scheme for Common Cultural Heritage: Preservation and Dialogue between Turkey and the EU–II (CCH-II)” implemented by the Ministry of Culture and Tourism of the Republic of Turkey with the financial support of the European Union. The content of this website is the sole responsibility of Col·lectivaT and does not necessarily reflect the views of the European Union.
提供机构:
collectivat
原始信息汇总
Şalom Ladino articles text corpus
基本信息
- 语言: Ladino (Judeo-Espanyol)
- 许可证: CC-BY-4.0
- 多语言性: 单语种
- 数据集大小: 176,843 words
- 数据集来源: 原始数据
- 任务类别: 文本生成
- 任务ID: 语言建模
数据集描述
- 数据集由397篇文章组成,来源于Şalom报纸的Judeo-Espanyol部分。
- 数据集用于研究濒危语言在数字时代的准备情况。
引用信息
- 作者: Alp Öktem, Rodolfo Zevallos, Yasmin Moslem, Güneş Öztürk, Karen Şarhon
- 出版物: Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia (EURALI) @ LREC 2022
- 地点与日期: Marseille, France, 20 June 2022



