five

collectivat/salom-ladino-articles

收藏
Hugging Face2022-10-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/collectivat/salom-ladino-articles
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - found language_creators: - found language: - lad license: cc-by-4.0 multilinguality: - monolingual size_categories: - 100K<n<1M source_datasets: - original task_categories: - text-generation task_ids: - language-modeling --- # Şalom Ladino articles text corpus Text corpus compiled from 397 articles from the Judeo-Espanyol section of [Şalom newspaper](https://www.salom.com.tr/haberler/17/judeo-espanyol). Original sentences and articles belong to Şalom. Size: 176,843 words [Offical link](https://data.sefarad.com.tr/dataset/salom-ladino-articles-text-corpus) Paper on [ArXiv](https://arxiv.org/abs/2205.15599) Citation: ``` Preparing an endangered language for the digital age: The Case of Judeo-Spanish. Alp Öktem, Rodolfo Zevallos, Yasmin Moslem, Güneş Öztürk, Karen Şarhon. Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia (EURALI) @ LREC 2022. Marseille, France. 20 June 2022 ``` This dataset is created as part of project "Judeo-Spanish: Connecting the two ends of the Mediterranean" carried out by Col·lectivaT and Sephardic Center of Istanbul within the framework of the “Grant Scheme for Common Cultural Heritage: Preservation and Dialogue between Turkey and the EU–II (CCH-II)” implemented by the Ministry of Culture and Tourism of the Republic of Turkey with the financial support of the European Union. The content of this website is the sole responsibility of Col·lectivaT and does not necessarily reflect the views of the European Union.
提供机构:
collectivat
原始信息汇总

Şalom Ladino articles text corpus

基本信息

  • 语言: Ladino (Judeo-Espanyol)
  • 许可证: CC-BY-4.0
  • 多语言性: 单语种
  • 数据集大小: 176,843 words
  • 数据集来源: 原始数据
  • 任务类别: 文本生成
  • 任务ID: 语言建模

数据集描述

  • 数据集由397篇文章组成,来源于Şalom报纸的Judeo-Espanyol部分。
  • 数据集用于研究濒危语言在数字时代的准备情况。

引用信息

  • 作者: Alp Öktem, Rodolfo Zevallos, Yasmin Moslem, Güneş Öztürk, Karen Şarhon
  • 出版物: Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia (EURALI) @ LREC 2022
  • 地点与日期: Marseille, France, 20 June 2022
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作