five

Croatian Adult Spoken Language Corpus (HrAL)

收藏
SSH Open MarketPlace2025-07-04 更新2025-07-05 收录
下载链接:
https://marketplace.sshopencloud.eu/dataset/u9acgY
下载链接
链接失效反馈
官方服务:
资源简介:
This corpus contains spontaneous conversations among 617 speakers from all Croatian counties, and it comprises more than 250 000 tokens and more than 100 000 types. Data for the corpus were collected from 2010 to 2012, from 2014 to 2015 and during 2016. Participants were adults who spoke Croatian as their mother tongue and first language. Transcripts were annotated with the ages and genders of the speakers, as well as the location of the conversation. A separate spreadsheet lists the speakers' origin, where they have spent most of their life and their level of education. The coverage of metadata for individual samples varies, and is in general more complete for samples collected from 2014 onwards. The corpus is available for download and browsing from a dedicated website.

该语料库收录了来自克罗地亚全国各郡县的617名说话者的自发对话,包含超过25万个词元(Token)与超过10万个词型(type)。该语料库的数据采集时段覆盖2010至2012年、2014至2015年以及2016年全年。参与对话者均为以克罗地亚语为母语与第一语言的成年人。对话转写文本已标注说话者的年龄、性别以及对话发生地点。另有独立电子表格记录了每位说话者的出身地、长期居住地与受教育水平。各样本的元数据覆盖程度存在差异,整体而言,2014年及之后采集的样本元数据更为完整。该语料库可通过专属网站进行下载与浏览。
创建时间:
2025-07-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作