Zoo-Lǎohǔ word set: A cross linguistic lexical priming word set for animacy judgements (English and Mandarin Chinese)|跨语言研究数据集|语言发展数据集

DataCite Commons2025-05-17 更新2024-07-13 收录

跨语言研究

语言发展

下载链接：

https://researchdata.ntu.edu.sg/citation?persistentId=doi:10.21979/N9/FEUSIO

下载链接

链接失效反馈

资源简介：

Audio tokens selected and edited from BLIP Lab's Singapore Early Word List - Audio Recordings. Selection performed by Tong, Zhane C. in 2023, under the supervision of SJS. Original audio recordings conducted by Woon Fei Ting and Annabel Loh under the supervision of Suzy J Styles as part of the Talkathon study in 2022-4. Tokens in both languages were spoken by a female Singaporean early-parallel bilingual of Singapore English and Singapore Mandarin in her 20s. For full details of audio recording conditions, please consult the main repository. Audio tokens are stereo wav files. Each token is a single word listed in the file name. This repository contains 86 English and 82 Mandarin tokens of words that are suitable for use with children under the age of 5 in Singapore. For English tokens, the name of the audio file is the word in English. To facilitate checking, handling and picture matching by our multilingual team, for the Mandarin tokens, the file names are given as English translation equivalents followed by the letter M. For Pinyin and Hanzi of the Mandarin words please consult the AudioFileList. Use the "tree" to view files in folder context. The majority of these words were used in a semantic decision task (animate/inanimate), with thematic semantic priming (related/unrelated) within and across languages (language match/mismatch). The study was preregistered (https://doi.org/10.21979/N9/ERB6J8), and materials, data and code have been archived (https://doi.org/10.21979/N9/JXMRVM). The list of semantic priming pairs is included in this repository.

提供机构：

DR-NTU (Data)

创建时间：

2024-04-16

用户留言

有没有相关的论文或文献参考？

这个数据集是基于什么背景创建的？

数据集的作者是谁？

能帮我联系到这个数据集的作者吗？

这个数据集如何下载？

点击留言

数据主题

具身智能

数据集 4098个

机构 8个

大模型

数据集 439个

机构 10个

无人机

数据集 37个

机构 6个

指令微调

数据集 36个

机构 6个

蛋白质结构

数据集 50个

机构 8个

空间智能

数据集 21个

机构 5个

5,000+

优质数据集

54 个

任务类型

进入经典数据集

热门数据集

中国区域交通网络数据集

该数据集包含中国各区域的交通网络信息，包括道路、铁路、航空和水路等多种交通方式的网络结构和连接关系。数据集详细记录了各交通节点的位置、交通线路的类型、长度、容量以及相关的交通流量信息。

data.stats.gov.cn 收录

LinkedIn Salary Insights Dataset

LinkedIn Salary Insights Dataset 提供了全球范围内的薪资数据，包括不同职位、行业、地理位置和经验水平的薪资信息。该数据集旨在帮助用户了解薪资趋势和市场行情，支持职业规划和薪资谈判。

www.linkedin.com 收录

中国空气质量数据集（2014-2020年）

数据集中的空气质量数据类型包括PM2.5, PM10, SO2, NO2, O3, CO, AQI，包含了2014-2020年全国360个城市的逐日空气质量监测数据。监测数据来自中国环境监测总站的全国城市空气质量实时发布平台，每日更新。数据集的原始文件为CSV的文本记录，通过空间化处理生产出Shape格式的空间数据。数据集包括CSV格式和Shape格式两数数据格式。

国家地球系统科学数据中心收录

Traditional-Chinese-Medicine-Dataset-SFT

该数据集是一个高质量的中医数据集，主要由非网络来源的内部数据构成，包含约1GB的中医各个领域临床案例、名家典籍、医学百科、名词解释等优质内容。数据集99%为简体中文内容，质量优异，信息密度可观。数据集适用于预训练或继续预训练用途，未来将继续发布针对SFT/IFT的多轮对话和问答数据集。数据集可以独立使用，但建议先使用配套的预训练数据集对模型进行继续预训练后，再使用该数据集进行进一步的指令微调。数据集还包含一定比例的中文常识、中文多轮对话数据以及古文/文言文<->现代文翻译数据，以避免灾难性遗忘并加强模型表现。

huggingface 收录

TaRF

TaRF 是由密歇根大学、耶鲁大学和加州大学伯克利分校联合创建的视触融合场景数据集，旨在将视觉与触觉信号对齐至共享的三维空间。该数据集包含 19.3k 对齐的视觉与触觉样本，覆盖 13 个普通场景，如办公室、走廊和户外环境。数据采集通过结合神经辐射场（NeRF）和触觉传感器完成，利用多视图几何方法校准视觉与触觉信号，实现空间对齐。TaRF 的创建过程包括场景的多视角视觉重建和同步采集触觉信号，最终通过扩散模型生成未直接采样的触觉信号。该数据集可用于触觉信号估计、触觉定位和材料属性理解等任务，为机器人交互和虚拟世界构建提供重要支持。

github 收录