Replication Data for: Norwegian compounds and corresponding constructions in Russian: The case of nouns with deverbal heads|语言学数据集|语料库研究数据集

DataONE2022-07-18 更新2024-06-08 收录

语言学

语料库研究

下载链接：

https://search.dataone.org/view/https://doi.org/10.18710/YRIQ2V

下载链接

链接失效反馈

资源简介：

The database included in this TROLLing post concerns Norwegian compounds with deverbal heads (e.g. papirproduksjon ‘paper production’ from produsere ‘to produce’) and corresponding constructions in Russian, such as the genitive (proizvodstvo bumagi ‘paper production’), the adjective (bumažnoe proizvodstvo ‘paper production’), the preposition (priglašenie na užin ‘dinner invitation’), and compound constructions (zemlevladelec ‘landowner’). The database contains examples excerpted from the parallel RuN corpus available at http://tekstlab.uio.no/glossa2/run., Article abstract: This article presents a corpus study of Norwegian compounds with deverbal heads (e.g., papirproduksjon ‘paper production’ from produsere ‘produce’) and corresponding constructions in Russian, such as the genitive (proizvodstvo bumagi ‘paper production’), the adjective (bumažnoe proizvodstvo ‘paper production’), the preposition (priglašenie na užin ‘dinner invitation’), and compound constructions (zemlevladelec ‘landowner’). Test of the “Non-Head Function Hypothesis” (Mezhevich 2002) indicates that the genitive construction is the most frequent equivalent of Norwegian compounds where the non-head functions as an internal argument (object). However, the adjective and compound constructions represent important competitors, while the preposition construction is more marginal. The genitive construction is shown to be particularly frequent for non-agentive nouns. A number of generalizations about the use of compounds are proposed, and it is argued that the adjective construction involves “typification”, which is an example of the general cognitive process “construal” (Langacker 2008). Finally, an “Extended Non-Head Function Hypothesis” is proposed, according to which the choice of a Russian construction depends on the closeness of the relation between head and non-head of the Norwegian compound. The closer the relation, the more likely is the use of the genitive. The more distant the relation, the more likely is the use of the adjective construction.

创建时间：

2022-07-18

用户留言

有没有相关的论文或文献参考？

这个数据集是基于什么背景创建的？

数据集的作者是谁？

能帮我联系到这个数据集的作者吗？

这个数据集如何下载？

点击留言

数据主题

具身智能

数据集 4098个

机构 8个

大模型

数据集 439个

机构 10个

无人机

数据集 37个

机构 6个

指令微调

数据集 36个

机构 6个

蛋白质结构

数据集 50个

机构 8个

空间智能

数据集 21个

机构 5个

5,000+

优质数据集

54 个

任务类型

进入经典数据集

热门数据集

TaRF

TaRF 是由密歇根大学、耶鲁大学和加州大学伯克利分校联合创建的视触融合场景数据集，旨在将视觉与触觉信号对齐至共享的三维空间。该数据集包含 19.3k 对齐的视觉与触觉样本，覆盖 13 个普通场景，如办公室、走廊和户外环境。数据采集通过结合神经辐射场（NeRF）和触觉传感器完成，利用多视图几何方法校准视觉与触觉信号，实现空间对齐。TaRF 的创建过程包括场景的多视角视觉重建和同步采集触觉信号，最终通过扩散模型生成未直接采样的触觉信号。该数据集可用于触觉信号估计、触觉定位和材料属性理解等任务，为机器人交互和虚拟世界构建提供重要支持。

github 收录

BC-MRI-SEG

BC-MRI-SEG是一个专注于乳腺癌MRI肿瘤分割的基准数据集，由中佛罗里达大学计算机视觉研究中心创建。该数据集整合了四个公开的MRI数据集，包括RIDER、ISPY1、BreastDM和DUKE，总计包含1320名患者的数据。这些数据集在MRI扫描仪的使用、配置及数据处理方法上各有不同，提供了多样化的数据来源。数据集的创建旨在解决医学影像领域中标记数据缺乏的问题，并推动开发适用于临床环境的稳健且适应性强的模型。BC-MRI-SEG的应用领域主要集中在乳腺癌的诊断和治疗评估，通过深度学习方法提高肿瘤分割的准确性和效率。

arXiv 收录

CE-CSL

CE-CSL数据集是由哈尔滨工程大学智能科学与工程学院创建的中文连续手语数据集，旨在解决现有数据集在复杂环境下的局限性。该数据集包含5,988个从日常生活场景中收集的连续手语视频片段，涵盖超过70种不同的复杂背景，确保了数据集的代表性和泛化能力。数据集的创建过程严格遵循实际应用导向，通过收集大量真实场景下的手语视频材料，覆盖了广泛的情境变化和环境复杂性。CE-CSL数据集主要应用于连续手语识别领域，旨在提高手语识别技术在复杂环境中的准确性和效率，促进聋人与听人社区之间的无障碍沟通。

arXiv 收录

中国食物成分数据库

食物成分数据比较准确而详细地描述农作物、水产类、畜禽肉类等人类赖以生存的基本食物的品质和营养成分含量。它是一个重要的我国公共卫生数据和营养信息资源，是提供人类基本需求和基本社会保障的先决条件；也是一个国家制定相关法规标准、实施有关营养政策、开展食品贸易和进行营养健康教育的基础，兼具学术、经济、社会等多种价值。本数据集收录了基于2002年食物成分表的1506条食物的31项营养成分（含胆固醇）数据，657条食物的18种氨基酸数据、441条食物的32种脂肪酸数据、130条食物的碘数据、114条食物的大豆异黄酮数据。

国家人口健康科学数据中心收录

CrowdHuman

CrowdHuman是一个用于评估人群场景中检测器性能的基准数据集。该数据集规模大、注释丰富且具有高多样性，包含训练、验证和测试集，共计47万个标注的人体实例，平均每张图像有23个人，包含各种遮挡情况。每个人体实例都标注有头部边界框、可见区域边界框和全身边界框。

github 收录