five

infinite-dataset-hub/EnglishGermanSyntax

收藏
Hugging Face2024-08-31 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/infinite-dataset-hub/EnglishGermanSyntax
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit tags: - infinite-dataset-hub - synthetic --- # EnglishGermanSyntax tags: Comparison, NLP, Syntax Analysis _Note: This is an AI-generated dataset so its content may be inaccurate or false_ **Dataset Description:** The 'EnglishGermanSyntax' dataset comprises sentences in both English and German that are syntactically paired. It is used to analyze and compare the syntactic structures of English and German, focusing on aspects such as word order, case marking, and syntactic constructions. The dataset serves as a valuable resource for NLP research, particularly for tasks involving cross-linguistic syntactic analysis and the development of translation models that require syntactic understanding. The dataset is constructed by collecting sentences that have direct translations or syntactic parallels in English and German. Each sentence pair in the dataset has a 'label' indicating whether it represents a 'parallel structure' or 'non-parallel structure', highlighting the syntactic similarities or differences between the two languages. **CSV Content Preview:** ```csv sentence_id,english_sentence,german_sentence,label 1,The cat sat on the mat.,Der Kater saß auf der Matte.,parallel_structure 2,She enjoys reading books.,Sie liest gerne Bücher.,parallel_structure 3,He plays the guitar beautifully.,Er spielt Gitarre schön.,non_parallel_structure 4,The dog chased the ball.,Das Hund chased the Ball.,non_parallel_structure 5,I am learning Spanish.,Ich lerne Spanisch.,parallel_structure ``` The 'sentence_id' column uniquely identifies each sentence pair. The 'english_sentence' and 'german_sentence' columns contain the text in English and German, respectively. The 'label' column indicates the syntactic relationship between the English and German sentences: 'parallel_structure' denotes sentences that have a clear syntactic parallel, while 'non_parallel_structure' denotes sentences that demonstrate syntactic differences between English and German. **Source of the data:** The dataset was generated using the [Infinite Dataset Hub](https://huggingface.co/spaces/infinite-dataset-hub/infinite-dataset-hub) and microsoft/Phi-3-mini-4k-instruct using the query 'English German': - **Dataset Generation Page**: https://huggingface.co/spaces/infinite-dataset-hub/infinite-dataset-hub?q=English+German&dataset=EnglishGermanSyntax&tags=Comparison,+NLP,+Syntax+Analysis - **Model**: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct - **More Datasets**: https://huggingface.co/datasets?other=infinite-dataset-hub

许可证:MIT协议 标签: - infinite-dataset-hub - 合成数据集 # 英德句法对比数据集(EnglishGermanSyntax) 标签:对比研究、自然语言处理(Natural Language Processing,简称NLP)、句法分析 **注意:本数据集由人工智能生成,其内容可能存在不准确或错误之处** **数据集描述:** “英德句法对比数据集(EnglishGermanSyntax)”包含成对出现的英、德语句,这些语句在句法上一一对应。本数据集用于分析对比英德两种语言的句法结构,重点关注词序、格标记以及句法构造等维度,可为自然语言处理(NLP)研究提供宝贵资源,尤其适用于跨语言句法分析任务,以及需要具备句法理解能力的翻译模型开发工作。 本数据集通过收集存在直接翻译对应或句法平行关系的英德语句构建而成。数据集中的每一个语句对均带有一个“标签”,用于标识其属于“平行结构”还是“非平行结构”,以此凸显两种语言在句法层面的异同。 **CSV内容预览:** csv sentence_id,english_sentence,german_sentence,label 1,The cat sat on the mat.,Der Kater saß auf der Matte.,parallel_structure 2,She enjoys reading books.,Sie liest gerne Bücher.,parallel_structure 3,He plays the guitar beautifully.,Er spielt Gitarre schön.,non_parallel_structure 4,The dog chased the ball.,Das Hund chased the Ball.,non_parallel_structure 5,I am learning Spanish.,Ich lerne Spanisch.,parallel_structure “sentence_id”列用于唯一标识每个语句对。“english_sentence”与“german_sentence”列分别存储英语与德语语句文本。“label”列用于说明英德语句间的句法关系:“parallel_structure”表示二者具有明确的句法平行性,“non_parallel_structure”则表示二者存在句法差异。 **数据集来源:** 本数据集通过[无限数据集枢纽(Infinite Dataset Hub)](https://huggingface.co/spaces/infinite-dataset-hub/infinite-dataset-hub)与微软(Microsoft)Phi-3-mini-4k-instruct模型,以查询词“English German”生成: - **数据集生成页面**:https://huggingface.co/spaces/infinite-dataset-hub/infinite-dataset-hub?q=English+German&dataset=EnglishGermanSyntax&tags=Comparison,+NLP,+Syntax+Analysis - **模型**:https://huggingface.co/microsoft/Phi-3-mini-4k-instruct - **更多数据集**:https://huggingface.co/datasets?other=infinite-dataset-hub
提供机构:
infinite-dataset-hub
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作