CELEX2
收藏DataCite Commons2025-04-15 更新2025-04-16 收录
下载链接:
https://datasets.lib.berkeley.edu/citation?persistentId=doi:10.60503/D3/ETE0WL
下载链接
链接失效反馈官方服务:
资源简介:
This corpus contains ASCII versions of the CELEX lexical databases of English (Version 2.5), Dutch (Version 3.1) and German (Version 2.0). CELEX was developed as a joint enterprise of the University of Nijmegen, the Institute for Dutch Lexicology in Leiden, the Max Planck Institute for Psycholinguistics in Nijmegen, and the Institute for Perception Research in Eindhoven. Pre-mastering and production was done by the LDC. For each language, this data set contains detailed information on: orthography (variations in spelling, hyphenation) phonology (phonetic transcriptions, variations in pronunciation, syllable structure, primary stress) morphology (derivational and compositional structure, inflectional paradigms) syntax (word class, word class-specific subcategorizations, argument structures) word frequency (summed word and lemma counts, based on recent and representative text corpora) The databases have not been tailored to fit any particular database management program. Instead, the information is in ASCII files in a UNIX directory tree that can be queried with tools, such as AWK or ICON. Unique identity numbers allow the linking of information from different files. Some kinds of information have to be computed online; wherever necessary, AWK functions have been provided to recover this information. README files specify the details of their use. A detailed User Guide describing the various kinds of lexical information available is supplied. All sections of this guide are POSTSCRIPT files, except for some additional notes on the German lexicon in plain ASCII.
本语料库包含CELEX词汇数据库的ASCII格式版本,涵盖英语(2.5版)、荷兰语(3.1版)与德语(2.0版)。CELEX由奈梅亨大学、莱顿荷兰语词典学研究所、奈梅亨马克斯·普朗克心理语言学研究所以及埃因霍温感知研究所联合开发。预制作与加工由LDC完成。
针对每种语言,本数据集包含如下详细信息:正字法(拼写、连字符使用的变体形式)、音系学(语音转写、发音变体、音节结构与主重音)、形态学(派生与组合结构、屈折范式)、句法学(词类、词类特定次范畴化规则、论元结构)以及词频(基于最新且具有代表性的文本语料库统计的总词频与词元计数)。
本数据库未针对特定数据库管理程序定制,相关信息以ASCII文件形式存储于UNIX目录树中,可通过AWK、ICON等工具进行查询。唯一标识编号可实现不同文件间的信息关联。部分信息需在线计算,必要时已提供AWK函数以恢复此类信息。
README文件详述了其使用细节。本数据集附带一份详细的用户指南,对可用的各类词汇信息进行说明。该指南的所有章节均为PostScript文件,仅有关德语词汇表的附加说明采用纯ASCII格式。
提供机构:
UC Berkeley Library Dataverse
创建时间:
2025-04-15
搜集汇总
数据集介绍

背景与挑战
背景概述
CELEX2是一个多语言词汇数据库,包含英语、荷兰语和德语的详细词汇信息,适用于多种操作系统,并提供工具脚本和用户指南以支持数据查询和处理。
以上内容由遇见数据集搜集并总结生成



