Corpus of Contemporary American English (COCA) 1990 to 2012 Datasets

Name: Corpus of Contemporary American English (COCA) 1990 to 2012 Datasets
Creator: University of Arizona Research Data Repository
Published: 2022-05-16 14:39:35
License: 暂无描述

DataCite Commons2022-05-16 更新2024-07-13 收录

下载链接：

https://arizona.figshare.com/articles/dataset/Corpus_of_Contemporary_American_English_COCA_1990_to_2012_Datasets/16638328/2

下载链接

链接失效反馈

官方服务：

资源简介：

Dataset available only to University of Arizona affiliates. To obtain access, you must log into ReDATA with your NetID. Data is for research use by each individual downloader only. Sharing and/or redistribution of any portion of this dataset is prohibited. In no case can substantial amounts of the full-text data (typically a total of 50,000 words or more) be distributed outside the University of Arizona. If portions of the derived data are made available to others, it cannot include substantial portions of the raw frequency of words. Any publications of products that are based on the data should contain a reference to the source of the data: http://corpus.byu.edu/full-text/. COCA cannot be used to create software or products that will be sold to others. This database is only available on the COCA website. To access the data, follow the link provided (https://coca.library.arizona.edu). The Corpus of Contemporary American English (COCA) is the largest freely-available corpus of English, and the only large and balanced corpus of American English. The corpus was created by Mark Davies of Brigham Young University, and it is used by tens of thousands of users every month (linguists, teachers, translators, and other researchers). COCA is also related to other large corpora that we have created. The corpus contains more than 450 million words of text and is equally divided among spoken, fiction, popular magazines, newspapers, and academic texts. It includes 20 million words each year from 1990-2012 and the corpus is also updated regularly (the most recent texts are from Summer 2012). Because of its design, it is perhaps the only corpus of English that is suitable for looking at current, ongoing changes in the language. For inquiries regarding the contents of this dataset, please contact the Corresponding Author listed in the README.txt file. Administrative inquiries (e.g., removal requests, trouble downloading, etc.) can be directed to data-management@arizona.edu

本数据集仅对亚利桑那大学（University of Arizona）关联人员开放。如需获取访问权限，需使用个人网络身份标识（NetID）登录ReDATA平台。本数据集仅授权每位下载者用于个人研究用途，严禁任何形式的共享或分发数据集的任意部分。任何情况下均不得将全文数据的实质性内容（通常指总字数达5万字及以上）分发至亚利桑那大学境外。若需向他人共享衍生数据片段，不得包含原始词频的实质性部分。基于本数据集开发的任何成果发表时，均需标注数据集来源：http://corpus.byu.edu/full-text/。严禁使用COCA开发可供对外售卖的软件或产品。本数据库仅可在COCA官网获取。如需获取本数据集，请通过指定链接（https://coca.library.arizona.edu）访问。当代美国英语语料库（Corpus of Contemporary American English，简称COCA）是目前规模最大的免费英语语料库，同时也是唯一一款大型均衡型美国英语语料库。该语料库由杨百翰大学（Brigham Young University）的马克·戴维斯（Mark Davies）开发，每月有数万名用户使用，其中包括语言学家、教师、翻译人员及其他科研人员。COCA还与我们开发的其他大型语料库存在关联。该语料库包含超4.5亿词的文本内容，按口语、小说、通俗杂志、报纸及学术文本五大类别实现均衡分布。语料库涵盖1990年至2012年每年2000万词的文本，且会定期进行更新（最新文本采集于2012年夏季）。得益于其科学的构建设计，该语料库或许是目前唯一适用于研究英语语言当前持续演变过程的英语语料库。若需咨询本数据集的相关内容，请联系README.txt文件中列明的通讯作者。如需咨询行政相关事宜（例如数据移除申请、下载故障等），请发送邮件至data-management@arizona.edu

提供机构：

University of Arizona Research Data Repository

创建时间：

2022-05-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集