SouthernCrossAI/ACE_Australian_Corpus_of_English
收藏Hugging Face2024-08-15 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/SouthernCrossAI/ACE_Australian_Corpus_of_English
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
language:
- en
tags:
- australia
- corpus
- english
size_categories:
- 1M<n<10M
---
# Australian Corpus of English (ACE)
## Overview
**Keywords**: Australian English, Corpus linguistics.
The [Australian Corpus of English (ACE)](https://figshare.mq.edu.au/articles/dataset/Australian_Corpus_of_English_ACE_/24629712?file=43778418) corpus was compiled to match Australian data from 1986 to the standard American and British corpora (Brown and LOB) from the 1960s. It includes **1 million words of published text** in **500 samples from 15 categories of nonfiction and fiction**.
## Data Source
The original dataset is from [Macquarie University Research Data - Australian Corpus of English (ACE)](https://figshare.mq.edu.au/articles/dataset/Australian_Corpus_of_English_ACE_/24629712?file=43778418) and licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/).
The current dataset is cleaned by [Gillian Law](https://huggingface.co/gillianalaw) and the uncleaned dataset can be found on [GitHub](https://github.com/southern-cross-ai/ACE).
---
许可证:MIT协议
语言:英语
标签:澳大利亚、语料库、英语
数据规模分类:100万词 < 数据规模 < 1000万词
---
# 澳大利亚英语语料库(Australian Corpus of English,ACE)
## 概览
**关键词**:澳大利亚英语、语料语言学。
本[澳大利亚英语语料库(Australian Corpus of English,ACE)](https://figshare.mq.edu.au/articles/dataset/Australian_Corpus_of_English_ACE_/24629712?file=43778418)的构建目标为,将1986年起的澳大利亚英语语料与1960年代的标准英美语料库(布朗语料库Brown与LOB语料库)对齐。该语料库涵盖来自15类非虚构与虚构文本的500个样本,总计100万词已发布文本。
## 数据来源
原始数据集源自[麦考瑞大学研究数据-澳大利亚英语语料库(ACE)](https://figshare.mq.edu.au/articles/dataset/Australian_Corpus_of_English_ACE_/24629712?file=43778418),采用[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)许可协议。
本版数据集由[吉莉安·劳(Gillian Law)](https://huggingface.co/gillianalaw)完成数据清洗,未清洗的原始数据集可在[GitHub](https://github.com/southern-cross-ai/ACE)平台获取。
提供机构:
SouthernCrossAI



