SouthernCrossAI/COOEE_The_Corpus_of_Oz_Early_English
收藏Hugging Face2024-08-16 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/SouthernCrossAI/COOEE_The_Corpus_of_Oz_Early_English
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
language:
- en
tags:
- australia
- corpus
- english
size_categories:
- 10K<n<100K
---
# A COrpus of Oz Early English (COOEE)
## Overview
Material to be included had to meet a regional and a temporal criterion. The latter required texts to have been produced between 1788 and 1900 in order to become eligible for COOEE. It was mandatory for a text to have been written in Australia, New Zealand or Norfolk Island. But in a few cases, other localities were allowed. For example, if a person who was a native Australian or who had lived in Australia for a considerable time, wrote a shipboard diary or travelled in other countries.
Contains: **Letters**, **published materials** in book form, **historical texts**.
The collection is stratified in two ways:
- Time period - The corpus is divided into four time periods (the initial numeral of each file name indicates the period from which the document comes):
- Period 1: 1788-1825
- Period 2: 1826-1850
- Period 3: 1851-1875
- Period 4: 1876-1900
- Register - The corpus contains material from four registers (the register to which a file belongs is specified in the metadata at the start of each file in the form <r=[register]> using the abbreviations above):
- Speech-based (sb)
- Private written (prw)
- Public written (pcw)
- Government English (ge)
## Data Source
The original data is downloaded from [LDaCA - A COrpus of Oz Early English (COOEE)](https://data.ldaca.edu.au/collection?id=arcp%3A%2F%2Fname%2Cdoi10.26180%252F23961609&_crateId=arcp%3A%2F%2Fname%2Cdoi10.26180%252F23961609) and licensed under [CC BY 4.0]().
The current dataset is cleaned by [Yifan Luo](https://huggingface.co/yifan-luo). You can also find the dataset on [GitHub](https://github.com/southern-cross-ai/COOEE).
提供机构:
SouthernCrossAI



