five

SouthernCrossAI/COOEE_The_Corpus_of_Oz_Early_English

收藏
Hugging Face2024-08-16 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/SouthernCrossAI/COOEE_The_Corpus_of_Oz_Early_English
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit language: - en tags: - australia - corpus - english size_categories: - 10K<n<100K --- # A COrpus of Oz Early English (COOEE) ## Overview Material to be included had to meet a regional and a temporal criterion. The latter required texts to have been produced between 1788 and 1900 in order to become eligible for COOEE. It was mandatory for a text to have been written in Australia, New Zealand or Norfolk Island. But in a few cases, other localities were allowed. For example, if a person who was a native Australian or who had lived in Australia for a considerable time, wrote a shipboard diary or travelled in other countries. Contains: **Letters**, **published materials** in book form, **historical texts**. The collection is stratified in two ways: - Time period - The corpus is divided into four time periods (the initial numeral of each file name indicates the period from which the document comes): - Period 1: 1788-1825 - Period 2: 1826-1850 - Period 3: 1851-1875 - Period 4: 1876-1900 - Register - The corpus contains material from four registers (the register to which a file belongs is specified in the metadata at the start of each file in the form <r=[register]> using the abbreviations above): - Speech-based (sb) - Private written (prw) - Public written (pcw) - Government English (ge) ## Data Source The original data is downloaded from [LDaCA - A COrpus of Oz Early English (COOEE)](https://data.ldaca.edu.au/collection?id=arcp%3A%2F%2Fname%2Cdoi10.26180%252F23961609&_crateId=arcp%3A%2F%2Fname%2Cdoi10.26180%252F23961609) and licensed under [CC BY 4.0](). The current dataset is cleaned by [Yifan Luo](https://huggingface.co/yifan-luo). You can also find the dataset on [GitHub](https://github.com/southern-cross-ai/COOEE).
提供机构:
SouthernCrossAI
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作