surajp/sanskrit_classic
收藏Hugging Face2024-01-18 更新2024-05-25 收录
下载链接:
https://hf-mirror.com/datasets/surajp/sanskrit_classic
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- sa
license:
- other
multilinguality:
- monolingual
size_categories:
- 100K<n<1M
source_datasets:
- original
task_categories:
- text-generation
- fill-mask
task_ids:
- language-modeling
- masked-language-modeling
paperswithcode_id: null
pretty_name: SanskritClassic
dataset_info:
features:
- name: text
dtype: string
config_name: combined
splits:
- name: train
num_bytes: 40299787
num_examples: 342033
download_size: 7258904
dataset_size: 40299787
---
# Dataset Card for [Dataset Name]
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:**[sanskrit_classic](https://github.com/parmarsuraj99/hf_datasets/tree/master/sanskrit_classic)
- **Repository:**[GitHub](https://github.com/parmarsuraj99/hf_datasets/tree/master/sanskrit_classic)
- **Paper:**N/A
- **Leaderboard:**N/A
- **Point of Contact:**[parmarsuraj99](parmarsuraj99@gmail.com)
### Dataset Summary
A collection of classical sanskrit texts
### Supported Tasks and Leaderboards
Language modeling
### Languages
Sanskrit
## Dataset Structure
### Data Instances
{'text': 'मा कर्मफलहेतुर्भूर्मा ते सङ्गोऽस्त्वकर्मणि॥'}
### Data Fields
`text`: a line
### Data Splits
| | Train |
|-------------------|--------|
| n_instances | 342033 |
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
```
@Misc{johnsonetal2014,
author = {Johnson, Kyle P. and Patrick Burns and John Stewart and Todd Cook},
title = {CLTK: The Classical Language Toolkit},
url = {https://github.com/cltk/cltk},
year = {2014--2020},
}
```
### Contributions
Thanks to [@parmarsuraj99](https://github.com/parmarsuraj99) for adding this dataset.
提供机构:
surajp
原始信息汇总
数据集概述
- 名称: SanskritClassic
- 语言: 梵文(sa)
- 许可证: other
- 多语言性: 单语种
- 大小: 100K<n<1M
- 源数据集: 原始数据
- 任务类别:
- 文本生成
- 填充掩码
- 任务ID:
- 语言建模
- 掩码语言建模
数据集结构
- 特征:
- 名称: text
- 数据类型: string
- 配置名称: combined
- 数据分割:
- 名称: train
- 字节数: 40299787
- 实例数: 342033
数据集创建
-
许可证信息: [更多信息需要]
-
引用信息:
@Misc{johnsonetal2014, author = {Johnson, Kyle P. and Patrick Burns and John Stewart and Todd Cook}, title = {CLTK: The Classical Language Toolkit}, url = {https://github.com/cltk/cltk}, year = {2014--2020}, }
-
贡献者: @parmarsuraj99



