WitchesSocialStream/HowItsMade
收藏Hugging Face2024-03-30 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/WitchesSocialStream/HowItsMade
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- text-generation
- text-classification
language:
- en
pretty_name: How It's Made
size_categories:
- n<1K
configs:
- config_name: default
data_files:
- split: train
path: "HowItsMade.jsonl"
---
# Dataset Card for How It's Made
This tiny dataset contains parsed subtitles from the Canadian documentary series: "How It's Made".
## Dataset Details
### Dataset Description
- **Curated by:** KaraKaraWitch
- **Funded by [optional]:** N/A
- **Shared by [optional]:** N/A
- **Language(s) (NLP):** English
- **License:** Not Specified.
## Uses
This dataset is intended to be used in Large language models for grounding questions asking on "How X item is made?"
### Direct Use
N/A. Dataset released As-Is.
### Out-of-Scope Use
The auther thinks that this can be used to generate inaccurate descriptions.
## Dataset Structure
Refer to the image as shown

## Dataset Creation
### Curation Rationale
<!-- Motivation for the creation of this dataset. -->
[More Information Needed]
### Source Data
The true source is undisclosed. However, there are some more tech-savvy people who would know where this would come from. *wink.*
#### Data Collection and Processing
1. Obtain Episodes
2. Process the subtitles: Remove extra, split into sentences.
#### Who are the source data producers?
<!-- This section describes the people or systems who originally created the data. It should also include self-reported demographic or identity information for the source data creators if this information is available. -->
[More Information Needed]
## Bias, Risks, and Limitations
Weight and Temperatures uses US Imperial units instead of typical metric. Further processing is required for correct use.
### Recommendations
Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations.
## Dataset Card Contact
Contact within the community tabs.
提供机构:
WitchesSocialStream
原始信息汇总
数据集概述:How Its Made
基本信息
- 任务类别:
- 文本生成
- 文本分类
- 语言: 英语
- 数据集大小: 小于1000条记录
- 配置:
- 配置名称: default
- 数据文件:
- 分割: train
- 路径: "HowItsMade.jsonl"
数据集描述
- 创建者: KaraKaraWitch
- 语言: 英语
- 许可证: 未指定
用途
- 主要用途: 用于大型语言模型中,回答关于“X物品是如何制造的?”的问题。
- 直接使用: 数据集按原样发布。
- 潜在风险: 可能被用于生成不准确的描述。
数据集结构
- 结构详情: 请参考提供的图片链接。
数据收集与处理
- 数据来源: 未公开,但技术娴熟的人可能知道其来源。
- 处理步骤:
- 获取剧集
- 处理字幕:去除多余部分,分割成句子。
偏差、风险和局限性
- 局限性: 使用美国英制单位而非公制单位,需要进一步处理以正确使用。
建议
- 用户注意事项: 应了解数据集的风险、偏差和局限性。



