five

WitchesSocialStream/HowItsMade

收藏
Hugging Face2024-03-30 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/WitchesSocialStream/HowItsMade
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - text-generation - text-classification language: - en pretty_name: How It's Made size_categories: - n<1K configs: - config_name: default data_files: - split: train path: "HowItsMade.jsonl" --- # Dataset Card for How It's Made This tiny dataset contains parsed subtitles from the Canadian documentary series: "How It's Made". ## Dataset Details ### Dataset Description - **Curated by:** KaraKaraWitch - **Funded by [optional]:** N/A - **Shared by [optional]:** N/A - **Language(s) (NLP):** English - **License:** Not Specified. ## Uses This dataset is intended to be used in Large language models for grounding questions asking on "How X item is made?" ### Direct Use N/A. Dataset released As-Is. ### Out-of-Scope Use The auther thinks that this can be used to generate inaccurate descriptions. ## Dataset Structure Refer to the image as shown ![](https://huggingface.co/datasets/KaraKaraWitch/HowItsMade/resolve/main/Code_Rwe2mhkBsS.png?download=true) ## Dataset Creation ### Curation Rationale <!-- Motivation for the creation of this dataset. --> [More Information Needed] ### Source Data The true source is undisclosed. However, there are some more tech-savvy people who would know where this would come from. *wink.* #### Data Collection and Processing 1. Obtain Episodes 2. Process the subtitles: Remove extra, split into sentences. #### Who are the source data producers? <!-- This section describes the people or systems who originally created the data. It should also include self-reported demographic or identity information for the source data creators if this information is available. --> [More Information Needed] ## Bias, Risks, and Limitations Weight and Temperatures uses US Imperial units instead of typical metric. Further processing is required for correct use. ### Recommendations Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations. ## Dataset Card Contact Contact within the community tabs.
提供机构:
WitchesSocialStream
原始信息汇总

数据集概述:How Its Made

基本信息

  • 任务类别:
    • 文本生成
    • 文本分类
  • 语言: 英语
  • 数据集大小: 小于1000条记录
  • 配置:
    • 配置名称: default
    • 数据文件:
      • 分割: train
      • 路径: "HowItsMade.jsonl"

数据集描述

  • 创建者: KaraKaraWitch
  • 语言: 英语
  • 许可证: 未指定

用途

  • 主要用途: 用于大型语言模型中,回答关于“X物品是如何制造的?”的问题。
  • 直接使用: 数据集按原样发布。
  • 潜在风险: 可能被用于生成不准确的描述。

数据集结构

  • 结构详情: 请参考提供的图片链接。

数据收集与处理

  • 数据来源: 未公开,但技术娴熟的人可能知道其来源。
  • 处理步骤:
    1. 获取剧集
    2. 处理字幕:去除多余部分,分割成句子。

偏差、风险和局限性

  • 局限性: 使用美国英制单位而非公制单位,需要进一步处理以正确使用。

建议

  • 用户注意事项: 应了解数据集的风险、偏差和局限性。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作