ianiket23/podcast_llama_chat_format-5k

Name: ianiket23/podcast_llama_chat_format-5k
Creator: ianiket23
Published: 2024-07-01 09:02:37
License: 暂无描述

Hugging Face2024-07-01 更新2024-07-06 收录

下载链接：

https://hf-mirror.com/datasets/ianiket23/podcast_llama_chat_format-5k

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集（5K）是为了对llama 3聊天模型进行微调而格式化的现有播客数据集（64bits/lex_fridman_podcast_for_llm_vicuna）。它代表了Lex Fridman Podcast的音频到文本的转录，这是一个由MIT的AI研究员Lex Fridman主持的播客。

This dataset (5K) formats an existing podcast dataset (64bits/lex_fridman_podcast_for_llm_vicuna) for llama 3 chat model fine tuning. It represents a compilation of audio-to-text transcripts from the Lex Fridman Podcast, hosted by AI researcher at MIT, Lex Fridman. The dataset includes a train split with 5000 examples. There might be some minor issues during the transcribe phase, and the next step is to use whisper to directly load the podcast and transcribe it in this format.

提供机构：

ianiket23

原始信息汇总

数据集概述

数据集信息

特征:
- 名称: text
- 数据类型: string
分割:
- 名称: train
- 字节数: 44928134
- 样本数: 5000
下载大小: 24185163
数据集大小: 44928134

配置

配置名称: default
- 数据文件:
  - 分割: train
  - 路径: data/train-*

数据集简介

该数据集（5K）用于为llama 3聊天模型微调格式化现有的播客数据集（64bits/lex_fridman_podcast_for_llm_vicuna）。它包含了Lex Fridman播客的音频转文本的转录内容。Lex Fridman播客由MIT的AI研究员Lex Fridman主持。

问题

在转录阶段可能存在一些轻微问题。

下一步

使用whisper直接加载播客并按此格式转录。

5,000+

优质数据集

54 个

任务类型

进入经典数据集