five

MLINSEA/Moroccan_ads

收藏
Hugging Face2024-03-23 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/MLINSEA/Moroccan_ads
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: ad dtype: string - name: title dtype: string - name: link dtype: string - name: channel dtype: string splits: - name: train num_bytes: 1115354 num_examples: 3992 download_size: 366806 dataset_size: 1115354 configs: - config_name: default data_files: - split: train path: data/train-* --- # Dataset Card for "Moroccan_ads" # YouTube Ads Dataset from Moroccan Channels ## Description This dataset contains advertisements and related information from Moroccan YouTube channels. It's designed to facilitate research in digital marketing, content analysis, and linguistic studies focused on Moroccan Arabic and French. ## Dataset Structure The dataset consists of 3992 records, each representing an advertisement from YouTube. The data is organized into four columns: - `ad`: The text of the advertisement. - `title`: The title of the YouTube video from which the ad was extracted. - `link`: The URL to the YouTube video. - `channel`: The identifier of the YouTube channel (e.g., `@orangemaroc`). ## Data Cleaning Users should be aware that the dataset contains raw data that may need to be cleaned and preprocessed for analysis. This can include removing special characters, correcting typos, or standardizing text format. [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
MLINSEA
原始信息汇总

数据集卡片 "Moroccan_ads"

描述

该数据集包含来自摩洛哥YouTube频道的广告及相关信息。它旨在促进数字营销、内容分析以及专注于摩洛哥阿拉伯语和法语的语言学研究。

数据集结构

数据集包含3992条记录,每条记录代表来自YouTube的一个广告。数据分为四列:

  • ad: 广告文本。
  • title: 从中提取广告的YouTube视频的标题。
  • link: YouTube视频的URL。
  • channel: YouTube频道的标识符(例如,@orangemaroc)。

数据清理

用户应注意,该数据集包含可能需要清理和预处理的原始数据。这可能包括删除特殊字符、纠正拼写错误或标准化文本格式。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作