MLINSEA/Moroccan_ads
收藏Hugging Face2024-03-23 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/MLINSEA/Moroccan_ads
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: ad
dtype: string
- name: title
dtype: string
- name: link
dtype: string
- name: channel
dtype: string
splits:
- name: train
num_bytes: 1115354
num_examples: 3992
download_size: 366806
dataset_size: 1115354
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# Dataset Card for "Moroccan_ads"
# YouTube Ads Dataset from Moroccan Channels
## Description
This dataset contains advertisements and related information from Moroccan YouTube channels. It's designed to facilitate research in digital marketing, content analysis, and linguistic studies focused on Moroccan Arabic and French.
## Dataset Structure
The dataset consists of 3992 records, each representing an advertisement from YouTube. The data is organized into four columns:
- `ad`: The text of the advertisement.
- `title`: The title of the YouTube video from which the ad was extracted.
- `link`: The URL to the YouTube video.
- `channel`: The identifier of the YouTube channel (e.g., `@orangemaroc`).
## Data Cleaning
Users should be aware that the dataset contains raw data that may need to be cleaned and preprocessed for analysis. This can include removing special characters, correcting typos, or standardizing text format.
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
MLINSEA
原始信息汇总
数据集卡片 "Moroccan_ads"
描述
该数据集包含来自摩洛哥YouTube频道的广告及相关信息。它旨在促进数字营销、内容分析以及专注于摩洛哥阿拉伯语和法语的语言学研究。
数据集结构
数据集包含3992条记录,每条记录代表来自YouTube的一个广告。数据分为四列:
ad: 广告文本。title: 从中提取广告的YouTube视频的标题。link: YouTube视频的URL。channel: YouTube频道的标识符(例如,@orangemaroc)。
数据清理
用户应注意,该数据集包含可能需要清理和预处理的原始数据。这可能包括删除特殊字符、纠正拼写错误或标准化文本格式。



