liangyuch/tiktok-ad-rl-preview
收藏Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/liangyuch/tiktok-ad-rl-preview
下载链接
链接失效反馈官方服务:
资源简介:
---
license: other
license_name: tiktok-creative-center-research-only
license_link: LICENSE
language: ["en"]
pretty_name: TikTok Creative Center — Top Ads (Smoke Set)
size_categories: ["n<1K"]
task_categories:
- video-classification
- feature-extraction
tags:
- tiktok
- ads
- creative-engagement
- video
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# TikTok Creative Center — Top Ads (Smoke Set)
100-ad smoke set scraped from the TikTok Creative Center Top Ads surface
(US, period=180, sort=for_you). Used to validate the v2 creative-centric
crawler (`tiktok-ad-rl` repo, tag `v2.0.0`).
## Schema
One row per `ad_id`. ~42 columns including:
- **identity:** `ad_id`, `vid`, `ad_title` (short headline), `brand_name`,
`industry_key` (+ `industry_name` derived), `objective_key` (+ `objective_name` derived)
- **assets:** `video_local_path` (HF `Video`), `cover_local_path` (HF `Image`)
- **detail metadata:** `caption` (full on-screen ad copy from baseDetail.adTitle),
`objectives_detail` (list of `{label, value}` — multi-objective ads),
`objectives_named` (derived human labels), `keyword_list`, `pattern_label`,
`landing_page`, `countries_delivered`, `voice_over`
- **engagement (creative-global):** `ctr_global` (percentile rank, lower=better),
`cost_bucket` (0=Low, 1=Med, 2=High), `like_cnt`, `comments_cnt`, `shares_cnt`
- **per-second curves (1 Hz, length = ceil(duration)+1):**
`ctr_curve`, `cvr_curve`, `clicks_curve`, `conversion_curve`, `retention_curve`
Full schema and gotchas: see `data/SCHEMA.md` in the source repo.
## Usage
```python
from datasets import load_dataset
ds = load_dataset("liangyuch/tiktok-ad-rl-preview", split="train")
row = ds[0]
row["cover_local_path"] # PIL.Image
row["video_local_path"] # decord Video
row["ctr_curve"] # list[float], 1 Hz over playback seconds
```
## Provenance and intended use
- Source: ads.tiktok.com/business/creativecenter (publicly browsable).
- Scraped via Playwright route-interception (`tiktok-ad-rl` crawler, tag v2.0.0).
- **Research-use only.** Per TikTok's general Terms of Service, automated
collection of platform data is restricted; this set is shared as a private,
small-scale verification artifact for an academic crawler. Not for
commercial use, redistribution, or model training intended for production.
- Some videos may be Spark Ads (creator usernames burned-in); advertiser IP
belongs to the original advertisers.
## Known caveats (read before analysis)
- `ctr_global` is a **percentile rank**, not a click-through rate. Lower = better.
- `cost_bucket` is an **ordinal** (0/1/2), ~85% are High in top-ads.
- Curves are per **playback second**, not calendar time. Values are
TikTok's full-precision floats (the web UI rounds for display only).
- `ad_title` (short headline) ≠ `caption` (full on-screen copy). Both columns
are present and meaningful.
- `objective_key` reflects the slice filter that surfaced the ad; the ad's
full objective set is in `objectives_detail` (often multi-valued).
- Some older ad_ids may have been delisted by TikTok after our crawl; those
rows have populated curves/engagement but `caption` / `objectives_detail`
may be NULL because the detail page no longer resolves.
- Videos can be partially watermarked (Spark Ads).
提供机构:
liangyuch



