five

liangyuch/tiktok-ad-rl-preview

收藏
Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/liangyuch/tiktok-ad-rl-preview
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: other license_name: tiktok-creative-center-research-only license_link: LICENSE language: ["en"] pretty_name: TikTok Creative Center — Top Ads (Smoke Set) size_categories: ["n<1K"] task_categories: - video-classification - feature-extraction tags: - tiktok - ads - creative-engagement - video configs: - config_name: default data_files: - split: train path: data/train-* --- # TikTok Creative Center — Top Ads (Smoke Set) 100-ad smoke set scraped from the TikTok Creative Center Top Ads surface (US, period=180, sort=for_you). Used to validate the v2 creative-centric crawler (`tiktok-ad-rl` repo, tag `v2.0.0`). ## Schema One row per `ad_id`. ~42 columns including: - **identity:** `ad_id`, `vid`, `ad_title` (short headline), `brand_name`, `industry_key` (+ `industry_name` derived), `objective_key` (+ `objective_name` derived) - **assets:** `video_local_path` (HF `Video`), `cover_local_path` (HF `Image`) - **detail metadata:** `caption` (full on-screen ad copy from baseDetail.adTitle), `objectives_detail` (list of `{label, value}` — multi-objective ads), `objectives_named` (derived human labels), `keyword_list`, `pattern_label`, `landing_page`, `countries_delivered`, `voice_over` - **engagement (creative-global):** `ctr_global` (percentile rank, lower=better), `cost_bucket` (0=Low, 1=Med, 2=High), `like_cnt`, `comments_cnt`, `shares_cnt` - **per-second curves (1 Hz, length = ceil(duration)+1):** `ctr_curve`, `cvr_curve`, `clicks_curve`, `conversion_curve`, `retention_curve` Full schema and gotchas: see `data/SCHEMA.md` in the source repo. ## Usage ```python from datasets import load_dataset ds = load_dataset("liangyuch/tiktok-ad-rl-preview", split="train") row = ds[0] row["cover_local_path"] # PIL.Image row["video_local_path"] # decord Video row["ctr_curve"] # list[float], 1 Hz over playback seconds ``` ## Provenance and intended use - Source: ads.tiktok.com/business/creativecenter (publicly browsable). - Scraped via Playwright route-interception (`tiktok-ad-rl` crawler, tag v2.0.0). - **Research-use only.** Per TikTok's general Terms of Service, automated collection of platform data is restricted; this set is shared as a private, small-scale verification artifact for an academic crawler. Not for commercial use, redistribution, or model training intended for production. - Some videos may be Spark Ads (creator usernames burned-in); advertiser IP belongs to the original advertisers. ## Known caveats (read before analysis) - `ctr_global` is a **percentile rank**, not a click-through rate. Lower = better. - `cost_bucket` is an **ordinal** (0/1/2), ~85% are High in top-ads. - Curves are per **playback second**, not calendar time. Values are TikTok's full-precision floats (the web UI rounds for display only). - `ad_title` (short headline) ≠ `caption` (full on-screen copy). Both columns are present and meaningful. - `objective_key` reflects the slice filter that surfaced the ad; the ad's full objective set is in `objectives_detail` (often multi-valued). - Some older ad_ids may have been delisted by TikTok after our crawl; those rows have populated curves/engagement but `caption` / `objectives_detail` may be NULL because the detail page no longer resolves. - Videos can be partially watermarked (Spark Ads).
提供机构:
liangyuch
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作