five

Aulvem/recall-radar

收藏
Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/Aulvem/recall-radar
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en - ja license: cc-by-4.0 tags: - product-safety - recalls - consumer-protection - cross-border-ecommerce - cosmetic - baby-product - food-safety size_categories: - n<1K task_categories: - text-classification - question-answering pretty_name: Cross-Border Product Recall Dataset configs: - config_name: default data_files: - split: train path: recall_radar_v0.1.csv --- # Cross-Border Product Recall Dataset A structured, machine-readable dataset of consumer-facing product recalls in three high-stakes categories (cosmetic, baby_product, food) across three regions (US, EU, JP), drawn from four authoritative agencies. Designed for cross-border e-commerce sellers and AI shopping assistants. ## Overview Product recalls are scattered across CPSC's web table, FDA's RSS feed, Safety Gate's search interface, and 消費者庁's Japanese-language portal. None of them publish in a uniform structure, and most online aggregators are stale or anonymous. `recall_radar` provides a typed, AI-readable layer with explicit `current_state` flags and an append-only `timeline` per recall — so a downstream agent can answer "is this product currently under recall?" without scraping. Every record cites the issuing authority's official recall page, carries an explicit `last_checked_at` date, and a `confidence` rating. > **Reference data, not safety advice. Always verify with the issuing authority's official recall page before discarding, returning, or making purchase decisions.** ## What's in v0.1 | Metric | Value | |---|---| | Records (planned) | 30–50 | | Authorities | 4 (CPSC, FDA, EU Safety Gate, 消費者庁) | | Regions | 3 (US, EU, JP) | | Categories | 3 (cosmetic, baby_product, food) | | Window | rolling past 3 months (announced_date ≥ 2026-02-01) | | Schema version | 0.1 | | License | CC-BY 4.0 | The 4 authorities covered: 1. **CPSC** — U.S. Consumer Product Safety Commission (cosmetic + baby_product) 2. **FDA** — U.S. Food and Drug Administration (food + some cosmetic) 3. **EU Safety Gate** — pan-EU consumer-product alert system (RAPEX successor) 4. **消費者庁** — Japan Consumer Affairs Agency (Recall Information Site) ## How to use ### Python (`datasets` library) ```python from datasets import load_dataset ds = load_dataset("Aulvem/recall-radar") print(ds["train"][0]) # {'id': 'recall_us_cpsc_2026_04_001', 'record_type': 'product_recall', 'country': 'US', ...} ``` ### Direct JSON fetch (preserves nested timeline) The CSV is the default split (flattened, with the timeline collapsed to a latest-event view), but the JSON file preserves the full `timeline` array. Use the JSON if your code needs to walk the complete event log. ```python import json, urllib.request url = "https://huggingface.co/datasets/Aulvem/recall-radar/resolve/main/recall_radar_v0.1.json" data = json.load(urllib.request.urlopen(url)) # Filter by category cosmetic_recalls = [r for r in data["records"] if r["product_category"] == "cosmetic"] # Walk the timeline of one record for ev in data["records"][0]["timeline"]: print(ev["date"], ev["event_type"], ev["description_en"]) ``` ### Schema validation ```python import json, urllib.request, jsonschema # pip install jsonschema base = "https://huggingface.co/datasets/Aulvem/recall-radar/resolve/main" schema = json.load(urllib.request.urlopen(f"{base}/schema.json")) dataset = json.load(urllib.request.urlopen(f"{base}/recall_radar_v0.1.json")) jsonschema.validate(dataset, schema) # raises on first failure ``` ## Schema The full JSON Schema (Draft 2020-12) ships in this dataset as `schema.json`. v0.1 has one `record_type`; the `oneOf` wrapper anticipates additive v0.2+ types. ### `product_recall` fields | Field | Type | Notes | |---|---|---| | `id` | string | `recall_<cc>_<authority>_<YYYY>_<MM>_<NNN>` | | `record_type` | const | `"product_recall"` | | `country` | string | ISO 3166-1 alpha-2 (uppercase); `"EU"` for Safety Gate | | `recall_authority_en` | string | Issuing agency's official name | | `recall_id_official` | string | Authority's verbatim identifier (e.g. `"26-001"`) | | `recall_url` | string (URI) | Single canonical authority detail page | | `announced_date` | string (ISO date) | Public announcement date | | `product_name_en/ja` | string | Authority's verbatim spelling preferred | | `product_category` | enum | `cosmetic` / `baby_product` / `food` / `other` | | `manufacturer_name` / `brand_name` | string \| null | | | `affected_units` | number \| null | Authority's stated estimate (verbatim phrase in `notes_en` if no number) | | `hazard_type` | enum | `chemical` / `mechanical` / `choking` / `contamination` / `allergen` / `labeling` / `burn_fire` / `electrical` / `fall` / `other` / `unknown` | | `hazard_description_en/ja` | string | Authority's wording preserved | | `remedy_en/ja` | string \| null | Refund / exchange / repair / destroy instructions | | `affected_regions` | array of country code | Always non-empty | | `notes_en/ja` | string \| null | Verbatim phrases, lot ranges, distribution period | | `source_urls` | array of URI | All sources cited | | `last_checked_at` | string (ISO date) | Verification date | | `confidence` | enum | `high` / `medium` / `low` / `unknown` | | `current_state` | object | `{ status, as_of }`; status enum `active` / `expanded` / `closed` / `superseded` / `unknown` | | `timeline` | array of object | Append-only event log (21-value `event_type` enum); descending substantive events with `initial_verification` foundation marker pinned to array tail | The CSV is a flattened view of the same data (timeline collapsed to a latest-event summary; full timeline preserved in JSON only). Both files are kept bit-identical with the canonical sources in the GitHub repository. ## Important notice > **Reference only. Always verify with the issuing authority's official recall page before discarding, returning, or making purchase decisions.** Product safety recalls have direct consumer safety implications. Confirm the latest status directly with the issuing authority's official recall page (CPSC, FDA, EU Safety Gate, 消費者庁) before taking any consumer action. The dataset maintainers accept no liability for outcomes derived from this data. ## Update cadence The dataset is automatically re-scanned and re-extracted weekly (Mondays 03:00 UTC) by a GitHub Actions workflow in the source repository. Each run produces a pull request with a structured diff against the prior snapshot. Updates are merged into the canonical dataset only after human review, and the merged result is mirrored here on Hugging Face. You can pin to a specific revision using the standard HF dataset versioning, e.g. `load_dataset("Aulvem/recall-radar", revision="<commit-sha>")`. ## Roadmap | Version | Focus | |---|---| | v0.1 (current) | 30–50 recalls × 4 authorities × 3 categories, rolling 3-month window | | v0.2 | New `record_type` values: `safety_alert`, `market_withdrawal`; expanded categories | | v0.3 | Incident counts + brand cross-references + distribution channel detail | | v1.0 | Public REST API with category / hazard / brand filters and webhook subscriptions | | v2.0 | MCP server for direct AI-agent integration (shopping assistants, compliance copilots) | Schema changes are additive — v0.1-pinned consumers will not be broken by later versions. ## Source repository Canonical source, weekly extraction pipeline, schema definition, and contribution process all live on GitHub: **https://github.com/aulvem/recall_radar** Issues, pull requests, and inaccuracy reports should go there. This Hugging Face dataset is a downstream mirror. ## How to cite If you use this dataset in research, products, or downstream tooling, please cite as: > recall_radar contributors (2026). *Cross-Border Product Recall Dataset* (v0.1). Available at https://huggingface.co/datasets/Aulvem/recall-radar (mirror) and https://github.com/aulvem/recall_radar (canonical), licensed under CC-BY 4.0. CC-BY 4.0 requires attribution; the citation above — or any substantively equivalent form that names the dataset, version, and license — satisfies that requirement. ## License Released under [Creative Commons Attribution 4.0 International (CC-BY 4.0)](https://creativecommons.org/licenses/by/4.0/). You are free to share and adapt, including for commercial use, provided you give appropriate credit. The underlying recall text and authority names remain the property of each authority. This dataset extracts factual information (product names, hazard types, dates, identifiers) under principles applicable to factual extraction; it does not redistribute the original prose. If you republish raw authority-authored text obtained via the source URLs, observe each authority's own terms. ## Contact Questions, bug reports, and corrections via GitHub issues: **https://github.com/aulvem/recall_radar/issues** Please cite the official source page when reporting an inaccuracy.
提供机构:
Aulvem
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作