Aulvem/recall-radar
收藏Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/Aulvem/recall-radar
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
- ja
license: cc-by-4.0
tags:
- product-safety
- recalls
- consumer-protection
- cross-border-ecommerce
- cosmetic
- baby-product
- food-safety
size_categories:
- n<1K
task_categories:
- text-classification
- question-answering
pretty_name: Cross-Border Product Recall Dataset
configs:
- config_name: default
data_files:
- split: train
path: recall_radar_v0.1.csv
---
# Cross-Border Product Recall Dataset
A structured, machine-readable dataset of consumer-facing product recalls in three high-stakes categories (cosmetic, baby_product, food) across three regions (US, EU, JP), drawn from four authoritative agencies. Designed for cross-border e-commerce sellers and AI shopping assistants.
## Overview
Product recalls are scattered across CPSC's web table, FDA's RSS feed, Safety Gate's search interface, and 消費者庁's Japanese-language portal. None of them publish in a uniform structure, and most online aggregators are stale or anonymous. `recall_radar` provides a typed, AI-readable layer with explicit `current_state` flags and an append-only `timeline` per recall — so a downstream agent can answer "is this product currently under recall?" without scraping.
Every record cites the issuing authority's official recall page, carries an explicit `last_checked_at` date, and a `confidence` rating.
> **Reference data, not safety advice. Always verify with the issuing authority's official recall page before discarding, returning, or making purchase decisions.**
## What's in v0.1
| Metric | Value |
|---|---|
| Records (planned) | 30–50 |
| Authorities | 4 (CPSC, FDA, EU Safety Gate, 消費者庁) |
| Regions | 3 (US, EU, JP) |
| Categories | 3 (cosmetic, baby_product, food) |
| Window | rolling past 3 months (announced_date ≥ 2026-02-01) |
| Schema version | 0.1 |
| License | CC-BY 4.0 |
The 4 authorities covered:
1. **CPSC** — U.S. Consumer Product Safety Commission (cosmetic + baby_product)
2. **FDA** — U.S. Food and Drug Administration (food + some cosmetic)
3. **EU Safety Gate** — pan-EU consumer-product alert system (RAPEX successor)
4. **消費者庁** — Japan Consumer Affairs Agency (Recall Information Site)
## How to use
### Python (`datasets` library)
```python
from datasets import load_dataset
ds = load_dataset("Aulvem/recall-radar")
print(ds["train"][0])
# {'id': 'recall_us_cpsc_2026_04_001', 'record_type': 'product_recall', 'country': 'US', ...}
```
### Direct JSON fetch (preserves nested timeline)
The CSV is the default split (flattened, with the timeline collapsed to a latest-event view), but the JSON file preserves the full `timeline` array. Use the JSON if your code needs to walk the complete event log.
```python
import json, urllib.request
url = "https://huggingface.co/datasets/Aulvem/recall-radar/resolve/main/recall_radar_v0.1.json"
data = json.load(urllib.request.urlopen(url))
# Filter by category
cosmetic_recalls = [r for r in data["records"] if r["product_category"] == "cosmetic"]
# Walk the timeline of one record
for ev in data["records"][0]["timeline"]:
print(ev["date"], ev["event_type"], ev["description_en"])
```
### Schema validation
```python
import json, urllib.request, jsonschema # pip install jsonschema
base = "https://huggingface.co/datasets/Aulvem/recall-radar/resolve/main"
schema = json.load(urllib.request.urlopen(f"{base}/schema.json"))
dataset = json.load(urllib.request.urlopen(f"{base}/recall_radar_v0.1.json"))
jsonschema.validate(dataset, schema) # raises on first failure
```
## Schema
The full JSON Schema (Draft 2020-12) ships in this dataset as `schema.json`. v0.1 has one `record_type`; the `oneOf` wrapper anticipates additive v0.2+ types.
### `product_recall` fields
| Field | Type | Notes |
|---|---|---|
| `id` | string | `recall_<cc>_<authority>_<YYYY>_<MM>_<NNN>` |
| `record_type` | const | `"product_recall"` |
| `country` | string | ISO 3166-1 alpha-2 (uppercase); `"EU"` for Safety Gate |
| `recall_authority_en` | string | Issuing agency's official name |
| `recall_id_official` | string | Authority's verbatim identifier (e.g. `"26-001"`) |
| `recall_url` | string (URI) | Single canonical authority detail page |
| `announced_date` | string (ISO date) | Public announcement date |
| `product_name_en/ja` | string | Authority's verbatim spelling preferred |
| `product_category` | enum | `cosmetic` / `baby_product` / `food` / `other` |
| `manufacturer_name` / `brand_name` | string \| null | |
| `affected_units` | number \| null | Authority's stated estimate (verbatim phrase in `notes_en` if no number) |
| `hazard_type` | enum | `chemical` / `mechanical` / `choking` / `contamination` / `allergen` / `labeling` / `burn_fire` / `electrical` / `fall` / `other` / `unknown` |
| `hazard_description_en/ja` | string | Authority's wording preserved |
| `remedy_en/ja` | string \| null | Refund / exchange / repair / destroy instructions |
| `affected_regions` | array of country code | Always non-empty |
| `notes_en/ja` | string \| null | Verbatim phrases, lot ranges, distribution period |
| `source_urls` | array of URI | All sources cited |
| `last_checked_at` | string (ISO date) | Verification date |
| `confidence` | enum | `high` / `medium` / `low` / `unknown` |
| `current_state` | object | `{ status, as_of }`; status enum `active` / `expanded` / `closed` / `superseded` / `unknown` |
| `timeline` | array of object | Append-only event log (21-value `event_type` enum); descending substantive events with `initial_verification` foundation marker pinned to array tail |
The CSV is a flattened view of the same data (timeline collapsed to a latest-event summary; full timeline preserved in JSON only). Both files are kept bit-identical with the canonical sources in the GitHub repository.
## Important notice
> **Reference only. Always verify with the issuing authority's official recall page before discarding, returning, or making purchase decisions.**
Product safety recalls have direct consumer safety implications. Confirm the latest status directly with the issuing authority's official recall page (CPSC, FDA, EU Safety Gate, 消費者庁) before taking any consumer action. The dataset maintainers accept no liability for outcomes derived from this data.
## Update cadence
The dataset is automatically re-scanned and re-extracted weekly (Mondays 03:00 UTC) by a GitHub Actions workflow in the source repository. Each run produces a pull request with a structured diff against the prior snapshot. Updates are merged into the canonical dataset only after human review, and the merged result is mirrored here on Hugging Face.
You can pin to a specific revision using the standard HF dataset versioning, e.g. `load_dataset("Aulvem/recall-radar", revision="<commit-sha>")`.
## Roadmap
| Version | Focus |
|---|---|
| v0.1 (current) | 30–50 recalls × 4 authorities × 3 categories, rolling 3-month window |
| v0.2 | New `record_type` values: `safety_alert`, `market_withdrawal`; expanded categories |
| v0.3 | Incident counts + brand cross-references + distribution channel detail |
| v1.0 | Public REST API with category / hazard / brand filters and webhook subscriptions |
| v2.0 | MCP server for direct AI-agent integration (shopping assistants, compliance copilots) |
Schema changes are additive — v0.1-pinned consumers will not be broken by later versions.
## Source repository
Canonical source, weekly extraction pipeline, schema definition, and contribution process all live on GitHub:
**https://github.com/aulvem/recall_radar**
Issues, pull requests, and inaccuracy reports should go there. This Hugging Face dataset is a downstream mirror.
## How to cite
If you use this dataset in research, products, or downstream tooling, please cite as:
> recall_radar contributors (2026). *Cross-Border Product Recall Dataset* (v0.1). Available at https://huggingface.co/datasets/Aulvem/recall-radar (mirror) and https://github.com/aulvem/recall_radar (canonical), licensed under CC-BY 4.0.
CC-BY 4.0 requires attribution; the citation above — or any substantively equivalent form that names the dataset, version, and license — satisfies that requirement.
## License
Released under [Creative Commons Attribution 4.0 International (CC-BY 4.0)](https://creativecommons.org/licenses/by/4.0/). You are free to share and adapt, including for commercial use, provided you give appropriate credit.
The underlying recall text and authority names remain the property of each authority. This dataset extracts factual information (product names, hazard types, dates, identifiers) under principles applicable to factual extraction; it does not redistribute the original prose. If you republish raw authority-authored text obtained via the source URLs, observe each authority's own terms.
## Contact
Questions, bug reports, and corrections via GitHub issues:
**https://github.com/aulvem/recall_radar/issues**
Please cite the official source page when reporting an inaccuracy.
提供机构:
Aulvem



