five

Neurofold/getnid

收藏
Hugging Face2026-02-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Neurofold/getnid
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 tags: - knowledge-graph - wiki-data - wikipedia language: - en pretty_name: GetNID — Neurofold Identity Registry v1.0 size_categories: - 1M<n<10M --- ``` # GetNID — Neurofold Identity Registry v1.0 **A deterministic, cryptographically-immutable identifier registry for 6.7 million Wikipedia concepts.** GetNID mints a canonical **Neurofold ID (NID)** for every resolved Wikipedia article, bridging human-readable titles and Wikidata QIDs through a trustless, serverless resolution protocol. The registry is designed to function as foundational namespace infrastructure for decentralized knowledge systems. --- ## What is a NID? A Neurofold ID is a deterministic 13-character identifier derived directly from the SHA-256 hash of a concept's Wikidata QID. ``` "Mathematics" → Q395 → N5336CE94E17E "Albert Einstein" → Q937 → N6F2A8C14B301 ``` NIDs are **permanent**, **content-addressed**, and **collision-resistant**. They require no central authority to issue or resolve. --- ## Resolution Architecture The registry implements a **double-sharding scheme** for O(1) lookup with no index scans, no search, and no server-side compute. ``` Title lookup (double-hop): sha256(normalize(title))[:2] % 256 → router shard → NID NID[1:5] % 8192 → data shard → record QID / NID lookup (single-hop): sha256(QID)[:4] % 8192 → data shard → record NID[1:5] % 8192 → data shard → record ``` | Layer | Shards | Purpose | |---|---|---| | Data shards | 8,192 | `ledger(nid, qid, title, lang)` | | Router shards | 256 | `routes(key_hash, target_nid)` | All shards are static SQLite files (~100KB each). The full registry is ~1.5GB. Every shard is independently verifiable against a cryptographic `manifest.json` generated at genesis time. --- ## Live Demo **[getnid.org](https://getnid.org)** — resolve any Wikipedia title, QID, or NID in the browser. Resolution runs entirely client-side via a Web Worker using sql.js (SQLite compiled to WebAssembly). Resolved shards are cached locally via the browser's **Origin Private File System (OPFS)**, enabling zero-latency offline resolution after first access. Users can opt into a full global sync to permanently mirror the entire registry locally. --- ## Dataset Structure ``` v1/ ├── manifest.json # SHA-256 hash of every shard (trustless verification) ├── meta.json # Version, genesis timestamp, master checksum, metrics ├── shards/ │ ├── shard_0000.db # SQLite — ledger table │ ├── shard_0001.db │ └── ... (8,192 total) └── routers/ ├── router_000.db # SQLite — routes table ├── router_001.db └── ... (256 total) ``` ### `ledger` schema (data shards) ```sql CREATE TABLE ledger ( nid TEXT PRIMARY KEY, -- e.g. N5336CE94E17E qid TEXT UNIQUE, -- e.g. Q395 title TEXT, -- e.g. Mathematics lang TEXT -- e.g. en ); ``` ### `routes` schema (router shards) ```sql CREATE TABLE routes ( key_hash TEXT PRIMARY KEY, -- sha256(norm_title)[:16] shard_id INTEGER, norm_title TEXT, target_nid TEXT ); ``` --- ## Usage ### Python (local resolution) ```python from getnid.registry import LocalRegistryClient from pathlib import Path client = LocalRegistryClient(Path("./v1")) # Resolve by title client.get_by_title("Mathematics") # → {"nid": "N5336CE94E17E", "qid": "Q395", "title": "Mathematics", "lang": "en"} # Resolve by QID client.get_by_qid("Q42") # → {"nid": "N...", "qid": "Q42", "title": "Douglas Adams", "lang": "en"} # Resolve by NID client.get_by_nid("N5336CE94E17E") # → {"nid": "N5336CE94E17E", "qid": "Q395", "title": "Mathematics", "lang": "en"} ``` ### Direct shard query (any language) ```python import hashlib, sqlite3 def resolve_qid(qid: str, shard_dir: str) -> dict: h = hashlib.sha256(qid.upper().encode()).hexdigest().upper() shard_id = int(h[:4], 16) % 8192 db_path = f"{shard_dir}/shard_{shard_id:04d}.db" with sqlite3.connect(db_path) as conn: row = conn.execute( "SELECT nid, qid, title, lang FROM ledger WHERE qid = ?", (qid.upper(),) ).fetchone() return dict(zip(["nid", "qid", "title", "lang"], row)) if row else None ``` ### JavaScript / Browser ```javascript const sha256 = async (text) => { const buf = await crypto.subtle.digest('SHA-256', new TextEncoder().encode(text)); return Array.from(new Uint8Array(buf)).map(b => b.toString(16).padStart(2,'0')).join('').toUpperCase(); }; async function resolveQID(qid) { const hash = await sha256(qid.toUpperCase()); const shardId = parseInt(hash.slice(0, 4), 16) % 8192; const fileName = `shard_${String(shardId).padStart(4,'0')}.db`; // Fetch shard, query with sql.js } ``` --- ## Metrics | Metric | Value | |---|---| | Minted NIDs | ~6.7M | | Languages | en (v1.0) | | Data shards | 8,192 | | Router shards | 256 | | Avg shard size | ~100 KB | | Total payload | ~1.5 GB | | Lookup complexity | O(1) | --- ## Licensing | Component | License | |---|---| | Registry code & protocol | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | Ledger data (derived from Wikidata) | [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) | Wikidata content is made available by the Wikimedia Foundation under CC BY-SA 4.0. Use of the registry data is subject to those upstream terms. --- ## Repository [github.com/neurofold/getnid](https://github.com/neurofold/getnid) --- ## Citation ```bibtex @misc{getnid2026, author = {Larson, JB}, title = {GetNID: Neurofold Identity Registry v1.0}, year = {2026}, url = {https://huggingface.co/datasets/Neurofold/getnid}, note = {Deterministic identifier registry for 6.7M Wikipedia concepts} } ```
提供机构:
Neurofold
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作