cahlen/class-numbers-real-quadratic

Name: cahlen/class-numbers-real-quadratic
Creator: cahlen
Published: 2026-04-08 00:36:53
License: 暂无描述

Hugging Face2026-04-08 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/cahlen/class-numbers-real-quadratic

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - tabular-classification tags: - number-theory - class-numbers - real-quadratic-fields - cohen-lenstra - gpu-computation - mathematics - computational-number-theory - algebraic-number-theory - continued-fractions pretty_name: "Class Numbers of Real Quadratic Fields (GPU-Computed)" size_categories: - 1B<n<10B configs: - config_name: 1e9_to_1e10 data_files: "data/1e9_to_1e10/*.parquet" description: "All fundamental discriminants d in [10^9, 10^10)" dataset_info: - config_name: 1e9_to_1e10 features: - name: discriminant dtype: uint64 - name: class_number dtype: int32 splits: - name: train num_examples: 2735671820 --- # Class Numbers of Real Quadratic Fields **2.74 billion** class numbers of real quadratic fields Q(√d), computed for every fundamental discriminant d in [10⁹, 10¹⁰) on an 8× NVIDIA B200 DGX cluster in 30 minutes. This dataset does not exist anywhere else. The previous systematic frontier was d ≤ 10¹¹ (Jacobson, Ramachandran, Williams 2006), but their raw per-discriminant data was never published. This is the first openly available, per-discriminant class number table at this scale. > Part of the [bigcompute.science](https://bigcompute.science) project — GPU-accelerated exploration of open conjectures in number theory and combinatorics. ## Quick Start ```python from datasets import load_dataset ds = load_dataset("cahlen/class-numbers-real-quadratic", "1e9_to_1e10", split="train", streaming=True) for row in ds.take(10): print(f"d = {row['discriminant']}, h(d) = {row['class_number']}") ``` ## What's In This Dataset Every row is a fundamental discriminant d and its class number h(d): | Column | Type | Description | |--------|------|-------------| | `discriminant` | `uint64` | Fundamental discriminant d > 0 | | `class_number` | `int32` | Class number h(d) of the real quadratic field Q(√d) | A **fundamental discriminant** is either: - d ≡ 1 (mod 4) and squarefree, or - d = 4m where m ≡ 2 or 3 (mod 4) and m is squarefree The **class number** h(d) measures the failure of unique factorization in the ring of integers of Q(√d). When h(d) = 1, the ring has unique factorization. ## Summary Statistics | Statistic | Value | |-----------|-------| | Range | d ∈ [10⁹, 10¹⁰) | | Fundamental discriminants | 2,735,671,820 | | Computation time | 30 minutes | | Hardware | 8× NVIDIA B200 DGX (1.43 TB VRAM, NVLink 5) | | Throughput | 1.53 million discriminants/sec | ### Class Number Distribution | h | Count | Fraction | |---|-------|----------| | 1 | 456,984,420 | 16.70% | | 2 | 606,415,562 | 22.17% | | 3 | 73,409,125 | 2.68% | | 4 | 540,733,202 | 19.77% | | 5 | 22,715,143 | 0.83% | | 6 | 96,852,027 | 3.54% | | 7 | 10,849,013 | 0.40% | | 8 | 298,291,861 | 10.90% | | 9 | 9,027,194 | 0.33% | | 10 | 30,106,984 | 1.10% | | 12 | 85,877,392 | 3.14% | | 16 | 123,589,441 | 4.52% | ### Cohen-Lenstra p-Divisibility | Divisor | Observed | Cohen-Lenstra (asymptotic) | |---------|----------|---------------------------| | 3 divides h | 15.28% | ~43.99% | | 5 divides h | 4.89% | ~23.84% | | 7 divides h | 2.35% | ~16.33% | ## Key Finding: Non-Monotone Convergence Cohen and Lenstra (1984) predict that h(d) = 1 occurs with probability ≈ 75.446% asymptotically. Our data shows the observed rate is **decreasing** at this scale: | Range | h = 1 fraction | |-------|---------------| | d < 10⁴ | 42.1% | | d ~ 10⁶ | 25.7% | | d ∈ [10⁹, 10¹⁰) | 16.7% | | Asymptotic prediction | 75.4% | The rate must eventually reverse and increase toward 75.4%, but at d ~ 10¹⁰ it hasn't turned around yet. This is because genus theory (the 2-part of the class group, determined by the number of prime factors of d) dominates at moderate discriminants. The values h = 2, 4, 8, 16 alone account for 57% of all discriminants. The odd part of the class group — where Cohen-Lenstra actually applies — must eventually dominate, but convergence is extremely slow. See the [full analysis](https://bigcompute.science/findings/class-number-convergence/) on bigcompute.science. ## Computation Method For each fundamental discriminant d, we compute h(d) via the analytic class number formula: ``` h(d) = round( sqrt(d) * L(1, χ_d) / (2 * R(d)) ) ``` ### Step 1: GPU Squarefree Sieve Each GPU thread checks its position for divisibility by p² for all primes p ≤ √d. Classifies fundamental discriminants and stream-compacts into a packed array. All on-device — no CPU bottleneck. ### Step 2: Regulator R(d) The regulator R(d) = log(ε_d) is computed from the continued fraction expansion, entirely in log-space to avoid integer overflow at d > 10⁹: - d ≡ 0 (mod 4): CF expansion of √(d/4), with first D = 1 detection for cycle completion - d ≡ 1 (mod 4): CF expansion of (1 + √d)/2 with reduced-state cycle detection ### Step 3: L-Function via Euler Product ``` L(1, χ_d) = ∏(p ≤ 99991) (1 - χ_d(p)/p)⁻¹ ``` 9,592 primes stored in CUDA `__constant__` memory. Kronecker symbol χ_d(p) = (d/p) computed via modular exponentiation (Jacobi symbol algorithm). ### Step 4: Assembly Round sqrt(d) * L / (2R) to nearest integer. Atomic histogram updates for aggregate statistics. ### Validation - **Exact match** with PARI/GP `qfbclassno()` on 1,000 randomly sampled discriminants across the full range - h = 1 rate of 42.13% for d < 10⁴ matches PARI exactly - Cross-validated: regulator values match PARI `quadregulator()` to 12+ digits ## Hardware | Component | Specification | |-----------|---------------| | Node | NVIDIA DGX B200 | | GPUs | 8× NVIDIA B200 (183 GB VRAM each) | | Total VRAM | 1.43 TB | | Interconnect | NVLink 5 (NV18), full mesh | | CPUs | 2× Intel Xeon Platinum 8570 (112 cores / 224 threads) | | System RAM | 2 TB DDR5 | ## Reproduce It Yourself ```bash git clone https://github.com/cahlen/idontknow cd idontknow # Compile (adjust -arch for your GPU: sm_100a for B200, sm_120a for RTX 5090) nvcc -O3 -arch=sm_100a -o class_v2 \ scripts/experiments/class-numbers/class_numbers_v2.cu -lpthread -lm # Validate against PARI/GP (should give h=1 at 42.13%) ./class_v2 5 10000 # Full run: d = 10^9 to 10^10 (~30 min on 8x B200, longer on fewer GPUs) ./class_v2 1000000000 10000000000 | tee run.log # Raw (d, h) binary files appear in data/class-numbers/raw_gpu*.bin # Format: repeating (uint64 discriminant, int32 class_number) = 12 bytes per record ``` The kernel auto-detects available GPUs and distributes the range evenly. ## Planned Extensions | Range | Est. Discriminants | Est. Time (8x B200) | |-------|-------------------|---------------------| | [10¹⁰, 10¹¹) | ~27B | ~65 hours (running now) | | [10¹¹, 10¹²) | ~270B | ~27 days | | [10¹², 10¹³) | ~2.7T | ~270 days | The [10¹⁰, 10¹¹) computation is in progress as of 2026-03-30 and will be added to this dataset when complete. ## Related - **Source code**: [github.com/cahlen/idontknow](https://github.com/cahlen/idontknow) — CUDA kernels, experiment infrastructure - **Experiment page**: [bigcompute.science/experiments/class-numbers-real-quadratic](https://bigcompute.science/experiments/class-numbers-real-quadratic/) - **Finding writeup**: [bigcompute.science/findings/class-number-convergence](https://bigcompute.science/findings/class-number-convergence/) - **All experiments**: [bigcompute.science](https://bigcompute.science) — Zaremba's conjecture, Ramsey R(5,5), Hausdorff spectrum, and more - **Agent-readable index**: [bigcompute.science/llms.txt](https://bigcompute.science/llms.txt) ## Understanding This Data Every positive integer that is not a perfect square has a "class number" -- a measure of how complicated the arithmetic is in a certain number system built from that integer. The question here: how are these class numbers distributed for large numbers? Each row is a pair: a fundamental discriminant d (think of it as a specially chosen integer) and its class number h(d). The dataset covers all 2.74 billion fundamental discriminants between 10^9 and 10^10. A concrete example: if you see the row (1000000007, 2), that means the real quadratic field built from sqrt(1000000007) has class number 2 -- its arithmetic has a mild complication, but not much. When h(d) = 1, the arithmetic in that number system is as simple as it can be: every number factors uniquely, just like the regular integers. The Cohen-Lenstra heuristics, a famous set of predictions from 1984, say that asymptotically 75.4% of these class numbers should equal 1. But in our data, only 16.70% have h=1. The most common value is actually h=2 at 22.17%, followed by h=4 at 19.77%. The convergence toward 75.4% is extraordinarily slow -- you would need to go to astronomically large discriminants before h=1 starts dominating. This slow convergence is itself a finding: anyone testing Cohen-Lenstra at "merely" 10^10 would see numbers that look nothing like the asymptotic prediction. This matters because class numbers connect to deep questions in algebraic number theory -- they show up in cryptography, in the study of prime numbers, and in understanding which equations have integer solutions. ## Citation ```bibtex @dataset{humphreys2026classnumbers, title = {Class Numbers of Real Quadratic Fields: GPU-Accelerated Computation to 10^10}, author = {Humphreys, Cahlen}, year = {2026}, month = mar, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/cahlen/class-numbers-real-quadratic}, note = {2.74 billion fundamental discriminants, 8x NVIDIA B200} } ``` ## References 1. Cohen, H. and Lenstra, H.W. Jr. (1984). "Heuristics on class groups of number fields." *Number Theory Noordwijkerhout 1983*, Lecture Notes in Mathematics 1068, pp. 33-62. 2. Jacobson, M.J. Jr., Ramachandran, S., and Williams, H.C. (2006). "Numerical results on class groups of imaginary quadratic fields." *Mathematics of Computation*, 75(254), pp. 1003-1024. 3. Stevenhagen, P. (1993). "The number of real quadratic fields having units of negative norm." *Experimental Mathematics*, 2(2), pp. 121-136. 4. Watkins, M. (2004). "Class numbers of imaginary quadratic fields." *Mathematics of Computation*, 73(246), pp. 907-938. ## Source - **Code**: [class-numbers](https://github.com/cahlen/idontknow/tree/main/scripts/experiments/class-numbers) - **Findings**: [Cohen-Lenstra at Scale](https://bigcompute.science/findings/class-number-convergence/) - **Project**: [bigcompute.science](https://bigcompute.science) - **MCP Server**: `mcp.bigcompute.science` (22 tools, no auth) - **AGENTS.md**: [Contribution guide](https://github.com/cahlen/idontknow/blob/main/AGENTS.md) ## Citation ```bibtex @misc{humphreys2026class_numbers_real_quadratic, author = {Humphreys, Cahlen and Claude (Anthropic)}, title = {Class Numbers of Real Quadratic Fields to 10^11}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/cahlen/class-numbers-real-quadratic} } ``` Human-AI collaborative work (Cahlen Humphreys + Claude). Not independently peer-reviewed. All code and data open for verification. CC BY 4.0.

许可证：CC BY 4.0 任务类别： - 表格分类（tabular-classification）标签： - 数论（number-theory） - 类数（class-numbers） - 实二次域（real-quadratic-fields） - 科恩-伦斯特拉（cohen-lenstra） - GPU计算（gpu-computation） - 数学 - 计算数论（computational-number-theory） - 代数数论（algebraic-number-theory） - 连分数（continued-fractions）展示名称："实二次域类数（GPU计算版）" 样本量区间：10亿 < 样本量 < 100亿配置项： - 配置名称：10^9 至 10^10 数据文件："data/1e9_to_1e10/*.parquet" 描述："区间[10^9, 10^10)内的所有基本判别式d" 数据集信息： - 配置名称：10^9 至 10^10 特征： - 名称：判别式（discriminant）数据类型：uint64 - 名称：类数（class_number）数据类型：int32 分割： - 名称：训练集样本数量：2735671820 # 实二次域类数 27.36亿个实二次域Q(√d)的类数，针对区间[10^9, 10^10)内的每一个基本判别式d，通过8×NVIDIA B200 DGX集群在30分钟内完成计算。本数据集为全球首次公开发布。此前的系统性研究边界为d ≤ 10^11（Jacobson、Ramachandran、Williams，2006），但该研究未公开单判别式级别的原始数据。本数据集是同规模下首个可公开获取的单判别式类数表格。 > 本数据集隶属于[bigcompute.science](https://bigcompute.science)项目——基于GPU加速的数论与组合学公开猜想探索计划。 ## 快速上手 python from datasets import load_dataset ds = load_dataset("cahlen/class-numbers-real-quadratic", "1e9_to_1e10", split="train", streaming=True) for row in ds.take(10): print(f"d = {row['discriminant']}, h(d) = {row['class_number']}") ## 数据集内容说明每一行数据对应一个基本判别式d及其类数h(d)： | 列名 | 数据类型 | 说明 | |--------|------|-------------| | `discriminant` | `uint64` | 正的基本判别式d | | `class_number` | `int32` | 实二次域Q(√d)的类数h(d) | **基本判别式**满足以下两种情况之一： - d ≡ 1 (mod 4) 且无平方因子； - d = 4m，其中m ≡ 2或3 (mod 4) 且m无平方因子 **类数h(d)** 用于衡量Q(√d)的整数环中唯一因子分解性质的失效程度。当h(d)=1时，该整数环满足唯一因子分解。 ## 统计摘要 | 统计项 | 数值 | |-----------|-------| | 判别式范围 | d ∈ [10^9, 10^10) | | 基本判别式总数 | 2,735,671,820 | | 计算耗时 | 30分钟 | | 硬件配置 | 8×NVIDIA B200 DGX（总显存1.43 TB，支持NVLink 5） | | 吞吐率 | 153万个判别式/秒 | ### 类数分布 | 类数h | 样本数量 | 占比 | |---|-------|----------| | 1 | 456,984,420 | 16.70% | | 2 | 606,415,562 | 22.17% | | 3 | 73,409,125 | 2.68% | | 4 | 540,733,202 | 19.77% | | 5 | 22,715,143 | 0.83% | | 6 | 96,852,027 | 3.54% | | 7 | 10,849,013 | 0.40% | | 8 | 298,291,861 | 10.90% | | 9 | 9,027,194 | 0.33% | | 10 | 30,106,984 | 1.10% | | 12 | 85,877,392 | 3.14% | | 16 | 123,589,441 | 4.52% | ### 科恩-伦斯特拉p整除性统计 | 整除因子 | 观测占比 | 科恩-伦斯特拉渐近预测占比 | |---------|----------|---------------------------| | 3整除h(d) | 15.28% | ~43.99% | | 5整除h(d) | 4.89% | ~23.84% | | 7整除h(d) | 2.35% | ~16.33% | ## 核心发现：非单调收敛性科恩与伦斯特拉（1984）预测，渐近情况下h(d)=1的概率约为75.446%。本数据集显示，在此尺度下该观测占比呈下降趋势： | 判别式区间 | h(d)=1的占比 | |-------|---------------| | d < 10^4 | 42.1% | | d ~ 10^6 | 25.7% | | d ∈ [10^9, 10^10) | 16.7% | | 渐近预测值 | 75.4% | 该占比最终将回升至75.4%，但在d ~ 10^10时尚未出现反转。这是因为亏格理论（类群的2-部分，由d的素因子个数决定）在中等判别尺度下占据主导地位。h=2、4、8、16的判别式合计占全部样本的57%。类群的奇部才是科恩-伦斯特拉猜想适用的场景，但收敛过程极其缓慢。可查看bigcompute.science上的[完整分析](https://bigcompute.science/findings/class-number-convergence/)。 ## 计算方法对于每个基本判别式d，我们通过解析类数公式计算h(d)： h(d) = round( sqrt(d) * L(1, χ_d) / (2 * R(d)) ) ### 步骤1：GPU无平方因子筛每个GPU线程针对所有p ≤ √d的素数p，检查其负责的位置是否被p²整除。完成基本判别式分类并将结果流式压缩为打包数组，全程在设备端运行，无CPU瓶颈。 ### 步骤2：调节子R(d)计算调节子R(d) = log(ε_d) 通过连分数展开计算，全程使用对数空间避免d>10^9时的整数溢出： - 当d ≡ 0 (mod 4)：对√(d/4)进行连分数展开，通过检测周期完成时的首项D=1完成判定 - 当d ≡ 1 (mod 4)：对(1+√d)/2进行连分数展开，采用约化状态周期检测算法 ### 步骤3：基于欧拉乘积的L函数计算 L(1, χ_d) = ∏(p ≤ 99991) (1 - χ_d(p)/p)⁻¹ 将9,592个素数存储于CUDA的`__constant__`内存中，克罗内克符号χ_d(p)=(d/p)通过模幂运算（雅可比符号算法）计算。 ### 步骤4：结果整合将`sqrt(d) * L / (2R)`四舍五入为最接近的整数，通过原子直方图更新聚合统计量。 ### 验证环节 - 对全区间内1,000个随机抽样的判别式，与PARI/GP的`qfbclassno()`函数结果完全匹配 - d<10^4时h=1的占比为42.13%，与PARI的计算结果完全一致 - 交叉验证结果：调节子值与PARI的`quadregulator()`函数结果误差不超过12位有效数字 ## 硬件配置 | 组件 | 规格参数 | |-----------|---------------| | 节点 | NVIDIA DGX B200 | | GPU | 8×NVIDIA B200（单卡显存183 GB） | | 总显存 | 1.43 TB | | 互连方案 | NVLink 5（NV18）全互连网络 | | CPU | 2×Intel Xeon Platinum 8570（112核心/224线程） | | 系统内存 | 2 TB DDR5 | ## 自行复现实验 bash git clone https://github.com/cahlen/idontknow cd idontknow # 编译（根据GPU调整-arch参数：sm_100a适用于B200，sm_120a适用于RTX 5090） nvcc -O3 -arch=sm_100a -o class_v2 scripts/experiments/class-numbers/class_numbers_v2.cu -lpthread -lm # 针对PARI/GP进行验证（应得到h=1占比42.13%） ./class_v2 5 10000 # 全量运行：d从10^9到10^10（8×B200下约30分钟，GPU数量越少耗时越长） ./class_v2 1000000000 10000000000 | tee run.log # 原始(d, h)二进制文件将生成于data/class-numbers/raw_gpu*.bin # 格式：重复的(uint64 discriminant, int32 class_number)，单条记录占12字节该程序会自动检测可用GPU并均匀分配计算范围。 ## 计划扩展内容 | 判别式区间 | 预计判别式数量 | 8×B200下预计耗时 | |-------|-------------------|---------------------| | [10^10, 10^11) | ~270亿 | ~65小时（2026-03-30时正在运行中） | | [10^11, 10^12) | ~2700亿 | ~27天 | | [10^12, 10^13) | ~2.7万亿 | ~270天 | [10^10, 10^11)的计算完成后将加入本数据集。 ## 相关资源 - **源代码**：[github.com/cahlen/idontknow](https://github.com/cahlen/idontknow) — CUDA内核、实验基础设施 - **实验页面**：[bigcompute.science/experiments/class-numbers-real-quadratic](https://bigcompute.science/experiments/class-numbers-real-quadratic/) - **发现报告**：[bigcompute.science/findings/class-number-convergence](https://bigcompute.science/findings/class-number-convergence/) - **全部实验项目**：[bigcompute.science](https://bigcompute.science) — 扎雷姆巴猜想、拉姆齐数R(5,5)、豪斯多夫谱等 - **Agent可读索引**：[bigcompute.science/llms.txt](https://bigcompute.science/llms.txt) ## 数据解读每个非完全平方的正整数都对应一个“类数”，用于衡量基于该整数构建的数系中算术运算的复杂程度。本数据集包含区间[10^9,10^10)内的全部27.36亿个基本判别式d及其类数h(d)。举个具体例子：若某行数据为(1000000007, 2)，则代表由√1000000007构建的实二次域的类数为2——其算术运算仅存在轻微的非唯一因子分解情况。当h(d)=1时，该数系的整数环满足唯一因子分解，与普通整数环完全一致。科恩-伦斯特拉启发式猜想（1984年提出的著名预测）指出，渐近情况下75.4%的类数应为1，但本数据中仅16.70%的样本满足h=1。最常见的类数为h=2（占22.17%），其次为h=4（占19.77%）。向75.4%的收敛过程极其缓慢，需要达到天文数字级别的判别式尺度，h=1的比例才会开始占据主导。这一缓慢收敛本身即为一项重要发现：若仅在10^10尺度下测试科恩-伦斯特拉猜想，得到的结果将与渐近预测相去甚远。类数与代数数论中的深层问题密切相关，可应用于密码学、素数研究以及整数解方程的分析等领域。 ## 引用格式 bibtex @dataset{humphreys2026classnumbers, title = {Class Numbers of Real Quadratic Fields: GPU-Accelerated Computation to 10^10}, author = {Humphreys, Cahlen}, year = {2026}, month = mar, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/cahlen/class-numbers-real-quadratic}, note = {2.74 billion fundamental discriminants, 8x NVIDIA B200} } ## 参考文献 1. Cohen, H. 和 Lenstra, H.W. Jr. (1984). "Heuristics on class groups of number fields." *Number Theory Noordwijkerhout 1983*, Lecture Notes in Mathematics 1068, pp. 33-62. 2. Jacobson, M.J. Jr., Ramachandran, S., 和 Williams, H.C. (2006). "Numerical results on class groups of imaginary quadratic fields." *Mathematics of Computation*, 75(254), pp. 1003-1024. 3. Stevenhagen, P. (1993). "The number of real quadratic fields having units of negative norm." *Experimental Mathematics*, 2(2), pp. 121-136. 4. Watkins, M. (2004). "Class numbers of imaginary quadratic fields." *Mathematics of Computation*, 73(246), pp. 907-938. ## 来源 - **代码**：[class-numbers](https://github.com/cahlen/idontknow/tree/main/scripts/experiments/class-numbers) - **发现报告**：[科恩-伦斯特拉猜想的大规模验证](https://bigcompute.science/findings/class-number-convergence/) - **项目主页**：[bigcompute.science](https://bigcompute.science) - **MCP服务器**：`mcp.bigcompute.science`（22个工具，无需认证） - **贡献指南**：[AGENTS.md](https://github.com/cahlen/idontknow/blob/main/AGENTS.md) ## 补充引用 bibtex @misc{humphreys2026class_numbers_real_quadratic, author = {Humphreys, Cahlen and Claude (Anthropic)}, title = {Class Numbers of Real Quadratic Fields to 10^11}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/cahlen/class-numbers-real-quadratic} } 本工作为人类与AI协作成果（Cahlen Humphreys + Claude），未经过同行评审。所有代码与数据均可公开验证。采用CC BY 4.0许可证。

提供机构：

cahlen

5,000+

优质数据集

54 个

任务类型

进入经典数据集