cahlen/class-numbers-real-quadratic
收藏Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/cahlen/class-numbers-real-quadratic
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- tabular-classification
tags:
- number-theory
- class-numbers
- real-quadratic-fields
- cohen-lenstra
- gpu-computation
- mathematics
- computational-number-theory
- algebraic-number-theory
- continued-fractions
pretty_name: "Class Numbers of Real Quadratic Fields (GPU-Computed)"
size_categories:
- 1B<n<10B
configs:
- config_name: 1e9_to_1e10
data_files: "data/1e9_to_1e10/*.parquet"
description: "All fundamental discriminants d in [10^9, 10^10)"
dataset_info:
- config_name: 1e9_to_1e10
features:
- name: discriminant
dtype: uint64
- name: class_number
dtype: int32
splits:
- name: train
num_examples: 2735671820
---
# Class Numbers of Real Quadratic Fields
**2.74 billion** class numbers of real quadratic fields Q(√d), computed for every fundamental discriminant d in [10⁹, 10¹⁰) on an 8× NVIDIA B200 DGX cluster in 30 minutes.
This dataset does not exist anywhere else. The previous systematic frontier was d ≤ 10¹¹ (Jacobson, Ramachandran, Williams 2006), but their raw per-discriminant data was never published. This is the first openly available, per-discriminant class number table at this scale.
> Part of the [bigcompute.science](https://bigcompute.science) project — GPU-accelerated exploration of open conjectures in number theory and combinatorics.
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("cahlen/class-numbers-real-quadratic", "1e9_to_1e10", split="train", streaming=True)
for row in ds.take(10):
print(f"d = {row['discriminant']}, h(d) = {row['class_number']}")
```
## What's In This Dataset
Every row is a fundamental discriminant d and its class number h(d):
| Column | Type | Description |
|--------|------|-------------|
| `discriminant` | `uint64` | Fundamental discriminant d > 0 |
| `class_number` | `int32` | Class number h(d) of the real quadratic field Q(√d) |
A **fundamental discriminant** is either:
- d ≡ 1 (mod 4) and squarefree, or
- d = 4m where m ≡ 2 or 3 (mod 4) and m is squarefree
The **class number** h(d) measures the failure of unique factorization in the ring of integers of Q(√d). When h(d) = 1, the ring has unique factorization.
## Summary Statistics
| Statistic | Value |
|-----------|-------|
| Range | d ∈ [10⁹, 10¹⁰) |
| Fundamental discriminants | 2,735,671,820 |
| Computation time | 30 minutes |
| Hardware | 8× NVIDIA B200 DGX (1.43 TB VRAM, NVLink 5) |
| Throughput | 1.53 million discriminants/sec |
### Class Number Distribution
| h | Count | Fraction |
|---|-------|----------|
| 1 | 456,984,420 | 16.70% |
| 2 | 606,415,562 | 22.17% |
| 3 | 73,409,125 | 2.68% |
| 4 | 540,733,202 | 19.77% |
| 5 | 22,715,143 | 0.83% |
| 6 | 96,852,027 | 3.54% |
| 7 | 10,849,013 | 0.40% |
| 8 | 298,291,861 | 10.90% |
| 9 | 9,027,194 | 0.33% |
| 10 | 30,106,984 | 1.10% |
| 12 | 85,877,392 | 3.14% |
| 16 | 123,589,441 | 4.52% |
### Cohen-Lenstra p-Divisibility
| Divisor | Observed | Cohen-Lenstra (asymptotic) |
|---------|----------|---------------------------|
| 3 divides h | 15.28% | ~43.99% |
| 5 divides h | 4.89% | ~23.84% |
| 7 divides h | 2.35% | ~16.33% |
## Key Finding: Non-Monotone Convergence
Cohen and Lenstra (1984) predict that h(d) = 1 occurs with probability ≈ 75.446% asymptotically. Our data shows the observed rate is **decreasing** at this scale:
| Range | h = 1 fraction |
|-------|---------------|
| d < 10⁴ | 42.1% |
| d ~ 10⁶ | 25.7% |
| d ∈ [10⁹, 10¹⁰) | 16.7% |
| Asymptotic prediction | 75.4% |
The rate must eventually reverse and increase toward 75.4%, but at d ~ 10¹⁰ it hasn't turned around yet. This is because genus theory (the 2-part of the class group, determined by the number of prime factors of d) dominates at moderate discriminants. The values h = 2, 4, 8, 16 alone account for 57% of all discriminants. The odd part of the class group — where Cohen-Lenstra actually applies — must eventually dominate, but convergence is extremely slow.
See the [full analysis](https://bigcompute.science/findings/class-number-convergence/) on bigcompute.science.
## Computation Method
For each fundamental discriminant d, we compute h(d) via the analytic class number formula:
```
h(d) = round( sqrt(d) * L(1, χ_d) / (2 * R(d)) )
```
### Step 1: GPU Squarefree Sieve
Each GPU thread checks its position for divisibility by p² for all primes p ≤ √d. Classifies fundamental discriminants and stream-compacts into a packed array. All on-device — no CPU bottleneck.
### Step 2: Regulator R(d)
The regulator R(d) = log(ε_d) is computed from the continued fraction expansion, entirely in log-space to avoid integer overflow at d > 10⁹:
- d ≡ 0 (mod 4): CF expansion of √(d/4), with first D = 1 detection for cycle completion
- d ≡ 1 (mod 4): CF expansion of (1 + √d)/2 with reduced-state cycle detection
### Step 3: L-Function via Euler Product
```
L(1, χ_d) = ∏(p ≤ 99991) (1 - χ_d(p)/p)⁻¹
```
9,592 primes stored in CUDA `__constant__` memory. Kronecker symbol χ_d(p) = (d/p) computed via modular exponentiation (Jacobi symbol algorithm).
### Step 4: Assembly
Round sqrt(d) * L / (2R) to nearest integer. Atomic histogram updates for aggregate statistics.
### Validation
- **Exact match** with PARI/GP `qfbclassno()` on 1,000 randomly sampled discriminants across the full range
- h = 1 rate of 42.13% for d < 10⁴ matches PARI exactly
- Cross-validated: regulator values match PARI `quadregulator()` to 12+ digits
## Hardware
| Component | Specification |
|-----------|---------------|
| Node | NVIDIA DGX B200 |
| GPUs | 8× NVIDIA B200 (183 GB VRAM each) |
| Total VRAM | 1.43 TB |
| Interconnect | NVLink 5 (NV18), full mesh |
| CPUs | 2× Intel Xeon Platinum 8570 (112 cores / 224 threads) |
| System RAM | 2 TB DDR5 |
## Reproduce It Yourself
```bash
git clone https://github.com/cahlen/idontknow
cd idontknow
# Compile (adjust -arch for your GPU: sm_100a for B200, sm_120a for RTX 5090)
nvcc -O3 -arch=sm_100a -o class_v2 \
scripts/experiments/class-numbers/class_numbers_v2.cu -lpthread -lm
# Validate against PARI/GP (should give h=1 at 42.13%)
./class_v2 5 10000
# Full run: d = 10^9 to 10^10 (~30 min on 8x B200, longer on fewer GPUs)
./class_v2 1000000000 10000000000 | tee run.log
# Raw (d, h) binary files appear in data/class-numbers/raw_gpu*.bin
# Format: repeating (uint64 discriminant, int32 class_number) = 12 bytes per record
```
The kernel auto-detects available GPUs and distributes the range evenly.
## Planned Extensions
| Range | Est. Discriminants | Est. Time (8x B200) |
|-------|-------------------|---------------------|
| [10¹⁰, 10¹¹) | ~27B | ~65 hours (running now) |
| [10¹¹, 10¹²) | ~270B | ~27 days |
| [10¹², 10¹³) | ~2.7T | ~270 days |
The [10¹⁰, 10¹¹) computation is in progress as of 2026-03-30 and will be added to this dataset when complete.
## Related
- **Source code**: [github.com/cahlen/idontknow](https://github.com/cahlen/idontknow) — CUDA kernels, experiment infrastructure
- **Experiment page**: [bigcompute.science/experiments/class-numbers-real-quadratic](https://bigcompute.science/experiments/class-numbers-real-quadratic/)
- **Finding writeup**: [bigcompute.science/findings/class-number-convergence](https://bigcompute.science/findings/class-number-convergence/)
- **All experiments**: [bigcompute.science](https://bigcompute.science) — Zaremba's conjecture, Ramsey R(5,5), Hausdorff spectrum, and more
- **Agent-readable index**: [bigcompute.science/llms.txt](https://bigcompute.science/llms.txt)
## Understanding This Data
Every positive integer that is not a perfect square has a "class number" -- a measure of how complicated the arithmetic is in a certain number system built from that integer. The question here: how are these class numbers distributed for large numbers?
Each row is a pair: a fundamental discriminant d (think of it as a specially chosen integer) and its class number h(d). The dataset covers all 2.74 billion fundamental discriminants between 10^9 and 10^10.
A concrete example: if you see the row (1000000007, 2), that means the real quadratic field built from sqrt(1000000007) has class number 2 -- its arithmetic has a mild complication, but not much. When h(d) = 1, the arithmetic in that number system is as simple as it can be: every number factors uniquely, just like the regular integers.
The Cohen-Lenstra heuristics, a famous set of predictions from 1984, say that asymptotically 75.4% of these class numbers should equal 1. But in our data, only 16.70% have h=1. The most common value is actually h=2 at 22.17%, followed by h=4 at 19.77%. The convergence toward 75.4% is extraordinarily slow -- you would need to go to astronomically large discriminants before h=1 starts dominating. This slow convergence is itself a finding: anyone testing Cohen-Lenstra at "merely" 10^10 would see numbers that look nothing like the asymptotic prediction.
This matters because class numbers connect to deep questions in algebraic number theory -- they show up in cryptography, in the study of prime numbers, and in understanding which equations have integer solutions.
## Citation
```bibtex
@dataset{humphreys2026classnumbers,
title = {Class Numbers of Real Quadratic Fields: GPU-Accelerated Computation to 10^10},
author = {Humphreys, Cahlen},
year = {2026},
month = mar,
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/cahlen/class-numbers-real-quadratic},
note = {2.74 billion fundamental discriminants, 8x NVIDIA B200}
}
```
## References
1. Cohen, H. and Lenstra, H.W. Jr. (1984). "Heuristics on class groups of number fields." *Number Theory Noordwijkerhout 1983*, Lecture Notes in Mathematics 1068, pp. 33-62.
2. Jacobson, M.J. Jr., Ramachandran, S., and Williams, H.C. (2006). "Numerical results on class groups of imaginary quadratic fields." *Mathematics of Computation*, 75(254), pp. 1003-1024.
3. Stevenhagen, P. (1993). "The number of real quadratic fields having units of negative norm." *Experimental Mathematics*, 2(2), pp. 121-136.
4. Watkins, M. (2004). "Class numbers of imaginary quadratic fields." *Mathematics of Computation*, 73(246), pp. 907-938.
## Source
- **Code**: [class-numbers](https://github.com/cahlen/idontknow/tree/main/scripts/experiments/class-numbers)
- **Findings**: [Cohen-Lenstra at Scale](https://bigcompute.science/findings/class-number-convergence/)
- **Project**: [bigcompute.science](https://bigcompute.science)
- **MCP Server**: `mcp.bigcompute.science` (22 tools, no auth)
- **AGENTS.md**: [Contribution guide](https://github.com/cahlen/idontknow/blob/main/AGENTS.md)
## Citation
```bibtex
@misc{humphreys2026class_numbers_real_quadratic,
author = {Humphreys, Cahlen and Claude (Anthropic)},
title = {Class Numbers of Real Quadratic Fields to 10^11},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/cahlen/class-numbers-real-quadratic}
}
```
Human-AI collaborative work (Cahlen Humphreys + Claude). Not independently peer-reviewed. All code and data open for verification. CC BY 4.0.
许可证:CC BY 4.0
任务类别:
- 表格分类(tabular-classification)
标签:
- 数论(number-theory)
- 类数(class-numbers)
- 实二次域(real-quadratic-fields)
- 科恩-伦斯特拉(cohen-lenstra)
- GPU计算(gpu-computation)
- 数学
- 计算数论(computational-number-theory)
- 代数数论(algebraic-number-theory)
- 连分数(continued-fractions)
展示名称:"实二次域类数(GPU计算版)"
样本量区间:10亿 < 样本量 < 100亿
配置项:
- 配置名称:10^9 至 10^10
数据文件:"data/1e9_to_1e10/*.parquet"
描述:"区间[10^9, 10^10)内的所有基本判别式d"
数据集信息:
- 配置名称:10^9 至 10^10
特征:
- 名称:判别式(discriminant)
数据类型:uint64
- 名称:类数(class_number)
数据类型:int32
分割:
- 名称:训练集
样本数量:2735671820
# 实二次域类数
27.36亿个实二次域Q(√d)的类数,针对区间[10^9, 10^10)内的每一个基本判别式d,通过8×NVIDIA B200 DGX集群在30分钟内完成计算。
本数据集为全球首次公开发布。此前的系统性研究边界为d ≤ 10^11(Jacobson、Ramachandran、Williams,2006),但该研究未公开单判别式级别的原始数据。本数据集是同规模下首个可公开获取的单判别式类数表格。
> 本数据集隶属于[bigcompute.science](https://bigcompute.science)项目——基于GPU加速的数论与组合学公开猜想探索计划。
## 快速上手
python
from datasets import load_dataset
ds = load_dataset("cahlen/class-numbers-real-quadratic", "1e9_to_1e10", split="train", streaming=True)
for row in ds.take(10):
print(f"d = {row['discriminant']}, h(d) = {row['class_number']}")
## 数据集内容说明
每一行数据对应一个基本判别式d及其类数h(d):
| 列名 | 数据类型 | 说明 |
|--------|------|-------------|
| `discriminant` | `uint64` | 正的基本判别式d |
| `class_number` | `int32` | 实二次域Q(√d)的类数h(d) |
**基本判别式**满足以下两种情况之一:
- d ≡ 1 (mod 4) 且无平方因子;
- d = 4m,其中m ≡ 2或3 (mod 4) 且m无平方因子
**类数h(d)** 用于衡量Q(√d)的整数环中唯一因子分解性质的失效程度。当h(d)=1时,该整数环满足唯一因子分解。
## 统计摘要
| 统计项 | 数值 |
|-----------|-------|
| 判别式范围 | d ∈ [10^9, 10^10) |
| 基本判别式总数 | 2,735,671,820 |
| 计算耗时 | 30分钟 |
| 硬件配置 | 8×NVIDIA B200 DGX(总显存1.43 TB,支持NVLink 5) |
| 吞吐率 | 153万个判别式/秒 |
### 类数分布
| 类数h | 样本数量 | 占比 |
|---|-------|----------|
| 1 | 456,984,420 | 16.70% |
| 2 | 606,415,562 | 22.17% |
| 3 | 73,409,125 | 2.68% |
| 4 | 540,733,202 | 19.77% |
| 5 | 22,715,143 | 0.83% |
| 6 | 96,852,027 | 3.54% |
| 7 | 10,849,013 | 0.40% |
| 8 | 298,291,861 | 10.90% |
| 9 | 9,027,194 | 0.33% |
| 10 | 30,106,984 | 1.10% |
| 12 | 85,877,392 | 3.14% |
| 16 | 123,589,441 | 4.52% |
### 科恩-伦斯特拉p整除性统计
| 整除因子 | 观测占比 | 科恩-伦斯特拉渐近预测占比 |
|---------|----------|---------------------------|
| 3整除h(d) | 15.28% | ~43.99% |
| 5整除h(d) | 4.89% | ~23.84% |
| 7整除h(d) | 2.35% | ~16.33% |
## 核心发现:非单调收敛性
科恩与伦斯特拉(1984)预测,渐近情况下h(d)=1的概率约为75.446%。本数据集显示,在此尺度下该观测占比呈下降趋势:
| 判别式区间 | h(d)=1的占比 |
|-------|---------------|
| d < 10^4 | 42.1% |
| d ~ 10^6 | 25.7% |
| d ∈ [10^9, 10^10) | 16.7% |
| 渐近预测值 | 75.4% |
该占比最终将回升至75.4%,但在d ~ 10^10时尚未出现反转。这是因为亏格理论(类群的2-部分,由d的素因子个数决定)在中等判别尺度下占据主导地位。h=2、4、8、16的判别式合计占全部样本的57%。类群的奇部才是科恩-伦斯特拉猜想适用的场景,但收敛过程极其缓慢。
可查看bigcompute.science上的[完整分析](https://bigcompute.science/findings/class-number-convergence/)。
## 计算方法
对于每个基本判别式d,我们通过解析类数公式计算h(d):
h(d) = round( sqrt(d) * L(1, χ_d) / (2 * R(d)) )
### 步骤1:GPU无平方因子筛
每个GPU线程针对所有p ≤ √d的素数p,检查其负责的位置是否被p²整除。完成基本判别式分类并将结果流式压缩为打包数组,全程在设备端运行,无CPU瓶颈。
### 步骤2:调节子R(d)计算
调节子R(d) = log(ε_d) 通过连分数展开计算,全程使用对数空间避免d>10^9时的整数溢出:
- 当d ≡ 0 (mod 4):对√(d/4)进行连分数展开,通过检测周期完成时的首项D=1完成判定
- 当d ≡ 1 (mod 4):对(1+√d)/2进行连分数展开,采用约化状态周期检测算法
### 步骤3:基于欧拉乘积的L函数计算
L(1, χ_d) = ∏(p ≤ 99991) (1 - χ_d(p)/p)⁻¹
将9,592个素数存储于CUDA的`__constant__`内存中,克罗内克符号χ_d(p)=(d/p)通过模幂运算(雅可比符号算法)计算。
### 步骤4:结果整合
将`sqrt(d) * L / (2R)`四舍五入为最接近的整数,通过原子直方图更新聚合统计量。
### 验证环节
- 对全区间内1,000个随机抽样的判别式,与PARI/GP的`qfbclassno()`函数结果完全匹配
- d<10^4时h=1的占比为42.13%,与PARI的计算结果完全一致
- 交叉验证结果:调节子值与PARI的`quadregulator()`函数结果误差不超过12位有效数字
## 硬件配置
| 组件 | 规格参数 |
|-----------|---------------|
| 节点 | NVIDIA DGX B200 |
| GPU | 8×NVIDIA B200(单卡显存183 GB) |
| 总显存 | 1.43 TB |
| 互连方案 | NVLink 5(NV18)全互连网络 |
| CPU | 2×Intel Xeon Platinum 8570(112核心/224线程) |
| 系统内存 | 2 TB DDR5 |
## 自行复现实验
bash
git clone https://github.com/cahlen/idontknow
cd idontknow
# 编译(根据GPU调整-arch参数:sm_100a适用于B200,sm_120a适用于RTX 5090)
nvcc -O3 -arch=sm_100a -o class_v2
scripts/experiments/class-numbers/class_numbers_v2.cu -lpthread -lm
# 针对PARI/GP进行验证(应得到h=1占比42.13%)
./class_v2 5 10000
# 全量运行:d从10^9到10^10(8×B200下约30分钟,GPU数量越少耗时越长)
./class_v2 1000000000 10000000000 | tee run.log
# 原始(d, h)二进制文件将生成于data/class-numbers/raw_gpu*.bin
# 格式:重复的(uint64 discriminant, int32 class_number),单条记录占12字节
该程序会自动检测可用GPU并均匀分配计算范围。
## 计划扩展内容
| 判别式区间 | 预计判别式数量 | 8×B200下预计耗时 |
|-------|-------------------|---------------------|
| [10^10, 10^11) | ~270亿 | ~65小时(2026-03-30时正在运行中) |
| [10^11, 10^12) | ~2700亿 | ~27天 |
| [10^12, 10^13) | ~2.7万亿 | ~270天 |
[10^10, 10^11)的计算完成后将加入本数据集。
## 相关资源
- **源代码**:[github.com/cahlen/idontknow](https://github.com/cahlen/idontknow) — CUDA内核、实验基础设施
- **实验页面**:[bigcompute.science/experiments/class-numbers-real-quadratic](https://bigcompute.science/experiments/class-numbers-real-quadratic/)
- **发现报告**:[bigcompute.science/findings/class-number-convergence](https://bigcompute.science/findings/class-number-convergence/)
- **全部实验项目**:[bigcompute.science](https://bigcompute.science) — 扎雷姆巴猜想、拉姆齐数R(5,5)、豪斯多夫谱等
- **Agent可读索引**:[bigcompute.science/llms.txt](https://bigcompute.science/llms.txt)
## 数据解读
每个非完全平方的正整数都对应一个“类数”,用于衡量基于该整数构建的数系中算术运算的复杂程度。本数据集包含区间[10^9,10^10)内的全部27.36亿个基本判别式d及其类数h(d)。
举个具体例子:若某行数据为(1000000007, 2),则代表由√1000000007构建的实二次域的类数为2——其算术运算仅存在轻微的非唯一因子分解情况。当h(d)=1时,该数系的整数环满足唯一因子分解,与普通整数环完全一致。
科恩-伦斯特拉启发式猜想(1984年提出的著名预测)指出,渐近情况下75.4%的类数应为1,但本数据中仅16.70%的样本满足h=1。最常见的类数为h=2(占22.17%),其次为h=4(占19.77%)。向75.4%的收敛过程极其缓慢,需要达到天文数字级别的判别式尺度,h=1的比例才会开始占据主导。这一缓慢收敛本身即为一项重要发现:若仅在10^10尺度下测试科恩-伦斯特拉猜想,得到的结果将与渐近预测相去甚远。
类数与代数数论中的深层问题密切相关,可应用于密码学、素数研究以及整数解方程的分析等领域。
## 引用格式
bibtex
@dataset{humphreys2026classnumbers,
title = {Class Numbers of Real Quadratic Fields: GPU-Accelerated Computation to 10^10},
author = {Humphreys, Cahlen},
year = {2026},
month = mar,
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/cahlen/class-numbers-real-quadratic},
note = {2.74 billion fundamental discriminants, 8x NVIDIA B200}
}
## 参考文献
1. Cohen, H. 和 Lenstra, H.W. Jr. (1984). "Heuristics on class groups of number fields." *Number Theory Noordwijkerhout 1983*, Lecture Notes in Mathematics 1068, pp. 33-62.
2. Jacobson, M.J. Jr., Ramachandran, S., 和 Williams, H.C. (2006). "Numerical results on class groups of imaginary quadratic fields." *Mathematics of Computation*, 75(254), pp. 1003-1024.
3. Stevenhagen, P. (1993). "The number of real quadratic fields having units of negative norm." *Experimental Mathematics*, 2(2), pp. 121-136.
4. Watkins, M. (2004). "Class numbers of imaginary quadratic fields." *Mathematics of Computation*, 73(246), pp. 907-938.
## 来源
- **代码**:[class-numbers](https://github.com/cahlen/idontknow/tree/main/scripts/experiments/class-numbers)
- **发现报告**:[科恩-伦斯特拉猜想的大规模验证](https://bigcompute.science/findings/class-number-convergence/)
- **项目主页**:[bigcompute.science](https://bigcompute.science)
- **MCP服务器**:`mcp.bigcompute.science`(22个工具,无需认证)
- **贡献指南**:[AGENTS.md](https://github.com/cahlen/idontknow/blob/main/AGENTS.md)
## 补充引用
bibtex
@misc{humphreys2026class_numbers_real_quadratic,
author = {Humphreys, Cahlen and Claude (Anthropic)},
title = {Class Numbers of Real Quadratic Fields to 10^11},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/cahlen/class-numbers-real-quadratic}
}
本工作为人类与AI协作成果(Cahlen Humphreys + Claude),未经过同行评审。所有代码与数据均可公开验证。采用CC BY 4.0许可证。
提供机构:
cahlen



