five

cahlen/class-numbers-real-quadratic

收藏
Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/cahlen/class-numbers-real-quadratic
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - tabular-classification tags: - number-theory - class-numbers - real-quadratic-fields - cohen-lenstra - gpu-computation - mathematics - computational-number-theory - algebraic-number-theory - continued-fractions pretty_name: "Class Numbers of Real Quadratic Fields (GPU-Computed)" size_categories: - 1B<n<10B configs: - config_name: 1e9_to_1e10 data_files: "data/1e9_to_1e10/*.parquet" description: "All fundamental discriminants d in [10^9, 10^10)" dataset_info: - config_name: 1e9_to_1e10 features: - name: discriminant dtype: uint64 - name: class_number dtype: int32 splits: - name: train num_examples: 2735671820 --- # Class Numbers of Real Quadratic Fields **2.74 billion** class numbers of real quadratic fields Q(√d), computed for every fundamental discriminant d in [10⁹, 10¹⁰) on an 8× NVIDIA B200 DGX cluster in 30 minutes. This dataset does not exist anywhere else. The previous systematic frontier was d ≤ 10¹¹ (Jacobson, Ramachandran, Williams 2006), but their raw per-discriminant data was never published. This is the first openly available, per-discriminant class number table at this scale. > Part of the [bigcompute.science](https://bigcompute.science) project — GPU-accelerated exploration of open conjectures in number theory and combinatorics. ## Quick Start ```python from datasets import load_dataset ds = load_dataset("cahlen/class-numbers-real-quadratic", "1e9_to_1e10", split="train", streaming=True) for row in ds.take(10): print(f"d = {row['discriminant']}, h(d) = {row['class_number']}") ``` ## What's In This Dataset Every row is a fundamental discriminant d and its class number h(d): | Column | Type | Description | |--------|------|-------------| | `discriminant` | `uint64` | Fundamental discriminant d > 0 | | `class_number` | `int32` | Class number h(d) of the real quadratic field Q(√d) | A **fundamental discriminant** is either: - d ≡ 1 (mod 4) and squarefree, or - d = 4m where m ≡ 2 or 3 (mod 4) and m is squarefree The **class number** h(d) measures the failure of unique factorization in the ring of integers of Q(√d). When h(d) = 1, the ring has unique factorization. ## Summary Statistics | Statistic | Value | |-----------|-------| | Range | d ∈ [10⁹, 10¹⁰) | | Fundamental discriminants | 2,735,671,820 | | Computation time | 30 minutes | | Hardware | 8× NVIDIA B200 DGX (1.43 TB VRAM, NVLink 5) | | Throughput | 1.53 million discriminants/sec | ### Class Number Distribution | h | Count | Fraction | |---|-------|----------| | 1 | 456,984,420 | 16.70% | | 2 | 606,415,562 | 22.17% | | 3 | 73,409,125 | 2.68% | | 4 | 540,733,202 | 19.77% | | 5 | 22,715,143 | 0.83% | | 6 | 96,852,027 | 3.54% | | 7 | 10,849,013 | 0.40% | | 8 | 298,291,861 | 10.90% | | 9 | 9,027,194 | 0.33% | | 10 | 30,106,984 | 1.10% | | 12 | 85,877,392 | 3.14% | | 16 | 123,589,441 | 4.52% | ### Cohen-Lenstra p-Divisibility | Divisor | Observed | Cohen-Lenstra (asymptotic) | |---------|----------|---------------------------| | 3 divides h | 15.28% | ~43.99% | | 5 divides h | 4.89% | ~23.84% | | 7 divides h | 2.35% | ~16.33% | ## Key Finding: Non-Monotone Convergence Cohen and Lenstra (1984) predict that h(d) = 1 occurs with probability ≈ 75.446% asymptotically. Our data shows the observed rate is **decreasing** at this scale: | Range | h = 1 fraction | |-------|---------------| | d < 10⁴ | 42.1% | | d ~ 10⁶ | 25.7% | | d ∈ [10⁹, 10¹⁰) | 16.7% | | Asymptotic prediction | 75.4% | The rate must eventually reverse and increase toward 75.4%, but at d ~ 10¹⁰ it hasn't turned around yet. This is because genus theory (the 2-part of the class group, determined by the number of prime factors of d) dominates at moderate discriminants. The values h = 2, 4, 8, 16 alone account for 57% of all discriminants. The odd part of the class group — where Cohen-Lenstra actually applies — must eventually dominate, but convergence is extremely slow. See the [full analysis](https://bigcompute.science/findings/class-number-convergence/) on bigcompute.science. ## Computation Method For each fundamental discriminant d, we compute h(d) via the analytic class number formula: ``` h(d) = round( sqrt(d) * L(1, χ_d) / (2 * R(d)) ) ``` ### Step 1: GPU Squarefree Sieve Each GPU thread checks its position for divisibility by p² for all primes p ≤ √d. Classifies fundamental discriminants and stream-compacts into a packed array. All on-device — no CPU bottleneck. ### Step 2: Regulator R(d) The regulator R(d) = log(ε_d) is computed from the continued fraction expansion, entirely in log-space to avoid integer overflow at d > 10⁹: - d ≡ 0 (mod 4): CF expansion of √(d/4), with first D = 1 detection for cycle completion - d ≡ 1 (mod 4): CF expansion of (1 + √d)/2 with reduced-state cycle detection ### Step 3: L-Function via Euler Product ``` L(1, χ_d) = ∏(p ≤ 99991) (1 - χ_d(p)/p)⁻¹ ``` 9,592 primes stored in CUDA `__constant__` memory. Kronecker symbol χ_d(p) = (d/p) computed via modular exponentiation (Jacobi symbol algorithm). ### Step 4: Assembly Round sqrt(d) * L / (2R) to nearest integer. Atomic histogram updates for aggregate statistics. ### Validation - **Exact match** with PARI/GP `qfbclassno()` on 1,000 randomly sampled discriminants across the full range - h = 1 rate of 42.13% for d < 10⁴ matches PARI exactly - Cross-validated: regulator values match PARI `quadregulator()` to 12+ digits ## Hardware | Component | Specification | |-----------|---------------| | Node | NVIDIA DGX B200 | | GPUs | 8× NVIDIA B200 (183 GB VRAM each) | | Total VRAM | 1.43 TB | | Interconnect | NVLink 5 (NV18), full mesh | | CPUs | 2× Intel Xeon Platinum 8570 (112 cores / 224 threads) | | System RAM | 2 TB DDR5 | ## Reproduce It Yourself ```bash git clone https://github.com/cahlen/idontknow cd idontknow # Compile (adjust -arch for your GPU: sm_100a for B200, sm_120a for RTX 5090) nvcc -O3 -arch=sm_100a -o class_v2 \ scripts/experiments/class-numbers/class_numbers_v2.cu -lpthread -lm # Validate against PARI/GP (should give h=1 at 42.13%) ./class_v2 5 10000 # Full run: d = 10^9 to 10^10 (~30 min on 8x B200, longer on fewer GPUs) ./class_v2 1000000000 10000000000 | tee run.log # Raw (d, h) binary files appear in data/class-numbers/raw_gpu*.bin # Format: repeating (uint64 discriminant, int32 class_number) = 12 bytes per record ``` The kernel auto-detects available GPUs and distributes the range evenly. ## Planned Extensions | Range | Est. Discriminants | Est. Time (8x B200) | |-------|-------------------|---------------------| | [10¹⁰, 10¹¹) | ~27B | ~65 hours (running now) | | [10¹¹, 10¹²) | ~270B | ~27 days | | [10¹², 10¹³) | ~2.7T | ~270 days | The [10¹⁰, 10¹¹) computation is in progress as of 2026-03-30 and will be added to this dataset when complete. ## Related - **Source code**: [github.com/cahlen/idontknow](https://github.com/cahlen/idontknow) — CUDA kernels, experiment infrastructure - **Experiment page**: [bigcompute.science/experiments/class-numbers-real-quadratic](https://bigcompute.science/experiments/class-numbers-real-quadratic/) - **Finding writeup**: [bigcompute.science/findings/class-number-convergence](https://bigcompute.science/findings/class-number-convergence/) - **All experiments**: [bigcompute.science](https://bigcompute.science) — Zaremba's conjecture, Ramsey R(5,5), Hausdorff spectrum, and more - **Agent-readable index**: [bigcompute.science/llms.txt](https://bigcompute.science/llms.txt) ## Understanding This Data Every positive integer that is not a perfect square has a "class number" -- a measure of how complicated the arithmetic is in a certain number system built from that integer. The question here: how are these class numbers distributed for large numbers? Each row is a pair: a fundamental discriminant d (think of it as a specially chosen integer) and its class number h(d). The dataset covers all 2.74 billion fundamental discriminants between 10^9 and 10^10. A concrete example: if you see the row (1000000007, 2), that means the real quadratic field built from sqrt(1000000007) has class number 2 -- its arithmetic has a mild complication, but not much. When h(d) = 1, the arithmetic in that number system is as simple as it can be: every number factors uniquely, just like the regular integers. The Cohen-Lenstra heuristics, a famous set of predictions from 1984, say that asymptotically 75.4% of these class numbers should equal 1. But in our data, only 16.70% have h=1. The most common value is actually h=2 at 22.17%, followed by h=4 at 19.77%. The convergence toward 75.4% is extraordinarily slow -- you would need to go to astronomically large discriminants before h=1 starts dominating. This slow convergence is itself a finding: anyone testing Cohen-Lenstra at "merely" 10^10 would see numbers that look nothing like the asymptotic prediction. This matters because class numbers connect to deep questions in algebraic number theory -- they show up in cryptography, in the study of prime numbers, and in understanding which equations have integer solutions. ## Citation ```bibtex @dataset{humphreys2026classnumbers, title = {Class Numbers of Real Quadratic Fields: GPU-Accelerated Computation to 10^10}, author = {Humphreys, Cahlen}, year = {2026}, month = mar, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/cahlen/class-numbers-real-quadratic}, note = {2.74 billion fundamental discriminants, 8x NVIDIA B200} } ``` ## References 1. Cohen, H. and Lenstra, H.W. Jr. (1984). "Heuristics on class groups of number fields." *Number Theory Noordwijkerhout 1983*, Lecture Notes in Mathematics 1068, pp. 33-62. 2. Jacobson, M.J. Jr., Ramachandran, S., and Williams, H.C. (2006). "Numerical results on class groups of imaginary quadratic fields." *Mathematics of Computation*, 75(254), pp. 1003-1024. 3. Stevenhagen, P. (1993). "The number of real quadratic fields having units of negative norm." *Experimental Mathematics*, 2(2), pp. 121-136. 4. Watkins, M. (2004). "Class numbers of imaginary quadratic fields." *Mathematics of Computation*, 73(246), pp. 907-938. ## Source - **Code**: [class-numbers](https://github.com/cahlen/idontknow/tree/main/scripts/experiments/class-numbers) - **Findings**: [Cohen-Lenstra at Scale](https://bigcompute.science/findings/class-number-convergence/) - **Project**: [bigcompute.science](https://bigcompute.science) - **MCP Server**: `mcp.bigcompute.science` (22 tools, no auth) - **AGENTS.md**: [Contribution guide](https://github.com/cahlen/idontknow/blob/main/AGENTS.md) ## Citation ```bibtex @misc{humphreys2026class_numbers_real_quadratic, author = {Humphreys, Cahlen and Claude (Anthropic)}, title = {Class Numbers of Real Quadratic Fields to 10^11}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/cahlen/class-numbers-real-quadratic} } ``` Human-AI collaborative work (Cahlen Humphreys + Claude). Not independently peer-reviewed. All code and data open for verification. CC BY 4.0.

许可证:CC BY 4.0 任务类别: - 表格分类(tabular-classification) 标签: - 数论(number-theory) - 类数(class-numbers) - 实二次域(real-quadratic-fields) - 科恩-伦斯特拉(cohen-lenstra) - GPU计算(gpu-computation) - 数学 - 计算数论(computational-number-theory) - 代数数论(algebraic-number-theory) - 连分数(continued-fractions) 展示名称:"实二次域类数(GPU计算版)" 样本量区间:10亿 < 样本量 < 100亿 配置项: - 配置名称:10^9 至 10^10 数据文件:"data/1e9_to_1e10/*.parquet" 描述:"区间[10^9, 10^10)内的所有基本判别式d" 数据集信息: - 配置名称:10^9 至 10^10 特征: - 名称:判别式(discriminant) 数据类型:uint64 - 名称:类数(class_number) 数据类型:int32 分割: - 名称:训练集 样本数量:2735671820 # 实二次域类数 27.36亿个实二次域Q(√d)的类数,针对区间[10^9, 10^10)内的每一个基本判别式d,通过8×NVIDIA B200 DGX集群在30分钟内完成计算。 本数据集为全球首次公开发布。此前的系统性研究边界为d ≤ 10^11(Jacobson、Ramachandran、Williams,2006),但该研究未公开单判别式级别的原始数据。本数据集是同规模下首个可公开获取的单判别式类数表格。 > 本数据集隶属于[bigcompute.science](https://bigcompute.science)项目——基于GPU加速的数论与组合学公开猜想探索计划。 ## 快速上手 python from datasets import load_dataset ds = load_dataset("cahlen/class-numbers-real-quadratic", "1e9_to_1e10", split="train", streaming=True) for row in ds.take(10): print(f"d = {row['discriminant']}, h(d) = {row['class_number']}") ## 数据集内容说明 每一行数据对应一个基本判别式d及其类数h(d): | 列名 | 数据类型 | 说明 | |--------|------|-------------| | `discriminant` | `uint64` | 正的基本判别式d | | `class_number` | `int32` | 实二次域Q(√d)的类数h(d) | **基本判别式**满足以下两种情况之一: - d ≡ 1 (mod 4) 且无平方因子; - d = 4m,其中m ≡ 2或3 (mod 4) 且m无平方因子 **类数h(d)** 用于衡量Q(√d)的整数环中唯一因子分解性质的失效程度。当h(d)=1时,该整数环满足唯一因子分解。 ## 统计摘要 | 统计项 | 数值 | |-----------|-------| | 判别式范围 | d ∈ [10^9, 10^10) | | 基本判别式总数 | 2,735,671,820 | | 计算耗时 | 30分钟 | | 硬件配置 | 8×NVIDIA B200 DGX(总显存1.43 TB,支持NVLink 5) | | 吞吐率 | 153万个判别式/秒 | ### 类数分布 | 类数h | 样本数量 | 占比 | |---|-------|----------| | 1 | 456,984,420 | 16.70% | | 2 | 606,415,562 | 22.17% | | 3 | 73,409,125 | 2.68% | | 4 | 540,733,202 | 19.77% | | 5 | 22,715,143 | 0.83% | | 6 | 96,852,027 | 3.54% | | 7 | 10,849,013 | 0.40% | | 8 | 298,291,861 | 10.90% | | 9 | 9,027,194 | 0.33% | | 10 | 30,106,984 | 1.10% | | 12 | 85,877,392 | 3.14% | | 16 | 123,589,441 | 4.52% | ### 科恩-伦斯特拉p整除性统计 | 整除因子 | 观测占比 | 科恩-伦斯特拉渐近预测占比 | |---------|----------|---------------------------| | 3整除h(d) | 15.28% | ~43.99% | | 5整除h(d) | 4.89% | ~23.84% | | 7整除h(d) | 2.35% | ~16.33% | ## 核心发现:非单调收敛性 科恩与伦斯特拉(1984)预测,渐近情况下h(d)=1的概率约为75.446%。本数据集显示,在此尺度下该观测占比呈下降趋势: | 判别式区间 | h(d)=1的占比 | |-------|---------------| | d < 10^4 | 42.1% | | d ~ 10^6 | 25.7% | | d ∈ [10^9, 10^10) | 16.7% | | 渐近预测值 | 75.4% | 该占比最终将回升至75.4%,但在d ~ 10^10时尚未出现反转。这是因为亏格理论(类群的2-部分,由d的素因子个数决定)在中等判别尺度下占据主导地位。h=2、4、8、16的判别式合计占全部样本的57%。类群的奇部才是科恩-伦斯特拉猜想适用的场景,但收敛过程极其缓慢。 可查看bigcompute.science上的[完整分析](https://bigcompute.science/findings/class-number-convergence/)。 ## 计算方法 对于每个基本判别式d,我们通过解析类数公式计算h(d): h(d) = round( sqrt(d) * L(1, χ_d) / (2 * R(d)) ) ### 步骤1:GPU无平方因子筛 每个GPU线程针对所有p ≤ √d的素数p,检查其负责的位置是否被p²整除。完成基本判别式分类并将结果流式压缩为打包数组,全程在设备端运行,无CPU瓶颈。 ### 步骤2:调节子R(d)计算 调节子R(d) = log(ε_d) 通过连分数展开计算,全程使用对数空间避免d>10^9时的整数溢出: - 当d ≡ 0 (mod 4):对√(d/4)进行连分数展开,通过检测周期完成时的首项D=1完成判定 - 当d ≡ 1 (mod 4):对(1+√d)/2进行连分数展开,采用约化状态周期检测算法 ### 步骤3:基于欧拉乘积的L函数计算 L(1, χ_d) = ∏(p ≤ 99991) (1 - χ_d(p)/p)⁻¹ 将9,592个素数存储于CUDA的`__constant__`内存中,克罗内克符号χ_d(p)=(d/p)通过模幂运算(雅可比符号算法)计算。 ### 步骤4:结果整合 将`sqrt(d) * L / (2R)`四舍五入为最接近的整数,通过原子直方图更新聚合统计量。 ### 验证环节 - 对全区间内1,000个随机抽样的判别式,与PARI/GP的`qfbclassno()`函数结果完全匹配 - d<10^4时h=1的占比为42.13%,与PARI的计算结果完全一致 - 交叉验证结果:调节子值与PARI的`quadregulator()`函数结果误差不超过12位有效数字 ## 硬件配置 | 组件 | 规格参数 | |-----------|---------------| | 节点 | NVIDIA DGX B200 | | GPU | 8×NVIDIA B200(单卡显存183 GB) | | 总显存 | 1.43 TB | | 互连方案 | NVLink 5(NV18)全互连网络 | | CPU | 2×Intel Xeon Platinum 8570(112核心/224线程) | | 系统内存 | 2 TB DDR5 | ## 自行复现实验 bash git clone https://github.com/cahlen/idontknow cd idontknow # 编译(根据GPU调整-arch参数:sm_100a适用于B200,sm_120a适用于RTX 5090) nvcc -O3 -arch=sm_100a -o class_v2 scripts/experiments/class-numbers/class_numbers_v2.cu -lpthread -lm # 针对PARI/GP进行验证(应得到h=1占比42.13%) ./class_v2 5 10000 # 全量运行:d从10^9到10^10(8×B200下约30分钟,GPU数量越少耗时越长) ./class_v2 1000000000 10000000000 | tee run.log # 原始(d, h)二进制文件将生成于data/class-numbers/raw_gpu*.bin # 格式:重复的(uint64 discriminant, int32 class_number),单条记录占12字节 该程序会自动检测可用GPU并均匀分配计算范围。 ## 计划扩展内容 | 判别式区间 | 预计判别式数量 | 8×B200下预计耗时 | |-------|-------------------|---------------------| | [10^10, 10^11) | ~270亿 | ~65小时(2026-03-30时正在运行中) | | [10^11, 10^12) | ~2700亿 | ~27天 | | [10^12, 10^13) | ~2.7万亿 | ~270天 | [10^10, 10^11)的计算完成后将加入本数据集。 ## 相关资源 - **源代码**:[github.com/cahlen/idontknow](https://github.com/cahlen/idontknow) — CUDA内核、实验基础设施 - **实验页面**:[bigcompute.science/experiments/class-numbers-real-quadratic](https://bigcompute.science/experiments/class-numbers-real-quadratic/) - **发现报告**:[bigcompute.science/findings/class-number-convergence](https://bigcompute.science/findings/class-number-convergence/) - **全部实验项目**:[bigcompute.science](https://bigcompute.science) — 扎雷姆巴猜想、拉姆齐数R(5,5)、豪斯多夫谱等 - **Agent可读索引**:[bigcompute.science/llms.txt](https://bigcompute.science/llms.txt) ## 数据解读 每个非完全平方的正整数都对应一个“类数”,用于衡量基于该整数构建的数系中算术运算的复杂程度。本数据集包含区间[10^9,10^10)内的全部27.36亿个基本判别式d及其类数h(d)。 举个具体例子:若某行数据为(1000000007, 2),则代表由√1000000007构建的实二次域的类数为2——其算术运算仅存在轻微的非唯一因子分解情况。当h(d)=1时,该数系的整数环满足唯一因子分解,与普通整数环完全一致。 科恩-伦斯特拉启发式猜想(1984年提出的著名预测)指出,渐近情况下75.4%的类数应为1,但本数据中仅16.70%的样本满足h=1。最常见的类数为h=2(占22.17%),其次为h=4(占19.77%)。向75.4%的收敛过程极其缓慢,需要达到天文数字级别的判别式尺度,h=1的比例才会开始占据主导。这一缓慢收敛本身即为一项重要发现:若仅在10^10尺度下测试科恩-伦斯特拉猜想,得到的结果将与渐近预测相去甚远。 类数与代数数论中的深层问题密切相关,可应用于密码学、素数研究以及整数解方程的分析等领域。 ## 引用格式 bibtex @dataset{humphreys2026classnumbers, title = {Class Numbers of Real Quadratic Fields: GPU-Accelerated Computation to 10^10}, author = {Humphreys, Cahlen}, year = {2026}, month = mar, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/cahlen/class-numbers-real-quadratic}, note = {2.74 billion fundamental discriminants, 8x NVIDIA B200} } ## 参考文献 1. Cohen, H. 和 Lenstra, H.W. Jr. (1984). "Heuristics on class groups of number fields." *Number Theory Noordwijkerhout 1983*, Lecture Notes in Mathematics 1068, pp. 33-62. 2. Jacobson, M.J. Jr., Ramachandran, S., 和 Williams, H.C. (2006). "Numerical results on class groups of imaginary quadratic fields." *Mathematics of Computation*, 75(254), pp. 1003-1024. 3. Stevenhagen, P. (1993). "The number of real quadratic fields having units of negative norm." *Experimental Mathematics*, 2(2), pp. 121-136. 4. Watkins, M. (2004). "Class numbers of imaginary quadratic fields." *Mathematics of Computation*, 73(246), pp. 907-938. ## 来源 - **代码**:[class-numbers](https://github.com/cahlen/idontknow/tree/main/scripts/experiments/class-numbers) - **发现报告**:[科恩-伦斯特拉猜想的大规模验证](https://bigcompute.science/findings/class-number-convergence/) - **项目主页**:[bigcompute.science](https://bigcompute.science) - **MCP服务器**:`mcp.bigcompute.science`(22个工具,无需认证) - **贡献指南**:[AGENTS.md](https://github.com/cahlen/idontknow/blob/main/AGENTS.md) ## 补充引用 bibtex @misc{humphreys2026class_numbers_real_quadratic, author = {Humphreys, Cahlen and Claude (Anthropic)}, title = {Class Numbers of Real Quadratic Fields to 10^11}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/cahlen/class-numbers-real-quadratic} } 本工作为人类与AI协作成果(Cahlen Humphreys + Claude),未经过同行评审。所有代码与数据均可公开验证。采用CC BY 4.0许可证。
提供机构:
cahlen
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作