five

Complete List of Mathematical Expressions in all Wikimedia Projects, including Wikipedia

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/15162181
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains a deduplicated list of all mathematical expressions used in all wikimedia projects. The data is provided as json file where the key is the md5 hash of the input. The input is what was extracted from the wikitext sources. This was done in the following way: All current dump were filtered for the math tag (see https://doi.org/10.5281/zenodo.15107679) for details Those dumps were imported into a mediawiki installation with the MathSearch extension. Here one database was used per wiki. The data from all the mathlog tables were combined in one table, which was exported into a json file. The json contains a list of key value pairs where the keys are the md5 hashes of the input. The scripts are available from swh:1:cnt:faec2206a154db5a2711791f4211097e36bf1413; origin=https://github.com/MaRDI4NFDI/wikiFilter; visit=swh:1:snp:28ed43d0e16ca3d6ce4bad1b484cec9d1124cd48; anchor=swh:1:rev:855735a5c90a0db3ccfd20c3899af4c82bc6704f; path=/wmcloud/allFormulae.sqlExample: The Wikipedia article on mass energy equivalence contains the following wikitext E = mc^2 the MathSearch extension extracts the user input E = mc^2 the md5 hash is  281a70c20b16a38d7781189936e1ac9f and thus the row     "281a70c20b16a38d7781189936e1ac9f": "E = mc^2", in the json file corresponds to that input.

本数据集收录了所有维基媒体项目中使用过的数学表达式的去重列表。数据以JSON文件形式提供,其中键为输入内容的MD5哈希值,输入内容即从维基文本(wikitext)源中提取的内容。本数据集的构建流程如下: 所有当前的数据转储文件均针对数学标签进行了过滤(详细信息参见https://doi.org/10.5281/zenodo.15107679)。 上述转储文件被导入至搭载了MathSearch扩展(MathSearch extension)的MediaWiki环境中,每个维基站点对应一个独立数据库。 将所有数学日志表中的数据合并至一张总数据表,并导出为JSON文件。该JSON文件包含若干键值对,其中键为输入内容的MD5哈希值。 相关脚本可从以下途径获取: swh:1:cnt:faec2206a154db5a2711791f4211097e36bf1413; origin=https://github.com/MaRDI4NFDI/wikiFilter; visit=swh:1:snp:28ed43d0e16ca3d6ce4bad1b484cec9d1124cd48; anchor=swh:1:rev:855735a5c90a0db3ccfd20c3899af4c82bc6704f; path=/wmcloud/allFormulae.sql 示例如下: 以维基百科关于质能等价的条目为例,其对应的维基文本为: E = mc^2 MathSearch扩展会提取其中的用户输入内容: E = mc^2 其MD5哈希值为: 281a70c20b16a38d7781189936e1ac9f 因此JSON文件中的如下键值对: "281a70c20b16a38d7781189936e1ac9f": "E = mc^2" 即对应该输入内容。
创建时间:
2025-04-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作