ibm-research/struct-text
收藏Hugging Face2025-07-21 更新2025-05-31 收录
下载链接:
https://hf-mirror.com/datasets/ibm-research/struct-text
下载链接
链接失效反馈官方服务:
资源简介:
StructText是一个包含表格及其相关文本的合成数据集,由SEC文件和WikiDB表格生成。数据集包括原始结构化数据以及从这些表格生成的文本。有多个配置可供选择,包括数据类型(原始、生成或计划文本)和过滤状态(未过滤或已过滤)。数据集分为训练、验证和测试集,并提供子集和完整数据集的特定配置。README还提供了数据集的文件夹布局、快速入门示例、数据集创建过程和引用详情。
StructText is a synthetic dataset consisting of tables and associated text, derived from SEC filings and WikiDB tables. It includes both original structured data and generated text from these tables. Multiple configurations are available, differing in the type of data (original, generated, or planned text) and filtering status (unfiltered or filtered). The dataset is split into train, validation, and test sets, with specific configurations for subsets and full datasets. The README also provides information on the datasets folder layout, quick-start examples, dataset creation process, and citation details.
提供机构:
ibm-research



