DuongTrongChi/chunked-tokenized-legal-docs
收藏Hugging Face2025-06-17 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/DuongTrongChi/chunked-tokenized-legal-docs
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多个配置,每个配置有特定数量的示例和字节大小。数据集仅具有一种特征类型,即 input_ids,这是一个32位整数的序列。有不同版本的训练数据集,数据集可下载的不同大小。数据集配置包括 eval、qwen-1.7-8k-eval、qwen-1.7-8k-train、qwen-4-4k-eval、qwen-4-4k-train、qwen-4-8k-train 和 train。每个配置的数据文件位于特定的路径,并且为了训练目的而分割。该数据集似乎与某些 qwen 版本相关,可能表示用于中文问答或类似任务的不同的数据集大小或版本。然而,README 文件中没有提供关于数据集目的或内容的描述。
The dataset contains multiple configurations, each with a specific number of examples and size in bytes. It features only one type of feature, input_ids, which is a sequence of 32-bit integers. There are different splits for training, and the dataset is available for download in various sizes. The dataset configurations include eval, qwen-1.7-8k-eval, qwen-1.7-8k-train, qwen-4-4k-eval, qwen-4-4k-train, qwen-4-8k-train, and train. The data files for each configuration are located in specific paths and are split for training purposes. The dataset seems to be related to some qwen versions, possibly indicating different sizes or versions of a dataset for Chinese question answering or similar tasks. However, the README does not provide a description of the datasets purpose or content beyond these details.
提供机构:
DuongTrongChi



