jcarbonnell/preTrainingNEAR

Name: jcarbonnell/preTrainingNEAR
Creator: jcarbonnell
Published: 2024-05-23 00:00:29
License: 暂无描述

Hugging Face2024-05-23 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/jcarbonnell/preTrainingNEAR

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: apache-2.0 dataset_info: features: - name: text dtype: string splits: - name: train num_bytes: 6855034 num_examples: 1022 - name: val num_bytes: 850752 num_examples: 114 download_size: 3998284 dataset_size: 7705786 configs: - config_name: default data_files: - split: train path: data/train-* - split: val path: data/val-* --- This dataset is a subset of the original nearData dataset, prepared for the continued pre-training of a pre-trained LLM. The idea behind the continued pre-training of pre-trained models is to further instruct them with specific information, in this case on the Near Protocol blockchain, before fine-tuning them. The preTrainingNEAR dataset was prepared from local text files using the datasets library from Hugging Face. It includes: - nearBlog: 481 blog articles from Near Blog collected on March 13th, 2024. - nearBosWebEngine: 13 docs files from the Near BOS Wen Engine collected on May 21st, 2024. - nearDocs: 395 docs files from Near Docs collected on March 13th, 2024. - nearNEPs: 124 docs files from the NEAR Enhancement Protocol collected on May 21st, 2024. - nearNode: 40 docs files from the Near Node Docs collected on May 21st, 2024. - nearPapers: 3 papers from the Near Papers collected on May 21st, 2024. - nearWiki: 98 docs from the Near Wiki collected on May 21st, 2024.

提供机构：

jcarbonnell

原始信息汇总

数据集概述

基本信息

语言: 英语
许可证: Apache-2.0

数据集特征

特征名称: text
数据类型: string

数据集划分

训练集
- 样本数量: 1022
- 数据大小: 6855034 字节
验证集
- 样本数量: 114
- 数据大小: 850752 字节

数据集大小

下载大小: 3998284 字节
总数据大小: 7705786 字节

数据文件配置

默认配置
- 训练集路径: data/train-*
- 验证集路径: data/val-*

数据来源

nearBlog: 481篇博客文章
nearBosWebEngine: 13篇文档
nearDocs: 395篇文档
nearNEPs: 124篇文档
nearNode: 40篇文档
nearPapers: 3篇论文
nearWiki: 98篇文档

5,000+

优质数据集

54 个

任务类型

进入经典数据集