five

InnerI/ATGCT-NNNT

收藏
Hugging Face2024-05-19 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/InnerI/ATGCT-NNNT
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit size_categories: - 1K<n<10K --- # ATGCT Neural Network Training Data ## Overview The ATGCT Neural Network Training Data is a comprehensive dataset designed for training machine learning models, particularly in the context of advanced communication technologies like neutrino communication. This dataset contains 10,000 entries, each with a category, a question, and a response. It is structured to facilitate the development of language models that can understand and generate text related to scientific research, project details, and technical descriptions. ## Dataset Details # NSF Awards CSV data source: https://www.nsf.gov/awardsearch/ExportResultServlet?exportType=csv ### Structure The dataset is provided in CSV format with the following columns: - **Category**: The research area or organization related to the question and response. - **Question**: A question pertaining to the project's title, methodology, findings, or implications. - **Response**: A detailed response that provides insights or answers related to the question. ### Sample Entries Here are a few examples of the data entries included in the dataset: In the context of the NSF (National Science Foundation) awards data: https://www.nsf.gov/awardsearch/advancedSearchResult?ProgEleCode=011Y00%2C123300%2C124200%2C124400%2C125200%2C128500%2C128800%2C160900%2C164300%2C167800%2C511500&BooleanElement=Any&BooleanRef=Any&ActiveAwards=true#results OPP stands for Office of Polar Programs. This office supports research in polar regions, including Antarctica and the Arctic. Research areas include glaciology, atmospheric and climate sciences, and ecosystem studies, among others. PHY stands for Division of Physics. This division supports research in all areas of physics, from fundamental theoretical and experimental studies to applied physics. Areas of interest include particle physics, nuclear physics, gravitational physics, and quantum mechanics, among others. These abbreviations indicate the specific NSF organizations or divisions responsible for funding and overseeing the research projects described in the dataset. | Category | Question | Response | |-----------|-----------|-----------| | PHY | What is the significance of IceCube Data Analysis in the U.S. 2022-2026? | This award provides funding for U.S. scientists participating in the IceCube Neutrino Observatory. | | OPP | How does the project High-Latitude Conjugate Study contribute to the field of Astrophysics? | The project studies the interactions between the Earth's magnetosphere and the solar wind. | | PHY | Can you explain the findings of Research in Classical and Quantum Gravity? | This award supports research in gravitational theories, including investigations into the nature of spacetime. | ## Usage ### Loading the Dataset You can load the dataset using the `datasets` library from Hugging Face: ```python from datasets import load_dataset dataset = load_dataset('InnerI/NNNT/ATGCT_Neural_Network_Training_Data.csv') ### Example from transformers import AutoTokenizer, AutoModelForQuestionAnswering from transformers import TrainingArguments, Trainer # Load dataset dataset = load_dataset('InnerI/ATGCT-NNNT/ATGCT_Neural_Network_Training_Data.csv') # Load pre-trained model and tokenizer tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased') model = AutoModelForQuestionAnswering.from_pretrained('bert-base-uncased') # Tokenize the inputs def preprocess_function(examples): return tokenizer(examples['Question'], examples['Response'], truncation=True, padding=True) tokenized_dataset = dataset.map(preprocess_function, batched=True) # Training arguments training_args = TrainingArguments( output_dir='./results', evaluation_strategy='epoch', learning_rate=2e-5, per_device_train_batch_size=16, per_device_eval_batch_size=16, num_train_epochs=3, weight_decay=0.01, ) # Initialize Trainer trainer = Trainer( model=model, args=training_args, train_dataset=tokenized_dataset['train'], eval_dataset=tokenized_dataset['test'] ) # Train the model trainer.train() ## Citation @dataset{atgct_neural_network_training_data, author = {Your Name}, title = {ATGCT Neural Network Training Data}, year = {2024}, url = {https://huggingface.co/datasets/your-dataset-name}, } ''' ## Contact For any questions or issues, please contact [your-email@example.com]. or [I](mailto://i@innerinetcompany.com) ### Instructions: 1. **Update the Paths**: Ensure that the paths in the example code snippets are correct and point to the actual location of your dataset. 2. **Provide Contact Information**: Update the contact information section with your details. 3. **Upload to Hugging Face**: Once you have your dataset ready, follow the Hugging Face [dataset upload guide](https://huggingface.co/docs/datasets/share) to upload and share your dataset. This README.md provides an overview of the dataset, details its structure, and includes example code for loading and using the dataset. > The goals of WoU-MMA are to build the capabilities and accelerate the synergy between observations and theory to realize integrated, multi-messenger astrophysical explorations of the Universe. > [https://t.co/JXYUiPrAW7](https://t.co/JXYUiPrAW7) > > — 𓆣 Inner⚕I⚕NetCompany/ 🤝, 🐘🕸.arweave.dev (@innerinetco) [May 19, 2024](https://twitter.com/innerinetco/status/1792008375254856183?ref_src=twsrc%5Etfw) <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
提供机构:
InnerI
原始信息汇总

ATGCT Neural Network Training Data

Overview

ATGCT Neural Network Training Data is a comprehensive dataset containing 10,000 entries, each with a category, a question, and a response. It is designed for training machine learning models, particularly in the context of advanced communication technologies like neutrino communication. The dataset aims to facilitate the development of language models that can understand and generate text related to scientific research, project details, and technical descriptions.

Dataset Details

Structure

The dataset is provided in CSV format with the following columns:

  • Category: The research area or organization related to the question and response.
  • Question: A question pertaining to the projects title, methodology, findings, or implications.
  • Response: A detailed response that provides insights or answers related to the question.

Sample Entries

  • Category: PHY (Division of Physics)

    • Question: What is the significance of IceCube Data Analysis in the U.S. 2022-2026?
    • Response: This award provides funding for U.S. scientists participating in the IceCube Neutrino Observatory.
  • Category: OPP (Office of Polar Programs)

    • Question: How does the project High-Latitude Conjugate Study contribute to the field of Astrophysics?
    • Response: The project studies the interactions between the Earths magnetosphere and the solar wind.
  • Category: PHY (Division of Physics)

    • Question: Can you explain the findings of Research in Classical and Quantum Gravity?
    • Response: This award supports research in gravitational theories, including investigations into the nature of spacetime.

Usage

The dataset can be loaded using the datasets library from Hugging Face:

python from datasets import load_dataset dataset = load_dataset(InnerI/ATGCT-NNNT/ATGCT_Neural_Network_Training_Data.csv)

Citation

@dataset{atgct_neural_network_training_data, author = {Your Name}, title = {ATGCT Neural Network Training Data}, year = {2024}, url = {https://huggingface.co/datasets/your-dataset-name}, }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作