five

KVN-AI/CannaNBot

收藏
Hugging Face2023-03-12 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/KVN-AI/CannaNBot
下载链接
链接失效反馈
官方服务:
资源简介:
tokenizer = RobertaTokenizer.from_pretrained('roberta-base') model = RobertaForQuestionAnswering.from_pretrained('roberta-base') import torch from transformers import RobertaTokenizer, RobertaForQuestionAnswering from sklearn.model_selection import train_test_split import pandas as pd # Load the pre-trained RoBERTa tokenizer and model tokenizer = RobertaTokenizer.from_pretrained('roberta-base') model = RobertaForQuestionAnswering.from_pretrained('roberta-base') # Define the preprocessed data as a list of tuples, where the first element is the question and the second element is the answer data = [("What types of products do you sell?", "We offer a wide range of cannabis products, including dried flowers, pre-rolls, oils, capsules, and accessories."), ("How do I place an order?", "You can place an order on our website or by calling our customer service line."), ("How long will it take to receive my order?", "Orders are typically processed and shipped within 1-2 business days. Shipping times vary depending on your location and the shipping method you choose."), ("What payment methods do you accept?", "We accept Visa, Mastercard, and American Express credit cards, as well as Visa Debit and Mastercard Debit cards. You can also pay with an electronic funds transfer (EFT) from your bank account."), ("Do you offer free shipping?", "We offer free shipping on orders over $99 (before taxes and shipping fees) to most locations in Canada."), ("Can I track my order?", "Yes, you will receive a confirmation email with a tracking number once your order has shipped. You can use the tracking number to check the status of your order on the carrier's website."), ("What are your store hours?", "Our stores are open Monday to Sunday from 10:00am to 9:00pm."), ("Where are your store locations?", "We have several retail locations across New Brunswick. You can find a list of our stores and their addresses on our website."), ("What is your return policy?", "We do not accept returns on cannabis products due to health and safety regulations. However, if you receive a damaged or defective product, please contact our customer service team for assistance."), ("What are the legal regulations for buying and using cannabis in New Brunswick?", "In New Brunswick, you must be 19 years or older to purchase and use cannabis. It is illegal to drive under the influence of cannabis, and you can face fines and penalties for doing so. You can possess up to 30 grams of dried cannabis or the equivalent in public, and up to 150 grams in your home. It is also illegal to buy or sell cannabis from anyone other than an authorized retailer. For more information, please visit the Government of New Brunswick's website.")] # Convert the data into a pandas dataframe df = pd.DataFrame(data, columns=["Question", "Answer"]) # Split the data into training, validation, and testing datasets train_data, test_data = train_test_split(df, test_size=0.2, random_state=42) train_data, val_data = train_test_split(train_data, test_size=0.2, random_state=42) # Tokenize the input sequences and convert them to tensors train_input_ids = tokenizer.batch_encode_plus(train_data.Question.tolist(), padding=True, truncation=True, max_length=512, return_tensors="pt") val_input_ids = tokenizer.batch_encode_plus(val_data.Question.tolist(), padding=True, truncation=True, max_length=512, return_tensors="pt") test_input_ids = tokenizer.batch_encode_plus(test_data.Question.tolist
提供机构:
KVN-AI
原始信息汇总

数据集概述

数据集内容

  • 问题与答案列表:数据集包含一系列问题及其对应的答案,共10个问题。
  • 问题示例
    • “What types of products do you sell?”
    • “How do I place an order?”
    • “How long will it take to receive my order?”
    • “What payment methods do you accept?”
    • “Do you offer free shipping?”
    • “Can I track my order?”
    • “What are your store hours?”
    • “Where are your store locations?”
    • “What is your return policy?”
    • “What are the legal regulations for buying and using cannabis in New Brunswick?”

数据处理

  • 数据格式:数据被转换为Pandas DataFrame格式,包含两列:“Question”和“Answer”。
  • 数据分割:数据被分割为训练集、验证集和测试集。训练集占总数据的80%,测试集占20%,然后从训练集中再分割出20%作为验证集。
  • 数据预处理:使用RoBERTa tokenizer对问题进行批量编码,转换为输入ID,并进行填充和截断处理,最大长度为512。

模型与工具

  • Tokenizer:使用预训练的RoBERTa tokenizer (roberta-base)。
  • Model:使用预训练的RoBERTa模型 (roberta-base) 用于问题回答任务。
  • 其他库:使用了torch, transformers, sklearn.model_selection, pandas等库进行数据处理和模型训练。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作