muneeb-py/autotrain-data-medocr-berta
收藏Hugging Face2023-04-13 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/muneeb-py/autotrain-data-medocr-berta
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是通过AutoTrain为项目medocr-berta自动处理的。数据集的BCP-47语言代码为unk,表示语言未知。数据集包含上下文、问题、答案文本、答案起始位置、未命名特征、图像路径和上下文特征等字段。数据集被分为训练集和验证集,训练集包含773个样本,验证集包含194个样本。
该数据集是通过AutoTrain为项目medocr-berta自动处理的。数据集的BCP-47语言代码为unk,表示语言未知。数据集包含上下文、问题、答案文本、答案起始位置、未命名特征、图像路径和上下文特征等字段。数据集被分为训练集和验证集,训练集包含773个样本,验证集包含194个样本。
提供机构:
muneeb-py
原始信息汇总
数据集概述
数据集名称
- AutoTrain Dataset for project: medocr-berta
语言
- BCP-47代码: unk
数据集结构
- 数据实例示例: json [ { "context": "[597, 612]", "question": "What are the contents?", "answers.text": ["INCL.ALL TAXES"], "answers.answer_start": [597], "feat_Unnamed: 0": [831], "feat_image_path": "/content/train/images/20221121_165501_71f9_jpg.rf.446486cc21bc5e8b824c6554a12e5dfd.jpg", "feat_context": "..." }, { "context": "[100, 111]", "question": "What is Expiry Date?", "answers.text": ["EXP.APR.25"], "answers.answer_start": [100], "feat_Unnamed: 0": [25], "feat_image_path": "/content/train/images/20221121_161649_7e1f_jpg.rf.f9286b7e77dc417400d771a1274f3ab8.jpg", "feat_context": "..." } ]
数据集字段
- 字段列表: json { "context": "Value(dtype=string, id=None)", "question": "Value(dtype=string, id=None)", "answers.text": "Sequence(feature=Value(dtype=string, id=None), length=-1, id=None)", "answers.answer_start": "Sequence(feature=Value(dtype=int32, id=None), length=-1, id=None)", "feat_Unnamed: 0": "Sequence(feature=Value(dtype=int64, id=None), length=-1, id=None)", "feat_image_path": "Sequence(feature=Value(dtype=string, id=None), length=-1, id=None)", "feat_context": "Sequence(feature=Value(dtype=string, id=None), length=-1, id=None)" }
数据集分割
-
分割详情:
Split name Num samples train 773 valid 194



