vichetkao/table_translation_eng_kh_100k
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/vichetkao/table_translation_eng_kh_100k
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: image_eng
dtype: image
- name: image_kh
dtype: image
- name: text_eng
dtype: string
- name: text_kh
dtype: string
splits:
- name: train
num_bytes: 8060700610
num_examples: 90000
- name: test
num_bytes: 900873813
num_examples: 10000
download_size: 8961574423
dataset_size: 8961574423
configs:
- config_name: default
data_files:
- split: train
path: data/hf_tables_train.parquet
- split: test
path: data/hf_tables_test.parquet
license: apache-2.0
task_categories:
- table-to-text
- translation
language:
- en
- km
size_categories:
- 100K<n<1M
---
# Multilingual Chart Dataset
## Dataset Overview
- **Total examples:** 100000 (train: 90000, test: 10000)
- **Total size:** 8GB
- **Languages:** English (en), Khmer (km)
- **Chart types:** 22+ types (bar, line, pie, gauge, heatmap, candlestick, etc.)
## Features
- **image_eng**: PNG image of chart in English
- **image_kh**: PNG image of chart in Khmer
- **text_eng**: JSON metadata in English (chart_type, title, axis labels, data)
- **text_kh**: JSON metadata in Khmer (chart_type, title, axis labels, data)
```python
import pandas as pd
import json
from PIL import Image
import io
df = pd.read_parquet('chart_tables_train.parquet')
row = df.iloc[0]
image = Image.open(io.BytesIO(row['image_eng']))
metadata = json.loads(row['text_eng'])
```
提供机构:
vichetkao



