harfoush4/arabic-eou-dataset
收藏Hugging Face2025-12-12 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/harfoush4/arabic-eou-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: text
dtype: string
- name: eou
dtype: int64
- name: source
dtype: string
- name: word_position
dtype: int64
- name: total_words
dtype: int64
- name: is_final
dtype: bool
- name: word_count
dtype: int64
- name: char_count
dtype: int64
- name: position_ratio
dtype: float64
- name: has_punctuation
dtype: bool
- name: ends_with_period
dtype: bool
- name: ends_with_question
dtype: bool
- name: ends_with_exclamation
dtype: bool
- name: is_single_backchannel
dtype: bool
- name: ends_with_continuation
dtype: bool
- name: starts_with_question
dtype: bool
- name: has_strong_eou
dtype: bool
- name: has_saudi_marker
dtype: bool
- name: is_very_short
dtype: bool
- name: is_short
dtype: bool
- name: is_medium
dtype: bool
- name: is_long
dtype: bool
splits:
- name: train
num_bytes: 3782488
num_examples: 35739
- name: validation
num_bytes: 811094
num_examples: 7659
- name: test
num_bytes: 810530
num_examples: 7659
download_size: 709554
dataset_size: 5404112
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
- split: test
path: data/test-*
---
提供机构:
harfoush4



