kardosdrur/opensubtitles-da-sv
收藏Hugging Face2023-10-26 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/kardosdrur/opensubtitles-da-sv
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
dataset_info:
features:
- name: link_id
dtype: string
- name: da
dtype: string
- name: 'no'
dtype: string
- name: overlap
dtype: float64
splits:
- name: train
num_bytes: 270499727.08648384
num_examples: 1772983
- name: test
num_bytes: 67624969.91351616
num_examples: 443246
download_size: 201404638
dataset_size: 338124697.0
---
# OpenSubtitles Danish-Swedish
Aligned sentences with heuristic-based filters from OpenSubtitles in Danish and in Swedish.
The source code for producing the dataset is included in the repository.
The dataset was created to aid training sentence transformers in the Danish Foundation Models project.
提供机构:
kardosdrur
原始信息汇总
OpenSubtitles Danish-Swedish 数据集概述
数据集信息
- 许可证:MIT
- 配置:
- 默认配置:
- 训练集:路径为
data/train-* - 测试集:路径为
data/test-*
- 训练集:路径为
- 默认配置:
数据集特征
- 特征列表:
link_id:字符串类型da:字符串类型no:字符串类型overlap:浮点数类型(float64)
数据集分割
- 训练集:
- 字节数:270499727.08648384
- 样本数:1772983
- 测试集:
- 字节数:67624969.91351616
- 样本数:443246
数据集大小
- 下载大小:201404638 字节
- 数据集大小:338124697.0 字节



