multilingual_cc_news
收藏魔搭社区2025-12-05 更新2025-02-15 收录
下载链接:
https://modelscope.cn/datasets/intfloat/multilingual_cc_news
下载链接
链接失效反馈官方服务:
资源简介:
### Dataset Summary
This dataset is based on [CloverSearch/cc-news-mutlilingual](https://huggingface.co/datasets/CloverSearch/cc-news-mutlilingual).
We add a script to support access multilingual CC-News dataset with HuggingFace datasets API instead of directly downloading raw data files.
### Data Fields
- `title`: a `string` feature.
- `maintext`: a `string` feature.
- `url`: a `string` feature.
- `date_publish`: a `string` feature.
### How to use this dataset
You can load any subset of CC-News per language:
```python
from datasets import load_dataset
dataset = load_dataset("intfloat/multilingual_cc_news", languages=["af"])
```
## Supported Languages
```
af
als
am
an
ar
arz
as
ast
av
az
azb
ba
bar
bcl
be
bg
bh
bn
bo
bpy
br
bs
bxr
ca
cbk
ce
ceb
ckb
co
cs
cv
cy
da
de
diq
dsb
dty
dv
el
eml
en
eo
es
et
eu
fa
fi
fr
fy
ga
gd
gl
gn
gom
gu
gv
he
hi
hif
hr
hsb
ht
hu
hy
ia
id
ie
ilo
io
is
it
ja
jbo
jv
ka
kk
km
kn
ko
krc
ku
kv
kw
ky
la
lb
lez
li
lmo
lo
lt
lv
mai
mg
mhr
min
mk
ml
mn
mr
mrj
ms
mt
mwl
my
myv
mzn
nah
nap
nds
ne
new
nl
nn
no
oc
or
os
pa
pam
pfl
pl
pms
pnb
ps
pt
qu
rm
ro
ru
sa
sah
sc
scn
sco
sd
sh
si
sk
sl
so
sq
sr
su
sv
sw
ta
te
tg
th
tk
tl
tr
tt
tyv
ug
uk
ur
uz
vec
vep
vi
vls
vo
wa
war
wuu
xal
xmf
yi
yo
yue
zh
```
### 数据集概述
本数据集基于[CloverSearch/cc-news-multilingual](https://huggingface.co/datasets/CloverSearch/cc-news-mutlilingual)。我们新增了配套脚本,以支持通过Hugging Face 数据集API访问多语言CC-News数据集,无需直接下载原始数据文件。
### 数据字段
- `title`:字符串类型特征
- `maintext`:字符串类型特征
- `url`:字符串类型特征
- `date_publish`:字符串类型特征
### 数据集使用方法
你可按语言加载CC-News的任意子集:
python
from datasets import load_dataset
dataset = load_dataset("intfloat/multilingual_cc_news", languages=["af"])
### 支持语言
af
als
am
an
ar
arz
as
ast
av
az
azb
ba
bar
bcl
be
bg
bh
bn
bo
bpy
br
bs
bxr
ca
cbk
ce
ceb
ckb
co
cs
cv
cy
da
de
diq
dsb
dty
dv
el
eml
en
eo
es
et
eu
fa
fi
fr
fy
ga
gd
gl
gn
gom
gu
gv
he
hi
hif
hr
hsb
ht
hu
hy
ia
id
ie
ilo
io
is
it
ja
jbo
jv
ka
kk
km
kn
ko
krc
ku
kv
kw
ky
la
lb
lez
li
lmo
lo
lt
lv
mai
mg
mhr
min
mk
ml
mn
mr
mrj
ms
mt
mwl
my
myv
mzn
nah
nap
nds
ne
new
nl
nn
no
oc
or
os
pa
pam
pfl
pl
pms
pnb
ps
pt
qu
rm
ro
ru
sa
sah
sc
scn
sco
sd
sh
si
sk
sl
so
sq
sr
su
sv
sw
ta
te
tg
th
tk
tl
tr
tt
tyv
ug
uk
ur
uz
vec
vep
vi
vls
vo
wa
war
wuu
xal
xmf
yi
yo
yue
zh
提供机构:
maas
创建时间:
2025-02-12



