five

The Knesset Meetings Corpus 2004-2005

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/2707355
下载链接
链接失效反馈
官方服务:
资源简介:
The Knesset Meetings Corpus 2004-2005 is made up of two components: Raw texts - 282 files made up of 867,725 lines together. These can be downloaded in two formats: As doc files, encoded using windows-1255 encoding: kneset16.zip - Contains 164 text files made up of 543,228 lines together. [MILA host] [Github Mirror] kneset17.zip - Contains 118 text files made up of 324,497 lines together. [MILA host] [Github Mirror] As txt files, encoded using utf8 encoding: kneset.tar.gz - An archive of all the raw text files, divided into two folders: [Github mirror] 16 - Contains 164 text files made up of 543,228 lines together. 17 - Contains 118 text files made up of 324,497 lines together. knesset_txt_16.tar.gz- Contains 164 text files made up of 543,228 lines together. [MILA host] [Github Mirror] knesset_txt_17.zip - Contains 118 text files made up of 324,497 lines together. [MILA host] [Github Mirror] Tokenized and morphologically tagged texts - Tagged versions exist only for the files in the 16 folder. The text are represented using MILA's XML schema for corpora. These can be downloaded in two ways: knesset_tagged_16.tar.gz - An archive of all tokenized and tagged files. [MILA host] [Archive.org mirror] By cloning this repository, as the unarchived version of these files can be found in this repository, under the knesset_tagged folder.
创建时间:
2020-01-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作