The Knesset Meetings Corpus 2004-2005

NIAID Data Ecosystem2026-03-11 收录

下载链接：

https://zenodo.org/record/2707355

下载链接

链接失效反馈

官方服务：

资源简介：

The Knesset Meetings Corpus 2004-2005 is made up of two components: Raw texts - 282 files made up of 867,725 lines together. These can be downloaded in two formats: As doc files, encoded using windows-1255 encoding: kneset16.zip - Contains 164 text files made up of 543,228 lines together. [MILA host] [Github Mirror] kneset17.zip - Contains 118 text files made up of 324,497 lines together. [MILA host] [Github Mirror] As txt files, encoded using utf8 encoding: kneset.tar.gz - An archive of all the raw text files, divided into two folders: [Github mirror] 16 - Contains 164 text files made up of 543,228 lines together. 17 - Contains 118 text files made up of 324,497 lines together. knesset_txt_16.tar.gz- Contains 164 text files made up of 543,228 lines together. [MILA host] [Github Mirror] knesset_txt_17.zip - Contains 118 text files made up of 324,497 lines together. [MILA host] [Github Mirror] Tokenized and morphologically tagged texts - Tagged versions exist only for the files in the 16 folder. The text are represented using MILA's XML schema for corpora. These can be downloaded in two ways: knesset_tagged_16.tar.gz - An archive of all tokenized and tagged files. [MILA host] [Archive.org mirror] By cloning this repository, as the unarchived version of these files can be found in this repository, under the knesset_tagged folder.

创建时间：

2020-01-24

5,000+

优质数据集

54 个

任务类型

进入经典数据集