The Knesset Meetings Corpus 2004-2005
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/2707355
下载链接
链接失效反馈官方服务:
资源简介:
The Knesset Meetings Corpus 2004-2005 is made up of two components:
Raw texts - 282 files made up of 867,725 lines together. These can be downloaded in two formats:
As doc files, encoded using windows-1255 encoding:
kneset16.zip - Contains 164 text files made up of 543,228 lines together. [MILA host] [Github Mirror]
kneset17.zip - Contains 118 text files made up of 324,497 lines together. [MILA host] [Github Mirror]
As txt files, encoded using utf8 encoding:
kneset.tar.gz - An archive of all the raw text files, divided into two folders: [Github mirror]
16 - Contains 164 text files made up of 543,228 lines together.
17 - Contains 118 text files made up of 324,497 lines together.
knesset_txt_16.tar.gz- Contains 164 text files made up of 543,228 lines together. [MILA host] [Github Mirror]
knesset_txt_17.zip - Contains 118 text files made up of 324,497 lines together. [MILA host] [Github Mirror]
Tokenized and morphologically tagged texts - Tagged versions exist only for the files in the 16 folder. The text are represented using MILA's XML schema for corpora. These can be downloaded in two ways:
knesset_tagged_16.tar.gz - An archive of all tokenized and tagged files. [MILA host] [Archive.org mirror]
By cloning this repository, as the unarchived version of these files can be found in this repository, under the knesset_tagged folder.
创建时间:
2020-01-24



