five

Automatic construction and analysis of interrogative corpora from texts

收藏
DataCite Commons2026-02-10 更新2026-05-04 收录
下载链接:
https://www.ortolang.fr/market/item/automatic-interro-corpora/v1
下载链接
链接失效反馈
官方服务:
资源简介:
These three Perl scripts extract and annotate written interrogatives from a text file. Corpus_builder_novels.pl extracts and annotates all sentences ending in a question mark. Corpus_builder_novels_direct-speech.pl extracts and annotates only the sentences starting with an indicator of direct speech (i.e. quotation marks, a hyphen or a dash). Corpus_analyzer_novels.pl sorts the interrogatives alphabetically (i.e. according to semantic type gt; morphosyntactic form gt; ID), counts the different variants, and calculates ratios. Besides, it prints a script which can be inserted in R, where statistical probability values are calculated in order to compare the samples to previous corpus studies.The Corpus_builder_novels.pl script does not annotate everything accurately, but together with Corpus_analyzer_novels.pl it is a great tool for first data exploration and it can give a quick overall picture of semantic and morphosyntactic distribution.
提供机构:
ORTOLANG (Open Resources and TOols for LANGuage) - www.ortolang.fr
创建时间:
2026-02-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作