Automatic construction and analysis of interrogative corpora from texts
收藏DataCite Commons2026-02-10 更新2026-05-04 收录
下载链接:
https://www.ortolang.fr/market/item/automatic-interro-corpora/v1
下载链接
链接失效反馈官方服务:
资源简介:
These three Perl scripts extract and annotate written interrogatives from a text file. Corpus_builder_novels.pl extracts and annotates all sentences ending in a question mark. Corpus_builder_novels_direct-speech.pl extracts and annotates only the sentences starting with an indicator of direct speech (i.e. quotation marks, a hyphen or a dash). Corpus_analyzer_novels.pl sorts the interrogatives alphabetically (i.e. according to semantic type gt; morphosyntactic form gt; ID), counts the different variants, and calculates ratios. Besides, it prints a script which can be inserted in R, where statistical probability values are calculated in order to compare the samples to previous corpus studies.The Corpus_builder_novels.pl script does not annotate everything accurately, but together with Corpus_analyzer_novels.pl it is a great tool for first data exploration and it can give a quick overall picture of semantic and morphosyntactic distribution.
提供机构:
ORTOLANG (Open Resources and TOols for LANGuage) - www.ortolang.fr
创建时间:
2026-02-10



