Automatic construction and analysis of interrogative corpora from texts

Name: Automatic construction and analysis of interrogative corpora from texts
Creator: ORTOLANG (Open Resources and TOols for LANGuage) - www.ortolang.fr
Published: 2026-02-10 21:41:53
License: 暂无描述

DataCite Commons2026-02-10 更新2026-05-04 收录

下载链接：

https://www.ortolang.fr/market/item/automatic-interro-corpora/v1

下载链接

链接失效反馈

官方服务：

资源简介：

These three Perl scripts extract and annotate written interrogatives from a text file. Corpus_builder_novels.pl extracts and annotates all sentences ending in a question mark. Corpus_builder_novels_direct-speech.pl extracts and annotates only the sentences starting with an indicator of direct speech (i.e. quotation marks, a hyphen or a dash). Corpus_analyzer_novels.pl sorts the interrogatives alphabetically (i.e. according to semantic type gt; morphosyntactic form gt; ID), counts the different variants, and calculates ratios. Besides, it prints a script which can be inserted in R, where statistical probability values are calculated in order to compare the samples to previous corpus studies.The Corpus_builder_novels.pl script does not annotate everything accurately, but together with Corpus_analyzer_novels.pl it is a great tool for first data exploration and it can give a quick overall picture of semantic and morphosyntactic distribution.

提供机构：

ORTOLANG (Open Resources and TOols for LANGuage) - www.ortolang.fr

创建时间：

2026-02-10

5,000+

优质数据集

54 个

任务类型

进入经典数据集