Automatic expansion of lexicon for organic residues valorization in emerging and developing countries
收藏DataCite Commons2026-02-09 更新2026-03-29 收录
下载链接:
https://dataverse.cirad.fr/citation?persistentId=doi:10.18167/DVN1/G2IFIA
下载链接
链接失效反馈官方服务:
资源简介:
<p>
A previously created lexicon (<a href="https://doi.org/10.18167/DVN1/HNZZSI" target="_blank">https://doi.org/10.18167/DVN1/HNZZSI</a>) was processed to:
</p>
<ol>
<li>combine existing synonymic terms into consolidated relevant terms, and</li>
<li>extract additional morphosyntactic variants from a corpus of 7,692 publications (titles and abstracts).</li>
</ol>
<p>
The extraction of variants was based on the Fastr method (Jacquemin C., 1994), a driven linguistic approach that enables the extraction of term variants from full-text documents.
</p>
<p>
The dataset contains two types of files:
</p>
<ul>
<li>
<strong>"XX_consolidated"</strong>: these files contain the lexicons built for three lists of relevant terms (OWT, TM and AV) after automatic consolidation with Fastr (sheet <em>raw</em>) and after manual validation (sheet <em>validated</em>). Terms are split into two columns: <em>seed_terms</em> and <em>variants</em>. The <em>variants</em> column contains terms from the lexicon that were considered synonyms of the corresponding <em>seed_term</em>.
</li>
<li>
<strong>"XX_variants_from_corpus"</strong>: these files contain the lexicons' seed terms (from their consolidated version) along with the morphosyntactic variants extracted using the Fastr method. The code is openly accessible at: <a href="https://github.com/SarahVal/FastrOrganicWastes/tree/main" target="_blank">https://github.com/SarahVal/FastrOrganicWastes/tree/main</a>.
</li>
</ul>
__________________________________________________________________
<p>
Jacquemin C. Fastr: a unification-based front-end to automatic indexing. In: Intelligent multimedia information retrieval systems and management - Volume 1, Le centre de hautes études internationales d’informatique documentaire, Paris, FRA, RIAO ’94, pp. 34–47, 1994
</p>
提供机构:
CIRAD Dataverse
创建时间:
2026-01-19



