five

Automatic expansion of lexicon for organic residues valorization in emerging and developing countries

收藏
DataCite Commons2026-02-09 更新2026-03-29 收录
下载链接:
https://dataverse.cirad.fr/citation?persistentId=doi:10.18167/DVN1/G2IFIA
下载链接
链接失效反馈
官方服务:
资源简介:
<p> A previously created lexicon (<a href="https://doi.org/10.18167/DVN1/HNZZSI" target="_blank">https://doi.org/10.18167/DVN1/HNZZSI</a>) was processed to: </p> <ol> <li>combine existing synonymic terms into consolidated relevant terms, and</li> <li>extract additional morphosyntactic variants from a corpus of 7,692 publications (titles and abstracts).</li> </ol> <p> The extraction of variants was based on the Fastr method (Jacquemin C., 1994), a driven linguistic approach that enables the extraction of term variants from full-text documents. </p> <p> The dataset contains two types of files: </p> <ul> <li> <strong>"XX_consolidated"</strong>: these files contain the lexicons built for three lists of relevant terms (OWT, TM and AV) after automatic consolidation with Fastr (sheet <em>raw</em>) and after manual validation (sheet <em>validated</em>). Terms are split into two columns: <em>seed_terms</em> and <em>variants</em>. The <em>variants</em> column contains terms from the lexicon that were considered synonyms of the corresponding <em>seed_term</em>. </li> <li> <strong>"XX_variants_from_corpus"</strong>: these files contain the lexicons' seed terms (from their consolidated version) along with the morphosyntactic variants extracted using the Fastr method. The code is openly accessible at: <a href="https://github.com/SarahVal/FastrOrganicWastes/tree/main" target="_blank">https://github.com/SarahVal/FastrOrganicWastes/tree/main</a>. </li> </ul> __________________________________________________________________ <p> Jacquemin C. Fastr: a unification-based front-end to automatic indexing. In: Intelligent multimedia information retrieval systems and management - Volume 1, Le centre de hautes études internationales d’informatique documentaire, Paris, FRA, RIAO ’94, pp. 34–47, 1994 </p>
提供机构:
CIRAD Dataverse
创建时间:
2026-01-19
二维码
社区交流群
二维码
科研交流群
商业服务