English Compounds Dataset for Compositionality Tests
收藏DataCite Commons2024-05-19 更新2024-07-13 收录
下载链接:
https://fdat.uni-tuebingen.de/records/sm2mc-njn64
下载链接
链接失效反馈官方服务:
资源简介:
The ENglish COMpositionality dataset containing COMpounds (en-comcom) was constructed from two existing compound datasets - the Tratz (2011) dataset and the Ó'Séaghdha (2008) dataset - and a selection of the nominal compounds in the WordNet database.
The Tratz (2011) dataset contains 19158 compounds and is part of the semantically-enriched parser described in Tratz (2011) available at http://www.isi.edu/publications/licensed-sw/fanseparser/
The Ó'Séaghdha (2008) contains 1443 compounds and is available at http://www.cl.cam.ac.uk/~do242/Resources/1443_Compounds.tar.gz
Additional compounds were collected from the WordNet 3.1 (Fellbaum, 1998) 'data.noun' file. The extracted list contained 18775 compounds.
The combination of compounds from the three sources was additionaly pre-processed and frequency-filtered - details in Dima (2019). The final dataset has 27220 compounds. The train, test and dev splits contain 19054, 5444 and 2722 compounds.
The train/test/dev files have the following format:
modifier head compound (e.g. police car police_car)
For results of compositionality models evaluated on this dataset see Dima (2016), Dima (2019).
Dima, Corina. 2015. Reverse-engineering Language: A Study on the Semantic Compositionality of German Compounds. In Proceedings of EMNLP 2015, Lisbon, Portugal, pp. pp. 1637–1642
[Download paper: https://aclweb.org/anthology/D/D15/D15-1188.pdf]
- Dima, C. 2016. On the Compositionality and Semantic Interpretation of English Noun Compounds. In Proceedings of the 1st Workshop on Representation Learning for NLP @ ACL 2016, pages 27–39, Berlin, Germany.
- Dima, C. 2019. Composition Models for the Representation and Semantic Interpretation of Nominal Compounds. PhD thesis. University of Tübingen.
- Fellbaum, C. 1998. WordNet. Wiley Online Library.
- Ó Séaghdha, D. 2008. Learning compound noun semantics. PhD thesis, Computer Laboratory, University of Cambridge. Published as University of Cambridge Computer Laboratory Technical Report 735.
- Tratz, S. 2011. Semantically-enriched parsing for natural language understanding. PhD thesis, PhD Thesis, University of Southern California.
English nominal compounds - compositional distributional representations - semandtic composition
提供机构:
University of Tübingen
创建时间:
2024-05-19



