five

English Compounds Dataset for Compositionality Tests

收藏
DataCite Commons2024-05-19 更新2024-07-13 收录
下载链接:
https://fdat.uni-tuebingen.de/records/sm2mc-njn64
下载链接
链接失效反馈
官方服务:
资源简介:
The ENglish COMpositionality dataset containing COMpounds (en-comcom) was constructed from two existing compound datasets - the Tratz (2011) dataset and the Ó'Séaghdha (2008) dataset - and a selection of the nominal compounds in the WordNet database. The Tratz (2011) dataset contains 19158 compounds and is part of the semantically-enriched parser described in Tratz (2011) available at http://www.isi.edu/publications/licensed-sw/fanseparser/ The Ó'Séaghdha (2008) contains 1443 compounds and is available at http://www.cl.cam.ac.uk/~do242/Resources/1443_Compounds.tar.gz Additional compounds were collected from the WordNet 3.1 (Fellbaum, 1998) 'data.noun' file. The extracted list contained 18775 compounds. The combination of compounds from the three sources was additionaly pre-processed and frequency-filtered - details in Dima (2019). The final dataset has 27220 compounds. The train, test and dev splits contain 19054, 5444 and 2722 compounds. The train/test/dev files have the following format:                   modifier head compound (e.g. police car police_car) For results of compositionality models evaluated on this dataset see Dima (2016), Dima (2019).                  Dima, Corina. 2015. Reverse-engineering Language: A Study on the Semantic Compositionality of German Compounds. In Proceedings of EMNLP 2015, Lisbon, Portugal, pp. pp. 1637–1642                  [Download paper: https://aclweb.org/anthology/D/D15/D15-1188.pdf]                  - Dima, C. 2016. On the Compositionality and Semantic Interpretation of English Noun Compounds. In Proceedings of the 1st Workshop on Representation Learning for NLP @ ACL 2016, pages 27–39, Berlin, Germany.                  - Dima, C. 2019. Composition Models for the Representation and Semantic Interpretation of Nominal Compounds. PhD thesis. University of Tübingen.                  - Fellbaum, C. 1998. WordNet. Wiley Online Library.                  - Ó Séaghdha, D. 2008. Learning compound noun semantics. PhD thesis, Computer Laboratory, University of Cambridge. Published as University of Cambridge Computer Laboratory Technical Report 735.                  - Tratz, S. 2011. Semantically-enriched parsing for natural language understanding. PhD thesis, PhD Thesis, University of Southern California.   English nominal compounds - compositional distributional representations - semandtic composition
提供机构:
University of Tübingen
创建时间:
2024-05-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作