five

Tab2Know evaluation data

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/3983012
下载链接
链接失效反馈
官方服务:
资源简介:
Evaluation data for the paper "Tab2Know: Building a Knowledge Base from Tables in Scientific Papers" published at ISWC2020. For code, see https://github.com/karmaresearch/tab2know .   This resource contains the following files: - `venues.txt`: The venues that were use for selecting PDFs from the [Semantic Scholar Open Research Corpus](http://s2-public-api-prod.us-west-2.elasticbeanstalk.com/corpus/) that were published in the last 5 years. - `extracted-tables.tar.gz`: All tables that we extracted using [Tabula](https://github.com/tabulapdf/tabula) from these PDFs. - `sample-400.tar.gz`: A sample of these tables which we used for annotation. - `ontology.ttl`: The annotation ontology in Turtle format. - `all_metadata.jsonl`: Annotations for this sample in the JSON format described below. - `labelqueries.csv`: The label queries used for weak annotation, created using the annotation interface. This CSV file contains 6 columns: a numeric ID, the label query template name (`template`), the template slots (`slots`), the label type (`label`), the annotation value (`value`), and a toggle for the interface (`enabled`). - `labelqueries-sparql-templates.zip`: The label query templates. These are SPARQL queries with slots of the form `{{slot}}`. The templates in `labelqueries.csv` refer to these files. - `rules.txt`: Datalog rules that we used for entity resolution. - `tab2know-graph.nt.gz`: The final RDF graph that contains all extracted table structures, predicted table and column classes, and resolved entity links.
创建时间:
2020-08-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作