Tab2Know evaluation data
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/3983012
下载链接
链接失效反馈官方服务:
资源简介:
Evaluation data for the paper "Tab2Know: Building a Knowledge Base from Tables in Scientific Papers" published at ISWC2020.
For code, see https://github.com/karmaresearch/tab2know .
This resource contains the following files:
- `venues.txt`: The venues that were use for selecting PDFs from the [Semantic Scholar Open Research Corpus](http://s2-public-api-prod.us-west-2.elasticbeanstalk.com/corpus/) that were published in the last 5 years.
- `extracted-tables.tar.gz`: All tables that we extracted using [Tabula](https://github.com/tabulapdf/tabula) from these PDFs.
- `sample-400.tar.gz`: A sample of these tables which we used for annotation.
- `ontology.ttl`: The annotation ontology in Turtle format.
- `all_metadata.jsonl`: Annotations for this sample in the JSON format described below.
- `labelqueries.csv`: The label queries used for weak annotation, created using the annotation interface. This CSV file contains 6 columns: a numeric ID, the label query template name (`template`), the template slots (`slots`), the label type (`label`), the annotation value (`value`), and a toggle for the interface (`enabled`).
- `labelqueries-sparql-templates.zip`: The label query templates. These are SPARQL queries with slots of the form `{{slot}}`. The templates in `labelqueries.csv` refer to these files.
- `rules.txt`: Datalog rules that we used for entity resolution.
- `tab2know-graph.nt.gz`: The final RDF graph that contains all extracted table structures, predicted table and column classes, and resolved entity links.
创建时间:
2020-08-25



