Marchanjo/spider-FIT-en
收藏Hugging Face2024-01-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Marchanjo/spider-FIT-en
下载链接
链接失效反馈官方服务:
资源简介:
mRAT-SQL和mRAT-SQL+GAP是两个将自然语言翻译为SQL查询的项目。mRAT-SQL通过数据库模式剪枝技术处理长文本序列,并使用mT5-large模型在英语、葡萄牙语、西班牙语和法语四种语言上进行了微调。mRAT-SQL+GAP则专注于葡萄牙语的文本到SQL翻译,使用了多语言BART模型,并在英语和葡萄牙语的训练数据集上进行了微调。两个项目都基于Spider数据集,并提供了源代码、评估和模型检查点。
mRAT-SQL和mRAT-SQL+GAP是两个将自然语言翻译为SQL查询的项目。mRAT-SQL通过数据库模式剪枝技术处理长文本序列,并使用mT5-large模型在英语、葡萄牙语、西班牙语和法语四种语言上进行了微调。mRAT-SQL+GAP则专注于葡萄牙语的文本到SQL翻译,使用了多语言BART模型,并在英语和葡萄牙语的训练数据集上进行了微调。两个项目都基于Spider数据集,并提供了源代码、评估和模型检查点。
提供机构:
Marchanjo
原始信息汇总
mRAT-SQL-FIT
A Multilingual Translator to SQL with Database Schema Pruning to Improve Self-Attention
- Authors: Marcelo Archanjo Jose, Fabio Gagliardi Cozman
- Description: This project addresses the challenge of long text sequences in transformers by introducing a training process with database schema pruning. The technique allows transformers to handle up to 512 input tokens. The model uses a multilingual approach with the mT5-large model fine-tuned with a data-augmented Spider dataset in four languages: English, Portuguese, Spanish, and French.
- Performance: The proposed technique increased the exact set match accuracy from 0.718 to 0.736 in a validation dataset (Dev).
- Resources: Source code, evaluations, and checkpoints are available at mRAT-SQL.
mRAT-SQL+GAP
mRAT-SQL+GAP: A Portuguese Text-to-SQL Transformer
- Authors: Marcelo Archanjo José, Fabio Gagliardi Cozman
- Description: This project focuses on translating natural language questions to SQL queries in Portuguese. It adapts the RAT-SQL+GAP system using a multilingual BART model and produces a translated version of the Spider dataset. The model is fine-tuned with a double-size training dataset (English and Portuguese).
- Performance: The multilingual BART model achieved 83% of the baseline, making inferences for the Portuguese test dataset.
- Resources: The multilingual ready version of RAT-SQL+GAP and the data are available at mRAT-SQL.



