five

Marchanjo/spider-FIT-pt

收藏
Hugging Face2024-01-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Marchanjo/spider-FIT-pt
下载链接
链接失效反馈
官方服务:
资源简介:
mRAT-SQL是一个多语言SQL翻译器,通过数据库模式剪枝来改进自注意力机制,能够处理长文本序列。它使用了mT5-large模型,并在英语、葡萄牙语、西班牙语和法语四种语言上进行了微调。mRAT-SQL+GAP则是一个专门针对葡萄牙语的文本到SQL翻译器,使用了多语言BART模型,并通过增加训练数据集的大小来提高翻译效果。
提供机构:
Marchanjo
原始信息汇总

mRAT-SQL-FIT

A Multilingual Translator to SQL with Database Schema Pruning to Improve Self-Attention

Marcelo Archanjo Jose, Fabio Gagliardi Cozman

  • Description: This project addresses the challenge of long text sequences in transformers by introducing a training process with database schema pruning. This technique removes tables and columns names that are irrelevant to the query of interest, allowing transformers to handle up to 512 input tokens. The model uses a multilingual approach with the mT5-large model fine-tuned on a data-augmented Spider dataset in four languages: English, Portuguese, Spanish, and French.
  • Performance: The proposed technique improved the exact set match accuracy from 0.718 to 0.736 on a validation dataset (Dev).
  • Resources: Source code, evaluations, and checkpoints are available at mRAT-SQL.

mRAT-SQL+GAP

mRAT-SQL+GAP: A Portuguese Text-to-SQL Transformer

Marcelo Archanjo José, Fabio Gagliardi Cozman

  • Description: This project focuses on the translation of natural language questions to SQL queries in Portuguese. It adapts the RAT-SQL+GAP system using a multilingual BART model and produces a translated version of the Spider dataset. The experiments show that training with both original and translated datasets improves performance, even when targeting a single language.
  • Performance: The multilingual BART model fine-tuned with a double-size training dataset (English and Portuguese) achieved 83% of the baseline, making inferences for the Portuguese test dataset.
  • Resources: The multilingual ready version of RAT-SQL+GAP and the data are available, open-sourced as mRAT-SQL+GAP at mRAT-SQL.
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作