350M Model

Figshare2025-05-23 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/350M_Model/29135096

下载链接

链接失效反馈

官方服务：

资源简介：

# 350M Model**RAG-350M** is a 350 million parameters Small Reasoning Model, trained for retrieval-augmented general (RAG), search and source summarization. Along with RAG-1B it belongs to our family of specialized reasoning models.RAG-350M outperforms most SLMs (4 billion parameters and below) on standardized benchmarks for retrieval-augmented general (HotPotQA, 2wiki) and is a highly cost-effective alternative with popular larger models, including Qwen-2.5-7B, Llama-3.1-8B and Gemma-3-4B. It is the only SLM to date to maintain consistent RAG performance across leading European languages and to ensure systematic reference grounding for statements. Due to its size, ease of deployment on constrained infrastructure (including mobile phone) and built-in support for factual and accurate information, RAG-350m unlocks a range of new use cases for generative AI.## FeaturesRAG-350M is a specialized language model using a series of special tokens to process a structured input (query and sources) and generate a structured output (reasoning sequence and answer with sources). For easier implementation, we encourage to use the associated API library.### Citation supportRAG-350M natively generated grounded answers on the basis of excerpts and citations extracted from the provided sources, using a custom syntax inspired by Wikipedia. It is one a handful open weights model to date to have been developed with this feature and the first one designed for actual deployment. In contrast with Anthropic approach (*Citation mode*), citation are integrally generated by the model and are not the product of external chunking. As a result we can provide another desirable feature to simplify source checking: citation shortening for longer excerpts (using "(…)").### RAG reasoningRAG-350M generates a specific reasoning sequences incorporating several proto-agentic abilities for RAG applications. The model is able to make a series of decisions directly:* Assessing whether the query is understandable.* Assessing whether the query is trivial enough to not require a lengthy pre-analysis (*adjustable reasoning*)* Assessing whether the sources do contain enough input to generate a grounded answer.The structured reasoning trace include the following steps:* Language detection of the query. The model will always strive to answer in the language of the original query.* Query analysis and associated query report. The analysis can either lead to a standard answer, a shortening reasoning trace/answer for trivial question, a reformulated query or a refusal (that could in the context of the application be transformed into user input querying).* Source analysis and associated source report. This step evaluates the coverage and depth of the provided sources in regards to the query.* Draft of the final answer.### MultilingualityRAG-350M is able to read and write in the main European languages: French, German, Italian, Spanish and, to a lesser extent, Polish, Latin and Portuguese.To date, it is the only small language model with negligible loss of performance in leading European languages for RAG-related tasks. On a translated set of HotPotQA we observed a significant drop of performance in most SLMs from 10\% to 30-35\% for sub-1B models. We do expect the results of any standard English evaluation on our RAG models should be largely transferable to the main European languages limiting the costs of evaluation and deployment in multilingual settings.## TrainingRAG-350M is trained on large synthetic dataset emulating retrieval of wide variety of multilingual open sources from Common Corpus. They provide native support for citation and grounding with literal quotes. Following on the latest trends of agentification, the models reintegrate multiple features associated with RAG workflows such as query routing, query reformulation, source reranking.## EvaluationRAG-350M was evaluated on three standard RAG benchmarks, 2wiki, HotpotQA and MuSique.All the benchmarks only assess the "trivial" mode on questions requiring some form of multi-hop reasoning over sources (answer disseminated into different sources) as well as discrimination of distractor sources.RAG-350M is not simply a cost-effective version of larger models. We found it has been able to answer correctly to several hundred questions from HotPotQA that neither Llama-3-8b nor Qwen-2.5-7b could solve. Consequently we encourage its use as part of multi-model RAG systems.

创建时间：

2025-05-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集