Small Language Model (SLM) Survey Comparison
收藏DataCite Commons2024-12-20 更新2025-04-16 收录
下载链接:
https://orkg.org/comparison/R790020
下载链接
链接失效反馈官方服务:
资源简介:
This comparison is based on the research survey paper entitled "A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness" by Fali Wang et al. This research paper provides a comprehensive survey of Small Language Models (SLMs) in the era of Large Language Models (LLMs), focusing on their techniques, enhancements, applications, collaboration with LLMs, and trustworthiness. Key points include the following:
1) Definition and Importance of SLMs: SLMs are defined by their capability to perform specialized tasks and suitability for resource-constrained settings/devices, offering low inference latency, cost-effectiveness, and efficient development compared to LLMs.
2) Foundational Concepts: The paper discusses the architecture and training processes of SLMs, highlighting the use of Transformer architecture, self-attention mechanisms, and various training techniques like knowledge distillation and quantization.
3) Enhancement Techniques: Various methods to improve SLM performance are explored, including training from scratch, fine-tuning, knowledge distillation, and leveraging LLM-enhancing technologies.
4) Applications of SLMs: SLMs are applied in diverse NLP tasks such as question-answering, coding, recommender systems, web search, and mobile-device control, demonstrating their versatility and efficiency in specific domains.
5) SLMs for LLMs: The paper examines how SLMs can enhance LLMs by improving reliability, extracting prompts, fine-tuning, and evaluating LLMs, and addressing issues like high inference latency and susceptibility to knowledge noise.
6) Trustworthiness: The trustworthiness of SLMs is investigated, focusing on robustness, privacy, reliability, safety, and fairness, with a summary of current evaluation methods and challenges.
7) Future Directions: Promising research directions include developing efficient SLM architectures, expanding domain-specific SLMs, establishing benchmarking platforms, enhancing SLM performance and efficiency, and ensuring the trustworthiness of SLMs.
This comparison provides a high-level overview and training details of various generic-domain Small Language Models (SLMs). The properties and their main points are summarized as follows:
**Model**: Lists the names of the SLMs, such as Llama 3.2, Qwen, Gemma, and others.
**number of parameters**: indicates the number of parameters in each model, ranging from hundreds of millions (M) to several billions (B).
**Date**: Specifies the release date of each model
**Paradigm**: Paradigm refers to the primary training approach or methodology used for developing the small language models (SLMs). The respective values in this property indicate whether the models were pre-trained, instruction-tuned, or obtained through other specific training techniques such as distillation or pruning.
**Domain**: Identifies the general domain of application for each model, typically labeled as "Generic". The respective values are generic, coding, or scientific.
**Training Datasets**: the training datasets property lists the datasets used to train each small language model (SLM), detailing the specific corpora or collections of data employed during the training process. Lists the datasets used for training each model, including well-known corpora like Pile, C4, RedPajama, and others.
**Attention mechanism**: the main types of attention mechanisms used in SLMs include: 1) Multi-Head Attention (MHA), widely used in transformer models 2) Multi-Query Attention (MQA) using a single shared query across all heads but allowing different key and value projections 3) Group-Query Attention (GQA) sharing query representations across multiple heads while allowing separate key and value representations, 4) Multi-Head Latent Attention (MLA) using low-rank key-value joint compression, requiring much less Key-Value (KV) Cache and 5) Flash Attention, the Flash Attention mechanism accelerates the self-attention process by minimizing memory overhead and optimizing computation, allowing models to process longer sequences more efficiently by reducing the need for storing intermediate results and leveraging efficient memory access patterns.
**Activation method/function**: the main types of activation functions used in the feed-forward neural networks (FFN) of SLMs include: ReLU (Rectified Linear Unit), GELU (Gaussian Error Linear Unit), the GELU variant GELUtanh, the SiLU (Sigmoid Linear Unit) and the Swish-Gated Linear Units, SwiGLU activation function
**Additional Training Techniques**: the additional training techniques property lists the specific additional methods and strategies used to train the SLMs. These training techniques are crucial for optimizing the performance, efficiency, and adaptability of SLMs, enabling them to handle a wide range of tasks effectively. Among them, we include RoPE (Rotary Positional Embedding), RMSNorm (Root Mean Square Layer Normalization), RLHF (Reinforcement Learning from Human Feedback), and so on.
提供机构:
Open Research Knowledge Graph
创建时间:
2024-12-20



