teleprint-me/misty-qa
收藏Hugging Face2025-01-24 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/teleprint-me/misty-qa
下载链接
链接失效反馈官方服务:
资源简介:
Misty-QA是一个合成数据集,用于训练语义相似度模型。该数据集通过llama_cpp_client库的自动化脚本生成,包含查询、相关文档和不相关文档,并为它们标注了相似度标签(相似为1,不相似为-1)。数据集结构为JSON格式,便于集成到训练工作流中,目前专注于原型设计的小规模数据集(例如20个示例)。该数据集的许可是知识共享署名-非商业性使用-相同方式共享4.0国际许可。
Misty-QA is a synthetic dataset for training semantic similarity models. It is generated through an automated script using the llama_cpp_client library, containing queries, related documents, and unrelated documents labeled with similarity scores (1 for similar, -1 for dissimilar). The dataset is structured in JSON format for easy integration into training workflows and is currently focused on a small-scale prototype (e.g., 20 examples). The dataset is licensed under CC-BY-NC-SA 4.0.
提供机构:
teleprint-me



