futo-org/swipe-negatives
收藏Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/futo-org/swipe-negatives
下载链接
链接失效反馈官方服务:
资源简介:
Swipe Hard Negatives是一个用于滑动键盘语言模型训练的数据集,包含通过KNN算法从SwipeALot编码器的滑动形状嵌入中挖掘出的硬负样本词集。每个正样本词配有128个最易混淆的负样本词及其与正样本词的平均成对样本余弦相似度。这些负样本之所以被称为硬,是因为它们在视觉滑动路径上与正样本相似,而非仅基于词汇特征。数据集目前仅支持en_qwerty键盘布局。数据集的生成过程包括:使用SwipeALot编码器嵌入每个词的滑动样本,通过KNN算法找到最近邻样本,计算平均余弦相似度,并对结果进行归一化和去重处理。
Swipe Hard Negatives is a dataset designed for swipe-keyboard language model training, containing hard-negative word sets mined by KNN over swipe-shape embeddings from the SwipeALot encoder. For each positive word, the top 128 most-confusable words are listed alongside their mean pairwise sample-to-sample cosine similarity with the positive word. These negatives are considered hard because they share visual swipe-path similarity with the positive, not just lexical features. The dataset is currently specific to the en_qwerty keyboard layout. The generation process involves embedding each words swipe samples using the SwipeALot encoder, performing KNN to find nearest neighbors, aggregating mean cosine similarity, and normalizing and deduplicating the results.
提供机构:
futo-org



