five

Egocentric-10K-Evaluation

收藏
魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/builddotai/Egocentric-10K-Evaluation
下载链接
链接失效反馈
官方服务:
资源简介:
<div style="margin: 20px 0;"> <table style="border-collapse: collapse; width: 100%;"> <tr> <td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/SHyQth6VqSqbAOf_47Swp.png" style="width: 100%; max-width: 100%;"/></td> <td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/ba_6c35-M_qrzjXe1aYOf.png" style="width: 100%; max-width: 100%;"/></td> <td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/O2JIcQw7eEcqlngCXsWWV.png" style="width: 100%; max-width: 100%;"/></td> </tr> <tr> <td style="text-align: center; padding: 5px;"><strong>Egocentric10K</strong></td> <td style="text-align: center; padding: 5px;"><strong>Ego4D</strong></td> <td style="text-align: center; padding: 5px;"><strong>Epic-Kitchens</strong></td> </tr> </table> </div> <p style="margin: 20px 0; line-height: 1.6;"> To evaluate the three in-the-wild egocentric datasets Egocentric-10K, Ego4D, and EPIC-KITCHENS-100 on hand visibility and active manipulation density as a proxy for data efficiency, we randomly sample 10k frames from each dataset and run them through a gemini-2.5-flash. </p> ## Hand Visibility <div style="border: 1px solid #d0d7de; border-radius: 6px; padding: 15px; margin: 15px 0;"> <p style="margin: 0 0 10px 0; font-size: 14px; line-height: 1.6;"> <strong>Prompt:</strong><br/> You are labeling an egocentric first-person image. Your task is to count how many camera-wearer's hands are visually present in the image: 0, 1, or 2.<br/><br/> <strong>Rules:</strong><br/> • Only count hands that are directly visible.<br/> • Do not infer hands that are outside the frame or potentially behind objects.<br/> • Ignore hands belonging to other people.<br/> • Any amount of visibility counts (even fingertips).<br/> • Return only one of: 0, 1, 2. No extra words. </p> <p style="margin: 10px 0 5px 0; font-size: 14px;"><strong>Response Schema:</strong></p> <pre style="padding: 10px; border-radius: 4px; margin: 0; overflow-x: auto;"><code>{ "type": "OBJECT", "properties": { "hand_count": { "type": "INTEGER" } }, "required": ["hand_count"] }</code></pre> </div> <div style="width: 100%; overflow-x: auto;"> | Dataset | Frames | 0 Hands | 1+ Hands | 2 Hands | |---------|--------|---------|----------|---------| | **Egocentric-10K** | 10,000 | **3.58%** | **96.42%** | **76.34%** | | **Ego4D** | 10,000 | 32.67% | 67.33% | 36.95% | | **EPIC-KITCHENS** | 10,000 | 9.63% | 90.37% | 61.05% | </div> <div style="margin: 20px 0;"> <table style="border-collapse: collapse; width: 100%;"> <tr> <td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/7hjr5j56RJG6D5bX4DroF.png" style="width: 100%; max-width: 100%;"/></td> <td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/JucJX20yGU8PALGPbKzzZ.png" style="width: 100%; max-width: 100%;"/></td> <td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/-oRVJBnoyKJxW9KIRY6ed.png" style="width: 100%; max-width: 100%;"/></td> </tr> <tr> <td style="text-align: center; padding: 5px;"><strong>Egocentric10K</strong><br/>2 hands</td> <td style="text-align: center; padding: 5px;"><strong>Ego4D</strong><br/>1 hand</td> <td style="text-align: center; padding: 5px;"><strong>Epic-Kitchens</strong><br/>2 hands</td> </tr> </table> </div> ## Active Manipulation <div style="border: 1px solid #d0d7de; border-radius: 6px; padding: 15px; margin: 15px 0;"> <p style="margin: 0 0 10px 0; font-size: 14px; line-height: 1.6;"> <strong>Prompt:</strong><br/> You are labeling an egocentric first-person image. Your task is to determine whether the camera-wearer is actively manipulating an object at this exact moment.<br/><br/> <strong>Definition:</strong><br/> "Active Manpulation" means the wearer is visibly using their hands to work on, modify, assemble, process, or handle physical objects, materials, components in pursuit of a specific goal<br/><br/> <strong>Rules:</strong><br/> • Do not infer actions that are not visible in the frame.<br/> • If the action is ambiguous or not clearly happening, respond "no."<br/> • Ignore objects held by other people.<br/> • Respond only with: "yes" or "no." </p> <p style="margin: 10px 0 5px 0; font-size: 14px;"><strong>Response Schema:</strong></p> <pre style="padding: 10px; border-radius: 4px; margin: 0; overflow-x: auto;"><code>{ "type": "OBJECT", "properties": { "answer": { "type": "STRING", "enum": ["yes", "no"] } }, "required": ["answer"] }</code></pre> </div> <div style="width: 100%; overflow-x: auto;"> | Dataset | Frames | Active Labor | |---------|--------|--------------| | **Egocentric-10K** | 10,000 | **91.66%** | | **Ego4D** | 10,000 | 50.07% | | **EPIC-KITCHENS** | 10,000 | 85.04% | </div> <div style="margin: 20px 0;"> <table style="border-collapse: collapse; width: 100%;"> <tr> <td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/oPDy1unv--pv45acYePL8.png" style="width: 100%; max-width: 100%;"/></td> <td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/uJYe6p8aM-rrM2nk-KoAY.png" style="width: 100%; max-width: 100%;"/></td> <td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/q2G_-CGnSxHyYDrwacq_l.png" style="width: 100%; max-width: 100%;"/></td> </tr> <tr> <td style="text-align: center; padding: 5px;"><strong>Egocentric10K</strong><br/>Active Labor: Yes</td> <td style="text-align: center; padding: 5px;"><strong>Ego4D</strong><br/>Active Labor: No</td> <td style="text-align: center; padding: 5px;"><strong>Epic-Kitchens</strong><br/>Active Labor: Yes</td> </tr> </table> </div>

<div style="margin: 20px 0;"> <table style="border-collapse: collapse; width: 100%;"> <tr> <td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/SHyQth6VqSqbAOf_47Swp.png" style="width: 100%; max-width: 100%;"/></td> <td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/ba_6c35-M_qrzjXe1aYOf.png" style="width: 100%; max-width: 100%;"/></td> <td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/O2JIcQw7eEcqlngCXsWWV.png" style="width: 100%; max-width: 100%;"/></td> </tr> <tr> <td style="text-align: center; padding: 5px;"><strong>自我中心10K数据集(Egocentric10K)</strong></td> <td style="text-align: center; padding: 5px;"><strong>Ego4D</strong></td> <td style="text-align: center; padding: 5px;"><strong>EPIC厨房数据集(Epic-Kitchens)</strong></td> </tr> </table> </div> <p style="margin: 20px 0; line-height: 1.6;"> 为以手部可见性与主动操作密度作为数据效率的代理评估指标,我们从三个野外真实场景第一人称视角数据集——自我中心10K数据集(Egocentric-10K)、Ego4D与EPIC厨房数据集(EPIC-KITCHENS-100)中,各随机采样10000帧图像,并通过gemini-2.5-flash模型进行推理处理。 </p> ## 手部可见性评估 <div style="border: 1px solid #d0d7de; border-radius: 6px; padding: 15px; margin: 15px 0;"> <p style="margin: 0 0 10px 0; font-size: 14px; line-height: 1.6;"> <strong>标注提示:</strong><br/> 您将对第一人称自我中心视角图像进行标注,任务为统计图像中佩戴相机者的可见手部数量,可选值为0、1或2。<br/><br/> <strong>标注规则:</strong><br/> • 仅统计直接可见的手部<br/> • 不得推断处于画幅外或被物体遮挡的手部<br/> • 忽略其他人员的手部<br/> • 只要存在任意程度的可见性即可(哪怕仅指尖)<br/> • 仅返回0、1、2中的一个数值,不得添加任何额外文字。 </p> <p style="margin: 10px 0 5px 0; font-size: 14px;"><strong>响应格式:</strong></p> <pre style="padding: 10px; border-radius: 4px; margin: 0; overflow-x: auto;"><code>{"type": "OBJECT", "properties": { "hand_count": { "type": "INTEGER" } }, "required": ["hand_count"] }</code></pre> </div> <div style="width: 100%; overflow-x: auto;"> | 数据集 | 采样帧数 | 0只手占比 | 1只及以上手占比 | 2只手占比 | |---------|--------|---------|----------|---------| | **自我中心10K数据集(Egocentric10K)** | 10,000 | **3.58%** | **96.42%** | **76.34%** | | **Ego4D** | 10,000 | 32.67% | 67.33% | 36.95% | | **EPIC厨房数据集(Epic-Kitchens)** | 10,000 | 9.63% | 90.37% | 61.05% | </div> <div style="margin: 20px 0;"> <table style="border-collapse: collapse; width: 100%;"> <tr> <td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/7hjr5j56RJG6D5bX4DroF.png" style="width: 100%; max-width: 100%;"/></td> <td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/JucJX20yGU8PALGPbKzzZ.png" style="width: 100%; max-width: 100%;"/></td> <td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/-oRVJBnoyKJxW9KIRY6ed.png" style="width: 100%; max-width: 100%;"/></td> </tr> <tr> <td style="text-align: center; padding: 5px;"><strong>自我中心10K数据集(Egocentric10K)<br/>2只手</strong></td> <td style="text-align: center; padding: 5px;"><strong>Ego4D<br/>1只手</strong></td> <td style="text-align: center; padding: 5px;"><strong>EPIC厨房数据集(Epic-Kitchens)<br/>2只手</strong></td> </tr> </table> </div> ## 主动操作评估 <div style="border: 1px solid #d0d7de; border-radius: 6px; padding: 15px; margin: 15px 0;"> <p style="margin: 0 0 10px 0; font-size: 14px; line-height: 1.6;"> <strong>标注提示:</strong><br/> 您将对第一人称自我中心视角图像进行标注,任务为判断佩戴相机者当前是否正在主动操作物体。<br/><br/> <strong>定义说明:</strong><br/> “主动操作”指佩戴者通过手部可见地为达成特定目标,对实体物体、材料、组件进行作业、修改、组装、处理或操控的行为。<br/><br/> <strong>标注规则:</strong><br/> • 不得推断画幅中未显现的动作<br/> • 若动作模糊或未明确发生,则回复“no”<br/> • 忽略其他人员所持的物体<br/> • 仅回复“yes”或“no”。 </p> <p style="margin: 10px 0 5px 0; font-size: 14px;"><strong>响应格式:</strong></p> <pre style="padding: 10px; border-radius: 4px; margin: 0; overflow-x: auto;"><code>{"type": "OBJECT", "properties": { "answer": { "type": "STRING", "enum": ["yes", "no"] } }, "required": ["answer"] }</code></pre> </div> <div style="width: 100%; overflow-x: auto;"> | 数据集 | 采样帧数 | 主动操作占比 | |---------|--------|--------------| | **自我中心10K数据集(Egocentric10K)** | 10,000 | **91.66%** | | **Ego4D** | 10,000 | 50.07% | | **EPIC厨房数据集(Epic-Kitchens)** | 10,000 | 85.04% | </div> <div style="margin: 20px 0;"> <table style="border-collapse: collapse; width: 100%;"> <tr> <td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/oPDy1unv--pv45acYePL8.png" style="width: 100%; max-width: 100%;"/></td> <td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/uJYe6p8aM-rrM2nk-KoAY.png" style="width: 100%; max-width: 100%;"/></td> <td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/q2G_-CGnSxHyYDrwacq_l.png" style="width: 100%; max-width: 100%;"/></td> </tr> <tr> <td style="text-align: center; padding: 5px;"><strong>自我中心10K数据集(Egocentric10K)<br/>主动操作:是</strong></td> <td style="text-align: center; padding: 5px;"><strong>Ego4D<br/>主动操作:否</strong></td> <td style="text-align: center; padding: 5px;"><strong>EPIC厨房数据集(Epic-Kitchens)<br/>主动操作:是</strong></td> </tr> </table> </div>
提供机构:
maas
创建时间:
2025-11-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作