Egocentric-10K-Evaluation
收藏魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/builddotai/Egocentric-10K-Evaluation
下载链接
链接失效反馈官方服务:
资源简介:
<div style="margin: 20px 0;">
<table style="border-collapse: collapse; width: 100%;">
<tr>
<td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/SHyQth6VqSqbAOf_47Swp.png" style="width: 100%; max-width: 100%;"/></td>
<td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/ba_6c35-M_qrzjXe1aYOf.png" style="width: 100%; max-width: 100%;"/></td>
<td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/O2JIcQw7eEcqlngCXsWWV.png" style="width: 100%; max-width: 100%;"/></td>
</tr>
<tr>
<td style="text-align: center; padding: 5px;"><strong>Egocentric10K</strong></td>
<td style="text-align: center; padding: 5px;"><strong>Ego4D</strong></td>
<td style="text-align: center; padding: 5px;"><strong>Epic-Kitchens</strong></td>
</tr>
</table>
</div>
<p style="margin: 20px 0; line-height: 1.6;">
To evaluate the three in-the-wild egocentric datasets Egocentric-10K, Ego4D, and EPIC-KITCHENS-100 on hand visibility and active manipulation density as a proxy for data efficiency, we randomly sample 10k frames from each dataset and run them through a gemini-2.5-flash.
</p>
## Hand Visibility
<div style="border: 1px solid #d0d7de; border-radius: 6px; padding: 15px; margin: 15px 0;">
<p style="margin: 0 0 10px 0; font-size: 14px; line-height: 1.6;">
<strong>Prompt:</strong><br/>
You are labeling an egocentric first-person image. Your task is to count how many camera-wearer's hands are visually present in the image: 0, 1, or 2.<br/><br/>
<strong>Rules:</strong><br/>
• Only count hands that are directly visible.<br/>
• Do not infer hands that are outside the frame or potentially behind objects.<br/>
• Ignore hands belonging to other people.<br/>
• Any amount of visibility counts (even fingertips).<br/>
• Return only one of: 0, 1, 2. No extra words.
</p>
<p style="margin: 10px 0 5px 0; font-size: 14px;"><strong>Response Schema:</strong></p>
<pre style="padding: 10px; border-radius: 4px; margin: 0; overflow-x: auto;"><code>{
"type": "OBJECT",
"properties": {
"hand_count": {
"type": "INTEGER"
}
},
"required": ["hand_count"]
}</code></pre>
</div>
<div style="width: 100%; overflow-x: auto;">
| Dataset | Frames | 0 Hands | 1+ Hands | 2 Hands |
|---------|--------|---------|----------|---------|
| **Egocentric-10K** | 10,000 | **3.58%** | **96.42%** | **76.34%** |
| **Ego4D** | 10,000 | 32.67% | 67.33% | 36.95% |
| **EPIC-KITCHENS** | 10,000 | 9.63% | 90.37% | 61.05% |
</div>
<div style="margin: 20px 0;">
<table style="border-collapse: collapse; width: 100%;">
<tr>
<td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/7hjr5j56RJG6D5bX4DroF.png" style="width: 100%; max-width: 100%;"/></td>
<td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/JucJX20yGU8PALGPbKzzZ.png" style="width: 100%; max-width: 100%;"/></td>
<td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/-oRVJBnoyKJxW9KIRY6ed.png" style="width: 100%; max-width: 100%;"/></td>
</tr>
<tr>
<td style="text-align: center; padding: 5px;"><strong>Egocentric10K</strong><br/>2 hands</td>
<td style="text-align: center; padding: 5px;"><strong>Ego4D</strong><br/>1 hand</td>
<td style="text-align: center; padding: 5px;"><strong>Epic-Kitchens</strong><br/>2 hands</td>
</tr>
</table>
</div>
## Active Manipulation
<div style="border: 1px solid #d0d7de; border-radius: 6px; padding: 15px; margin: 15px 0;">
<p style="margin: 0 0 10px 0; font-size: 14px; line-height: 1.6;">
<strong>Prompt:</strong><br/>
You are labeling an egocentric first-person image. Your task is to determine whether the camera-wearer is actively manipulating an object at this exact moment.<br/><br/>
<strong>Definition:</strong><br/>
"Active Manpulation" means the wearer is visibly using their hands to work on, modify, assemble, process, or handle physical objects, materials, components in pursuit of a specific goal<br/><br/>
<strong>Rules:</strong><br/>
• Do not infer actions that are not visible in the frame.<br/>
• If the action is ambiguous or not clearly happening, respond "no."<br/>
• Ignore objects held by other people.<br/>
• Respond only with: "yes" or "no."
</p>
<p style="margin: 10px 0 5px 0; font-size: 14px;"><strong>Response Schema:</strong></p>
<pre style="padding: 10px; border-radius: 4px; margin: 0; overflow-x: auto;"><code>{
"type": "OBJECT",
"properties": {
"answer": {
"type": "STRING",
"enum": ["yes", "no"]
}
},
"required": ["answer"]
}</code></pre>
</div>
<div style="width: 100%; overflow-x: auto;">
| Dataset | Frames | Active Labor |
|---------|--------|--------------|
| **Egocentric-10K** | 10,000 | **91.66%** |
| **Ego4D** | 10,000 | 50.07% |
| **EPIC-KITCHENS** | 10,000 | 85.04% |
</div>
<div style="margin: 20px 0;">
<table style="border-collapse: collapse; width: 100%;">
<tr>
<td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/oPDy1unv--pv45acYePL8.png" style="width: 100%; max-width: 100%;"/></td>
<td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/uJYe6p8aM-rrM2nk-KoAY.png" style="width: 100%; max-width: 100%;"/></td>
<td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/q2G_-CGnSxHyYDrwacq_l.png" style="width: 100%; max-width: 100%;"/></td>
</tr>
<tr>
<td style="text-align: center; padding: 5px;"><strong>Egocentric10K</strong><br/>Active Labor: Yes</td>
<td style="text-align: center; padding: 5px;"><strong>Ego4D</strong><br/>Active Labor: No</td>
<td style="text-align: center; padding: 5px;"><strong>Epic-Kitchens</strong><br/>Active Labor: Yes</td>
</tr>
</table>
</div>
<div style="margin: 20px 0;">
<table style="border-collapse: collapse; width: 100%;">
<tr>
<td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/SHyQth6VqSqbAOf_47Swp.png" style="width: 100%; max-width: 100%;"/></td>
<td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/ba_6c35-M_qrzjXe1aYOf.png" style="width: 100%; max-width: 100%;"/></td>
<td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/O2JIcQw7eEcqlngCXsWWV.png" style="width: 100%; max-width: 100%;"/></td>
</tr>
<tr>
<td style="text-align: center; padding: 5px;"><strong>自我中心10K数据集(Egocentric10K)</strong></td>
<td style="text-align: center; padding: 5px;"><strong>Ego4D</strong></td>
<td style="text-align: center; padding: 5px;"><strong>EPIC厨房数据集(Epic-Kitchens)</strong></td>
</tr>
</table>
</div>
<p style="margin: 20px 0; line-height: 1.6;">
为以手部可见性与主动操作密度作为数据效率的代理评估指标,我们从三个野外真实场景第一人称视角数据集——自我中心10K数据集(Egocentric-10K)、Ego4D与EPIC厨房数据集(EPIC-KITCHENS-100)中,各随机采样10000帧图像,并通过gemini-2.5-flash模型进行推理处理。
</p>
## 手部可见性评估
<div style="border: 1px solid #d0d7de; border-radius: 6px; padding: 15px; margin: 15px 0;">
<p style="margin: 0 0 10px 0; font-size: 14px; line-height: 1.6;">
<strong>标注提示:</strong><br/>
您将对第一人称自我中心视角图像进行标注,任务为统计图像中佩戴相机者的可见手部数量,可选值为0、1或2。<br/><br/>
<strong>标注规则:</strong><br/>
• 仅统计直接可见的手部<br/>
• 不得推断处于画幅外或被物体遮挡的手部<br/>
• 忽略其他人员的手部<br/>
• 只要存在任意程度的可见性即可(哪怕仅指尖)<br/>
• 仅返回0、1、2中的一个数值,不得添加任何额外文字。
</p>
<p style="margin: 10px 0 5px 0; font-size: 14px;"><strong>响应格式:</strong></p>
<pre style="padding: 10px; border-radius: 4px; margin: 0; overflow-x: auto;"><code>{"type": "OBJECT",
"properties": {
"hand_count": {
"type": "INTEGER"
}
},
"required": ["hand_count"]
}</code></pre>
</div>
<div style="width: 100%; overflow-x: auto;">
| 数据集 | 采样帧数 | 0只手占比 | 1只及以上手占比 | 2只手占比 |
|---------|--------|---------|----------|---------|
| **自我中心10K数据集(Egocentric10K)** | 10,000 | **3.58%** | **96.42%** | **76.34%** |
| **Ego4D** | 10,000 | 32.67% | 67.33% | 36.95% |
| **EPIC厨房数据集(Epic-Kitchens)** | 10,000 | 9.63% | 90.37% | 61.05% |
</div>
<div style="margin: 20px 0;">
<table style="border-collapse: collapse; width: 100%;">
<tr>
<td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/7hjr5j56RJG6D5bX4DroF.png" style="width: 100%; max-width: 100%;"/></td>
<td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/JucJX20yGU8PALGPbKzzZ.png" style="width: 100%; max-width: 100%;"/></td>
<td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/-oRVJBnoyKJxW9KIRY6ed.png" style="width: 100%; max-width: 100%;"/></td>
</tr>
<tr>
<td style="text-align: center; padding: 5px;"><strong>自我中心10K数据集(Egocentric10K)<br/>2只手</strong></td>
<td style="text-align: center; padding: 5px;"><strong>Ego4D<br/>1只手</strong></td>
<td style="text-align: center; padding: 5px;"><strong>EPIC厨房数据集(Epic-Kitchens)<br/>2只手</strong></td>
</tr>
</table>
</div>
## 主动操作评估
<div style="border: 1px solid #d0d7de; border-radius: 6px; padding: 15px; margin: 15px 0;">
<p style="margin: 0 0 10px 0; font-size: 14px; line-height: 1.6;">
<strong>标注提示:</strong><br/>
您将对第一人称自我中心视角图像进行标注,任务为判断佩戴相机者当前是否正在主动操作物体。<br/><br/>
<strong>定义说明:</strong><br/>
“主动操作”指佩戴者通过手部可见地为达成特定目标,对实体物体、材料、组件进行作业、修改、组装、处理或操控的行为。<br/><br/>
<strong>标注规则:</strong><br/>
• 不得推断画幅中未显现的动作<br/>
• 若动作模糊或未明确发生,则回复“no”<br/>
• 忽略其他人员所持的物体<br/>
• 仅回复“yes”或“no”。
</p>
<p style="margin: 10px 0 5px 0; font-size: 14px;"><strong>响应格式:</strong></p>
<pre style="padding: 10px; border-radius: 4px; margin: 0; overflow-x: auto;"><code>{"type": "OBJECT",
"properties": {
"answer": {
"type": "STRING",
"enum": ["yes", "no"]
}
},
"required": ["answer"]
}</code></pre>
</div>
<div style="width: 100%; overflow-x: auto;">
| 数据集 | 采样帧数 | 主动操作占比 |
|---------|--------|--------------|
| **自我中心10K数据集(Egocentric10K)** | 10,000 | **91.66%** |
| **Ego4D** | 10,000 | 50.07% |
| **EPIC厨房数据集(Epic-Kitchens)** | 10,000 | 85.04% |
</div>
<div style="margin: 20px 0;">
<table style="border-collapse: collapse; width: 100%;">
<tr>
<td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/oPDy1unv--pv45acYePL8.png" style="width: 100%; max-width: 100%;"/></td>
<td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/uJYe6p8aM-rrM2nk-KoAY.png" style="width: 100%; max-width: 100%;"/></td>
<td style="text-align: center; padding: 10px; width: 33.33%;"><img src="https://cdn-uploads.huggingface.co/production/uploads/690d75303df78b892c337cd4/q2G_-CGnSxHyYDrwacq_l.png" style="width: 100%; max-width: 100%;"/></td>
</tr>
<tr>
<td style="text-align: center; padding: 5px;"><strong>自我中心10K数据集(Egocentric10K)<br/>主动操作:是</strong></td>
<td style="text-align: center; padding: 5px;"><strong>Ego4D<br/>主动操作:否</strong></td>
<td style="text-align: center; padding: 5px;"><strong>EPIC厨房数据集(Epic-Kitchens)<br/>主动操作:是</strong></td>
</tr>
</table>
</div>
提供机构:
maas
创建时间:
2025-11-11



