five

EmbodiedCity|具身智能数据集|城市环境模拟数据集

收藏
arXiv2024-10-13 更新2024-10-16 收录
具身智能
城市环境模拟
下载链接:
https://embodied-city.fiblab.net
下载链接
链接失效反馈
资源简介:
EmbodiedCity是由清华大学构建的一个用于评估具身智能在真实城市环境中表现的基准平台。该数据集基于北京市的一个商业区,构建了高度逼真的3D模拟环境,包含真实的街道、建筑、城市元素、行人和交通流量。数据集结合了历史收集的真实世界交通数据和模拟算法,模拟了行人和车辆的流动。数据集创建过程中,详细构建了城市建筑的3D模型,并提供了完整的输入输出接口,使具身智能代理能够轻松获取任务需求和环境观察,并进行决策和性能评估。该数据集主要应用于具身智能的评估和训练,旨在解决具身智能在开放户外城市环境中的感知、规划和行动能力问题。
提供机构:
清华大学
创建时间:
2024-10-13
AI搜集汇总
数据集介绍
main_image_url
构建方式
EmbodiedCity数据集的构建基于一个高度真实的3D模拟环境,该环境以中国最大城市之一的北京的一个商业区为蓝本,细致地重建了街道、建筑、城市元素、行人和交通。通过结合历史收集的真实世界交通数据和模拟算法,实现了对行人和车辆流动的高保真模拟。此外,设计了一系列涵盖不同EmbodiedAI能力的评估任务,并提供了一套完整的输入输出接口,使具身代理能够轻松接收任务要求和当前环境观察,并做出决策和获得性能评估。
特点
EmbodiedCity数据集的显著特点在于其高度真实的3D城市环境,该环境不仅包括了建筑物和街道的精细建模,还模拟了动态元素如车辆和行人的行为。此外,数据集涵盖了多种EmbodiedAI任务,如场景描述、问答、对话、视觉语言导航和任务规划,这些任务全面覆盖了感知、推理和决策三个关键方面。
使用方法
EmbodiedCity数据集的使用方法包括通过提供的Python客户端SDK和基于HTTP协议的Python代理服务器进行访问。用户可以通过这些接口控制具身代理在模拟环境中的行为,获取实时观察数据,并进行任务执行和性能评估。此外,数据集还提供了一个在线平台,支持最多8个代理的同时模拟和控制,用户可以通过键盘、网页GUI或在线Python代码编辑器来操作代理。
背景与挑战
背景概述
EmbodiedCity数据集由清华大学的一组研究人员于2024年构建,旨在为城市环境中的具身智能体提供一个基准平台。该数据集的核心研究问题是如何在真实世界的城市环境中评估和提升具身智能体的能力,包括感知、规划和行动。通过构建一个高度逼真的3D模拟环境,结合历史收集的数据和模拟算法,研究人员设计了一系列涵盖不同具身智能体能力的评估任务。这一研究不仅扩展了现有具身智能体的功能,还为人工通用智能的实际应用提供了更高的价值。
当前挑战
EmbodiedCity数据集在构建过程中面临多个挑战。首先,创建一个高度逼真的城市环境需要精确的3D建模和复杂的数据处理,这要求研究人员具备高超的技术能力和丰富的资源。其次,设计涵盖多种具身智能体能力的评估任务需要深入理解智能体在城市环境中的行为和决策过程。此外,数据集的构建还需要大量的数据标注和人工校正,以确保评估任务的准确性和可靠性。最后,如何在模拟环境中有效地评估和提升具身智能体的能力,仍是一个开放的研究问题。
常用场景
经典使用场景
EmbodiedCity数据集的经典使用场景在于评估具身智能体在真实城市环境中的感知、规划和行动能力。通过构建高度逼真的3D模拟环境,结合历史收集的数据和仿真算法,该数据集能够模拟高保真的行人和车辆流动。此外,设计了一系列涵盖不同具身智能体能力的评估任务,包括场景描述、问答、对话、视觉语言导航和任务规划,从而全面测试智能体在开放户外城市环境中的多层次、多维度能力。
解决学术问题
EmbodiedCity数据集解决了现有具身智能体研究中主要集中在有限室内环境的问题,扩展了具身智能体的任务范围至户外城市环境。这一扩展不仅提升了现有具身智能体的能力水平,还具有更高的实际应用价值,支持更多潜在的人工通用智能应用。通过评估流行的大型语言模型,该数据集验证了其在不同维度和难度上的具身智能能力,为具身智能体的发展提供了重要的基准和参考。
衍生相关工作
EmbodiedCity数据集的发布催生了一系列相关研究工作,包括基于该数据集的具身智能体算法改进、多模态数据融合技术研究、以及具身智能体在城市环境中的应用探索。例如,有研究利用该数据集开发了新的视觉语言导航算法,显著提升了智能体在复杂城市环境中的导航能力。此外,还有研究探讨了如何利用该数据集进行跨模态学习,以提高智能体在不同感知模式下的综合表现。
以上内容由AI搜集并总结生成
用户留言
有没有相关的论文或文献参考?
这个数据集是基于什么背景创建的?
数据集的作者是谁?
能帮我联系到这个数据集的作者吗?
这个数据集如何下载?
点击留言
数据主题
具身智能
数据集  4098个
机构  8个
大模型
数据集  439个
机构  10个
无人机
数据集  37个
机构  6个
指令微调
数据集  36个
机构  6个
蛋白质结构
数据集  50个
机构  8个
空间智能
数据集  21个
机构  5个
5,000+
优质数据集
54 个
任务类型
进入经典数据集
热门数据集

poi

本项目收集国内POI兴趣点,当前版本数据来自于openstreetmap。

github 收录

TM-Senti

TM-Senti是由伦敦玛丽女王大学开发的一个大规模、远距离监督的Twitter情感数据集,包含超过1.84亿条推文,覆盖了超过七年的时间跨度。该数据集基于互联网档案馆的公开推文存档,可以完全重新构建,包括推文元数据且无缺失推文。数据集内容丰富,涵盖多种语言,主要用于情感分析和文本分类等任务。创建过程中,研究团队精心筛选了表情符号和表情,确保数据集的质量和多样性。该数据集的应用领域广泛,旨在解决社交媒体情感表达的长期变化问题,特别是在表情符号和表情使用上的趋势分析。

arXiv 收录

中国1km分辨率逐月降水量数据集(1901-2023)

该数据集为中国逐月降水量数据,空间分辨率为0.0083333°(约1km),时间为1901.1-2023.12。数据格式为NETCDF,即.nc格式。该数据集是根据CRU发布的全球0.5°气候数据集以及WorldClim发布的全球高分辨率气候数据集,通过Delta空间降尺度方案在中国降尺度生成的。并且,使用496个独立气象观测点数据进行验证,验证结果可信。本数据集包含的地理空间范围是全国主要陆地(包含港澳台地区),不含南海岛礁等区域。为了便于存储,数据均为int16型存于nc文件中,降水单位为0.1mm。 nc数据可使用ArcMAP软件打开制图; 并可用Matlab软件进行提取处理,Matlab发布了读入与存储nc文件的函数,读取函数为ncread,切换到nc文件存储文件夹,语句表达为:ncread (‘XXX.nc’,‘var’, [i j t],[leni lenj lent]),其中XXX.nc为文件名,为字符串需要’’;var是从XXX.nc中读取的变量名,为字符串需要’’;i、j、t分别为读取数据的起始行、列、时间,leni、lenj、lent i分别为在行、列、时间维度上读取的长度。这样,研究区内任何地区、任何时间段均可用此函数读取。Matlab的help里面有很多关于nc数据的命令,可查看。数据坐标系统建议使用WGS84。

国家青藏高原科学数据中心 收录

Canadian Census

**Overview** The data package provides demographics for Canadian population groups according to multiple location categories: Forward Sortation Areas (FSAs), Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs), Federal Electoral Districts (FEDs), Health Regions (HRs) and provinces. **Description** The data are available through the Canadian Census and the National Household Survey (NHS), separated or combined. The main demographic indicators provided for the population groups, stratified not only by location but also for the majority by demographical and socioeconomic characteristics, are population number, females and males, usual residents and private dwellings. The primary use of the data at the Health Region level is for health surveillance and population health research. Federal and provincial departments of health and human resources, social service agencies, and other types of government agencies use the information to monitor, plan, implement and evaluate programs to improve the health of Canadians and the efficiency of health services. Researchers from various fields use the information to conduct research to improve health. Non-profit health organizations and the media use the health region data to raise awareness about health, an issue of concern to all Canadians. The Census population counts for a particular geographic area representing the number of Canadians whose usual place of residence is in that area, regardless of where they happened to be on Census Day. Also included are any Canadians who were staying in that area on Census Day and who had no usual place of residence elsewhere in Canada, as well as those considered to be 'non-permanent residents'. National Household Survey (NHS) provides demographic data for various levels of geography, including provinces and territories, census metropolitan areas/census agglomerations, census divisions, census subdivisions, census tracts, federal electoral districts and health regions. In order to provide a comprehensive overview of an area, this product presents data from both the NHS and the Census. NHS data topics include immigration and ethnocultural diversity; aboriginal peoples; education and labor; mobility and migration; language of work; income and housing. 2011 Census data topics include population and dwelling counts; age and sex; families, households and marital status; structural type of dwelling and collectives; and language. The data are collected for private dwellings occupied by usual residents. A private dwelling is a dwelling in which a person or a group of persons permanently reside. Information for the National Household Survey does not include information for collective dwellings. Collective dwellings are dwellings used for commercial, institutional or communal purposes, such as a hotel, a hospital or a work camp. **Benefits** - Useful for canada public health stakeholders, for public health specialist or specialized public and other interested parties. for health surveillance and population health research. for monitoring, planning, implementation and evaluation of health-related programs. media agencies may use the health regions data to raise awareness about health, an issue of concern to all canadians. giving the addition of longitude and latitude in some of the datasets the data can be useful to transpose the values into geographical representations. the fields descriptions along with the dataset description are useful for the user to quickly understand the data and the dataset. **License Information** The use of John Snow Labs datasets is free for personal and research purposes. For commercial use please subscribe to the [Data Library](https://www.johnsnowlabs.com/marketplace/) on John Snow Labs website. The subscription will allow you to use all John Snow Labs datasets and data packages for commercial purposes. **Included Datasets** - [Canadian Population and Dwelling by FSA 2011](https://www.johnsnowlabs.com/marketplace/canadian-population-and-dwelling-by-fsa-2011) - This Canadian Census dataset covers data on population, total private dwellings and private dwellings occupied by usual residents by forward sortation area (FSA). It is enriched with the percentage of the population or dwellings versus the total amount as well as the geographical area, province, and latitude and longitude. The whole Canada's population is marked as 100, referring to 100% for the percentages. - [Detailed Canadian Population Statistics by CMAs and CAs 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-cmas-and-cas-2011) - This dataset covers the population statistics of Canada by Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs). It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by FED 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-fed-2011) - This dataset covers the population statistics of Canada from 2011 by Federal Electoral District of 2013 Representation Order. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Health Region 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-health-region-2011) - This dataset covers the population statistics of Canada by health region. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Province 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-province-2011) - This dataset covers the population statistics of Canada by provinces and territories. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. **Data Engineering Overview** **We deliver high-quality data** - Each dataset goes through 3 levels of quality review - 2 Manual reviews are done by domain experts - Then, an automated set of 60+ validations enforces every datum matches metadata & defined constraints - Data is normalized into one unified type system - All dates, unites, codes, currencies look the same - All null values are normalized to the same value - All dataset and field names are SQL and Hive compliant - Data and Metadata - Data is available in both CSV and Apache Parquet format, optimized for high read performance on distributed Hadoop, Spark & MPP clusters - Metadata is provided in the open Frictionless Data standard, and its every field is normalized & validated - Data Updates - Data updates support replace-on-update: outdated foreign keys are deprecated, not deleted **Our data is curated and enriched by domain experts** Each dataset is manually curated by our team of doctors, pharmacists, public health & medical billing experts: - Field names, descriptions, and normalized values are chosen by people who actually understand their meaning - Healthcare & life science experts add categories, search keywords, descriptions and more to each dataset - Both manual and automated data enrichment supported for clinical codes, providers, drugs, and geo-locations - The data is always kept up to date – even when the source requires manual effort to get updates - Support for data subscribers is provided directly by the domain experts who curated the data sets - Every data source’s license is manually verified to allow for royalty-free commercial use and redistribution. **Need Help?** If you have questions about our products, contact us at [info@johnsnowlabs.com](mailto:info@johnsnowlabs.com).

Databricks 收录

CosyVoice 2

CosyVoice 2是由阿里巴巴集团开发的多语言语音合成数据集,旨在通过大规模多语言数据集训练,实现高质量的流式语音合成。数据集通过有限标量量化技术改进语音令牌的利用率,并结合预训练的大型语言模型作为骨干,支持流式和非流式合成。数据集的创建过程包括文本令牌化、监督语义语音令牌化、统一文本-语音语言模型和块感知流匹配模型等步骤。该数据集主要应用于语音合成领域,旨在解决高延迟和低自然度的问题,提供接近人类水平的语音合成质量。

arXiv 收录