CRAWDAD icsi/netalyzr-android
收藏DataCite Commons2022-11-17 更新2025-04-16 收录
下载链接:
https://ieee-dataport.org/open-access/crawdad-icsinetalyzr-android
下载链接
链接失效反馈官方服务:
资源简介:
Mobile data collected using the Netalyzr for Android App.This dataset was collected by the ICSI Netalyzr app for Android to develop a characterization of how operational decisions, such as network configurations, business models, and relationships between operators introduce diversity in service quality and affect user security and privacy. We delve in detail beyond the radio link and into network configuration and business relationships in six countries. We identify the widespread use of transparent middleboxes such as HTTP and DNS proxies, analyzing how they actively modify user traffic, compromise user privacy, and potentially undermine user security. In addition, we identify network sharing agreements between operators, highlighting the implications of roaming and characterizing the properties of MVNOs, including that a majority are simply rebranded versions of major operators. More broadly, our findings using this data highlight the importance of considering higher-layer relationships when seeking to analyze mobile traffic in a sound fashion.date/time of measurement start: 2013-10-22date/time of measurement end: 2014-09-01collection environment: This dataset is collected using the Netalyzr for Android app. This app is available for free from the Google Play website for anyone to install and run. We analyzed data for a 9 month period from six countries: US, CA, UK, FR, DE, and AU.network configuration: Android phones connected through 3G and 4G networks. Rooted and unrooted devices, and multi-user/multi-device.data collection methodology: The data was collected by crowd-sourcing means. Users run proactively Netalyzr for Android App to troubleshoot their network configuration or understand their network and how it behaves. No private data is collected without user's consent. Google Play link to the app: https://play.google.com/store/apps/details?id=edu.berkeley.icsi.netalyzr.androidsanitization: The dataset contains exclusively the sessions used for the core of the paper (MNO and MVNO characterization in the USA, Canada, France, Germany, Great Britain and Australia). We excluded users connected through VPNs, users with public IP addresses, users on international roaming, users connected through femtocells, users with customized network configurations (e.g. custom HTTP proxies and DNS resolvers), and sessions coming from engineering mode networks according to ITU. For the remaining Netalyzr sessions, we excluded sensitive fields such as passwords/usernames of APN settings, location information, base station information, and sensitive information injected on HTTP headers by proxies. IPv4 and IPv6 addresses are anonymized by performing a /16 and /32 sub-netting respectively. FQDNs are not also released as they contain information that can identify the users in many cases. For accessing a larger public Netalyzr dataset with more detailed values and all the collected variables, visit PREDICT: https://www.predict.orghole: Operator name, MCC/MNC values, as well as extra carrier information can be noisy or missing as a consequence of sessions generated by MVNO subscribers, network sharing agreements between operators, or even due to inconsistencies on Android's API (the dataset comprises handsets running versions from 2.2.3 to 5, which may also be modified by the vendor/mobile provider in their subsidized phones) or inaccurate APN settings on the handset (e.g. sometimes Android returns an empty MCC/MNC or an empty operator name). These sessions can be reconstructed. Other errors may appear on Netalyzr-specific tests (e.g. proxy detection and behavior characterization) due to connectivity problems or peculiar handset configurations. Our Mobisys'15 paper "Beyond the Radio: Illuminating the Higher Layers of Mobile Networks" contains further details about the data sanytization process, and the method followed for the study.error: It is a dataset collected through crowd-sourcing means. Caution is advised at the time of interpreting the data.limitation: Due to technical limitations, we cannot release an app for iOS, so this data is limited to Android users.note: Do not hesitate to contact us on netalyzr-help@icsi.berkeley.edu for questions.Traceseticsi/netalyzr-android/middleboxesDetails of middlebox behavior in cellular networks. The traceset contains a subset of the data collected from the Netalyzr for Android App.measurement purpose: Network Diagnosis, Network Performance AnalysisIP Addressing: Netalyzr identifies the client's local IP address via Android's APIs and system properties, and uses TCP connections and UDP flows to our echo servers to identify the public IP address of the device. We use the whois tool to identify the organization owning the IP address.Cellular Network Provider Identification: To identify the network service operator we use Android's TelephonyManager and ConnectivityManager APIs, and extract the APN settings as reported by the handset. This allows us to identify the name of the mobile operator, the name of the operator as reported by the SIM card, the APN providing the service, the cell ID (where users allow it), the 3GPP standard providing the service, as well as the MNC and MCC parameters.Location: Android allows us to extract city-level device location if the user allows it. This information is useful to identify where roaming happens between mobile operators, and identify locations with poor network performance.HTTP proxies.Non-responsive server test: TCP-terminating proxies may be deployed in cellular networks for performance improvement. Such proxies are likely to respond with a SYN-ACK to a client's connection request before connecting to the intended origin server. We test for this behavior by attempting a connection to a server that replies with a RST. If the Netalyzr client's attempt to connect to this server on port 80 initially succeeds, this indicates the presence of a TCP-terminating proxy.Header modification test: RFC 2616 specifies that systems should treat HTTP header names as case-insensitive, and, with few exceptions, free of ordering requirements. Furthermore, RFC 2615 indicates that any proxy must add the Via header to indicate its presence to intermediate protocols and recipients. Netalyzr fetches custom content from our server using mixed-cased request and response headers in a known order. Any changes indicate the presence a proxy. This method also allows identifying additional headers added by the HTTP proxy, as in the case of tracking headers, and whether intermediate proxies modify traffic using techniques such as image transcoding, which can affect the fidelity of content delivered to mobile clients through CDNs and other cloud infrastructure.HTTP enforcement test: In addition to standard HTTP, Netalyzr attempts to fetch an entity using the protocol declaration ICSI/1.1 instead of HTTP/1.1. If this request is rejected, we know that the network has a protocol-parsing proxy.Invalid Host header value test. CERT VU 435052 describes how some in-path proxies would interpret the Host request header and attempt to contact the listed host rather than forward the request to the intended address. We check for this vulnerability by fetching from our server with an alternate Host header of www.google.com. The presence of this vulnerability in commercial proxies is alarming as it suggests that operators may not have their middlebox software upgraded, potentially having other vulnerabilities not covered by our test suite.icsi/netalyzr-android/middleboxes Trace middleboxes-trace: The data exposing middlebox (HTTP and DNS) behaviour in cellular networksconfiguration: Crowdsourced data collection using Netalyzr for Android appformat: The tuple (id,time,raw_op_name,clean_op_name,country,raw_cellular_technology,3gpp_family,mcc,mnc,apn,apn_name,extra_carrier_info,global_ip,ip_dns,ip_dns_proxy,ip_http_proxy,http_content_change,http_hdr_reorder,http_hdr_injection,invalid_host_name_vulnerability,http_enforcement,http_default_compression,transcoding,dns_direct_mangled,dns_direct_proxy,dns_direct_changed_id,roaming_indicator,rooted,http_header_injected_list - id - integer- time - timestamp- raw_op_name - operator name as reported by Android's Telephony Manager- clean_op_name - operator name after applying our filter- country - device country as reported by android- raw_cellular_technology - 3GPP technology as reported by Android Connectivity/Telephony Manager- 3gpp_family - 3GPP family after applying our filter (i.e. UMTS/HSPA, LTE, CDMA)- mcc - Mobile Country Code. Asigned by ITU. Identifies the country. As reported by Android's Telephony Manager- mnc - Mobile Network Code. Asigned by ITU. Identifies the operator (generally the radio operator). As reported by Android Telephony Manager- apn - APN information (not all android devices return a value)- apn_name - APN Name (not all android devices return a value)- extra_carrier_info - Optionally supplied extra information about the- network state. Provided by Android Connectivity Manager- global_ip - Public IP address (/16 for IPv4 and /64 for IPv6)- ip_dns - IP address of the default DNS Resolver (as seen by Netalyzr)- ip_dns_proxy - Address of a DNS proxy ( as seen from Netalyzr server).- ip_http_proxy - IP address of the proxy in network ( as seen from Netalyzr server).- http_content_change - HTTP Content has been modified. Boolean not as reported by Android- rooted - Whether the phone is rooted or not (allows executing "su"). Security vulnerability.- http_header_injected_list - List of HTTP headers injected by the proxy.
本数据集采用面向安卓平台的Netalyzr(Netalyzr)应用采集。该数据集由ICSI Netalyzr安卓应用采集,旨在刻画网络配置、商业模式、运营商间关系等运营决策如何引入服务质量差异,并对用户安全与隐私造成影响。本研究深入剖析无线链路之外的网络配置与商业关系,覆盖六个国家(美国、加拿大、英国、法国、德国、澳大利亚)。研究团队识别出透明中间盒(如HTTP代理与DNS代理)的广泛部署,并分析其如何主动修改用户流量、侵害用户隐私,甚至潜在地破坏用户安全。此外,本研究还识别出运营商间的网络共享协议,阐明了漫游服务的影响,并刻画了移动虚拟网络运营商(Mobile Virtual Network Operator,MVNO)的特征——其中多数仅为主流移动网络运营商(Mobile Network Operator,MNO)的贴牌版本。本研究通过该数据集得出的结论表明,若要科学分析移动流量,需考虑高层网络关系的重要性。
### 测量起止时间
测量开始时间:2013-10-22;测量结束时间:2014-09-01
### 采集环境
本数据集通过面向安卓平台的Netalyzr应用采集,该应用可从Google Play商店免费下载安装供任意用户使用。本研究分析了覆盖9个月周期、来自6个国家(美国、加拿大、英国、法国、德国、澳大利亚)的数据集。
### 网络配置
安卓手机通过3G与4G网络接入,涵盖已root与未root设备,以及多用户/多设备场景。
### 数据采集方法
本数据通过众包方式采集:用户主动运行Netalyzr安卓应用,以排查网络配置问题或了解自身网络及其运行状态。未经用户同意,不会采集任何隐私数据。
应用的Google Play链接:https://play.google.com/store/apps/details?id=edu.berkeley.icsi.netalyzr.android
### 数据脱敏处理
本数据集仅包含用于论文核心分析的会话(即针对美国、加拿大、法国、德国、英国、澳大利亚的MNO与MVNO特征分析所需的会话)。研究团队排除了以下场景的会话:通过虚拟专用网络(Virtual Private Network,VPN)接入的用户、拥有公网IP地址的用户、处于国际漫游状态的用户、通过飞蜂窝基站(femtocell)接入的用户、使用自定义网络配置(如自定义HTTP代理与DNS解析器)的用户,以及符合国际电信联盟(International Telecommunication Union,ITU)标准的工程模式网络产生的会话。
对于剩余的Netalyzr会话,研究团队移除了敏感字段,包括APN(Access Point Name,APN)设置的密码/用户名、位置信息、基站信息,以及代理注入到HTTP头中的敏感信息。IPv4与IPv6地址分别通过/16与/32子网掩码进行匿名化处理。由于完全合格域名(Fully Qualified Domain Name,FQDN)在多数场景下可用于识别用户,因此未对外发布FQDN。
若需获取包含更详细参数与所有采集变量的公开Netalyzr数据集,请访问PREDICT项目:https://www.predict.org
### 数据缺陷与局限
1. 运营商名称、MCC(Mobile Country Code,MCC)/MNC(Mobile Network Code,MNC)值以及额外运营商信息可能存在噪声或缺失:这是由于MVNO用户产生的会话、运营商间的网络共享协议,甚至安卓应用程序接口(Application Programming Interface,API)的不一致性导致的(本数据集覆盖的安卓系统版本为2.2.3至5,且厂商/移动运营商可能在定制补贴机型时对系统进行修改),或是终端设备上的APN配置不准确(例如,安卓系统有时会返回空的MCC/MNC值或空的运营商名称)。此类会话可进行重构。
2. 由于连接问题或特殊的终端配置,Netalyzr专属测试(如代理检测与行为刻画)可能出现其他错误。
本研究发表于Mobisys'15的论文《Beyond the Radio: Illuminating the Higher Layers of Mobile Networks》详细阐述了数据脱敏流程与本研究采用的方法。
### 错误提示
本数据集通过众包方式采集,解读数据时需谨慎。
### 局限性
由于技术限制,无法推出iOS版本的应用,因此本数据集仅覆盖安卓用户。
### 注意事项
如有任何疑问,请发送邮件至netalyzr-help@icsi.berkeley.edu与我们联系。
---
### icsi/netalyzr-android/middleboxes 追踪集
该追踪集包含从Netalyzr安卓应用采集的部分数据,用于揭示蜂窝网络中的中间盒行为细节。
#### 测量目的
网络诊断、网络性能分析
#### IP地址识别
Netalyzr通过安卓系统的TelephonyManager(TelephonyManager)与系统属性获取客户端的本地IP地址,并通过向回声服务器发起TCP连接与UDP流,以识别设备的公网IP地址。本研究使用whois工具识别IP地址所属的组织机构。
#### 蜂窝网络运营商识别
为识别网络服务运营商,本研究使用安卓系统的TelephonyManager与ConnectivityManager(ConnectivityManager)API,并提取终端设备上报的APN设置。借此可获取移动运营商名称、SIM卡上报的运营商名称、提供服务的APN、小区ID(若用户允许采集)、提供服务的3GPP标准,以及MNC与MCC参数。
#### 位置信息
若用户允许,安卓系统可提取设备的城市级位置信息。此类信息可用于识别移动运营商间的漫游场景,以及定位网络性能较差的区域。
#### HTTP代理检测
1. 无响应服务器测试:蜂窝网络中可能部署TCP终止代理以提升性能,此类代理通常会在连接至目标源服务器前,向客户端的连接请求返回SYN-ACK包。本研究通过尝试连接一个会返回RST包的服务器来测试此类行为:若Netalyzr客户端首次连接该服务器的80端口成功,则表明存在TCP终止代理。
2. 头修改测试:RFC 2616规定系统应将HTTP头名称视为大小写不敏感,且除少数例外情况外,头的顺序无强制要求。此外,RFC 2615规定任何代理都必须添加Via头,以向中间协议与接收方表明自身的存在。Netalyzr使用已知顺序的混合大小写请求与响应头从服务器获取自定义内容,任何修改均表明存在代理。该方法还可识别HTTP代理添加的额外头,如跟踪头,以及中间代理是否通过图像转码等技术修改流量——此类操作可能影响通过内容分发网络(Content Delivery Network,CDN)与其他云基础设施交付给移动客户端的内容保真度。
3. HTTP协议强制测试:除标准HTTP协议外,Netalyzr尝试使用ICSI/1.1而非HTTP/1.1的协议声明获取资源。若该请求被拒绝,则表明网络中存在协议解析代理。
4. 无效Host头值测试:CERT VU 435052描述了部分路径内代理如何解析Host请求头,并尝试连接所列主机而非将请求转发至目标地址。本研究通过向服务器发起Host头为www.google.com的请求来测试此类漏洞。商业代理中此类漏洞的存在令人担忧,这表明运营商可能未升级其中间盒软件,进而可能存在本测试套件未覆盖的其他漏洞。
---
### icsi/netalyzr-android/middleboxes 追踪集(详细字段)
#### middleboxes-trace
该数据集用于揭示蜂窝网络中中间盒(HTTP与DNS)的行为特征。
#### 配置
通过Netalyzr安卓应用进行众包数据采集
#### 数据格式
采用如下元组结构:
`(id, time, raw_op_name, clean_op_name, country, raw_cellular_technology, 3gpp_family, mcc, mnc, apn, apn_name, extra_carrier_info, global_ip, ip_dns, ip_dns_proxy, ip_http_proxy, http_content_change, http_hdr_reorder, http_hdr_injection, invalid_host_name_vulnerability, http_enforcement, http_default_compression, transcoding, dns_direct_mangled, dns_direct_proxy, dns_direct_changed_id, roaming_indicator, rooted, http_header_injected_list)`
各字段说明如下:
- `id`:整数类型,会话唯一标识
- `time`:时间戳格式的会话采集时间
- `raw_op_name`:安卓TelephonyManager上报的原始运营商名称
- `clean_op_name`:经过本研究过滤后的标准化运营商名称
- `country`:安卓系统上报的设备所在国家
- `raw_cellular_technology`:安卓ConnectivityManager/TelephonyManager上报的原始3GPP网络技术类型
- `3gpp_family`:经过本研究过滤后的3GPP网络家族(如UMTS/HSPA、LTE、CDMA)
- `mcc`:移动国家码(Mobile Country Code,MCC),由ITU分配,用于标识国家,由安卓TelephonyManager上报
- `mnc`:移动网络码(Mobile Network Code,MNC),由ITU分配,用于标识运营商(通常为无线运营商),由安卓TelephonyManager上报
- `apn`:APN信息(并非所有安卓设备都会返回该值)
- `apn_name`:APN名称(并非所有安卓设备都会返回该值)
- `extra_carrier_info`:安卓ConnectivityManager提供的可选额外网络状态信息
- `global_ip`:公网IP地址(IPv4地址使用/16子网掩码匿名化,IPv6地址使用/32子网掩码匿名化)
- `ip_dns`:默认DNS解析器的IP地址(由Netalyzr观测得到)
- `ip_dns_proxy`:DNS代理的IP地址(由Netalyzr服务器观测得到)
- `ip_http_proxy`:网络中HTTP代理的IP地址(由Netalyzr服务器观测得到)
- `http_content_change`:布尔类型,标识HTTP内容是否被修改(非安卓系统上报值)
- `rooted`:布尔类型,标识设备是否已获取root权限(可执行`su`命令,存在安全漏洞)
- `http_header_injected_list`:代理注入的HTTP头列表
提供机构:
IEEE DataPort
创建时间:
2022-11-17



