Internet
收藏Databricks2024-05-09 收录
下载链接:
https://marketplace.databricks.com/details/08b18478-aa8d-49b1-a6e8-50dbef03e6b9/John-Snow-Labs_Internet
下载链接
链接失效反馈官方服务:
资源简介:
**Overview**
This data package contains datasets on Austin Google Analytics, IPv4 address networks database, internet top-level domains such as .com and .uk. It also contains a list of media types, media subtypes, and their file extensions and WIPO (World Intellectual Property Organization) administered copyright treaties.
**Description**
This data package contains a dataset about Austin Google Analytics. Google Analytics is implemented with "page tags", in this case, called the Google Analytics Tracking Code, which is a snippet of JavaScript code that the website owner adds to every page of the website. The tracking code runs in the client browser when the client browses the page (if JavaScript is enabled in the browser) and collects visitor data and sends it to a Google data collection server as part of a request for a web beacon. One of the datasets contains internet protocol address which represents a unique string of numbers separated by full stops that identifys each computer using the Internet Protocol to communicate over a network.
Transmission Control Protocol TCP/IP protocol is the standard for how to communicate on the network. In the TCP/IP protocol, the unique identifier for a computer or device is represented by the IP. There are two standards for IP addresses: IP Version 4 (IPv4) and IP Version 6 (IPv6). Internet Top Level Domain Names dataset represents the delegation details of top-level domains, including gTLDs (Generic top-level domains) such as .com, and country-code TLDs such as .uk. Another dataset lists all the Media Types, Media Subtypes, and their file extensions. The details of the Media Types and Media Subtypes are taken from the official registry of Media Types maintained by IANA. The extension details are taken from the website of the Apache Software Foundation.
**Benefits**
- Marketing campaign optimization. website usability improvement. business benefits. target audience identification. budget allocation.
**License Information**
The use of John Snow Labs datasets is free for personal and research purposes. For commercial use please subscribe to the [Data Library](https://www.johnsnowlabs.com/marketplace/) on John Snow Labs website. The subscription will allow you to use all John Snow Labs datasets and data packages for commercial purposes.
**Included Datasets**
- [Austin Google Analytics](https://www.johnsnowlabs.com/marketplace/austin-google-analytics)
- This dataset provides the Austin Google Analytic. Google Analytics is a freemium web analytics service offered by Google that tracks and reports website traffic.
- [IPv4 Geolocation](https://www.johnsnowlabs.com/marketplace/ipv4-geolocation)
- This dataset is a database of IPv4 address networks with their respective geographical location.
- [Internet Top Level Domain Names](https://www.johnsnowlabs.com/marketplace/internet-top-level-domain-names)
- This dataset represents the delegation details of top-level domains, including gTLDs (Generic top-level domains) such as .com, and country-code TLDs such as .uk.
- [List of Internet Media Types And Subtypes](https://www.johnsnowlabs.com/marketplace/list-of-internet-media-types-and-subtypes)
- This dataset lists all the Media Types, Media Subtypes, and their file extensions. The details of the Media Types and Media Subtypes are taken from the official registry of Media Types maintained by IANA. The extension details are taken from the website of the Apache Software Foundation.
- [Membership to International Copyright Treaties](https://www.johnsnowlabs.com/marketplace/membership-to-international-copyright-treaties)
- This dataset provides the details of the membership to WIPO (World Intellectual Property Organization) administered treaties on the subject matter of copyright.
- [NYC Social Media Usage](https://www.johnsnowlabs.com/marketplace/nyc-social-media-usage)
- The Demographic Reports are produced by the Economic, Demographic and Statistical Research unit within the Countywide Service Integration and Planning Management (CSIPM) Division of the Fairfax County Department of Neighborhood and Community
Services. Information produced by the Economic, Demographic and Statistical Research unit is used by every county department, board, authority and the Fairfax County Public Schools.
**Data Engineering Overview**
**We deliver high-quality data**
- Each dataset goes through 3 levels of quality review
- 2 Manual reviews are done by domain experts
- Then, an automated set of 60+ validations enforces every datum matches metadata & defined constraints
- Data is normalized into one unified type system
- All dates, unites, codes, currencies look the same
- All null values are normalized to the same value
- All dataset and field names are SQL and Hive compliant
- Data and Metadata
- Data is available in both CSV and Apache Parquet format, optimized for high read performance on distributed Hadoop, Spark & MPP clusters
- Metadata is provided in the open Frictionless Data standard, and its every field is normalized & validated
- Data Updates
- Data updates support replace-on-update: outdated foreign keys are deprecated, not deleted
**Our data is curated and enriched by domain experts**
Each dataset is manually curated by our team of doctors, pharmacists, public health & medical billing experts:
- Field names, descriptions, and normalized values are chosen by people who actually understand their meaning
- Healthcare & life science experts add categories, search keywords, descriptions and more to each dataset
- Both manual and automated data enrichment supported for clinical codes, providers, drugs, and geo-locations
- The data is always kept up to date – even when the source requires manual effort to get updates
- Support for data subscribers is provided directly by the domain experts who curated the data sets
- Every data source’s license is manually verified to allow for royalty-free commercial use and redistribution.
**Need Help?**
If you have questions about our products, contact us at [info@johnsnowlabs.com](mailto:info@johnsnowlabs.com).
提供机构:
John Snow Labs
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集整合了多个互联网相关数据,涵盖网站流量分析、IP地理定位、顶级域名信息、媒体类型及扩展名、版权条约等内容,适用于营销优化和网络研究。数据经过严格质量审核,提供CSV和Parquet格式,支持个人和研究用途,商业使用需订阅。
以上内容由遇见数据集搜集并总结生成



