five

company_names_data

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://figshare.com/articles/dataset/company_names_data/21385260
下载链接
链接失效反馈
官方服务:
资源简介:
The data contain a sample of 1,597,336 pairs of firms matched from two data sources: HeadHunter job board platform and Ruslana firm-level data aggregator. Columns represent the following issues: hh_name - the set of lowercased names (initial and transliterated) of a firm from HeadHunter platform; rus_name - the set of lowercased names (initial and transliterated) of a firm from Ruslana platform; J_M - the Jaccard similarity between two previous sets obtained with MinHash approximation (100 hash-functions) and converted into integer scale {0, 1, ..., 100}; d_H - the geographic distance between two firms in km; match - the Boolean indication of the pair match (marked-up and manually validated); ind - the Boolean indication of the same company industry based on the company and industry description similarity; entity - the Boolean indication of the same company legal form based on the company keywords; subs - the Boolean indication that at least one of company name formulation from one data source is a substring of another company name from another database; sample - values of "train" or "test" indication training and test samples.
创建时间:
2022-10-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作