five

company_names_data

收藏
DataCite Commons2022-10-23 更新2024-07-29 收录
下载链接:
https://figshare.com/articles/dataset/company_names_data/21385260
下载链接
链接失效反馈
官方服务:
资源简介:
The data contain a sample of 1,597,336 pairs of firms matched from two data sources: HeadHunter job board platform and Ruslana firm-level data aggregator. Columns represent the following issues: <em>hh_name</em> - the set of lowercased names (initial and transliterated) of a firm from HeadHunter platform; <em>rus_name</em> - the set of lowercased names (initial and transliterated) of a firm from Ruslana platform; <em>J_M</em> - the Jaccard similarity between two previous sets obtained with MinHash approximation (100 hash-functions) and converted into integer scale {0, 1, ..., 100}; <em>d_H</em> - the geographic distance between two firms in km; <em>match</em> - the Boolean indication of the pair match (marked-up and manually validated); <em>ind</em> - the Boolean indication of the same company industry based on the company and industry description similarity; <em>entity</em> - the Boolean indication of the same company legal form based on the company keywords; <em>subs</em> - the Boolean indication that at least one of company name formulation from one data source is a substring of another company name from another database; <em>sample</em> - values of "train" or "test" indication training and test samples.
提供机构:
figshare
创建时间:
2022-10-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作