five

lang-uk/recruitment-dataset-job-descriptions-ukrainian

收藏
Hugging Face2024-06-02 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/lang-uk/recruitment-dataset-job-descriptions-ukrainian
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: Position dtype: string - name: Long Description dtype: string - name: Company Name dtype: string - name: Exp Years dtype: string - name: Primary Keyword dtype: string - name: English Level dtype: string - name: Published dtype: string - name: Long Description_lang dtype: string - name: id dtype: string - name: __index_level_0__ dtype: int64 splits: - name: train num_bytes: 83166918 num_examples: 27461 download_size: 40645342 dataset_size: 83166918 configs: - config_name: default data_files: - split: train path: data/train-* license: mit language: - uk size_categories: - 10K<n<100K --- # Djinni Dataset (Ukrainian Job Descriptions part) ## Overview The [Djinni Recruitment Dataset](https://github.com/Stereotypes-in-LLMs/recruitment-dataset) (Ukrainian Job Descriptions part) contains 150,000 job descriptions and 230,000 anonymized candidate CVs, posted between 2020-2023 on the [Djinni](https://djinni.co/) IT job platform. The dataset includes samples in English and Ukrainian. The dataset contains various attributes related to job descriptions, including position titles, job descriptions, company names, experience requirements, keywords, English proficiency levels, publication dates, language of job descriptions, and unique identifiers. ## Intended Use The Djinni dataset is designed with versatility in mind, supporting a wide range of applications: - **Recommender Systems and Semantic Search:** It serves as a key resource for enhancing job recommendation engines and semantic search functionalities, making the job search process more intuitive and tailored to individual preferences. - **Advancement of Large Language Models (LLMs):** The dataset provides invaluable training data for both English and Ukrainian domain-specific LLMs. It is instrumental in improving the models' understanding and generation capabilities, particularly in specialized recruitment contexts. - **Fairness in AI-assisted Hiring:** By serving as a benchmark for AI fairness, the Djinni dataset helps mitigate biases in AI-assisted recruitment processes, promoting more equitable hiring practices. - **Recruitment Automation:** The dataset enables the development of tools for automated creation of resumes and job descriptions, streamlining the recruitment process. - **Market Analysis:** It offers insights into the dynamics of Ukraine's tech sector, including the impacts of conflicts, aiding in comprehensive market analysis. - **Trend Analysis and Topic Discovery:** The dataset facilitates modeling and classification for trend analysis and topic discovery within the tech industry. - **Strategic Planning:** By enabling the automatic identification of company domains, the dataset assists in strategic market planning. ## Load Dataset ```python from datasets import load_dataset data = load_dataset("lang-uk/recruitment-dataset-job-descriptions-ukrainian")['train'] ``` ## BibTeX entry and citation info *When publishing results based on this dataset please refer to:* ```bibtex @inproceedings{drushchak-romanyshyn-2024-introducing, title = "Introducing the Djinni Recruitment Dataset: A Corpus of Anonymized {CV}s and Job Postings", author = "Drushchak, Nazarii and Romanyshyn, Mariana", editor = "Romanyshyn, Mariana and Romanyshyn, Nataliia and Hlybovets, Andrii and Ignatenko, Oleksii", booktitle = "Proceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP) @ LREC-COLING 2024", month = may, year = "2024", address = "Torino, Italia", publisher = "ELRA and ICCL", url = "https://aclanthology.org/2024.unlp-1.2", pages = "8--13", } ``` ## Attribution Special thanks to [Djinni](https://djinni.co/) for providing this invaluable dataset. Their contribution is crucial in advancing research and development in AI, machine learning, and the broader tech industry. Their effort in compiling and sharing this dataset is greatly appreciated by the community.
提供机构:
lang-uk
原始信息汇总

数据集概述

数据集名称

  • 名称: Djinni Dataset (Ukrainian Job Descriptions part)

数据集内容

  • 包含内容: 150,000个工作描述和230,000个匿名候选人简历,涵盖2020-2023年期间在Djinni IT工作平台发布的信息。
  • 语言: 英语和乌克兰语

数据集特征

  • 特征列表:
    • Position: 字符串
    • Long Description: 字符串
    • Company Name: 字符串
    • Exp Years: 字符串
    • Primary Keyword: 字符串
    • English Level: 字符串
    • Published: 字符串
    • Long Description_lang: 字符串
    • id: 字符串
    • index_level_0: int64

数据集大小

  • 训练集大小:
    • 字节数: 83166918
    • 示例数: 27461
  • 下载大小: 40645342
  • 数据集总大小: 83166918

数据集用途

  • 推荐系统和语义搜索
  • 大型语言模型(LLMs)的进步。
  • AI辅助招聘的公平性
  • 招聘自动化
  • 市场分析
  • 趋势分析和主题发现
  • 战略规划

数据集加载

python from datasets import load_dataset

data = load_dataset("lang-uk/recruitment-dataset-job-descriptions-ukrainian")[train]

引用信息

bibtex @inproceedings{drushchak-romanyshyn-2024-introducing, title = "Introducing the Djinni Recruitment Dataset: A Corpus of Anonymized {CV}s and Job Postings", author = "Drushchak, Nazarii and Romanyshyn, Mariana", editor = "Romanyshyn, Mariana and Romanyshyn, Nataliia and Hlybovets, Andrii and Ignatenko, Oleksii", booktitle = "Proceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP) @ LREC-COLING 2024", month = may, year = "2024", address = "Torino, Italia", publisher = "ELRA and ICCL", url = "https://aclanthology.org/2024.unlp-1.2", pages = "8--13", }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作