five

Lead Scoring Dataset

收藏
www.kaggle.com2020-08-17 更新2025-03-24 收录
下载链接:
https://www.kaggle.com/amritachatterjee09/lead-scoring-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
### Context An education company named X Education sells online courses to industry professionals. On any given day, many professionals who are interested in the courses land on their website and browse for courses. The company markets its courses on several websites and search engines like Google. Once these people land on the website, they might browse the courses or fill up a form for the course or watch some videos. When these people fill up a form providing their email address or phone number, they are classified to be a lead. Moreover, the company also gets leads through past referrals. Once these leads are acquired, employees from the sales team start making calls, writing emails, etc. Through this process, some of the leads get converted while most do not. The typical lead conversion rate at X education is around 30%. Now, although X Education gets a lot of leads, its lead conversion rate is very poor. For example, if, say, they acquire 100 leads in a day, only about 30 of them are converted. To make this process more efficient, the company wishes to identify the most potential leads, also known as ‘Hot Leads’. If they successfully identify this set of leads, the lead conversion rate should go up as the sales team will now be focusing more on communicating with the potential leads rather than making calls to everyone. There are a lot of leads generated in the initial stage (top) but only a few of them come out as paying customers from the bottom. In the middle stage, you need to nurture the potential leads well (i.e. educating the leads about the product, constantly communicating, etc. ) in order to get a higher lead conversion. X Education wants to select the most promising leads, i.e. the leads that are most likely to convert into paying customers. The company requires you to build a model wherein you need to assign a lead score to each of the leads such that the customers with higher lead score h have a higher conversion chance and the customers with lower lead score have a lower conversion chance. The CEO, in particular, has given a ballpark of the target lead conversion rate to be around 80%. ### Content Variables Description * Prospect ID - A unique ID with which the customer is identified. * Lead Number - A lead number assigned to each lead procured. * Lead Origin - The origin identifier with which the customer was identified to be a lead. Includes API, Landing Page Submission, etc. * Lead Source - The source of the lead. Includes Google, Organic Search, Olark Chat, etc. * Do Not Email -An indicator variable selected by the customer wherein they select whether of not they want to be emailed about the course or not. * Do Not Call - An indicator variable selected by the customer wherein they select whether of not they want to be called about the course or not. * Converted - The target variable. Indicates whether a lead has been successfully converted or not. * TotalVisits - The total number of visits made by the customer on the website. * Total Time Spent on Website - The total time spent by the customer on the website. * Page Views Per Visit - Average number of pages on the website viewed during the visits. * Last Activity - Last activity performed by the customer. Includes Email Opened, Olark Chat Conversation, etc. * Country - The country of the customer. * Specialization - The industry domain in which the customer worked before. Includes the level 'Select Specialization' which means the customer had not selected this option while filling the form. * How did you hear about X Education - The source from which the customer heard about X Education. * What is your current occupation - Indicates whether the customer is a student, umemployed or employed. * What matters most to you in choosing this course An option selected by the customer - indicating what is their main motto behind doing this course. * Search - Indicating whether the customer had seen the ad in any of the listed items. * Magazine * Newspaper Article * X Education Forums * Newspaper * Digital Advertisement * Through Recommendations - Indicates whether the customer came in through recommendations. * Receive More Updates About Our Courses - Indicates whether the customer chose to receive more updates about the courses. * Tags - Tags assigned to customers indicating the current status of the lead. * Lead Quality - Indicates the quality of lead based on the data and intuition the employee who has been assigned to the lead. * Update me on Supply Chain Content - Indicates whether the customer wants updates on the Supply Chain Content. * Get updates on DM Content - Indicates whether the customer wants updates on the DM Content. * Lead Profile - A lead level assigned to each customer based on their profile. * City - The city of the customer. * Asymmetric Activity Index - An index and score assigned to each customer based on their activity and their profile * Asymmetric Profile Index * Asymmetric Activity Score * Asymmetric Profile Score * I agree to pay the amount through cheque - Indicates whether the customer has agreed to pay the amount through cheque or not. * a free copy of Mastering The Interview - Indicates whether the customer wants a free copy of 'Mastering the Interview' or not. * Last Notable Activity - The last notable activity performed by the student. ### Acknowledgements UpGrad Case Study ### Inspiration Your data will be in front of the world's largest data science community. What questions do you want to see answered?

{'Context': 'X 教育是一家致力于向行业专业人士提供在线课程的培训机构。在任何特定的一天,众多对课程感兴趣的专业人士会访问其网站,进行课程浏览。该公司在多个网站和搜索引擎,如谷歌上推广其课程。当这些人士抵达网站后,他们可能会浏览课程、填写课程申请表或观看一些视频。当这些人填写包含他们的电子邮件地址或电话号码的表格时,他们被归类为潜在客户。此外,公司还通过过去的推荐获得潜在客户。一旦获得这些潜在客户,销售团队的员工开始进行电话沟通、撰写电子邮件等。在这个过程中,一些潜在客户被成功转化,而大多数则未能转化。X 教育的典型潜在客户转化率约为 30%。尽管 X 教育获得了大量的潜在客户,但其转化率非常低。例如,如果他们在一天内获得 100 个潜在客户,其中只有大约 30 个被转化。为了提高这一过程的效率,公司希望识别最有潜力的潜在客户,即所谓的‘热线索’。如果他们成功识别出这组潜在客户,潜在客户的转化率应该会上升,因为销售团队现在将更多地专注于与潜在客户进行沟通,而不是给每个人打电话。在初始阶段(顶部)产生了大量线索,但只有少数从底部转化为付费客户。在中游阶段,需要精心培育潜在客户(即教育潜在客户关于产品的知识,持续沟通等),以便提高潜在客户的转化率。X 教育希望挑选出最有希望的潜在客户,即最有可能转化为付费客户的潜在客户。公司要求您构建一个模型,其中需要对每个潜在客户分配一个潜在客户评分,使得评分较高的客户具有更高的转化机会,而评分较低的客户具有较低的转化机会。特别是,CEO 已经给出了目标潜在客户转化率的预估,约为 80%。', 'Content': {'Variables': "预测 ID - 用于识别客户的唯一标识符。 潜在客户编号 - 分配给每个获取的潜在客户的编号。 潜在客户来源 - 用于识别客户成为潜在客户的来源标识符。包括 API、着陆页提交等。 潜在客户来源 - 潜在客户的来源。包括谷歌、有机搜索、Olark 聊天等。 不发送电子邮件 - 由客户选择的指示变量,表明他们是否希望收到关于课程的电子邮件。 不打电话 - 由客户选择的指示变量,表明他们是否希望收到关于课程的电话。 已转化 - 目标变量。指示潜在客户是否已成功转化。 网站总访问量 - 客户在网站上进行的总访问次数。 网站总花费时间 - 客户在网站上花费的总时间。 每次访问页面浏览量 - 每次访问期间网站的平均页面浏览量。 最后活动 - 客户最后执行的活动。包括电子邮件已打开、Olark 聊天对话等。 国家 - 客户所在的国家。 专业领域 - 客户之前工作的行业领域。包括 '选择专业领域' 这一选项,意味着客户在填写表格时未选择此选项。 您是如何了解到 X 教育的 - 客户了解到 X 教育的来源。 您目前的职业是什么 - 指示客户是学生、失业还是就业。 在选择此课程时,什么对您来说最重要 - 客户选择的选项,表明他们进行此课程的主要宗旨。 搜索 - 指示客户是否在列出的任何项目中看到了广告。 杂志 报纸文章 X 教育论坛 报纸 数字广告 通过推荐 - 指示客户是否是通过推荐而来。 希望收到更多关于我们课程更新的信息 - 指示客户是否选择了希望收到更多课程更新的信息。 标签 - 分配给客户的标签,指示潜在客户当前的状态。 潜在客户质量 - 根据数据和直觉,指派给潜在客户的员工认为的潜在客户质量。 关于供应链内容的更新 - 指示客户是否希望收到关于供应链内容的更新。 关于 DM 内容的更新 - 指示客户是否希望收到关于 DM 内容的更新。 潜在客户档案 - 根据客户档案分配给每个客户的潜在客户级别。 城市 - 客户所在的城市。 非对称活动指数 - 根据客户的活动和档案分配给每个客户的指数和评分。 非对称档案指数 非对称活动评分 非对称档案评分 我同意通过支票支付金额 - 指示客户是否同意通过支票支付金额。 a免费副本《精通面试》 - 指示客户是否希望获得《精通面试》的免费副本。 最后显著活动 - 学生最后执行的活动。"}, 'Acknowledgements': 'UpGrad 案例研究', 'Inspiration': '您提供的数据将置于全球最大数据科学社区面前。您希望看到哪些问题得到解答?'}
提供机构:
www.kaggle.com
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作