Data Science Jobs SQL Case Study
收藏Data Science Jobs SQL Case Study
数据集描述
该数据集包含跨国公司薪酬数据,涉及多个国家的薪资、工作条件和公司特征。关键属性如下:
-
work_year: 薪资支付的年份,包括:
- 2020: 过去年份的确定金额。
- 2024: 当前年份的估计金额。
-
experience_level: 工作年份中的经验水平,包括:
- EN: 初级/初级。
- MI: 中级/中级。
- SE: 高级/专家。
- EX: 执行级/总监。
-
employment_type: 雇佣类型,包括:
- PT: 兼职。
- FT: 全职。
- CT: 合同。
- FL: 自由职业。
-
job_title: 工作年份中的角色(例如,数据科学家、机器学习工程师)。
-
Salary: 支付的总薪资金额。
-
salary_currency: 薪资支付的货币,遵循ISO 4217货币代码标准。
-
salary_in_usd: 基于当年平均汇率转换为美元的薪资。
-
employee_residence: 员工工作年份中的主要居住国家,表示为ISO 3166国家代码。
-
remote_ratio: 远程工作比例,包括:
- 0: 无远程工作(少于20%)。
- 50: 部分远程工作。
- 100: 完全远程工作(超过80%)。
-
company_location: 雇主主要办公室或合同分支所在国家,表示为ISO 3166国家代码。
-
company_size: 公司年平均员工人数,包括:
- S: 小型公司(少于50名员工)。
- M: 中型公司(50至250名员工)。
- L: 大型公司(超过250名员工)。
案例研究问题及SQL查询
1. 识别提供全远程管理职位且薪资超过$90,000 USD的国家
SQL SELECT DISTINCT (t.company_location) FROM ( SELECT company_location, job_title, remote_ratio, AVG(salary_in_usd), COUNT(remote_ratio) FROM salaries GROUP BY company_location, job_title, remote_ratio HAVING remote_ratio = 100 AND job_title LIKE %manager% AND AVG(salary_in_usd) > 90000 ) t;
结果:
- US (United States)
- IN (India)
- MX (Mexico)
- AU (Australia)
- FR (France)
2. 识别拥有最多大型公司(公司规模L)的前5个国家
SQL select company_location,count(company_size) from salaries where company_size = L and experience_level = EN group by company_location order by count(company_size) desc limit 5;
结果:
- United States (US) – 53家大型公司
- Germany (DE) – 10家大型公司
- Canada (CA) – 10家大型公司
- United Kingdom (GB) – 8家大型公司
- India (IN) – 6家大型公司
3. 计算享受全远程工作且薪资超过$100,000 USD的员工比例
SQL select ((select count() from salaries where salary > 100000 and remote_ratio =100) /(select count() from salaries where salary > 100000) * 100) as enjoying_remote_position_with_100k_salary;
结果:
- 32.35%的员工享受全远程工作且薪资超过$100,000 USD。
4. 识别初级职位平均薪资超过市场平均薪资的国家
SQL select t1.job_title,t2.company_location,t1.average,t2.average_per_country from ( select job_title,avg(salary_in_usd) as average from salaries where experience_level = EN GROUP BY job_title ) t1 join (select company_location,job_title,avg(salary_in_usd) as average_per_country from salaries where experience_level = EN group by company_location,job_title) t2 on t1.job_title = t2.job_title and t2.average_per_country > t1.average;
5. 识别每个职位在不同国家支付的最高平均薪资
SQL select t2.company_location,t2.job_title,t1.max_avg from (select job_title,max(avg_sal) as max_avg from (select company_location,job_title,avg(salary_in_usd) as avg_sal from salaries GROUP BY company_location,job_title order by job_title)t group by job_title order by job_title) t1 join (select company_location,job_title,avg(salary_in_usd) as avg_sal from salaries GROUP BY company_location,job_title order by job_title) t2 on t1.job_title = t2.job_title where t1.max_avg = avg_sal order by job_title;
6. 识别过去三年(2022-2024)平均薪资持续增长的国家
SQL with temp as ( select * from salaries where company_location in ( select company_location from ( select company_location,avg(salary),count(distinct work_year) from salaries where work_year >= (year(current_date()))-2 GROUP BY company_location having count(DISTINCT work_year) =3 order by company_location ) t )) select company_location, max(case when work_year = 2022 then avg_salary end) as Average_2022, max(case when work_year = 2023 then avg_salary end) as Average_2023, max(case when work_year = 2024 then avg_salary end) as Average_2024 from ( select company_location,work_year,avg(salary_in_usd) as avg_salary from temp group by company_location,work_year order by company_location) t group by company_location having Average_2024 > Average_2023 and Average_2023 > Average_2022;
7. 2021与2024年各经验水平的全远程工作比例
sql SELECT experience_level, SUM(CASE WHEN work_year = 2021 THEN remote_ratio ELSE 0 END) / COUNT() AS remote_2021, SUM(CASE WHEN work_year = 2024 THEN remote_ratio ELSE 0 END) / COUNT() AS remote_2024 FROM salaries GROUP BY experience_level;
8. 2023-2024年各经验水平和职位的平均薪资增长百分比
sql WITH salary_2023 AS ( SELECT job_title, experience_level, AVG(salary_in_usd) AS avg_salary_2023 FROM salaries WHERE work_year = 2023 GROUP BY job_title, experience_level ), salary_2024 AS ( SELECT job_title, experience_level, AVG(salary_in_usd) AS avg_salary_2024 FROM salaries WHERE work_year = 2024 GROUP BY job_title, experience_level ) SELECT s2023.job_title, s2023.experience_level, ((s2024.avg_salary_2024 - s2023.avg_salary_2023) / s2023.avg_salary_2023) * 100 AS salary_increase_percentage FROM salary_2023 s2023 JOIN salary_2024 s2024 ON s2023.job_title = s2024.job_title AND s2023.experience_level = s2024.experience_level;
9. 为经验水平实施基于角色的访问控制(RBAC)
sql -- 伪代码 CREATE ROLE entry_level_access; CREATE ROLE mid_level_access; CREATE ROLE senior_level_access; GRANT SELECT ON salaries TO entry_level_access WHERE experience_level = EN; GRANT SELECT ON salaries TO mid_level_access WHERE experience_level = MI; GRANT SELECT ON salaries TO senior_level_access WHERE experience_level = SE;




