Multi-language software development: issues, challenges, and solutions

Name: Multi-language software development: issues, challenges, and solutions
Creator: yang, haoran
Published: 2022-09-04 00:00:00
License: 暂无描述

Figshare2022-09-04 更新2026-04-08 收录

下载链接：

https://figshare.com/articles/dataset/posts/19343657/6

下载链接

链接失效反馈

官方服务：

资源简介：

Code and Dataset [Environment preparation]1. Python version: 3.6 or upper version2. Dependent libraries: nltk, pandas, etc.Utilize pip install [lib_name] to install the libraries. [Code]1. File listStep1 Scrapy tool -- Use the Scrapy tool to crawl multi-language data from Stack Overflow.Step2 LDA topic model -- Use LDA to generate development related topics.Step3 Random sampling -- Random sample 20% posts.acceptance-categories.py -- Get the relationship between acceptance and categories.accepted answer time.py -- Get the data on how long after the post is published to get the accepted answer.evolution of acceptance.py -- Get the evolution of per-year percentage of posts with accepted answers.view counts.py -- Get the data related to the post's view counts.data analysis.py -- Other data analysis. 2. Command line… \step1 Scrapy tool>scrapy crawl -o output.csv getQA -- start to crawl data from Stack Overflow 3. Command parametersscrapy bench -- Run quick benchmark testscrapy check -- Check spider contractsscrapy crawl -- Run a spiderscrapy edit -- Edit spiderscrapy fetch -- Fetch a URL using the Scrapy downloaderscrapy genspider -- Generate new spider using pre-defined templatesscrapy list -- List available spidersscrapy parse -- Parse URL (using its spider) and print the resultsscrapy runspider -- Run a self-contained spider (without creating a project)Use "scrapy -h" to see more info about a command [Dataset]1. Data/dataset_1,113posts.csv: filtering via tags and number of votes.2. Data/dataset_5,565posts.csv.csv: filtering via topic modeling.3. Data/dataset_10,444posts.csv: random sampling.4. Data/dataset_586posts.csv: manual analysis.

提供机构：

yang, haoran

创建时间：

2022-03-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集