Multi-language software development: issues, challenges, and solutions
收藏Figshare2022-09-04 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/posts/19343657/6
下载链接
链接失效反馈官方服务:
资源简介:
Code and Dataset<br>[Environment preparation]1. Python version: 3.6 or upper version2. Dependent libraries: nltk, pandas, etc.Utilize pip install [lib_name] to install the libraries.<br>[Code]1. File listStep1 Scrapy tool -- Use the Scrapy tool to crawl multi-language data from Stack Overflow.Step2 LDA topic model -- Use LDA to generate development related topics.Step3 Random sampling -- Random sample 20% posts.acceptance-categories.py -- Get the relationship between acceptance and categories.accepted answer time.py -- Get the data on how long after the post is published to get the accepted answer.evolution of acceptance.py -- Get the evolution of per-year percentage of posts with accepted answers.view counts.py -- Get the data related to the post's view counts.data analysis.py -- Other data analysis.<br>2. Command line… \step1 Scrapy tool>scrapy crawl -o output.csv getQA -- start to crawl data from Stack Overflow<br>3. Command parametersscrapy bench -- Run quick benchmark testscrapy check -- Check spider contractsscrapy crawl -- Run a spiderscrapy edit -- Edit spiderscrapy fetch -- Fetch a URL using the Scrapy downloaderscrapy genspider -- Generate new spider using pre-defined templatesscrapy list -- List available spidersscrapy parse -- Parse URL (using its spider) and print the resultsscrapy runspider -- Run a self-contained spider (without creating a project)Use "scrapy -h" to see more info about a command<br>[Dataset]1. Data/dataset_1,113posts.csv: filtering via tags and number of votes.2. Data/dataset_5,565posts.csv.csv: filtering via topic modeling.3. Data/dataset_10,444posts.csv: random sampling.4. Data/dataset_586posts.csv: manual analysis.<br>
提供机构:
yang, haoran
创建时间:
2022-03-19



