Demystifying Issues, Challenges, and Solutions for Multilingual Software Development
收藏DataCite Commons2024-04-08 更新2024-07-29 收录
下载链接:
https://figshare.com/articles/dataset/posts/19343657/9
下载链接
链接失效反馈官方服务:
资源简介:
Code and Dataset
<br>
[Environment preparation]
1. Python version: 3.6 or upper version
2. Dependent libraries: nltk, pandas, etc.
Utilize pip install [lib_name] to install the libraries.
<br>
[Code]
1. File list
Step1 Scrapy tool -- Use the Scrapy tool to crawl multi-language data from Stack Overflow.
Step2 LDA topic model -- Use LDA to generate development related topics.
Step3 Random sampling for Codebook -- Random sample 495 posts.
Step4 Random sampling for Coding Process -- Random sample 20% posts.
acceptance-categories.py -- Get the relationship between acceptance and categories.
accepted answer time.py -- Get the data on how long after the post is published to get the accepted answer.
evolution of acceptance.py -- Get the evolution of per-year percentage of posts with accepted answers.
view counts.py -- Get the data related to the post's view counts.
data analysis.py -- Other data analysis.
<br>
2. Command line
… \step1 Scrapy tool>scrapy crawl -o output.csv getQA -- start to crawl data from Stack Overflow
<br>
3. Command parameters
scrapy bench -- Run quick benchmark test
scrapy check -- Check spider contracts
scrapy crawl -- Run a spider
scrapy edit -- Edit spider
scrapy fetch -- Fetch a URL using the Scrapy downloader
scrapy genspider -- Generate new spider using pre-defined templates
scrapy list -- List available spiders
scrapy parse -- Parse URL (using its spider) and print the results
scrapy runspider -- Run a self-contained spider (without creating a project)
Use "scrapy -h" to see more info about a command
<br>
[Dataset]
1. Data/dataset_1,113posts.csv: filtering via tags and number of votes.
2. Data/dataset_5,565posts.csv.csv: filtering via topic modeling.
3. Data/dataset_10,444posts.csv: random sampling.
4. Data/dataset_586posts.csv: manual analysis.
5. Data/CodeBook.xlsx: key codes used to categorize SO posts on multilingual development issues.
<br>
提供机构:
figshare
创建时间:
2022-09-09



