five

Demystifying Issues, Challenges, and Solutions for Multilingual Software Development

收藏
DataCite Commons2024-04-08 更新2024-07-29 收录
下载链接:
https://figshare.com/articles/dataset/posts/19343657/9
下载链接
链接失效反馈
官方服务:
资源简介:
Code and Dataset <br> [Environment preparation] 1. Python version: 3.6 or upper version 2. Dependent libraries: nltk, pandas, etc. Utilize pip install [lib_name] to install the libraries. <br> [Code] 1. File list Step1 Scrapy tool -- Use the Scrapy tool to crawl multi-language data from Stack Overflow. Step2 LDA topic model -- Use LDA to generate development related topics. Step3 Random sampling for Codebook -- Random sample 495 posts. Step4 Random sampling for Coding Process -- Random sample 20% posts. acceptance-categories.py -- Get the relationship between acceptance and categories. accepted answer time.py -- Get the data on how long after the post is published to get the accepted answer. evolution of acceptance.py -- Get the evolution of per-year percentage of posts with accepted answers. view counts.py -- Get the data related to the post's view counts. data analysis.py -- Other data analysis. <br> 2. Command line … \step1 Scrapy tool&gt;scrapy crawl -o output.csv getQA -- start to crawl data from Stack Overflow <br> 3. Command parameters scrapy bench -- Run quick benchmark test scrapy check -- Check spider contracts scrapy crawl -- Run a spider scrapy edit -- Edit spider scrapy fetch -- Fetch a URL using the Scrapy downloader scrapy genspider -- Generate new spider using pre-defined templates scrapy list -- List available spiders scrapy parse -- Parse URL (using its spider) and print the results scrapy runspider -- Run a self-contained spider (without creating a project) Use "scrapy -h" to see more info about a command <br> [Dataset] 1. Data/dataset_1,113posts.csv: filtering via tags and number of votes. 2. Data/dataset_5,565posts.csv.csv: filtering via topic modeling. 3. Data/dataset_10,444posts.csv: random sampling. 4. Data/dataset_586posts.csv: manual analysis. 5. Data/CodeBook.xlsx:  key codes used to categorize SO posts on multilingual development issues.  <br>
提供机构:
figshare
创建时间:
2022-09-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作