SCodeSearcher
收藏DataCite Commons2025-11-28 更新2026-04-25 收录
下载链接:
https://figshare.com/articles/dataset/SCodeSearcher/25359841/3
下载链接
链接失效反馈官方服务:
资源简介:
<pre>### File path configuration<br><br>Before you start training the model, make sure that all file paths are correctly set to the paths in your local environment. This includes training data, where the model is saved, and any associated configuration files.<br><br>- ** Training data paths ** : Check the path of the training data to make sure they point to the correct location.<br>- ** Model save path ** : The 'checkpoint' directory is used to save model weights during training. Make sure that this path correctly points to the local directory where you want to save the model weights.<br><br>** Important note ** : Before running the run.sh script, open the script and any related Python files, then check and update the path settings.<br><br>## Soft contrastive learning<br><br>To soft contrastive learning, navigate to the corresponding directory and run the following command:<br><br>```sh<br>cd $Project_Path<br>bash run.sh<br>```<br><br>## Parameter setting description<br><br>- To adjust the weight ranges of positive samples, modify the softmax operation for 'ai' on line 158 of 'utils.py'.<br><br>- To adjust the weight ranges of negative sample, adjust 'bi' on line 163 of 'utils.py'. <br><br><br><br>## Code Search<br><br><br>The dataset file contains the code retrieval datasets and the code classification datasets. <br><br>```<br>python run.py \<br> --output_dir=./python \<br> --config_name=/graphcodebert-base \<br> --model_name_or_path=/graphcodebert-base \<br> --tokenizer_name=/graphcodebert-base \<br> --lang=python \<br> --do_train \<br> --train_data_file=/dataset/CSN-Python/train.jsonl \<br> --eval_data_file=/dataset/CSN-Python/test.jsonl \<br> --test_data_file=/dataset/CSN-Python/test.jsonl \<br> --codebase_file=/dataset/CSN-Python/codebase.jsonl \<br> --num_train_epochs 20 \<br> --code_length 318 \<br> --data_flow_length 64 \<br> --nl_length 256 \<br> --train_batch_size 32 \<br> --eval_batch_size 64 \<br> --learning_rate 2e-5 \<br> --seed 42<br>```</pre>
提供机构:
figshare
创建时间:
2025-11-28



