LiteCoder/LiteCoder_SourceCode
收藏Hugging Face2023-08-02 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/LiteCoder/LiteCoder_SourceCode
下载链接
链接失效反馈官方服务:
资源简介:
# LiteCoder Experiment Reproducing package
- To run the pre-train objective use the following scripts:
- Reproduce LiteCoder with all objectives:
- Navigate the folder `Pre-training` containing the `LiteCoder.py` file
- Then, run `Python LiteCoder.py --train-tt --train-cs --train-pd`
- The pretrained model is released on [hugging face](https://huggingface.co/LiteCoder/LiteCoder_pretrained), therefore it automatically loads.
- To run the ablation studies:
- Ablation 1: `Python LiteCoder.py --train-tt`
- Ablation 2: `Python LiteCoder.py --train-tt --train-cs`
- Ablation 3: `Python LiteCoder.py --train-tt --train-cs --train-pd`
- To `Fine-tuning` LiteCoder on downstream tasks:
- Navigate to the `Fine-tuning` folder and then `Downstream task` folder:
- Code Clone Detection:
- Follow the instruction of `readme.md` file.
- Code Translation:
- Run `setup.sh` file.
- Navigate to the `scripts/finetune` and run `translate.sh` file.
- To extract the programming language features (i.e., `token type`, `code sememe`, and `code dependencies`)
- We used open source datasets to extract language features. we released the extracted datasets on the Hugging Face:
- `LT_Java` : [LiteCoder/LT_Java](https://huggingface.co/datasets/LiteCoder/LT_Java)
- `LT_Python` : [LiteCoder/LT_Python](https://huggingface.co/datasets/LiteCoder/LT_Python)
- `LT_Java_Dependency` : [LiteCoder/LT_Java_Dependency](https://huggingface.co/datasets/LiteCoder/LT_Java_Dependency)
- Navigate to the utils directory:
- Use either the `Java` or `Python` notebook file to run over your dataset.
- Run the cells, for which, you want to extract the features.
- Dependencies:
- Feature extraction dependencies:
```bash
- pip install ast-comments
- pip install ast
- pip install javalang
- pip install tree-sitter
- Model training dependencies:
``` bash
- pip install transformers
- pip install datasets
- pip install pytorch_lightning
- pip install torch
- Or `pip install -r requirements.txt`
提供机构:
LiteCoder
原始信息汇总
数据集概述
数据集内容
- 预训练模型: 预训练模型已发布在Hugging Face,运行时自动加载。
- 编程语言特征提取数据集:
LT_Java: LiteCoder/LT_JavaLT_Python: LiteCoder/LT_PythonLT_Java_Dependency: LiteCoder/LT_Java_Dependency
使用方法
- 预训练:
- 导航至
Pre-training文件夹,运行Python LiteCoder.py --train-tt --train-cs --train-pd。
- 导航至
- 消融研究:
- Ablation 1:
Python LiteCoder.py --train-tt - Ablation 2:
Python LiteCoder.py --train-tt --train-cs - Ablation 3:
Python LiteCoder.py --train-tt --train-cs --train-pd
- Ablation 1:
- 微调:
- 导航至
Fine-tuning文件夹的Downstream task文件夹:- 代码克隆检测: 遵循
readme.md文件的指示。 - 代码翻译:
- 运行
setup.sh文件。 - 导航至
scripts/finetune并运行translate.sh文件。
- 运行
- 代码克隆检测: 遵循
- 导航至
- 特征提取:
- 导航至
utils目录:- 使用
Java或Python笔记本文件运行您的数据集。 - 运行提取特征所需的单元格。
- 使用
- 导航至
依赖项
- 特征提取依赖:
pip install ast-commentspip install astpip install javalangpip install tree-sitter
- 模型训练依赖:
pip install transformerspip install datasetspip install pytorch_lightningpip install torch
- 或运行:
pip install -r requirements.txt



