five

LiteCoder/LiteCoder_SourceCode

收藏
Hugging Face2023-08-02 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/LiteCoder/LiteCoder_SourceCode
下载链接
链接失效反馈
官方服务:
资源简介:
# LiteCoder Experiment Reproducing package - To run the pre-train objective use the following scripts: - Reproduce LiteCoder with all objectives: - Navigate the folder `Pre-training` containing the `LiteCoder.py` file - Then, run `Python LiteCoder.py --train-tt --train-cs --train-pd` - The pretrained model is released on [hugging face](https://huggingface.co/LiteCoder/LiteCoder_pretrained), therefore it automatically loads. - To run the ablation studies: - Ablation 1: `Python LiteCoder.py --train-tt` - Ablation 2: `Python LiteCoder.py --train-tt --train-cs` - Ablation 3: `Python LiteCoder.py --train-tt --train-cs --train-pd` - To `Fine-tuning` LiteCoder on downstream tasks: - Navigate to the `Fine-tuning` folder and then `Downstream task` folder: - Code Clone Detection: - Follow the instruction of `readme.md` file. - Code Translation: - Run `setup.sh` file. - Navigate to the `scripts/finetune` and run `translate.sh` file. - To extract the programming language features (i.e., `token type`, `code sememe`, and `code dependencies`) - We used open source datasets to extract language features. we released the extracted datasets on the Hugging Face: - `LT_Java` : [LiteCoder/LT_Java](https://huggingface.co/datasets/LiteCoder/LT_Java) - `LT_Python` : [LiteCoder/LT_Python](https://huggingface.co/datasets/LiteCoder/LT_Python) - `LT_Java_Dependency` : [LiteCoder/LT_Java_Dependency](https://huggingface.co/datasets/LiteCoder/LT_Java_Dependency) - Navigate to the utils directory: - Use either the `Java` or `Python` notebook file to run over your dataset. - Run the cells, for which, you want to extract the features. - Dependencies: - Feature extraction dependencies: ```bash - pip install ast-comments - pip install ast - pip install javalang - pip install tree-sitter - Model training dependencies: ``` bash - pip install transformers - pip install datasets - pip install pytorch_lightning - pip install torch - Or `pip install -r requirements.txt`
提供机构:
LiteCoder
原始信息汇总

数据集概述

数据集内容

使用方法

  • 预训练:
    • 导航至Pre-training文件夹,运行Python LiteCoder.py --train-tt --train-cs --train-pd
  • 消融研究:
    • Ablation 1: Python LiteCoder.py --train-tt
    • Ablation 2: Python LiteCoder.py --train-tt --train-cs
    • Ablation 3: Python LiteCoder.py --train-tt --train-cs --train-pd
  • 微调:
    • 导航至Fine-tuning文件夹的Downstream task文件夹:
      • 代码克隆检测: 遵循readme.md文件的指示。
      • 代码翻译:
        • 运行setup.sh文件。
        • 导航至scripts/finetune并运行translate.sh文件。
  • 特征提取:
    • 导航至utils目录:
      • 使用JavaPython笔记本文件运行您的数据集。
      • 运行提取特征所需的单元格。

依赖项

  • 特征提取依赖:
    • pip install ast-comments
    • pip install ast
    • pip install javalang
    • pip install tree-sitter
  • 模型训练依赖:
    • pip install transformers
    • pip install datasets
    • pip install pytorch_lightning
    • pip install torch
  • 或运行: pip install -r requirements.txt
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作