ateneoscsl/BUOD_articlescraper
收藏Hugging Face2023-05-01 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/ateneoscsl/BUOD_articlescraper
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- summarization
language:
- tl
- en
---
# 📝 BUOD Article Scraper
Authors: [James Esguerra](https://huggingface.co/jamesesguerra), [Julia Avila](), [Hazielle Bugayong](https://huggingface.co/0xhaz)
- Article Scraper for the KAMI-3000 dataset used in the BUOD [distilBART](https://huggingface.co/ateneoscsl/BUOD_distilBART_TM) and [bert2bert](https://huggingface.co/ateneoscsl/BUOD_bert2bert_TM) Transformer Models. This was also used for the text summarization tasks in the Filipino Language.
### Setup
1. Clone the repository.
```sh
# https
git clone https://github.com/avila-bugayong-esguerra/article-scraper.git
# or
# ssh
git clone git@github.com:avila-bugayong-esguerra/article-scraper.git
```
2. Change directory into project folder.
```sh
cd article_scraper
```
3. Create a virtual environment.
```sh
python -m venv venv
```
4. Activate the virtual environment.
```sh
# windows
\venv\Scripts\activate
# unix
source venv/bin/activate
```
5. Install the dependencies.
```sh
pip install -r article_scraper/requirements.txt
```
6. Change directory into the Scrapy project.
```sh
cd article_scraper
```
任务类别:
- 文本摘要
语言:
- 他加禄语(tl)
- 英语(en)
# 📝 BUOD 文章爬虫工具
作者:[James Esguerra](https://huggingface.co/jamesesguerra)、[Julia Avila]、[Hazielle Bugayong](https://huggingface.co/0xhaz)
本工具为KAMI-3000数据集配套的文章爬虫,已应用于BUOD的[distilBART](https://huggingface.co/ateneoscsl/BUOD_distilBART_TM)与[bert2bert](https://huggingface.co/ateneoscsl/BUOD_bert2bert_TM) Transformer模型,同时被用于菲律宾语(Filipino)的文本摘要任务。
### 设置
1. 克隆本仓库:
sh
# HTTPS 协议
git clone https://github.com/avila-bugayong-esguerra/article-scraper.git
# 或
# SSH 协议
git clone git@github.com:avila-bugayong-esguerra/article-scraper.git
2. 进入项目文件夹:
sh
cd article_scraper
3. 创建虚拟环境:
sh
python -m venv venv
4. 激活虚拟环境:
sh
# Windows 系统
venvScriptsactivate
# Unix/Linux 系统
source venv/bin/activate
5. 安装依赖包:
sh
pip install -r article_scraper/requirements.txt
6. 进入Scrapy项目目录:
sh
cd article_scraper
提供机构:
ateneoscsl
原始信息汇总
数据集概述
数据集名称
- BUOD Article Scraper
作者
- James Esguerra
- Julia Avila
- Hazielle Bugayong
数据集用途
- 用于BUOD的distilBART和bert2bert Transformer模型。
- 用于菲律宾语的文本摘要任务。
语言
- 菲律宾语 (tl)
- 英语 (en)
任务类别
- 文本摘要



