tallesl/quinquilharia
收藏Hugging Face2025-02-02 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/tallesl/quinquilharia
下载链接
链接失效反馈官方服务:
资源简介:
Quinquilharia数据集是一个包含从多个巴西葡萄牙语论坛中抓取的文本数据的集合。数据集涵盖了多个主题,如体育、艺术、电子、电影、摄影等。每个论坛的抓取数据以CSV文件形式提供,并附有数据行数的估计。抓取过程使用了`wget`命令,并详细描述了各种参数的使用及其目的,以确保抓取过程的效率和服务器负载的控制。
The Quinquilharia dataset consists of diverse text data scraped from various Portuguese forums, covering topics from sports to technology, arts to religion. Each topic corresponds to a CSV file containing the scraped text data, with the number of lines ranging from a few thousand to hundreds of thousands. The scraping process used the wget tool with various parameters set to ensure data integrity and legality, while also avoiding excessive server load.
提供机构:
tallesl



