TheFinAI/gr_general_domain_laws_and_regulations_combined
收藏Hugging Face2025-07-08 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/TheFinAI/gr_general_domain_laws_and_regulations_combined
下载链接
链接失效反馈官方服务:
资源简介:
这是一个包含希腊法律和法规的各种小型数据集组合而成的数据集。它包含text(文本内容)、source(文本来源)和tokens(token数量)三个字段。文本来源有助于区分每条数据的来源,例如来自国家印刷厂的法律、欧盟法律的Multi_Eurlex数据集,以及包含希腊国家法律档案的Raptarchis。在清洁过程中,对来自国家印刷厂的数据进行了清理,移除了包含特定杂质的行,并使用两种分词器对数据进行token计数。
This dataset is a combination of various smaller datasets containing regulations and laws in Greek. It includes the fields text (content of the text), source (source of the text), and tokens (number of tokens). The source field helps to distinguish the origin of each row, such as laws from the National Printing House of Greece, EU laws from the Multi_Eurlex dataset, and Greek state laws from the Raptarchis archive. During the cleaning process, the data from the National Printing House was cleaned by removing lines with specific types of artifacts, and token counts were performed using two tokenizers.
提供机构:
TheFinAI



