elenanereiss/german_legal_entity_recognition
收藏Hugging Face2024-01-18 更新2024-05-25 收录
下载链接:
https://hf-mirror.com/datasets/elenanereiss/german_legal_entity_recognition
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- expert-generated
language_creators:
- found
language:
- de
license:
- cc-by-4.0
multilinguality:
- monolingual
size_categories:
- n<1K
source_datasets:
- original
task_categories:
- token-classification
task_ids:
- named-entity-recognition
paperswithcode_id: legal-documents-entity-recognition
pretty_name: Legal Documents Entity Recognition
dataset_info:
- config_name: bag
features:
- name: id
dtype: string
- name: tokens
sequence: string
- name: ner_tags
sequence:
class_label:
names:
'0': B-AN
'1': B-EUN
'2': B-GRT
'3': B-GS
'4': B-INN
'5': B-LD
'6': B-LDS
'7': B-LIT
'8': B-MRK
'9': B-ORG
'10': B-PER
'11': B-RR
'12': B-RS
'13': B-ST
'14': B-STR
'15': B-UN
'16': B-VO
'17': B-VS
'18': B-VT
'19': I-AN
'20': I-EUN
'21': I-GRT
'22': I-GS
'23': I-INN
'24': I-LD
'25': I-LDS
'26': I-LIT
'27': I-MRK
'28': I-ORG
'29': I-PER
'30': I-RR
'31': I-RS
'32': I-ST
'33': I-STR
'34': I-UN
'35': I-VO
'36': I-VS
'37': I-VT
'38': O
splits:
- name: train
download_size: 4392913
dataset_size: 0
- config_name: bfh
features:
- name: id
dtype: string
- name: tokens
sequence: string
- name: ner_tags
sequence:
class_label:
names:
'0': B-AN
'1': B-EUN
'2': B-GRT
'3': B-GS
'4': B-INN
'5': B-LD
'6': B-LDS
'7': B-LIT
'8': B-MRK
'9': B-ORG
'10': B-PER
'11': B-RR
'12': B-RS
'13': B-ST
'14': B-STR
'15': B-UN
'16': B-VO
'17': B-VS
'18': B-VT
'19': I-AN
'20': I-EUN
'21': I-GRT
'22': I-GS
'23': I-INN
'24': I-LD
'25': I-LDS
'26': I-LIT
'27': I-MRK
'28': I-ORG
'29': I-PER
'30': I-RR
'31': I-RS
'32': I-ST
'33': I-STR
'34': I-UN
'35': I-VO
'36': I-VS
'37': I-VT
'38': O
splits:
- name: train
download_size: 4392913
dataset_size: 0
- config_name: bgh
features:
- name: id
dtype: string
- name: tokens
sequence: string
- name: ner_tags
sequence:
class_label:
names:
'0': B-AN
'1': B-EUN
'2': B-GRT
'3': B-GS
'4': B-INN
'5': B-LD
'6': B-LDS
'7': B-LIT
'8': B-MRK
'9': B-ORG
'10': B-PER
'11': B-RR
'12': B-RS
'13': B-ST
'14': B-STR
'15': B-UN
'16': B-VO
'17': B-VS
'18': B-VT
'19': I-AN
'20': I-EUN
'21': I-GRT
'22': I-GS
'23': I-INN
'24': I-LD
'25': I-LDS
'26': I-LIT
'27': I-MRK
'28': I-ORG
'29': I-PER
'30': I-RR
'31': I-RS
'32': I-ST
'33': I-STR
'34': I-UN
'35': I-VO
'36': I-VS
'37': I-VT
'38': O
splits:
- name: train
download_size: 4392913
dataset_size: 0
- config_name: bpatg
features:
- name: id
dtype: string
- name: tokens
sequence: string
- name: ner_tags
sequence:
class_label:
names:
'0': B-AN
'1': B-EUN
'2': B-GRT
'3': B-GS
'4': B-INN
'5': B-LD
'6': B-LDS
'7': B-LIT
'8': B-MRK
'9': B-ORG
'10': B-PER
'11': B-RR
'12': B-RS
'13': B-ST
'14': B-STR
'15': B-UN
'16': B-VO
'17': B-VS
'18': B-VT
'19': I-AN
'20': I-EUN
'21': I-GRT
'22': I-GS
'23': I-INN
'24': I-LD
'25': I-LDS
'26': I-LIT
'27': I-MRK
'28': I-ORG
'29': I-PER
'30': I-RR
'31': I-RS
'32': I-ST
'33': I-STR
'34': I-UN
'35': I-VO
'36': I-VS
'37': I-VT
'38': O
splits:
- name: train
download_size: 4392913
dataset_size: 0
- config_name: bsg
features:
- name: id
dtype: string
- name: tokens
sequence: string
- name: ner_tags
sequence:
class_label:
names:
'0': B-AN
'1': B-EUN
'2': B-GRT
'3': B-GS
'4': B-INN
'5': B-LD
'6': B-LDS
'7': B-LIT
'8': B-MRK
'9': B-ORG
'10': B-PER
'11': B-RR
'12': B-RS
'13': B-ST
'14': B-STR
'15': B-UN
'16': B-VO
'17': B-VS
'18': B-VT
'19': I-AN
'20': I-EUN
'21': I-GRT
'22': I-GS
'23': I-INN
'24': I-LD
'25': I-LDS
'26': I-LIT
'27': I-MRK
'28': I-ORG
'29': I-PER
'30': I-RR
'31': I-RS
'32': I-ST
'33': I-STR
'34': I-UN
'35': I-VO
'36': I-VS
'37': I-VT
'38': O
splits:
- name: train
download_size: 4392913
dataset_size: 0
- config_name: bverfg
features:
- name: id
dtype: string
- name: tokens
sequence: string
- name: ner_tags
sequence:
class_label:
names:
'0': B-AN
'1': B-EUN
'2': B-GRT
'3': B-GS
'4': B-INN
'5': B-LD
'6': B-LDS
'7': B-LIT
'8': B-MRK
'9': B-ORG
'10': B-PER
'11': B-RR
'12': B-RS
'13': B-ST
'14': B-STR
'15': B-UN
'16': B-VO
'17': B-VS
'18': B-VT
'19': I-AN
'20': I-EUN
'21': I-GRT
'22': I-GS
'23': I-INN
'24': I-LD
'25': I-LDS
'26': I-LIT
'27': I-MRK
'28': I-ORG
'29': I-PER
'30': I-RR
'31': I-RS
'32': I-ST
'33': I-STR
'34': I-UN
'35': I-VO
'36': I-VS
'37': I-VT
'38': O
splits:
- name: train
download_size: 4392913
dataset_size: 0
- config_name: bverwg
features:
- name: id
dtype: string
- name: tokens
sequence: string
- name: ner_tags
sequence:
class_label:
names:
'0': B-AN
'1': B-EUN
'2': B-GRT
'3': B-GS
'4': B-INN
'5': B-LD
'6': B-LDS
'7': B-LIT
'8': B-MRK
'9': B-ORG
'10': B-PER
'11': B-RR
'12': B-RS
'13': B-ST
'14': B-STR
'15': B-UN
'16': B-VO
'17': B-VS
'18': B-VT
'19': I-AN
'20': I-EUN
'21': I-GRT
'22': I-GS
'23': I-INN
'24': I-LD
'25': I-LDS
'26': I-LIT
'27': I-MRK
'28': I-ORG
'29': I-PER
'30': I-RR
'31': I-RS
'32': I-ST
'33': I-STR
'34': I-UN
'35': I-VO
'36': I-VS
'37': I-VT
'38': O
splits:
- name: train
download_size: 4392913
dataset_size: 0
- config_name: all
features:
- name: id
dtype: string
- name: tokens
sequence: string
- name: ner_tags
sequence:
class_label:
names:
'0': B-AN
'1': B-EUN
'2': B-GRT
'3': B-GS
'4': B-INN
'5': B-LD
'6': B-LDS
'7': B-LIT
'8': B-MRK
'9': B-ORG
'10': B-PER
'11': B-RR
'12': B-RS
'13': B-ST
'14': B-STR
'15': B-UN
'16': B-VO
'17': B-VS
'18': B-VT
'19': I-AN
'20': I-EUN
'21': I-GRT
'22': I-GS
'23': I-INN
'24': I-LD
'25': I-LDS
'26': I-LIT
'27': I-MRK
'28': I-ORG
'29': I-PER
'30': I-RR
'31': I-RS
'32': I-ST
'33': I-STR
'34': I-UN
'35': I-VO
'36': I-VS
'37': I-VT
'38': O
splits:
- name: train
download_size: 4392913
dataset_size: 0
---
# Dataset Card for Legal Documents Entity Recognition
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** https://github.com/elenanereiss/Legal-Entity-Recognition
- **Repository:** None
- **Paper:** https://link.springer.com/chapter/10.1007/978-3-030-33220-4_20
- **Leaderboard:** [If the dataset supports an active leaderboard, add link here]()
- **Point of Contact:** Georg Rehm (georg.rehm@dfki.de)
### Dataset Summary
<div class="course-tip course-tip-orange bg-gradient-to-br dark:bg-gradient-to-r before:border-orange-500 dark:before:border-orange-800 from-orange-50 dark:from-gray-900 to-white dark:to-gray-950 border border-orange-50 text-orange-700 dark:text-gray-400">
<p><b>Deprecated:</b> Dataset "german_legal_entity_recognition" is deprecated and will be deleted. Use <a href="https://huggingface.co/datasets/elenanereiss/german-ler">"elenanereiss/german-ler"</a> instead.</p>
</div>
### Supported Tasks and Leaderboards
[More Information Needed]
### Languages
[More Information Needed]
## Dataset Structure
### Data Instances
[More Information Needed]
### Data Fields
[More Information Needed]
### Data Splits
[More Information Needed]
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
[More Information Needed]
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
[More Information Needed]
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
[More Information Needed]
### Contributions
Thanks to [@abhishekkrthakur](https://github.com/abhishekkrthakur) for adding this dataset.
标注创建者:专家生成
语言来源:公开获取
使用语言:德语
许可协议:CC BY 4.0
多语言类型:单语种
样本量范围:样本量小于1000
源数据集类型:原创数据集
任务类别:Token 分类
任务子项:命名实体识别(Named Entity Recognition)
PapersWithCode 标识:legal-documents-entity-recognition
直观名称:法律文档实体识别(Legal Documents Entity Recognition)
数据集信息:
- 配置名称:bag
数据字段:
- 标识符(id):字符串类型
- 词元序列(tokens):字符串序列
- 实体标记序列(ner_tags):序列类型,其类别标签如下:
0: B-AN, 1: B-EUN, 2: B-GRT, 3: B-GS, 4: B-INN, 5: B-LD, 6: B-LDS, 7: B-LIT, 8: B-MRK, 9: B-ORG, 10: B-PER, 11: B-RR, 12: B-RS, 13: B-ST, 14: B-STR, 15: B-UN, 16: B-VO, 17: B-VS, 18: B-VT, 19: I-AN, 20: I-EUN, 21: I-GRT, 22: I-GS, 23: I-INN, 24: I-LD, 25: I-LDS, 26: I-LIT, 27: I-MRK, 28: I-ORG, 29: I-PER, 30: I-RR, 31: I-RS, 32: I-ST, 33: I-STR, 34: I-UN, 35: I-VO, 36: I-VS, 37: I-VT, 38: O(非实体标记)
数据划分:训练集(train)
下载大小:4392913 字节
数据集大小:0
- 配置名称:bfh
数据字段:
- 标识符(id):字符串类型
- 词元序列(tokens):字符串序列
- 实体标记序列(ner_tags):序列类型,其类别标签与bag配置一致
数据划分:训练集(train)
下载大小:4392913 字节
数据集大小:0
- 配置名称:bgh
数据字段:
- 标识符(id):字符串类型
- 词元序列(tokens):字符串序列
- 实体标记序列(ner_tags):序列类型,其类别标签与bag配置一致
数据划分:训练集(train)
下载大小:4392913 字节
数据集大小:0
- 配置名称:bpatg
数据字段:
- 标识符(id):字符串类型
- 词元序列(tokens):字符串序列
- 实体标记序列(ner_tags):序列类型,其类别标签与bag配置一致
数据划分:训练集(train)
下载大小:4392913 字节
数据集大小:0
- 配置名称:bsg
数据字段:
- 标识符(id):字符串类型
- 词元序列(tokens):字符串序列
- 实体标记序列(ner_tags):序列类型,其类别标签与bag配置一致
数据划分:训练集(train)
下载大小:4392913 字节
数据集大小:0
- 配置名称:bverfg
数据字段:
- 标识符(id):字符串类型
- 词元序列(tokens):字符串序列
- 实体标记序列(ner_tags):序列类型,其类别标签与bag配置一致
数据划分:训练集(train)
下载大小:4392913 字节
数据集大小:0
- 配置名称:bverwg
数据字段:
- 标识符(id):字符串类型
- 词元序列(tokens):字符串序列
- 实体标记序列(ner_tags):序列类型,其类别标签与bag配置一致
数据划分:训练集(train)
下载大小:4392913 字节
数据集大小:0
- 配置名称:all
数据字段:
- 标识符(id):字符串类型
- 词元序列(tokens):字符串序列
- 实体标记序列(ner_tags):序列类型,其类别标签与bag配置一致
数据划分:训练集(train)
下载大小:4392913 字节
数据集大小:0
# 法律文档实体识别数据集卡片
## 目录
- [数据集描述](#dataset-description)
- [数据集摘要](#dataset-summary)
- [支持任务与排行榜](#supported-tasks-and-leaderboards)
- [使用语言](#languages)
- [数据集结构](#dataset-structure)
- [数据实例](#data-instances)
- [数据字段](#data-fields)
- [数据划分](#data-splits)
- [数据集构建](#dataset-creation)
- [构建初衷](#curation-rationale)
- [源数据](#source-data)
- [标注信息](#annotations)
- [个人与敏感信息](#personal-and-sensitive-information)
- [数据使用注意事项](#considerations-for-using-the-data)
- [数据集的社会影响](#social-impact-of-dataset)
- [偏差讨论](#discussion-of-biases)
- [其他已知局限性](#other-known-limitations)
- [附加信息](#additional-information)
- [数据集维护者](#dataset-curators)
- [许可信息](#licensing-information)
- [引用信息](#citation-information)
- [贡献声明](#contributions)
## 数据集描述
- **主页:** https://github.com/elenanereiss/Legal-Entity-Recognition
- **代码仓库:** 无
- **论文:** https://link.springer.com/chapter/10.1007/978-3-030-33220-4_20
- **排行榜:** [若数据集支持活跃排行榜,请在此添加链接]()
- **联系方式:** Georg Rehm (georg.rehm@dfki.de)
### 数据集摘要
> **已弃用:** 数据集「german_legal_entity_recognition」已标记为弃用并将被移除,请改用 <a href="https://huggingface.co/datasets/elenanereiss/german-ler">elenanereiss/german-ler</a>。
### 支持任务与排行榜
[需补充更多信息]
### 使用语言
[需补充更多信息]
## 数据集结构
### 数据实例
[需补充更多信息]
### 数据字段
[需补充更多信息]
### 数据划分
[需补充更多信息]
## 数据集构建
### 构建初衷
[需补充更多信息]
### 源数据
[需补充更多信息]
#### 初始数据收集与标准化
[需补充更多信息]
#### 源语言生产者是谁?
[需补充更多信息]
### 标注信息
[需补充更多信息]
#### 标注流程
[需补充更多信息]
#### 标注者是谁?
[需补充更多信息]
### 个人与敏感信息
[需补充更多信息]
## 数据使用注意事项
### 数据集的社会影响
[需补充更多信息]
### 偏差讨论
[需补充更多信息]
### 其他已知局限性
[需补充更多信息]
## 附加信息
### 数据集维护者
[需补充更多信息]
### 许可信息
[需补充更多信息]
### 引用信息
[需补充更多信息]
### 贡献声明
感谢 <a href="https://github.com/abhishekkrthakur">@abhishekkrthakur</a> 贡献本数据集。
提供机构:
elenanereiss
原始信息汇总
数据集概述:Legal Documents Entity Recognition
数据集基本信息
- 语言: 德语 (de)
- 许可证: CC-BY-4.0
- 多语言性: 单语种
- 数据集大小: 小于1K
- 数据来源: 原创数据
- 任务类别: 词元分类
- 具体任务: 命名实体识别
- PapersWithCode ID: legal-documents-entity-recognition
- 数据集名称: Legal Documents Entity Recognition
数据集结构
数据字段
- id: 字符串类型
- tokens: 字符串序列类型
- ner_tags: 类别标签序列类型,包含以下标签:
- B-AN, B-EUN, B-GRT, B-GS, B-INN, B-LD, B-LDS, B-LIT, B-MRK, B-ORG, B-PER, B-RR, B-RS, B-ST, B-STR, B-UN, B-VO, B-VS, B-VT
- I-AN, I-EUN, I-GRT, I-GS, I-INN, I-LD, I-LDS, I-LIT, I-MRK, I-ORG, I-PER, I-RR, I-RS, I-ST, I-STR, I-UN, I-VO, I-VS, I-VT
- O
数据分割
- train: 训练集
- download_size: 4392913字节
- dataset_size: 0字节
数据集创建
注释创建者
- 类型: 专家生成
语言创建者
- 类型: 发现
源数据
- 类型: 原创数据
注释
- 注释过程: 未提供详细信息
- 注释者: 未提供详细信息
个人和敏感信息
- 信息: 未提供详细信息
使用数据集的考虑
社会影响
- 信息: 未提供详细信息
偏见讨论
- 信息: 未提供详细信息
其他已知限制
- 信息: 未提供详细信息
附加信息
数据集管理者
- 信息: 未提供详细信息
许可信息
- 信息: 未提供详细信息
引用信息
- 信息: 未提供详细信息
贡献者
- 贡献者: @abhishekkrthakur



