TEDEUTenders
收藏魔搭社区2025-12-05 更新2025-06-21 收录
下载链接:
https://modelscope.cn/datasets/PleIAs/TEDEUTenders
下载链接
链接失效反馈官方服务:
资源简介:
# TED (Tenders Electronic Daily) Dataset Card
## Dataset Overview
**Dataset Name:** TED (Tenders Electronic Daily) EU Tenders Dataset
**Description:**
TED (Tenders Electronic Daily) is the online version of the 'Supplement to the Official Journal' of the EU, dedicated to European public procurement. This dataset comprises a collection of procurement notices published by the EU, organized into Parquet files divided by year. Each file contains detailed information about public tenders, including metadata extracted from XML files.
**Languages Covered:**
- German (DEU): 63,128 documents
- French (FRA): 25,539 documents
- Polish (POL): 20,594 documents
- Spanish (SPA): 15,119 documents
- Dutch (NLD): 12,371 documents
- Czech (CES): 11,293 documents
- Romanian (RON): 10,285 documents
- English (ENG): 10,231 documents
- Swedish (SWE): 9,560 documents
- Italian (ITA): 6,603 documents
- Bulgarian (BUL): 6,281 documents
- Finnish (FIN): 5,812 documents
- Latvian (LAV): 5,160 documents
- Danish (DAN): 3,251 documents
- Lithuanian (LIT): 3,162 documents
- Croatian (HR): 3,155 documents
- Estonian (EST): 2,813 documents
- Hungarian (HUN): 2,167 documents
- Portuguese (POR): 2,080 documents
- Slovenian (SLV): 1,959 documents
- Slovak (SLK): 1,822 documents
- Greek (ELL): 1,606 documents
- Irish (GLE): 1 document
- Unspecified/Empty: 57 documents
**Total Number of Documents:** 224,049
**Total Number of Words:** 279,556,839
**Average Number of Words per Document:** 1,247.75
**Total Number of Characters:** 2,125,102,547
**Average Number of Characters per Document:** 9,484.99
**Total Number of Rows with Empty Identifier:** 57
## Dataset Structure
The dataset is stored in Parquet files organized by year. Each Parquet file contains the following metadata for each document:
- `filename`: Name of the original XML file.
- `identifier`: Unique identifier for the document.
- `date`: Publication date of the document in DD/MM/YYYY format.
- `language`: Language of the document.
- `url`: URL to the document on the TED website.
- `text`: Full text content of the document.
- `word_count`: Total number of words in the document.
- `character_count`: Total number of characters in the document.
## Usage
The dataset can be used for various purposes including, but not limited to:
- Analyzing public procurement trends across different European countries and languages.
- Studying the distribution of procurement notices over time.
- Developing natural language processing models to analyze the content of procurement notices.
- Extracting insights related to public procurement policies and their implementation across the EU.
## Source
The dataset is collected from the Tenders Electronic Daily (TED) platform of the European Union. TED is the official platform for publishing public procurement notices across Europe.
## Dataset Citation
If you use this dataset in your research, please cite it as follows:
```
@dataset{TED_EU_Tenders_2024,
title={TED (Tenders Electronic Daily) EU Tenders Dataset},
author={Pleias},
year={2024},
description={Collection of EU public procurement notices from the TED platform, organized by year and language, with detailed metadata extracted from XML files.}
}
```
**Note:** The dataset is presented and maintained by Pleias. All rights reserved.
# TED(Tenders Electronic Daily,电子招标日报)数据集卡片
## 数据集概览
**数据集名称:** TED(Tenders Electronic Daily,电子招标日报)欧盟招标数据集
**数据集说明:**
TED(Tenders Electronic Daily,电子招标日报)是欧盟《官方公报增刊》的在线版本,专注于欧洲公共采购领域。本数据集收录了欧盟发布的各类公共招标公告,按年份整理为多个Parquet格式文件。每份文件包含公共招标的详细信息,其中涵盖从XML文件中提取的元数据。
**覆盖语言:**
- 德语(DEU):63,128份文档
- 法语(FRA):25,539份文档
- 波兰语(POL):20,594份文档
- 西班牙语(SPA):15,119份文档
- 荷兰语(NLD):12,371份文档
- 捷克语(CES):11,293份文档
- 罗马尼亚语(RON):10,285份文档
- 英语(ENG):10,231份文档
- 瑞典语(SWE):9,560份文档
- 意大利语(ITA):6,603份文档
- 保加利亚语(BUL):6,281份文档
- 芬兰语(FIN):5,812份文档
- 拉脱维亚语(LAV):5,160份文档
- 丹麦语(DAN):3,251份文档
- 立陶宛语(LIT):3,162份文档
- 克罗地亚语(HR):3,155份文档
- 爱沙尼亚语(EST):2,813份文档
- 匈牙利语(HUN):2,167份文档
- 葡萄牙语(POR):2,080份文档
- 斯洛文尼亚语(SLV):1,959份文档
- 斯洛伐克语(SLK):1,822份文档
- 希腊语(ELL):1,606份文档
- 爱尔兰语(GLE):1份文档
- 未指定/空值:57份文档
**文档总数量:** 224,049份
**总词数:** 279,556,839
**单文档平均词数:** 1,247.75
**总字符数:** 2,125,102,547
**单文档平均字符数:** 9,484.99
**空标识符行数:** 57
## 数据集结构
本数据集以按年份组织的Parquet文件存储。每份Parquet文件包含每份文档的如下元数据:
- `filename`:原始XML文件的文件名
- `identifier`:文档的唯一标识符
- `date`:文档发布日期,格式为DD/MM/YYYY
- `language`:文档所用语言
- `url`:该文档在TED平台官网的链接
- `text`:文档的完整文本内容
- `word_count`:文档的总词数
- `character_count`:文档的总字符数
## 使用场景
本数据集可应用于多种研究与开发场景,包括但不限于:
- 分析不同欧洲国家及语言背景下的公共采购趋势
- 研究招标公告随时间的分布规律
- 开发用于分析招标公告内容的自然语言处理(Natural Language Processing)模型
- 提取欧盟公共采购政策及其实施的相关洞察
## 数据集来源
本数据集采集自欧盟电子招标日报(Tenders Electronic Daily, TED)平台。TED是欧洲全境发布公共采购公告的官方平台。
## 数据集引用
若您在研究中使用本数据集,请按如下格式引用:
@dataset{TED_EU_Tenders_2024,
title={TED (Tenders Electronic Daily) EU Tenders Dataset},
author={Pleias},
year={2024},
description={Collection of EU public procurement notices from the TED platform, organized by year and language, with detailed metadata extracted from XML files.}
}
**备注:** 本数据集由Pleias发布并维护,保留所有权利。
提供机构:
maas
创建时间:
2025-06-19
搜集汇总
数据集介绍

背景与挑战
背景概述
TEDEUTenders数据集包含来自欧盟Tenders Electronic Daily平台的公共采购通知,涵盖多种语言(如德语、法语、波兰语等),总计224,049份文档,存储为按年份分组的Parquet文件,适用于公共采购趋势分析和自然语言处理任务。
以上内容由遇见数据集搜集并总结生成



