five

TEDEUTenders

收藏
魔搭社区2025-12-05 更新2025-06-21 收录
下载链接:
https://modelscope.cn/datasets/PleIAs/TEDEUTenders
下载链接
链接失效反馈
官方服务:
资源简介:
# TED (Tenders Electronic Daily) Dataset Card ## Dataset Overview **Dataset Name:** TED (Tenders Electronic Daily) EU Tenders Dataset **Description:** TED (Tenders Electronic Daily) is the online version of the 'Supplement to the Official Journal' of the EU, dedicated to European public procurement. This dataset comprises a collection of procurement notices published by the EU, organized into Parquet files divided by year. Each file contains detailed information about public tenders, including metadata extracted from XML files. **Languages Covered:** - German (DEU): 63,128 documents - French (FRA): 25,539 documents - Polish (POL): 20,594 documents - Spanish (SPA): 15,119 documents - Dutch (NLD): 12,371 documents - Czech (CES): 11,293 documents - Romanian (RON): 10,285 documents - English (ENG): 10,231 documents - Swedish (SWE): 9,560 documents - Italian (ITA): 6,603 documents - Bulgarian (BUL): 6,281 documents - Finnish (FIN): 5,812 documents - Latvian (LAV): 5,160 documents - Danish (DAN): 3,251 documents - Lithuanian (LIT): 3,162 documents - Croatian (HR): 3,155 documents - Estonian (EST): 2,813 documents - Hungarian (HUN): 2,167 documents - Portuguese (POR): 2,080 documents - Slovenian (SLV): 1,959 documents - Slovak (SLK): 1,822 documents - Greek (ELL): 1,606 documents - Irish (GLE): 1 document - Unspecified/Empty: 57 documents **Total Number of Documents:** 224,049 **Total Number of Words:** 279,556,839 **Average Number of Words per Document:** 1,247.75 **Total Number of Characters:** 2,125,102,547 **Average Number of Characters per Document:** 9,484.99 **Total Number of Rows with Empty Identifier:** 57 ## Dataset Structure The dataset is stored in Parquet files organized by year. Each Parquet file contains the following metadata for each document: - `filename`: Name of the original XML file. - `identifier`: Unique identifier for the document. - `date`: Publication date of the document in DD/MM/YYYY format. - `language`: Language of the document. - `url`: URL to the document on the TED website. - `text`: Full text content of the document. - `word_count`: Total number of words in the document. - `character_count`: Total number of characters in the document. ## Usage The dataset can be used for various purposes including, but not limited to: - Analyzing public procurement trends across different European countries and languages. - Studying the distribution of procurement notices over time. - Developing natural language processing models to analyze the content of procurement notices. - Extracting insights related to public procurement policies and their implementation across the EU. ## Source The dataset is collected from the Tenders Electronic Daily (TED) platform of the European Union. TED is the official platform for publishing public procurement notices across Europe. ## Dataset Citation If you use this dataset in your research, please cite it as follows: ``` @dataset{TED_EU_Tenders_2024, title={TED (Tenders Electronic Daily) EU Tenders Dataset}, author={Pleias}, year={2024}, description={Collection of EU public procurement notices from the TED platform, organized by year and language, with detailed metadata extracted from XML files.} } ``` **Note:** The dataset is presented and maintained by Pleias. All rights reserved.

# TED(Tenders Electronic Daily,电子招标日报)数据集卡片 ## 数据集概览 **数据集名称:** TED(Tenders Electronic Daily,电子招标日报)欧盟招标数据集 **数据集说明:** TED(Tenders Electronic Daily,电子招标日报)是欧盟《官方公报增刊》的在线版本,专注于欧洲公共采购领域。本数据集收录了欧盟发布的各类公共招标公告,按年份整理为多个Parquet格式文件。每份文件包含公共招标的详细信息,其中涵盖从XML文件中提取的元数据。 **覆盖语言:** - 德语(DEU):63,128份文档 - 法语(FRA):25,539份文档 - 波兰语(POL):20,594份文档 - 西班牙语(SPA):15,119份文档 - 荷兰语(NLD):12,371份文档 - 捷克语(CES):11,293份文档 - 罗马尼亚语(RON):10,285份文档 - 英语(ENG):10,231份文档 - 瑞典语(SWE):9,560份文档 - 意大利语(ITA):6,603份文档 - 保加利亚语(BUL):6,281份文档 - 芬兰语(FIN):5,812份文档 - 拉脱维亚语(LAV):5,160份文档 - 丹麦语(DAN):3,251份文档 - 立陶宛语(LIT):3,162份文档 - 克罗地亚语(HR):3,155份文档 - 爱沙尼亚语(EST):2,813份文档 - 匈牙利语(HUN):2,167份文档 - 葡萄牙语(POR):2,080份文档 - 斯洛文尼亚语(SLV):1,959份文档 - 斯洛伐克语(SLK):1,822份文档 - 希腊语(ELL):1,606份文档 - 爱尔兰语(GLE):1份文档 - 未指定/空值:57份文档 **文档总数量:** 224,049份 **总词数:** 279,556,839 **单文档平均词数:** 1,247.75 **总字符数:** 2,125,102,547 **单文档平均字符数:** 9,484.99 **空标识符行数:** 57 ## 数据集结构 本数据集以按年份组织的Parquet文件存储。每份Parquet文件包含每份文档的如下元数据: - `filename`:原始XML文件的文件名 - `identifier`:文档的唯一标识符 - `date`:文档发布日期,格式为DD/MM/YYYY - `language`:文档所用语言 - `url`:该文档在TED平台官网的链接 - `text`:文档的完整文本内容 - `word_count`:文档的总词数 - `character_count`:文档的总字符数 ## 使用场景 本数据集可应用于多种研究与开发场景,包括但不限于: - 分析不同欧洲国家及语言背景下的公共采购趋势 - 研究招标公告随时间的分布规律 - 开发用于分析招标公告内容的自然语言处理(Natural Language Processing)模型 - 提取欧盟公共采购政策及其实施的相关洞察 ## 数据集来源 本数据集采集自欧盟电子招标日报(Tenders Electronic Daily, TED)平台。TED是欧洲全境发布公共采购公告的官方平台。 ## 数据集引用 若您在研究中使用本数据集,请按如下格式引用: @dataset{TED_EU_Tenders_2024, title={TED (Tenders Electronic Daily) EU Tenders Dataset}, author={Pleias}, year={2024}, description={Collection of EU public procurement notices from the TED platform, organized by year and language, with detailed metadata extracted from XML files.} } **备注:** 本数据集由Pleias发布并维护,保留所有权利。
提供机构:
maas
创建时间:
2025-06-19
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
TEDEUTenders数据集包含来自欧盟Tenders Electronic Daily平台的公共采购通知,涵盖多种语言(如德语、法语、波兰语等),总计224,049份文档,存储为按年份分组的Parquet文件,适用于公共采购趋势分析和自然语言处理任务。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作