five

An open source framework for metadata exploration and discovery of Polar Data

收藏
DataONE2020-07-17 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/doi:10.18739/A2R49G96H
下载链接
链接失效反馈
官方服务:
资源简介:
This project will deliver an open source framework for metadata exploration, automatic text mining and information retrieval of polar data that uses the Apache Tika technology. Apache Tika is currently the de facto "babel fish", aiding in the automatic MIME detection, text extraction, and metadata classification of over 1200 data formats. The PI will expand Tika to handle polar data and scientific data formats, making Polar data more easily available, searchable, and retrievable by all major content management systems. The proposed activity will lay the framework for a thorough automatically generated inventory of polar metadata and data. Expanding Tika to handle polar data will also naturally invite the technology/open source community to deal with polar use cases, helping to increase understanding of the arctic. The resultant software produced through effort will be disseminated to the software and polar communities through the Apache Software Foundation. A computer science graduate student and postdoc will be exposed to Cryosphere and Arctic data, helping to train the next generation of cross disciplinary data scientists in the domain. The PI's Search Engines (20-40 students annual enrollment) and Software Architecture (30-50 students annual enrollment) graduate courses at USC will benefit from the Arctic cyberinfrastructure use cases disseminated through course projects and lecture material. The PI will also work collaboratively with NSF-funded projects dealing with projects focusing on the archiving, discovery and access of polar data, such as ACADIS and the Antarctic Master Directory.

本项目将推出一款基于Apache Tika(Apache Tika)技术的开源框架,用于极地数据的元数据探索、自动文本挖掘与信息检索。目前,Apache Tika已是事实上的“万能转换器”,可自动识别超1200种数据格式的多用途互联网邮件扩展类型(Multipurpose Internet Mail Extensions,简称MIME)、提取文本并完成元数据分类。本项目的项目负责人(Principal Investigator,简称PI)将对Apache Tika进行扩展,使其支持极地数据与科学数据格式,从而让极地数据更易于被所有主流内容管理系统获取、搜索与检索。本次拟开展的工作将为全面自动生成极地元数据与数据清单奠定框架基础。扩展后的Apache Tika可支持极地数据处理,这也将自然吸引技术与开源社区关注极地应用场景,有助于提升人们对北极地区的认知。本项目产出的软件将通过Apache软件基金会(Apache Software Foundation)向软件社区与极地科研社区推广。一名计算机科学方向的研究生与一名博士后将接触冰冻圈(Cryosphere)与北极数据,助力培养该领域的下一代跨学科数据科学家。南加州大学(University of Southern California,简称USC)的项目负责人开设的《搜索引擎》(年招生规模20-40人)与《软件架构》(年招生规模30-50人)研究生课程,将通过课程项目与讲义素材引入的北极网络基础设施应用场景而受益。项目负责人还将与美国国家科学基金会(National Science Foundation,简称NSF)资助的极地数据归档、发现与访问相关项目展开合作,例如ACADIS(ACADIS)与南极总目录(Antarctic Master Directory)。
创建时间:
2020-07-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作