five

Den-Intelligente-Patientjournal/Medical_word_embedding_eval

收藏
Hugging Face2024-11-29 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Den-Intelligente-Patientjournal/Medical_word_embedding_eval
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-sa-3.0 config_names: - Abbreviation equality - Adjective inflection analogy - Clinical analogy - Clinical similarity - Noun inflection analogy - UMNSRS relatedness - UMNSRS similarity - Verb inflection analogy #dataset_info: #- config_name: Abbreviation equality # features: # - name: train # dtype: string configs: - config_name: Abbreviation equality data_files: - split: train path: Abbreviation equality/train* - config_name: Adjective inflection analogy data_files: - split: train path: Adjective inflection analogy/train* - config_name: Clinical analogy data_files: - split: train path: Clinical analogy/train* - config_name: Clinical similarity data_files: - split: train path: Clinical similarity/train* - config_name: Noun inflection analogy data_files: - split: train path: Noun inflection analogy/train* - config_name: UMNSRS relatedness data_files: - split: train path: UMNSRS relatedness/train* - config_name: UMNSRS similarity data_files: - split: train path: UMNSRS similarity/train* - config_name: Verb inflection analogy data_files: - split: train path: Verb inflection analogy/train* --- # Danish medical word embedding evaluation The development of the dataset is described further in our [paper](https://aclanthology.org/2023.nejlt-1.4/). ### Citing ``` @inproceedings{laursen-etal-2023-benchmark, title = "Benchmark for Evaluation of {D}anish Clinical Word Embeddings", author = "Laursen, Martin Sundahl and Pedersen, Jannik Skyttegaard and Vinholt, Pernille Just and Hansen, Rasmus S{\o}gaard and Savarimuthu, Thiusius Rajeeth", editor = "Derczynski, Leon", booktitle = "Northern European Journal of Language Technology, Volume 9", year = "2023", address = {Link{\"o}ping, Sweden}, publisher = {Link{\"o}ping University Electronic Press}, url = "https://aclanthology.org/2023.nejlt-1.4", doi = "https://doi.org/10.3384/nejlt.2000-1533.2023.4132", abstract = "In natural language processing, benchmarks are used to track progress and identify useful models. Currently, no benchmark for Danish clinical word embeddings exists. This paper describes the development of a Danish benchmark for clinical word embeddings. The clinical benchmark consists of ten datasets: eight intrinsic and two extrinsic. Moreover, we evaluate word embeddings trained on text from the clinical domain, general practitioner domain and general domain on the established benchmark. All the intrinsic tasks of the benchmark are publicly available.", } ```

许可证:CC BY-SA 3.0(知识共享署名-相同方式共享3.0协议) 配置名称列表: - 缩写一致性(Abbreviation equality) - 形容词屈折类比(Adjective inflection analogy) - 临床类比(Clinical analogy) - 临床相似度(Clinical similarity) - 名词屈折类比(Noun inflection analogy) - UMNSRS相关性(UMNSRS relatedness) - UMNSRS相似度(UMNSRS similarity) - 动词屈折类比(Verb inflection analogy) #数据集信息: #- 配置名称:Abbreviation equality # 特征: # - name: train # dtype: string 配置项: - 配置名称:缩写一致性,数据文件: - 划分集:训练集,路径:Abbreviation equality/train* - 配置名称:形容词屈折类比,数据文件: - 划分集:训练集,路径:Adjective inflection analogy/train* - 配置名称:临床类比,数据文件: - 划分集:训练集,路径:Clinical analogy/train* - 配置名称:临床相似度,数据文件: - 划分集:训练集,路径:Clinical similarity/train* - 配置名称:名词屈折类比,数据文件: - 划分集:训练集,路径:Noun inflection analogy/train* - 配置名称:UMNSRS相关性,数据文件: - 划分集:训练集,路径:UMNSRS relatedness/train* - 配置名称:UMNSRS相似度,数据文件: - 划分集:训练集,路径:UMNSRS similarity/train* - 配置名称:动词屈折类比,数据文件: - 划分集:训练集,路径:Verb inflection analogy/train* --- # 丹麦语医学词嵌入(word embedding)评估 本数据集的详细开发过程可参阅我们的[研究论文](https://aclanthology.org/2023.nejlt-1.4/)。 ### 引用格式 @inproceedings{laursen-etal-2023-benchmark, title = "Benchmark for Evaluation of Danish Clinical Word Embeddings", author = "Laursen, Martin Sundahl and Pedersen, Jannik Skyttegaard and Vinholt, Pernille Just and Hansen, Rasmus Søgaard and Savarimuthu, Thiusius Rajeeth", editor = "Derczynski, Leon", booktitle = "Northern European Journal of Language Technology, Volume 9", year = "2023", address = {Linköping, Sweden}, publisher = {Linköping University Electronic Press}, url = "https://aclanthology.org/2023.nejlt-1.4", doi = "https://doi.org/10.3384/nejlt.2000-1533.2023.4132", abstract = "在自然语言处理领域,基准测试集用于追踪研究进展并筛选实用模型。目前尚无针对丹麦语临床词嵌入的基准测试集。本文详述了一款丹麦语临床词嵌入基准测试集的开发流程。该临床基准测试集包含十项数据集:八项内在任务与两项外在任务。此外,我们针对来自临床领域、全科医疗领域以及通用领域的文本训练得到的词嵌入模型,在本基准测试集上开展了评估。本基准测试集的所有内在任务均已公开发布。" }
提供机构:
Den-Intelligente-Patientjournal
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作