sapienzanlp-course-materials/hw-mnlp-2026
收藏Hugging Face2026-04-01 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/sapienzanlp-course-materials/hw-mnlp-2026
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: wikipedia_title
dtype: string
- name: wikidata_id
dtype: string
- name: query
dtype: string
- name: query_id
dtype: string
- name: candidate_chunks
list: string
- name: n_candidates
dtype: int64
- name: answer
dtype: string
- name: answer_pos
dtype: int64
- name: short_answer
list: string
splits:
- name: test
num_bytes: 28269104
num_examples: 2000
- name: blind
num_bytes: 18465398
num_examples: 1322
- name: train
num_bytes: 111968308
num_examples: 8000
download_size: 94231375
dataset_size: 158702810
configs:
- config_name: default
data_files:
- split: test
path: data/test-*
- split: blind
path: data/blind-*
- split: train
path: data/train-*
task_categories:
- sentence-similarity
- text-generation
- question-answering
language:
- en
size_categories:
- 10K<n<100K
---
# Dataset for Multilingual Natural Language Processing (MNLP) Homeworks
This dataset serves for both **Homework 1** and **Homework 2** of the [Multilingual Natural Language Processing (MNLP) course](https://sapienzanlp.github.io/).
## Homework 1 - Semantic Search
In the first homework, you are asked to build *semantic search* systems. You **must only** use the following variables:
* `query`: A single question in natural language.
* `query_id`: The question (query) identifier.
* `candidate_chunks`: List of candidate answers (only one is correct).
* `n_candidates`: Number of candidate answers per query (length of `candidate_chunks`).
* `answer`: The correct answer, an element of `candidate_chunks`.
* `answer_pos`: Position of the correct answer in `candidate_chunks`.
## Homework 2 - TBA
Stay tuned!
提供机构:
sapienzanlp-course-materials



