ai-education/ruTeacherTalk
收藏Hugging Face2024-03-29 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/ai-education/ruTeacherTalk
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-4.0
task_categories:
- text-classification
language:
- ru
size_categories:
- 10K<n<100K
---
Here, we present a dataset of lesson transcripts with annotation of 19 teacher talk techniques.
The lessons were given in Russian non-selective public schools and correpond to 12 different school subjects.
The dataset is divided in two groups, depending on the perspective (sociological or methodological) according to which
the annotation was performed. In its turn, each group is divided in another two subgroups:
the lessons transcribed by a human and lessons trancribed by an ASR model.
The overall number of transcripts is 118 out of which 116 are annotated with respect to
the sociological perpsective and 100 with respect to the sociological one.
The data are annonimized with replacement of real names by the string "ИМЯ".
We are going to upload baseline models trained on this data soon.
提供机构:
ai-education
原始信息汇总
数据集概述
数据集内容
- 包含19种教师谈话技巧标注的课程转录数据集。
- 数据来源于俄罗斯非选择性公立学校的12种不同学科课程。
数据结构
- 数据集分为两大组,根据标注的视角(社会学或方法论)进行划分。
- 每组进一步分为两个子组:
- 人工转录的课程
- 自动语音识别模型(ASR)转录的课程
数据量
- 总转录数量为118份。
- 其中,116份基于社会学视角进行标注,100份基于方法论视角进行标注。
数据处理
- 数据进行了匿名化处理,真实姓名被替换为字符串“ИМЯ”。
语言
- 数据集语言为俄语。
许可
- 数据集遵循CC-BY-NC-4.0许可协议。



