Embeddings and topic vectors for MOOC lectures dataset

NIAID Data Ecosystem2026-03-11 收录

下载链接：

https://data.mendeley.com/datasets/xknjp8pxbj

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset is comprised of word embeddings and document topic distribution vectors generated from transcripts of 12032 video lectures from 200 courses that were collected from Coursera learning platform. Two well-known natural language processing techniques, namely Word2Vec and Latent Dirichlet Allocation (LDA) implemented in the Gensim package in Python are used to generate word embeddings and topic vectors, respectively.

创建时间：

2019-12-06

5,000+

优质数据集

54 个

任务类型

进入经典数据集