Code for Improving Video Caption Accuracy with LLMs

Name: Code for Improving Video Caption Accuracy with LLMs
Creator: DaRUS
Published: 2025-12-05 09:58:02
License: 暂无描述

DataCite Commons2025-12-05 更新2025-04-17 收录

下载链接：

https://darus.uni-stuttgart.de/citation?persistentId=doi:10.18419/DARUS-4776

下载链接

链接失效反馈

官方服务：

资源简介：

As part of the IKILeUS project at the University of Stuttgart, research was conducted to explore how Large Language Models (LLMs) can enhance the accuracy and contextual relevance of automatic speech recognition (ASR)-generated captions. While ASR tools provide a foundation for accessibility, they often produce grammatical errors, misinterpret homophones, and struggle with domain-specific terminology. To address these challenges, experiments were conducted using LLMs such as GPT-3.5 and Llama2-13B to refine and correct captioning errors. The models were evaluated using standard NLP metrics such as Word Error Rate (WER), BLEU, and ROUGE scores, demonstrating notable improvements in caption accuracy. The findings suggest that LLMs can effectively enhance the readability, coherence, and precision of automatically generated captions, offering a promising direction for improving video accessibility for the Deaf and Hard of Hearing (DHH) community.

提供机构：

DaRUS

创建时间：

2025-02-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集