yoad/heb_news_ocr_corpus
收藏Hugging Face2025-02-01 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/yoad/heb_news_ocr_corpus
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了新闻文章的相关信息,如文章ID、起始位置、标题、类型、页面编号、文本内容、OCR文本、源文件名、报纸名称和日期等。数据集被划分为训练集,共有约273万个示例。
The dataset contains information related to news articles, such as article ID, beginning position, title, type, page number, text content, OCR text, source file name, newspaper name, and date. The dataset is split into a training set with a total of approximately 2.73 million examples.
提供机构:
yoad



