mysamai/ashaar-tashkeel
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/mysamai/ashaar-tashkeel
下载链接
链接失效反馈官方服务:
资源简介:
Ashaar — Model-Inferred Tashkeel数据集是一个包含6,934,210个阿拉伯诗歌诗句的模型推断音标化(tashkeel)数据集。数据来源于arbml/ashaar数据集,并通过basharalrfooh/Fine-Tashkeel模型进行音标化处理。该数据集的主要用途是作为الشاعر (al-Shaaer)阿拉伯诗歌AI系统的一部分,用于引导qafiya(押韵)规则挖掘。数据集的局限性包括音标化质量是模型推断的,未经人工验证,且在处理罕见诗歌形式、方言输入或模糊的词尾元音时,错误分布未进行详细描述。
The Ashaar — Model-Inferred Tashkeel dataset consists of model-inferred diacritization (tashkeel) for 6,934,210 Arabic poetry verses. The data is sourced from the arbml/ashaar dataset and processed using the basharalrfooh/Fine-Tashkeel model. The primary use of this dataset is as part of the الشاعر (al-Shaaer) Arabic poetry AI system to bootstrap qafiya (rhyme) rule mining. Limitations of the dataset include the diacritization quality being model-inferred and not hand-verified, with no detailed characterization of error distribution on rare poetic forms, dialectal input, or ambiguous word-final vowels.
提供机构:
mysamai



