five

TroglodyteDerivations/Bluesky_Emoji_Extraction

收藏
Hugging Face2024-12-16 更新2024-12-21 收录
下载链接:
https://hf-mirror.com/datasets/TroglodyteDerivations/Bluesky_Emoji_Extraction
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是从Bluesky社交媒体帖子中提取的表情符号数据集,总大小为112 GB,包含10个JSONL文件,每个文件大小为11.2 GB。每个文件中的每一行都包含表情符号、其频率、名称和Unicode代码。数据集适用于自然语言处理(NLP)、大语言模型(LLMs)、生成式AI和数据分析等领域的研究和开发。数据集的潜在应用包括训练和微调LLMs以理解和生成包含表情符号的文本,增强文本生成模型以包含上下文相关的表情符号,分析表情符号的流行趋势和情感分析,以及研究用户在不同情境下与表情符号的互动。数据集可用于研究、商业用途,并鼓励用户在使用数据集时引用来源。

This dataset contains extracted emojis from Bluesky social media posts, totaling 11.2 GB per file, with a total of 10 JSONL files. Each file is structured to include emojis along with their frequency, name, and Unicode code. The dataset is a valuable resource for researchers and developers working on natural language processing (NLP), large language models (LLMs), generative AI, and data analysis. Potential applications include training and fine-tuning LLMs to understand and generate text that includes emojis, enhancing text generation models to include contextually relevant emojis, analyzing the popularity and usage trends of emojis over time, and studying how users interact with emojis in different contexts. The dataset is available for research and commercial use, and users are encouraged to cite the source when using the dataset in their work.
提供机构:
TroglodyteDerivations
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作