five

HSDSLab/RedditMemes

收藏
Hugging Face2024-07-10 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/HSDSLab/RedditMemes
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: reddit_memes size_categories: - 100K<n<1M --- ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) ## Dataset Description ### Dataset Summary This dataset comprises images collected from Reddit, along with various metadata. The data are ideal for tasks that require analysis of online community interactions and content engagement, particularly in the context of image and text analysis. ### Supported Tasks and Leaderboards This dataset supports natural language processing and image analysis tasks, including sentiment analysis, trend analysis, and community engagement studies. ### Languages The text data are primarily in English. ## Dataset Structure ### Data Instances An example data instance from the dataset might look like this: ```json { "id": "2020-02-02_001", "title_raw": "Look at this amazing sunset!", "score": 320, "num_comments": 45, "over18": false, "url": "https://reddit.com/r/examplepost", "date": "2023-07-04", "ocr_raw": "Amazing sunset at the beach", "all_text_stemmed": "amaz sunset beach", "all_text_processed": "Amazing sunset at the beach", "caption": "A beautiful sunset over the ocean", "file_name": "a1b2c3.jpg" } ``` ### Data Fields - `id`: Unique identifier for each post. - `title_raw`: Original title of the Reddit post. - `score`: Number of upvotes subtracted from number of downvotes. Can be negative. - `num_comments`: Number of comments on the post. - `over18`: Boolean indicating if the content is for adults only. - `url`: URL to the original Reddit post. - `date`: Date the post was uploaded. - `ocr_raw`: Text extracted from the image using OCR. - `all_text_stemmed`: Stemmed version of the OCR text. - `all_text_processed`: Fully processed version of the OCR text. - `caption`: An image description generated by [BLIP](https://github.com/salesforce/BLIP). - `file_name`: Name of the file stored locally. ### Data Splits This dataset does not contain predefined splits such as training, testing, or validation. Users are encouraged to define these splits based on their research needs and methodologies.
提供机构:
HSDSLab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作