five

PHY041/sc4021-travel-opinion-search

收藏
Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/PHY041/sc4021-travel-opinion-search
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 task_categories: - text-classification - feature-extraction language: - en - zh tags: - opinion-mining - sentiment-analysis - travel - instagram - pinterest - information-retrieval size_categories: - 100K<n<1M --- # SC4021 Travel Opinion Search Engine Dataset Crawled social media data for the NTU SC4021 Information Retrieval course project — an opinion-aware travel search engine for China travel content. ## Dataset Structure | Split | Rows | Description | |-------|------|-------------| | `ig_posts` | ~105K | Instagram posts with captions, locations, image metadata, quality scores, and VLM classifications | | `ig_comments` | ~117K | Instagram comments linked to posts | | `ig_users` | ~121K | Instagram user profiles (username, bio, followers) | | `pinterest_pins` | ~100K+ | Pinterest pins with image URLs, descriptions, and quality scores | ## Key Fields (ig_posts) | Field | Type | Description | |-------|------|-------------| | `id` | str | Instagram shortcode | | `caption` / `caption_clean` | str | Original and cleaned caption text | | `language` | str | Detected language (en/zh) | | `province` / `city` | str | Mapped China location | | `image_category` | str | CLIP zero-shot category (landscape, food, culture, etc.) | | `quality_score` | float | VLM quality assessment | | `image_description` | str | VLM-generated image description | | `likes` / `comments_count` | int | Engagement metrics | | `location_lat` / `location_lng` | float | Geolocation | ## Sources - **Instagram**: Travel hashtags (#chinatravel, #travelchina, etc.) and brand accounts - **Pinterest**: Travel photography, fashion, and destination boards ## Usage ```python from datasets import load_dataset ds = load_dataset("PHY041/sc4021-travel-opinion-search") posts = ds["ig_posts"] comments = ds["ig_comments"] ``` ## Citation NTU SC4021 Information Retrieval Project, AY2025/26 Semester 2. ## License CC BY-NC 4.0 — for academic/research use only.
提供机构:
PHY041
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作