five

Hausa-English Code-Switched Dataset

收藏
Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/3xjyjsf4sb
下载链接
链接失效反馈
官方服务:
资源简介:
Hausa-English Code-Switched Dataset Overview The Hausa-English Code-Switched Dataset contains comments collected from Facebook, Instagram, YouTube, and Twitter. These comments exhibit code-switching between Hausa and English, providing a rich resource for linguistic research, natural language processing (NLP), and machine translation. Features Platform Support: Includes comments from Facebook, Instagram, YouTube, and Twitter. Multilingual Data: Captures code-switching between Hausa and English, reflecting real-world multilingual usage. Customizable: Adaptable for other language combinations and specific data collection needs. Data Collection Process The dataset was collected using a custom scraper designed to gather code-switched comments from social media platforms. Here’s a brief overview of the process: Platform Integration: Configured to work with Facebook, Instagram, YouTube, and Twitter APIs. Multilingual Data Capture: Identified comments with code-switching between Hausa and English. Configuration: Set up API keys and platform-specific settings. Execution: Ran the scraper on each platform, collecting and aggregating comments. Applications The dataset supports various research and application domains: Linguistic Analysis: Study code-switching patterns between Hausa and English. NLP: Train and evaluate models for tasks like language identification and part-of-speech tagging. Machine Translation: Provides parallel data for training translation systems. Sociolinguistic Studies: Explore social and cultural factors influencing code-switching on social media. Dataset Structure The dataset is organized into a CSV file with the following columns: Platform: The social media platform (Facebook, Instagram, YouTube, Twitter). Date: The date the comment was posted. Time: The time the comment was posted. User ID: A unique identifier for the user. Comment: The code-switched comment containing Hausa and English text. English Translation: The correct English translation of the code-switched comment. Example Entries Platform Date Time User ID Comment English Translation Facebook 2023-06-15 14:23:45 user123 Ina son wannan song, it's really great! I love this song, it's really great! Twitter 2023-06-15 14:23:45 user124 Yau ne zamu je gidan abinci, can't wait! Today we are going to the restaurant, can't wait! Instagram 2023-06-15 14:23:45 user125 Kai, wannan video is so funny! Wow, this video is so funny! YouTube 2023-06-15 14:23:45 user126 Na gode for sharing this, very informative! Thank you for sharing this, very informative! Conclusion The Hausa-English Code-Switched Dataset is a valuable resource for researchers and practitioners in linguistics, NLP, and machine translation. It provides real-world examples of code-switching, supporting the development of robust models and tools for handling multilingual text in diverse contexts. Explore the dataset and contribute to its ongoing development and application.
创建时间:
2024-07-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作