Hausa-English Code-Switched Dataset
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/3xjyjsf4sb
下载链接
链接失效反馈官方服务:
资源简介:
Hausa-English Code-Switched Dataset
Overview
The Hausa-English Code-Switched Dataset contains comments collected from Facebook, Instagram, YouTube, and Twitter. These comments exhibit code-switching between Hausa and English, providing a rich resource for linguistic research, natural language processing (NLP), and machine translation.
Features
Platform Support: Includes comments from Facebook, Instagram, YouTube, and Twitter.
Multilingual Data: Captures code-switching between Hausa and English, reflecting real-world multilingual usage.
Customizable: Adaptable for other language combinations and specific data collection needs.
Data Collection Process
The dataset was collected using a custom scraper designed to gather code-switched comments from social media platforms. Here’s a brief overview of the process:
Platform Integration: Configured to work with Facebook, Instagram, YouTube, and Twitter APIs.
Multilingual Data Capture: Identified comments with code-switching between Hausa and English.
Configuration: Set up API keys and platform-specific settings.
Execution: Ran the scraper on each platform, collecting and aggregating comments.
Applications
The dataset supports various research and application domains:
Linguistic Analysis: Study code-switching patterns between Hausa and English.
NLP: Train and evaluate models for tasks like language identification and part-of-speech tagging.
Machine Translation: Provides parallel data for training translation systems.
Sociolinguistic Studies: Explore social and cultural factors influencing code-switching on social media.
Dataset Structure
The dataset is organized into a CSV file with the following columns:
Platform: The social media platform (Facebook, Instagram, YouTube, Twitter).
Date: The date the comment was posted.
Time: The time the comment was posted.
User ID: A unique identifier for the user.
Comment: The code-switched comment containing Hausa and English text.
English Translation: The correct English translation of the code-switched comment.
Example Entries
Platform Date Time User ID Comment English Translation
Facebook 2023-06-15 14:23:45 user123 Ina son wannan song, it's really great! I love this song, it's really great!
Twitter 2023-06-15 14:23:45 user124 Yau ne zamu je gidan abinci, can't wait! Today we are going to the restaurant, can't wait!
Instagram 2023-06-15 14:23:45 user125 Kai, wannan video is so funny! Wow, this video is so funny!
YouTube 2023-06-15 14:23:45 user126 Na gode for sharing this, very informative! Thank you for sharing this, very informative!
Conclusion
The Hausa-English Code-Switched Dataset is a valuable resource for researchers and practitioners in linguistics, NLP, and machine translation. It provides real-world examples of code-switching, supporting the development of robust models and tools for handling multilingual text in diverse contexts. Explore the dataset and contribute to its ongoing development and application.
创建时间:
2024-07-19



