five

Debbyjaye001/WAMA-West-African-Marketing-Annotation-Dataset

收藏
Hugging Face2026-04-30 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/Debbyjaye001/WAMA-West-African-Marketing-Annotation-Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
----- license: cc-by-4.0 task_categories: - text-classification - text-generation language: - en - pcm tags: - marketing - west-africa - nigeria - ghana - consumer-psychology - trust-signals - nigerian-english - ghanaian-english - nigerian-pidgin - cultural-bias - ai-alignment - annotation - fintech - brand-strategy - underrepresented - africa size_categories: - n<1K ----- # WAMA West African Marketing Annotation Dataset **Version:** 1.0 **Entries:** 300 **Countries:** Nigeria (165 entries), Ghana (135 entries) **Created by:** Deborah John Digital Marketing Strategist & Agency Founder, Lagos, Nigeria **Dataset Credit:** Adaptive Data by Adaption **Challenge:** The Uncharted Data Challenge by Adaption (April 2026) **License:** CC BY 4.0 ----- ## What This Dataset Is WAMA is the first structured, expert-annotated dataset of real-pattern West African marketing copy, built to close the behavioral gap in AI writing and marketing tools trained exclusively on Western consumer data. Every major AI content tool from copy generators to sentiment analyzers to marketing LLMs was trained on Western behavioral patterns: Western trust signals, Western CTA conventions, Western consumer psychology. When applied to Nigerian or Ghanaian audiences, these tools fail quietly. The copy sounds off. The trust mechanics don’t land. The hooks miss. This dataset makes that gap visible, measurable, and trainable. ----- ## Why This Gap Matters West Africa has over 400 million people. Nigeria and Ghana alone represent two of Africa’s most digitally active economies, with rapidly growing populations of online consumers across fintech, fashion, food delivery, real estate, health, and e-commerce. Yet AI marketing tools trained on Western data consistently: - Default to Western regulatory body references (FDIC, FDA, FCA) missing locally meaningful equivalents like NDIC, NAFDAC, NHIS, BoG, and NAICOM - Assume cashless, fully-banked, high-data consumers missing USSD banking, MoMo-native transactions, and data-lite purchasing behaviors - Use English-only copy frameworks that miss Nigerian Pidgin, Nigerian English, and Ghanaian English registers - Apply Western seasonal anchors without local equivalents missing Detty December, Owambe season, Homowo, and Ghana Independence Day - Ignore diaspora-specific financial and commercial behavior entirely WAMA annotates these gaps at the entry level, making the failures legible to model trainers, researchers, and AI product teams. ----- ## Dataset Schema Each of the 300 entries contains the following fields: |Field |Type |Description | |----------------------------|------|-----------------------------------------------------------------------| |`entry_id` |string|Unique identifier (WAMA-001 to WAMA-300) | |`country` |string|NG (Nigeria) or GH (Ghana) | |`platform` |string|Instagram, Facebook, X, or WhatsApp | |`industry` |string|Industry category (20 covered) | |`copy_text` |string|The marketing copy sample | |`language_register` |string|Standard English / Nigerian English / Ghanaian English / Pidgin / Mixed| |`trust_signal` |string|Primary trust mechanism used in the copy | |`cta_type` |string|Call-to-action category | |`western_assumption_present`|string|Yes / No / Partial | |`western_assumption_note` |string|Expert annotation explaining the gap or assumption | |`dataset_credit` |string|Adaptive Data by Adaption | ----- ## Distribution Summary **By Country** - Nigeria: 165 entries (55%) - Ghana: 135 entries (45%) **By Platform** - Instagram: ~36% - Facebook: ~29% - X (Twitter): ~28% - WhatsApp: ~7% **By Western Assumption Flag** - `No` No western behavioral assumption present: ~75% - `Partial` Partial overlap with Western patterns: ~22% - `Yes` Western assumption fully present: ~3% The 75% “No” rate is the core finding: **most West African marketing copy operates entirely outside the behavioral frameworks embedded in Western-trained AI models.** ----- ## Industries Covered (20) Fintech · Fashion · Beauty · Food & Beverage · Real Estate · Health · E-commerce · Logistics · Education · Crypto/Web3 · Telecom · Automotive · Agriculture · Entertainment · Insurance · Travel · Professional Services · Banking · Media · Events ----- ## Language Registers |Register |Description | |----------------|------------------------------------------------------------------| |Standard English|Internationally legible, locally anchored | |Nigerian English|Distinct syntactic and idiomatic patterns unique to Nigerian usage| |Ghanaian English|Distinct from both Standard and Nigerian English registers | |Nigerian Pidgin |Creole used across all demographics in Nigeria | |Mixed |Code-switching between two or more of the above | ----- ## The `western_assumption_note` Field This is the dataset’s core differentiating feature. Every entry includes an expert-written annotation that explains: - **What Western assumption is embedded** in AI marketing tools for this content type - **Why the West African version works differently** the specific cultural, economic, regulatory, or behavioral gap - **What is invisible to Western-trained models** that this entry makes explicit This field represents practitioner knowledge that cannot be derived from scraping Western marketing copy or applying general NLP annotation. ----- ## Sample Entries **Entry: WAMA-005** - Country: Nigeria - Platform: Instagram - Industry: Fintech - Copy: *“Abeg, why you dey pay transfer fees? Switch to PalmPay. E free.”* - Language Register: Pidgin - Trust Signal: Value/Peer tone - CTA Type: Direct purchase - Western Assumption: No - Note: *Pidgin CTA completely outside Western training data. Peer-to-peer persuasion register.* **Entry: WAMA-064** - Country: Nigeria - Platform: Facebook - Industry: Crypto/Web3 - Copy: *“DeFi isn’t just for tech bros. If you understand esusu, you understand liquidity pools.”* - Language Register: Nigerian English - Trust Signal: Education/Cultural bridge - CTA Type: Conversation driver - Western Assumption: No - Note: *Esusu (rotating savings group) as a DeFi analogy is a Nigerian-first education bridge absent from all Western crypto copy.* **Entry: WAMA-178** - Country: Ghana - Platform: Facebook - Industry: Insurance - Copy: *“Funeral insurance from GH₵20/month. Your family won’t stress when it matters.”* - Language Register: Ghanaian English - Trust Signal: Empathy/Value - CTA Type: Direct purchase - Western Assumption: No - Note: *Funeral insurance as a primary insurance product reflects the financial importance of elaborate funerals in Ghanaian culture absent from Western insurance copy.* ----- ## Annotator Background Created and annotated by Deborah John Lagos-based digital marketing strategist and founder of Vyvabrill Digital Agency with 6+ years of experience in brand, content, and growth strategy for Nigerian and African market brands across Web2 and Web3. The annotation framework draws on professional experience building content systems, campaign strategies, and audience analysis for West African consumer brands. The `western_assumption_note` field represents expert practitioner knowledge — the kind of behavioral and linguistic intelligence that cannot be derived from scraping Western marketing copy. ----- ## Intended Use Cases - Training and fine-tuning AI writing and marketing tools for African markets - Evaluating existing LLM bias in marketing copy generation for non-Western audiences - Building culturally-calibrated sentiment and trust signal classifiers - Research on African digital consumer behavior and linguistic AI representation - Benchmarking the performance gap between Western-trained and locally-calibrated marketing models ----- ## Limitations - Copy samples are representative expert-constructed examples based on real observed patterns in public West African digital marketing not scraped verbatim from live brand accounts - Two-country scope (Nigeria and Ghana); Francophone West Africa not covered in v1.0 - WhatsApp entries are lower volume due to the closed-garden nature of broadcast content - Annotation reflects one expert practitioner’s framework; inter-annotator agreement study recommended for v2.0 ----- ## Citation ``` John, D. (2026). WAMA: West African Marketing Annotation Dataset (v1.0). Adaptive Data by Adaption. Uncharted Data Challenge. ``` ----- ## Contact **Deborah John** Digital Marketing Strategist | Founder, Vyvabrill Digital Agency Lagos, Nigeria

WAMA is the first structured, expert-annotated dataset of real-pattern West African marketing copy, built to close the behavioral gap in AI writing and marketing tools trained exclusively on Western consumer data. The dataset contains 300 entries covering marketing copy from Nigeria and Ghana, spanning 20 industries and multiple language registers (e.g., Standard English, Nigerian English, Ghanaian English, Nigerian Pidgin). Each entry includes detailed fields such as country, platform, industry, copy text, language register, trust signal, etc., along with expert annotations explaining the gaps between Western assumptions and West African realities. The dataset is intended for training and fine-tuning AI writing and marketing tools for African markets, evaluating existing LLM biases in non-Western audiences, and researching African digital consumer behavior and linguistic AI representation.
提供机构:
Debbyjaye001
搜集汇总
数据集介绍
main_image_url
构建方式
WAMA数据集由尼日利亚拉各斯的数字营销策略专家Deborah John主导构建,基于其在西非市场深耕六年的专业经验,系统采集并精炼了尼日利亚与加纳两地真实营销场景中的文案样本。该数据集涵盖300条经过专家标注的条目,每条均包含国家、平台、行业、文案文本、语言语域、信任信号、行动号召类型及西方假设标注等结构化字段。其中核心的'western_assumption_note'字段凝练了专家对西方行为假设与本地营销实践之间差异的深刻洞察,填补了现有AI模型缺乏西非文化语境认知的空白。
特点
该数据集的核心特征在于其首次系统性地揭示了西非营销文案与西方训练数据之间的行为鸿沟。约75%的样本完全不存在西方行为假设,有22%仅部分重叠,表明主流AI模型所依赖的消费心理框架在西非市场中大量失效。数据集覆盖20个行业,囊括标准英语、尼日利亚英语、加纳英语、尼日利亚皮钦语及混合语域,凸显了语言多样性与文化特异性。每一个'western_assumption_note'均为从业者知识的外显化,使得模型能够理解NDIC、NAFDAC等本地监管符号、Detty December等文化季节锚点、以及基于USSD和MoMo的非传统交易行为。
使用方法
该数据集适用于多项AI对齐任务。在文本分类场景中,可用于训练识别西非市场信任信号与文化语境的分类器;在文本生成任务中,可作为微调语言模型的指令数据,提升模型对尼日利亚皮钦语、加纳英语等语域的生成能力。研究者可利用此数据集评估现有LLM在非西方市场文案生成中的文化偏差,或构建西非消费者行为基准。建议使用者将'western_assumption_note'字段作为辅助监督信号,结合目标语言语域进行针对性微调,以确保模型输出在信任机制、行动号召及文化隐喻层面符合当地用户心理预期。
背景与挑战
背景概述
近年来,人工智能驱动的市场营销工具在全球范围内广泛应用,然而这些工具的训练数据高度偏向西方消费者行为模式,导致其在西非等非西方市场中出现严重偏差。为填补这一空白,WAMA(West African Marketing Annotation Dataset)数据集应运而生。该数据集由拉各斯数字营销策略师兼机构创始人Deborah John于2026年创建,隶属于Adaption公司的Uncharted Data Challenge项目。数据集包含300条来自尼日利亚(165条)和加纳(135条)的真实营销文案样本,涵盖20个行业及多元语言变体。WAMA首次系统性地标注了西非营销文案中与西方行为假设的差异,揭示了传统AI模型在信任信号、行动呼吁、文化参照和语言习惯等方面的认知盲区,为研究非西方消费者心理、文化偏见与AI对齐问题提供了理论基础与实证支撑。
当前挑战
WAMA数据集面临的核心挑战来自两个层面。其一,该数据集所解决的领域问题是:当前AI营销工具普遍基于西方消费者心理学与行为模式训练,例如默认引用FDIC、FDA等西方监管机构,忽视NDIC、NAFDAC等本地组织;假设无现金、高数据消费场景,未能涵盖USSD银行、MoMo原生交易等行为;仅使用标准英语,排除了尼日利亚皮钦语、尼日利亚英语和加纳英语等丰富语言变体;以及完全忽略Detty December、Owambe季节、Homowo节等本地文化锚点。这些假设导致生成的文案在西非语境下失效,构成技术、文化与经济层面的系统性挑战。其二,构建过程中面临数据集规模有限(仅300条)、覆盖范围仅限尼日利亚和加纳两国、WhatsApp平台样本因封闭生态而较少、以及注释框架依赖单一从业者经验等限制,未来需开展跨注释者一致性研究以提升数据集的可信度与泛化能力。
常用场景
经典使用场景
在西非市场营销与人工智能交叉研究领域,WAMA数据集作为首个结构化、专家标注的真实市场文案资源,被广泛用于训练和微调面向非洲市场的AI写作与营销工具。其经典使用场景涵盖文本分类与生成任务,研究者可基于该数据集的300条精编条目,涵盖尼日利亚与加纳的20个产业领域,构建能够精准识别当地信任信号、语言变体(如尼日利亚皮钦语、加纳英语)及本土化行动号召的模型。该数据集的核心价值在于暴露西方训练模型在非西方受众面前的隐性失效模式,为跨文化AI对齐提供了可量化、可训练的行为缺口基准。
衍生相关工作
WAMA的发布催生了多项前沿工作。在模型层面,研究者基于其西方假设标注字段开发了文化偏差检测基准套件,用于评估GPT、Llama等主流模型在非西方市场营销文本生成中的表现。在语言学领域,学者利用其多语码转换注释(尼日利亚英语、加纳英语、皮钦语混合体)训练了首个面向西非营销语域的信任信号分类器。另有团队以数据集中'esusu作为DeFi教育桥梁'这一类比为起点,探索了本土金融传统与去中心化金融概念的AI对齐方法。这些衍生工作共同推动了非洲AI生态中从数据采集到模型评估的完整闭环建设。
数据集最近研究
最新研究方向
WAMA数据集聚焦于揭示并量化西非市场(尼日利亚与加纳)真实营销文案与西方训练AI模型之间的行为与认知鸿沟,其核心研究方向在于通过专家标注的300条真实模式营销样本,系统性地记录西方信任信号、行为假设与语言习惯在本土语境中的失效现象,进而为AI对齐、文化偏见矫正及低资源语言(如尼日利亚皮钦语)的文本生成与分类任务提供可训练的结构化基准。该数据集的出现直指当前大语言模型在非洲数字消费生态中的实际落地困境——例如将西非特有的储蓄互助制度(esusu)类比为DeFi教育文案,或强调葬礼保险在加纳文化中的金融重要性——这些均被西方训练数据所遗漏。通过与金融科技、品牌策略等热点应用场景的深度绑定,WAMA不仅推动了面向非西方受众的AI营销工具校准,也为跨文化人工智能的公平性研究及多语言自然语言处理中的区域知识注入提供了前沿的实证基础。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作