five

krajnish95/SOQL-dataset

收藏
Hugging Face2025-12-05 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/krajnish95/SOQL-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
# SOQL Mega Dataset Documentation This document explains the contents of the **15,000‑record SOQL training dataset**. It summarizes **what type of SOQL queries**, **which Salesforce objects**, and **which SOQL features** are included. --- ## 📁 1. Dataset Overview - **File Name:** `soql_dataset_full_15000.json` - **Total Records:** 15,000 - **Format for each entry:** ```json { "instruction": "<plain English instruction>", "input": "", "output": "<SOQL query>" } ``` - Designed specifically for **LLM training** to convert natural language → SOQL. --- ## 📌 2. Salesforce Clouds Covered The dataset includes SOQL examples from **all major Salesforce clouds**: ### ✔ Sales Cloud Account, Contact, Lead, Opportunity, OpportunityLineItem Product2, Pricebook2, PricebookEntry Campaign, CampaignMember, Quote, QuoteLineItem Order, OrderItem, Contract, Asset, User ### ✔ Service Cloud Case, CaseComment, Solution EmailMessage, LiveChatTranscript, LiveChatVisitor Entitlement, EntitlementProcess ### ✔ Field Service WorkOrder, WorkOrderLineItem ServiceAppointment, ServiceResource ServiceTerritory, ServiceTerritoryMember WorkType, RoutingRule ### ✔ CPQ SBQQ__Quote__c, SBQQ__QuoteLine__c ContractedPrice, Subscription ### ✔ Marketing / Pardot ListEmail EmailTemplate Pardot_Prospect__c MarketingForm__c ### ✔ Commerce Cloud Cart__c, CartItem__c InventoryItem__c ProductCategory__c ### ✔ Experience Cloud Network, NetworkMember Site, AuthSession UserPreference ### ✔ Einstein / AI / Analytics PredictionResult__c MlRecommendation__c EinsteinActivity__c Dataset__x ### ✔ Content / Knowledge KnowledgeArticleVersion Knowledge__kav ContentVersion, ContentDocument ### ✔ Security & Metadata PermissionSet, PermissionSetAssignment AccountShare, ContactShare, OpportunityShare CustomMetadata__mdt, CustomPermission__c ApexClass, ApexTrigger, Layout ### ✔ Big Objects EventArchive__b CustomerActivity__b LoginHistoryArchive__b CaseHistoryArchive__b ### ✔ Custom Objects (examples) Invoice__c InvoiceLine__c CreditMemoLine__c ConsumptionRate__c Document__c ApptBundleAggrDurDnscale__c ### ✔ 200+ Synthetic Platform Objects To generalize LLM behavior: - SalesObject1 … SalesObject20 - ServiceObject1 … ServiceObject20 - CPQObject1 … CPQObject20 - etc. --- ## 📌 3. SOQL Query Types Covered ### ✔ Basic SELECT Queries - Simple equality - Numeric comparisons - String filters - Email filters - Null checks (`= NULL`, `!= NULL`) ### ✔ WHERE Clause Variants - `LIKE '%keyword%'` - `IN (1,2,3)` - `INCLUDES ('Value')` (multi-select picklists) - Date filters - Date literals: `LAST_N_DAYS:30`, `YESTERDAY`, `THIS_YEAR` --- ## 📌 4. Aggregates Included ### ✔ Aggregate Functions - `COUNT()` - `COUNT_DISTINCT()` - `SUM()` - `AVG()` - `MAX()` - `MIN()` ### ✔ GROUP BY / HAVING Examples include: ```sql SELECT StageName, COUNT(Id) FROM Opportunity GROUP BY StageName SELECT Industry, SUM(AnnualRevenue) FROM Account GROUP BY Industry HAVING SUM(AnnualRevenue) > 1000000 ``` ### ✔ ROLLUP & GROUPING Examples: ```sql SELECT Industry, Type, SUM(AnnualRevenue) FROM Account GROUP BY ROLLUP(Industry, Type) SELECT Industry, GROUPING(Industry) grp FROM Account GROUP BY ROLLUP(Industry) ``` --- ## 📌 5. Advanced SOQL Features ### ✔ TYPEOF Polymorphic queries for WhoId / WhatId: ```sql TYPEOF Who WHEN Contact THEN LastName, Email WHEN Lead THEN Company END ``` ### ✔ USING SCOPE ALL ROWS Soft-deleted record access. ### ✔ FOR VIEW / FOR REFERENCE / FOR UPDATE - `FOR VIEW` (view stats) - `FOR REFERENCE` (read-only snapshot) - `FOR UPDATE` (record locking) --- ## 📌 6. Relationship Queries ### ✔ Parent → Child Subqueries ```sql SELECT Name, (SELECT Subject FROM Case WHERE Status = 'New') FROM Account ``` ### ✔ Semi-Join / Anti-Join ```sql SELECT Id FROM Account WHERE Id IN (SELECT AccountId FROM Contact) SELECT Id FROM Account WHERE Id NOT IN (SELECT AccountId FROM Opportunity) ``` ### ✔ Deep Multi-Level Chains (3–5 levels) Examples include: - Account → Contact → Case → CaseComment - Account → Contact → Case → FeedItem → FeedComment - Account → Opportunity → OpportunityLineItem → Product2 --- ## 📌 7. Big Object Queries ```sql SELECT Id, EventDate__c FROM EventArchive__b WHERE EventDate__c > 2021-01-01T00:00:00Z ``` Big objects use **restricted SOQL**, which is represented in the dataset. --- ## 📌 8. Query Variations Included - `ORDER BY` - `LIMIT` - `OFFSET` - `FIELDS(ALL)` - Currency conversion: `convertCurrency()` - Timezone conversion: `convertTimezone()` --- ## 📌 9. Dataset Goals This dataset was intentionally built to help LLMs learn: 1. Convert plain English instructions → valid SOQL. 2. Understand objects, fields, and query patterns. 3. Handle complex SOQL operations & multi-object reasoning. 4. Work with both standard & custom object patterns. 5. Generalize to unseen objects using synthetic ones. --- ## ✔ Final Notes - The dataset intentionally mixes real + synthetic objects to maximize generalization. - All SOQL examples are syntactically plausible and demonstrate Salesforce querying concepts. - You can safely fine‑tune any LLM or RAG system using this dataset. --- --- license: mit ---
提供机构:
krajnish95
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作