krajnish95/SOQL-dataset
收藏Hugging Face2025-12-05 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/krajnish95/SOQL-dataset
下载链接
链接失效反馈官方服务:
资源简介:
# SOQL Mega Dataset Documentation
This document explains the contents of the **15,000‑record SOQL training dataset**.
It summarizes **what type of SOQL queries**, **which Salesforce objects**, and **which SOQL features** are included.
---
## 📁 1. Dataset Overview
- **File Name:** `soql_dataset_full_15000.json`
- **Total Records:** 15,000
- **Format for each entry:**
```json
{
"instruction": "<plain English instruction>",
"input": "",
"output": "<SOQL query>"
}
```
- Designed specifically for **LLM training** to convert natural language → SOQL.
---
## 📌 2. Salesforce Clouds Covered
The dataset includes SOQL examples from **all major Salesforce clouds**:
### ✔ Sales Cloud
Account, Contact, Lead, Opportunity, OpportunityLineItem
Product2, Pricebook2, PricebookEntry
Campaign, CampaignMember, Quote, QuoteLineItem
Order, OrderItem, Contract, Asset, User
### ✔ Service Cloud
Case, CaseComment, Solution
EmailMessage, LiveChatTranscript, LiveChatVisitor
Entitlement, EntitlementProcess
### ✔ Field Service
WorkOrder, WorkOrderLineItem
ServiceAppointment, ServiceResource
ServiceTerritory, ServiceTerritoryMember
WorkType, RoutingRule
### ✔ CPQ
SBQQ__Quote__c, SBQQ__QuoteLine__c
ContractedPrice, Subscription
### ✔ Marketing / Pardot
ListEmail
EmailTemplate
Pardot_Prospect__c
MarketingForm__c
### ✔ Commerce Cloud
Cart__c, CartItem__c
InventoryItem__c
ProductCategory__c
### ✔ Experience Cloud
Network, NetworkMember
Site, AuthSession
UserPreference
### ✔ Einstein / AI / Analytics
PredictionResult__c
MlRecommendation__c
EinsteinActivity__c
Dataset__x
### ✔ Content / Knowledge
KnowledgeArticleVersion
Knowledge__kav
ContentVersion, ContentDocument
### ✔ Security & Metadata
PermissionSet, PermissionSetAssignment
AccountShare, ContactShare, OpportunityShare
CustomMetadata__mdt, CustomPermission__c
ApexClass, ApexTrigger, Layout
### ✔ Big Objects
EventArchive__b
CustomerActivity__b
LoginHistoryArchive__b
CaseHistoryArchive__b
### ✔ Custom Objects (examples)
Invoice__c
InvoiceLine__c
CreditMemoLine__c
ConsumptionRate__c
Document__c
ApptBundleAggrDurDnscale__c
### ✔ 200+ Synthetic Platform Objects
To generalize LLM behavior:
- SalesObject1 … SalesObject20
- ServiceObject1 … ServiceObject20
- CPQObject1 … CPQObject20
- etc.
---
## 📌 3. SOQL Query Types Covered
### ✔ Basic SELECT Queries
- Simple equality
- Numeric comparisons
- String filters
- Email filters
- Null checks (`= NULL`, `!= NULL`)
### ✔ WHERE Clause Variants
- `LIKE '%keyword%'`
- `IN (1,2,3)`
- `INCLUDES ('Value')` (multi-select picklists)
- Date filters
- Date literals: `LAST_N_DAYS:30`, `YESTERDAY`, `THIS_YEAR`
---
## 📌 4. Aggregates Included
### ✔ Aggregate Functions
- `COUNT()`
- `COUNT_DISTINCT()`
- `SUM()`
- `AVG()`
- `MAX()`
- `MIN()`
### ✔ GROUP BY / HAVING
Examples include:
```sql
SELECT StageName, COUNT(Id) FROM Opportunity GROUP BY StageName
SELECT Industry, SUM(AnnualRevenue) FROM Account GROUP BY Industry HAVING SUM(AnnualRevenue) > 1000000
```
### ✔ ROLLUP & GROUPING
Examples:
```sql
SELECT Industry, Type, SUM(AnnualRevenue) FROM Account GROUP BY ROLLUP(Industry, Type)
SELECT Industry, GROUPING(Industry) grp FROM Account GROUP BY ROLLUP(Industry)
```
---
## 📌 5. Advanced SOQL Features
### ✔ TYPEOF
Polymorphic queries for WhoId / WhatId:
```sql
TYPEOF Who WHEN Contact THEN LastName, Email WHEN Lead THEN Company END
```
### ✔ USING SCOPE ALL ROWS
Soft-deleted record access.
### ✔ FOR VIEW / FOR REFERENCE / FOR UPDATE
- `FOR VIEW` (view stats)
- `FOR REFERENCE` (read-only snapshot)
- `FOR UPDATE` (record locking)
---
## 📌 6. Relationship Queries
### ✔ Parent → Child Subqueries
```sql
SELECT Name, (SELECT Subject FROM Case WHERE Status = 'New') FROM Account
```
### ✔ Semi-Join / Anti-Join
```sql
SELECT Id FROM Account WHERE Id IN (SELECT AccountId FROM Contact)
SELECT Id FROM Account WHERE Id NOT IN (SELECT AccountId FROM Opportunity)
```
### ✔ Deep Multi-Level Chains (3–5 levels)
Examples include:
- Account → Contact → Case → CaseComment
- Account → Contact → Case → FeedItem → FeedComment
- Account → Opportunity → OpportunityLineItem → Product2
---
## 📌 7. Big Object Queries
```sql
SELECT Id, EventDate__c FROM EventArchive__b WHERE EventDate__c > 2021-01-01T00:00:00Z
```
Big objects use **restricted SOQL**, which is represented in the dataset.
---
## 📌 8. Query Variations Included
- `ORDER BY`
- `LIMIT`
- `OFFSET`
- `FIELDS(ALL)`
- Currency conversion: `convertCurrency()`
- Timezone conversion: `convertTimezone()`
---
## 📌 9. Dataset Goals
This dataset was intentionally built to help LLMs learn:
1. Convert plain English instructions → valid SOQL.
2. Understand objects, fields, and query patterns.
3. Handle complex SOQL operations & multi-object reasoning.
4. Work with both standard & custom object patterns.
5. Generalize to unseen objects using synthetic ones.
---
## ✔ Final Notes
- The dataset intentionally mixes real + synthetic objects to maximize generalization.
- All SOQL examples are syntactically plausible and demonstrate Salesforce querying concepts.
- You can safely fine‑tune any LLM or RAG system using this dataset.
---
---
license: mit
---
提供机构:
krajnish95



