Agentforce and Data Cloud of 20% of total score in Salesforce Agentforce Specialist Exam. The topic covers data library, retrieval augment generation (RAG), search types, and retrievers.
NOTE
Most of the content in this work was generated with the assistance of AI and carefully reviewed, edited, and curated by the author. If you have found any issues with the content on this page, please do not hesitate to contact me at support@issacc.com.
🤖 Agentforce & Data Cloud
🎯 Learning Objectives
After studying this topic, you should be able to:
- Explain how an Agentforce Data Library improves agent response accuracy and personalization.
- Describe how to create and configure data libraries using the Knowledge base or uploaded files.
- Assign an Agentforce Data Library to an agent via the Agent Builder.
- Explain how Retrieval Augmented Generation (RAG) enhances AI-generated responses.
- Use retrievers to ground prompt templates with relevant Data Cloud information.
📚 Agentforce Data Library Overview
💡 A Data Library acts as a structured knowledge repository for Agentforce agents.
🔍 Purpose | 🧩 Description |
---|---|
Accuracy | Ground AI responses in domain-specific knowledge. |
Personalization | Provide contextually relevant, organization-specific responses. |
Trust | Build user confidence in generative AI output. |
Data libraries can use:
- Salesforce Knowledge base
- Uploaded files (
.txt
,.html
,.pdf
)
⚙️ Core Concepts
🧠 Grounding
- Injects domain-specific knowledge and customer context into LLM prompts.
- Leads to more accurate, relevant, and trustworthy responses.
🧩 Chunking
- Breaks data sources into smaller chunks for efficient search and retrieval.
- Works across text, images, and audio.
🗂️ Indexing
- Organizes chunks for structured search and quick access.
- Enables semantic matching via similarity scoring.
🔎 Search
- When a query runs, chunks are compared by similarity score.
- High-scoring chunks are injected into the LLM prompt for context.
📥 Retrievers
- Embedded resources that search and return relevant information from data libraries.
- Define which datasets in Data Cloud are available to AI agents.
🧰 Data Library Setup & Management
🏗️ Creating a Data Library
Can be done via:
- Setup → Agentforce Data Library page, or
- Agent Builder → Knowledge tab
🧾 Data Sources
Two possible sources:
- Knowledge Base
- Choose Identifying Fields (for locating articles).
- Choose Content Fields (for enriching responses).
- Optionally restrict to public articles or filter by categories.
- Uploaded Files
- Upload
.txt
or.html
(≤ 4 MB) and.pdf
(≤ 100 MB).
- Upload
🧠 Knowledge Settings
- Define which Knowledge Articles to index.
- Filter by data category or availability.
🧩 Data Space
- Defines which Data Cloud data source the library uses.
⚡ Automated Configuration
When created, Salesforce automatically:
- Pushes data to Data Cloud
- Builds search index and retriever
- Links agents to that data
🧑💻 Assignment & Usage
🔧 Feature | 💬 Function |
---|---|
Agent Builder | Select or create a data library under Knowledge tab |
AI Features Supported | Agentforce Agents, Agentforce Service Agent, Einstein Service Replies |
Restriction | Each feature → only one data library at a time |
Agent Action | Answer Questions with Knowledge (uses assigned library for responses) |
⚠️ Requirements
- Must have Data Cloud enabled
- Data Cloud Admin permissions required
🧠 Retrieval Augmented Generation (RAG)
🌐 Concept Overview
RAG improves LLM output by grounding prompts in accurate, current, and contextually relevant data.
It uses retrievers to pull structured/unstructured information from vector databases in Data Cloud.
⚙️ Process Breakdown
🏗️ Offline Preparation
- Connect unstructured data.
- Create Search Index Configuration (chunk + vectorize).
- Store and manage the search index in Data Cloud.
⚡ Online Usage
- Retriever is called inside a Prompt Template.
- Retrieves relevant info from search index.
- Augments original prompt with retrieved context.
- Sends prompt to LLM → generates response.
🔍 Search Index Configuration
Step | Description |
---|---|
1️⃣ Choose Setup Type | Easy Setup, Advanced Setup, or From a Data Kit. |
2️⃣ Select Search Type & DMO | Choose data model object for the index. |
3️⃣ Add Fields & Chunking Strategy | Define how data will be segmented. |
4️⃣ Select Vectorization Strategy | Determine semantic encoding for unstructured data. |
5️⃣ Set Related Fields | Add optional filters for targeted retrieval. |
🧮 Search Types
🧭 Vector Search
- Converts text into numerical embeddings to measure semantic similarity.
- Recognizes meaning beyond keywords.
- Example:
- “How do I reset my password?” ≈ “How can I change my login credentials?”
🔤 Keyword Search
- Focuses on lexical similarity.
- Example:
- “Model X200 Printer” ≈ “Model X210 Printer”
⚗️ Hybrid Search
- Combines both semantic and lexical matching for optimal accuracy.
- Creates both vector index and keyword index within Data Cloud.
🧩 Data Preparation
To use retrievers:
- Load, chunk, vectorize, and store content in Data Cloud.
- Associate search index with a Data Space and Data Model Object (DMO).
- Make it searchable via retrievers.
🧠 Retrievers Overview
Type | Description |
---|---|
Default Retriever | Auto-created when a Search Index is configured; not customizable. |
Custom Retriever | Built in Einstein Studio; supports filters and versions. |
Dynamic Retriever | Placeholder defined at runtime (used in standard templates). |
🔍 Filtering
Custom retrievers can use filters to refine results for specific use cases.
🧩 Versioning
Each edit → new version; only one active version at a time.
⚙️ Retriever in Prompt Builder
🧭 Resource Field
Displays active retrievers available for a prompt (default + custom).
🧩 Retriever Settings (Side Panel)
Setting | Description |
---|---|
Search Text | Dynamic field for semantic queries or merge fields. |
Output Fields | Select which DMO fields to return in the results. |
Number of Results | Limit how many chunks are injected (e.g., 10). |
🧩 Summary Table
Component | Purpose |
---|---|
Data Library | Centralized source of knowledge (Knowledge base or files). |
Grounding | Adds context to LLM for accurate responses. |
Chunking | Splits data for efficient retrieval. |
Indexing | Organizes chunks for fast semantic search. |
Retriever | Fetches relevant data from Data Cloud for grounding. |
Search Index | Stores vectorized data for retrieval. |
RAG Process | Combines retrievers + LLM for contextual, reliable outputs. |
✅ Key Takeaways
- Agentforce Data Libraries and Retrievers together make AI smarter, contextual, and trustworthy.
- Data Libraries provide the ground truth (Knowledge base or files).
- RAG framework ensures AI outputs are grounded in real, organization-specific data.
- Retrievers + Search Indexes enable fast semantic matching.
- Agent Builder links agents directly to their data sources.
- Data Cloud underpins it all — required for setup and operation.
📈 Flow Charts
1) Agentforce Data Library — lifecycle
flowchart LR A[Create Data Library] --> B{Choose Data Source} B -->|Knowledge| C[Select Identifying & Content Fields] B -->|Uploaded Files| D[Upload TXT/HTML/PDF] C --> E[Chunk & Index in Data Cloud] D --> E E --> F[Retriever Created Automatically] F --> G[Assign Library in Agent Builder] G --> H[Agent Action: Answer Questions with Knowledge]
2) RAG — offline vs online usage
flowchart TB subgraph OFFLINE Preparation O1[Connect Unstructured Data] O2[Create Search Index Configuration] O3[Chunk & Vectorize] O4[Store Index in Data Cloud] O1 --> O2 --> O3 --> O4 end subgraph ONLINE Usage N1[Call Retriever in Prompt Template] N2[Retrieve Relevant Chunks] N3[Augment Prompt] N4[LLM Generates Response] N1 --> N2 --> N3 --> N4 end O4 --> N1
3) Search index types & flow
flowchart TB A[Create Search Index] --> B{Search Type} B -->|Vector| C[Embeddings for Semantic Similarity] B -->|Keyword| D[Lexical Matching] B -->|Hybrid| E[Vector + Keyword] C --> F[Index for DMO/UDMO] D --> F E --> F F --> G[Supports Retriever Queries]
4) Retriever in Prompt Builder — configuration to runtime
flowchart LR A[Active Retriever] --> B[Prompt Builder: Resource Field] B --> C[Add to Prompt Template] C --> D{Configure Settings} D -->|Search Text| E[Query or Merge Fields] D -->|Output Fields| F[Select DMO Fields] D -->|Number of Results| G[Set Max Chunks] C --> H[Runtime: Retrieve -> Inject -> Respond]
5) Knowledge vs files — decision mini-flow
flowchart LR K[Choose Data Source] --> J{Use Knowledge?} J --> |Yes| K1[Select Articles, Categories, Public Only] J --> |No| K2[Upload Files TXT HTML PDF] K1 --> L[Index Retriever Assign] K2 --> L[Index Retriever Assign]
📚 Flashcards
What is an Agentforce Data Library?
A structured repository of knowledge that improves accuracy, personalization, and trust in AI responses. It can source data from the Salesforce Knowledge base or uploaded files like text, HTML, and PDFs.
What are the main benefits of using a Data Library?
It enhances AI accuracy, adds personalization, builds trust, and ensures responses are grounded in verified information.
What is grounding in Agentforce?
The process of using data from a Data Library to provide domain-specific and contextual information to an LLM prompt for more accurate and relevant responses.
What is chunking?
The act of breaking data into smaller pieces called chunks to improve search efficiency and relevance.
What is indexing?
Organizing and categorizing data chunks for easier search and retrieval during AI query processing.
What is a retriever?
A component that searches for and returns relevant data from the Data Library to enrich AI responses. Retrievers determine which datasets in Data Cloud are accessible to agents.
Where can a Data Library be created?
In Setup on the Agentforce Data Library page or directly from the Knowledge tab in the Agent Builder.
What are the two possible data sources for a Data Library?
Salesforce Knowledge base or uploaded files (TXT, HTML, PDF).
What fields are configured when using Knowledge as a data source?
Identifying Fields help locate the correct articles, and Content Fields enrich AI responses with relevant details.
What are the size limits for uploaded files?
Up to 4 MB for text or HTML files and 100 MB for PDF files.
How many data libraries can each AI feature use?
Each AI feature, such as an Agentforce Agent or Einstein Service Replies, can use only one data library at a time.
What is required to use Agentforce Data Libraries?
Data Cloud must be enabled, and Data Cloud admin permissions are required for setup.
What happens automatically when a Data Library is created?
Data is pushed to Data Cloud, a search index and retriever are created, and the agent is linked to that data source.
What is Retrieval Augmented Generation (RAG)?
A framework that grounds LLM prompts with relevant information retrieved from Data Cloud, improving accuracy and relevance.
What are the two phases of RAG?
Offline preparation (connect, chunk, vectorize, store) and online usage (retrieve, augment, generate response).
What is a search index in Data Cloud?
A repository of vectorized and chunked data that allows efficient retrieval of semantically relevant information.
What are the three types of search in RAG?
Vector search (semantic), keyword search (lexical), and hybrid search (combines both).
What is the difference between default and custom retrievers?
Default retrievers are created automatically with a search index, while custom retrievers can be created in Einstein Studio and customized with filters and versions.
What are the retriever settings in Prompt Builder?
Search Text (query or merge field), Output Fields (fields returned), and Number of Results (limit of retrieved chunks).
What is a dynamic retriever?
A placeholder retriever specified at runtime, used in standard prompt templates for flexible context retrieval.