Data Cloud in 2026: Identity Resolution, Streaming Transforms, and Semantic Models

Data Cloud is no longer a product that evolves on a Salesforce release cycle. Since mid-2024, Salesforce has shipped Data Cloud capabilities on a monthly cadence — sometimes faster. For architects managing Data Cloud implementations, this has created a challenge: the product your team configured six months ago may look materially different from what is available today, and keeping up without a structured review process means missing capabilities that can directly reduce build effort or improve data quality.

This post consolidates the most architecturally significant capabilities that have shipped across the 2025–2026 monthly releases. We focus on what each capability changes architecturally, not just what it is.

Bring Your Own Link for Identity Resolution Matching

Bring Your Own Link (BYOL) for Identity Resolution Matching is the capability with the highest architectural impact for organisations that maintain a master identity graph outside Salesforce — whether in Snowflake, Databricks, or a custom MDM platform. The Salesforce Help article confirms this feature as "Bring Your Own Link for Identity Resolution Matching," not "Bring Your Own Lake" — an important distinction since the feature is about contributing pre-resolved identity links, not about data lake federation.

Previously, Data Cloud's identity resolution ran exclusively inside the Salesforce trust boundary, using rulesets you configure in the Identity Resolution setup. For organisations with an existing enterprise MDM, this created an awkward choice: maintain two identity graphs (one in the MDM, one in Data Cloud) and risk divergence, or abandon the MDM investment and rebuild resolution logic in Data Cloud.

Bring Your Own Link resolves this by allowing you to ingest pre-identified matches — source-system records that have already been linked in your external data — and use those external links as matching criteria in identity resolution, rather than relying solely on Data Cloud's own attribute-matching rules. The external links ensure that records stay linked even when attribute data changes, and they enable matching of profiles that wouldn't normally match based on regular rules — for example, linking an anonymous browser session to an authenticated user profile.

The architectural implication is significant: for organisations with a mature identity governance program, this feature allows Data Cloud to function as the activation layer while your MDM retains authority over identity matching decisions. It eliminates the reconciliation overhead that previously made Data Cloud identity resolution difficult to operate alongside an existing MDM.

Implementation note: your source data must include an ID that identifies the matched records, contributed at ingest time. The external link approach improves profile matching in complex situations by ensuring that even as attribute data evolves, the link between records is preserved. Consult the Salesforce Help article on "Unify Individuals or Accounts with an External Link" for configuration details.

Streaming Transforms with DMOs as Source and Target

Transforms in Data Cloud allow you to apply SQL-based logic to incoming data before it reaches a Data Model Object (DMO). The original implementation required transforms to target a custom DMO — standard DMOs were not writable via transforms. This restriction has been lifted: transforms can now write directly to standard DMOs, and they can use other DMOs as their source. This capability — confirmed in the Salesforce release notes as "Use DMOs as Source and Target Object in Streaming Transforms" — enables a much cleaner architecture for real-time data preparation.

The streaming variant applies the transform logic in near-real-time as records arrive, rather than in a scheduled batch window. The practical architecture here:

Ingest raw event data (e.g., web behaviour events, IoT readings, transaction records) into a staging DMO with minimal schema constraints.
Apply a streaming transform to cleanse, normalise, and map the staging data to the appropriate standard DMO (e.g., UnifiedIndividual, SalesOrder, EngagementEvent).
Downstream segments and calculated insights operate on the clean standard DMO data, not the raw staging layer.

This pattern replaces a common workaround that used Apex triggers or Flow to pre-process data before ingestion. The transform approach is lower-latency, more maintainable, and does not consume Salesforce core compute resources. For high-volume event streams, this is a significant architectural improvement.

30-Minute Data Graph Refresh

The Data Graph — Data Cloud's mechanism for packaging a subset of unified profile data for consumption by external systems, including Agentforce agents and Einstein personalisation — previously refreshed on a multi-hour schedule. Salesforce release notes confirm a minimum refresh interval of 30 minutes for Data Graphs configured with the accelerated refresh option.

This matters most for real-time personalisation use cases where segment membership or profile attributes change frequently — for example, a customer's loyalty tier after a purchase, or a student's academic standing after grade submission. A 30-minute refresh still introduces latency relative to a streaming update, but it is adequate for most personalisation scenarios driven by daily or transactional events rather than sub-minute signals.

When configuring accelerated refresh, confirm that your Data Graph does not exceed the row count limit for the accelerated tier. Graphs exceeding this limit fall back to the standard refresh schedule regardless of configuration. Check the current Salesforce Help documentation for the applicable row count threshold, as this may change with platform updates.

Tableau Semantics Integration

Data Cloud now integrates with Tableau Semantics — Salesforce's AI-infused semantic layer integrated into Data Cloud that translates data into business language. Calculated insights, DMO field labels, and data graph definitions created in Data Cloud can be published to Tableau as a semantic model, making curated Data Cloud metrics available to Tableau analysts without requiring them to understand the underlying DMO schema.

The integration supports consistency across analytics surfaces: Tableau Semantics is designed to provide businesses with consistent, reliable, and trusted data across Agentforce, Tableau Next, Tableau Cloud, and Salesforce consumption layers. For organisations that have invested in Tableau as their analytics layer, this closes a significant gap — analytics teams can define and certify metrics in Tableau, and those same metrics can power Data Cloud segments and activation, creating a single metric definition that serves both analytics and marketing automation.

Calculated Insights Metrics Tracking

Calculated Insights now include a metrics tracking capability that logs the output value of each insight computation over time, creating a historical time series. This is separate from the underlying profile data — it is a record of what each insight produced at each computation run.

For architects, this opens up trend-based segmentation that was previously complex to implement. Rather than building a snapshot comparison flow that stores and compares computed values across runs, you can access the metrics tracking history directly through Data Cloud's query interface to drive segments based on how a metric has changed over time.

This powers segments like "customers whose lifetime value has increased significantly in the last 90 days" — a capability that previously required custom DMO storage and bespoke transform logic. It is now a native segmentation capability within Calculated Insights. Consult the Salesforce Data Cloud documentation for the current query syntax and table names available for metrics tracking queries, as these are subject to change with platform updates.

Communication Capping for B2C

Communication capping, confirmed in Salesforce release notes, allows you to set frequency limits on outbound marketing communications at the individual level, across channels and campaigns. A capping rule might read: "no individual should receive more than 3 promotional messages per week across email and push, regardless of how many segments they qualify for."

The capping logic runs at activation time — after segment evaluation, before message dispatch. The architecture consideration is processing order: if a contact qualifies for multiple concurrent activations, capping rules determine which activations execute and which are suppressed for that contact. The suppression is logged against the individual's communication history, which feeds back into future capping calculations.

For organisations running parallel email, push, and SMS programs from Data Cloud, communication capping reduces the risk of contact fatigue and improves deliverability scores — both direct effects that justify the configuration investment. Enable capping rules before your next high-volume campaign run, not after.

Vector and Hybrid Search Expansion

Data Cloud's vector search capability — which allows semantic similarity queries against embedded content — has expanded to support hybrid search: combining dense vector similarity with sparse keyword matching in a single query. This significantly improves retrieval quality for use cases where the query may be partially precise (matching specific product codes or identifiers) and partially semantic (matching conceptual intent). The Salesforce release notes confirm the "Expand Vector and Hybrid Search" capability in Data Cloud.

Hybrid search is particularly valuable for product recommendation and content retrieval scenarios where a user's intent can be expressed either as a specific term or as a conceptual description. When configuring hybrid search in Data Cloud, consult the official Salesforce Data Cloud documentation for the current supported query syntax — Salesforce's implementation uses its own SQL dialect and the exact function signatures are subject to change as the capability matures.

The general principle: hybrid search blends two retrieval signals — semantic vector similarity and keyword relevance — weighted according to your use case. For lookup-style queries where exact matching matters most, weight toward keyword matching. For open-ended semantic queries, weight toward vector similarity. The right balance depends on your data and your users' query patterns.

Real-Time Data Actions via Automation Flows

Data Cloud activations have historically been push-based — you define a segment, configure an activation, and Data Cloud pushes member updates to a channel or system on a schedule or in near-real-time. A more recent addition enables pull-based automation: a Salesforce Automation Flow can directly query Data Cloud unified profile data and calculated insights at execution time, rather than waiting for a scheduled activation to write data to core Salesforce objects.

This pattern is particularly valuable for service scenarios: when a Service Cloud case is opened, a Flow can query Data Cloud to retrieve the customer's calculated lifetime value, segment memberships, and recent engagement history — surfacing them in the case record in real time, without requiring a scheduled data sync or a pre-populated lookup field. This capability makes Data Cloud profile enrichment available in operational Salesforce workflows, not just in marketing activation contexts.

New Connectors: Google Analytics 4 and Amazon Kafka

Two significant connectors have shipped, both confirmed in Salesforce release notes. The Google Analytics 4 connector (Beta) brings web and app behaviour event data directly into Data Cloud via the GA4 Data API, replacing the previous pattern of exporting GA4 data to BigQuery and then ingesting from BigQuery. If your organisation uses GA4 as the primary digital analytics platform, evaluate whether you can retire the BigQuery intermediate layer — this reduces pipeline complexity and latency.

The Amazon Kafka connector (Beta) enables high-throughput event streaming directly into Data Cloud from MSK or self-managed Kafka clusters. This is the right ingestion path for organisations with event-driven microservices architectures where the canonical event bus is Kafka. Configure the connector with a dedicated consumer group per Data Cloud ingestion stream to avoid interfering with other downstream Kafka consumers.

Document AI for Unstructured Data

Document AI integration in Data Cloud (Beta), confirmed in Salesforce release notes, allows unstructured documents — PDFs, images, scanned forms — to be processed through Einstein OCR and classification models, with the extracted structured fields written directly to DMOs. For industries that handle significant volumes of semi-structured input (financial services, education, public sector), this eliminates a custom ML pipeline that previously sat outside the Salesforce trust boundary.

The extracted fields are typed and validated before DMO insertion, which means you define a schema for the expected output and the Document AI extraction is constrained to produce values of the correct type. Extraction confidence scores are stored alongside the values — build a human review queue for records below your confidence threshold rather than accepting all extractions unconditionally.

Is your Data Cloud implementation keeping pace with the product?

DISquare's free health check includes a Data Cloud maturity assessment — covering your identity resolution model, segmentation architecture, calculated insights design, and connector inventory. We'll show you which recent capabilities you can activate immediately and which require architectural changes.

Book your free health check →