OCSF: Standardizing Security Data for the AI Era

While the cybersecurity industry has spent the last year captivated by the rise of AI copilots and autonomous agents, a more fundamental shift is occurring one layer deeper. Security vendors and enterprises are quietly coalescing around a shared language to describe security data, aiming to finish the era of fragmented, proprietary logs that hinder rapid response.

The leading candidate for this role is the Open Cybersecurity Schema Framework (OCSF). By providing a vendor-neutral way to represent security events, findings, and context, OCSF is designed to eliminate the “normalization tax”—the grueling process of rewriting field names and building custom parsers just to get two different security tools to talk to one another.

For security operations centers (SOCs), this isn’t just a technical convenience; it is an operational necessity. In a typical environment, teams must stitch together telemetry from endpoints, identity providers, cloud workloads, and SaaS applications. When these tools use different nesting structures and naming conventions for the same event, correlating a sophisticated attack becomes a manual, time-consuming chore.

Consider a common threat scenario: an employee logs into their laptop from San Francisco at 10 a.m., but two minutes later, a cloud resource is accessed using the same credentials from New York. Detecting this “impossible travel” requires the system to correlate identity logs with cloud access logs. If the identity tool calls the user “UserID” and the cloud tool calls it “user_identity_guid,” an analyst or a SIEM (security information and event management) tool must first translate that data before the alert can even trigger.

The architecture of a shared security language

At its core, OCSF is an open-source framework for cybersecurity schemas. Unlike previous attempts at standardization, it is deliberately agnostic to how data is stored, collected, or moved. Whether a company uses a data lake, a traditional SIEM, or a streaming pipeline, OCSF provides the consistent structure needed for threat detection and investigation.

View this post on Instagram

The framework allows vendors to map their proprietary schemas into a common model. This means data can move through a security pipeline—from the source to a lake and then to an analytics tool—without requiring expensive and fragile translations at every hop.

The project’s trajectory has been unusually rapid. Launched in August 2022 by Amazon Web Services (AWS) and Splunk, the initiative began with 17 companies. It quickly expanded to include industry giants such as CrowdStrike, Palo Alto Networks, IBM, and Okta. By August 2024, the community had grown to more than 200 participating organizations and 800 contributors. In November 2024, the project further solidified its governance by joining the Linux Foundation, bringing its total contributor count to 900.

The OCSF community has kept up a steady cadence of releases over the last two years

Impact of OCSF on Security Data Workflows
Workflow Stage	Traditional Approach (Proprietary)	OCSF Approach (Standardized)
Data Ingestion	Custom parsers for every vendor tool	Native OCSF mapping by vendors
Event Correlation	Manual field mapping across datasets	Direct correlation via shared schema
Tool Migration	Rewriting all detection queries	Portable queries across OCSF-ready tools
Analyst Effort	High time spent on data cleaning	More time spent on actual investigation

From abstract standard to operational plumbing

OCSF has moved past the phase of being a theoretical proposal; it is now appearing in the actual “plumbing” of major security products. AWS has integrated the framework across several services: AWS Security Lake converts natively supported logs into OCSF and stores them in Parquet format, while AWS AppFabric and Security Hub also leverage the schema for normalized audit data and findings.

Other industry leaders have followed suit to reduce friction for their customers. Splunk utilizes edge and ingest processors to translate incoming data into OCSF, and Cribl supports the seamless conversion of streaming data. CrowdStrike has positioned itself on both ends of the pipeline, translating Falcon data into OCSF for export while allowing its Next-Gen SIEM to ingest and parse OCSF-formatted data from other sources.

Similarly, Palo Alto Networks can now forward Strata Logging Service data directly into Amazon Security Lake using the OCSF standard. This interoperability allows enterprises to swap or add tools without the catastrophic data-mapping projects that typically accompany such changes.

The urgency of AI telemetry

The push for a shared data language has gained fresh urgency with the deployment of agentic AI. When a company deploys a large language model (LLM) integrated with model gateways, vector stores, and tool-calling runtimes, it creates a new, complex form of telemetry that often spans multiple product boundaries.

In this environment, the critical question for a security team is no longer just “what did the AI say?” but “what did the AI actually do?” If an AI assistant calls a tool it shouldn’t, retrieves sensitive files from a vector database, or chains together a risky sequence of actions, the resulting security event must be understood across the entire distributed system.

OCSF has evolved specifically to address these AI-driven risks. Versions 1.5.0, 1.6.0, and 1.7.0 introduced updates that help security teams trace an assistant’s tool calls step-by-step and flag unusual behavior. Instead of only seeing the final output of an LLM, investigators can now piece together the full chain of actions that led to a potential breach.

Looking toward OCSF 1.8.0

The roadmap for OCSF continues to lean heavily into AI observability. Development for version 1.8.0 aims to provide even deeper visibility into AI interactions. Future updates may allow security teams to see which specific model handled an exchange, which provider supplied it, and how token counts shifted during a conversation.

This level of detail is vital for detecting “prompt injection” or data exfiltration. For instance, a sudden spike in completion tokens could signal that a bot was fed an unusually large hidden prompt or pulled too much background data from a vector database, increasing the risk of leaking sensitive internal guidance.

As AI expands the attack surface through new paths of abuse and automated scams, the ability to connect data across systems without losing context is becoming the primary defense for the modern SOC. The transition of OCSF from a community project to an industry standard suggests that the market is finally prioritizing the data layer over the tool layer.

The next major milestone for the framework will be the formal release and adoption of the 1.8.0 specifications, which will further define how AI-specific telemetry is standardized across the ecosystem.

Do you think a shared data language will finally end vendor lock-in for security tools? Share your thoughts in the comments.

OCSF: Standardizing Security Data for the AI Era

The architecture of a shared security language

From abstract standard to operational plumbing

The urgency of AI telemetry

Looking toward OCSF 1.8.0

Related

The Strategist Behind RFK Jr.’s Make America Healthy Again Movement

How Online Aviation Classes Prepare Students for Aviation Careers

You may also like

Leave a Comment Cancel Reply