Security | CIO
What is data analytics? Transforming data into better decisions
What is data analytics? Data analytics focuses on gleaning insights from data. It comprises the processes, tools, and techniques of data analysis and management, and its chief aim is to apply statistical analysis and technologies on data to find trends and solve problems. Data analytics has become increasingly important in the enterprise to shape business processes and improve decision-making and business results. Data analytics draws from a range of disciplines, incl
5 de Maio de 2026, 07:00

What is data analytics? Transforming data into better decisions

5 de Maio de 2026, 07:00

What is data analytics?

Data analytics focuses on gleaning insights from data. It comprises the processes, tools, and techniques of data analysis and management, and its chief aim is to apply statistical analysis and technologies on data to find trends and solve problems. Data analytics has become increasingly important in the enterprise to shape business processes and improve decision-making and business results.

Data analytics draws from a range of disciplines, including computer programming, mathematics, and statistics, to perform analysis on data in an effort to describe, predict, and improve performance. So to ensure robust analysis, data analytics teams leverage a range of data management techniques, including data mining, data cleansing, data transformation, data modeling, and more.

What is AI data analytics?

AI data analytics is a rapidly growing specialty within data analytics that applies AI to support, automate, and simplify data analysis. It leverages ML, natural language processing, and data mining, along with foundational models and chat assistance for predictive analytics, sentiment analysis, and AI-enhanced business intelligence. AI tools can be used for data collection and data preparation, while ML models can be trained to extract insights and patterns.

The four types of data analytics

Analytics breaks down broadly into four types: descriptive analytics attempts to describe what has transpired at a particular time; diagnostic analytics assesses why something has happened; predictive analytics ascertains the likelihood of something happening in the future; and prescriptive analytics provides recommended actions to take to achieve a desired outcome.

To explore these more specifically, descriptive analytics uses historical and current data from multiple sources to describe the present state, or a specified historical state, by identifying trends and patterns. Business analytics is the purview of business intelligence (BI). Diagnostic analytics uses data, often generated via descriptive analytics, to discover the factors or reasons for past performance. Predictive analytics applies techniques such as statistical modeling, forecasting, and ML to the output of descriptive and diagnostic analytics to make predictions about future outcomes. Predictive analytics is often considered a type of advanced analytics, and frequently depends on ML and/or deep learning. And prescriptive analytics is another type of advanced analytics that involves the application of testing and other techniques to recommend specific solutions that will deliver outcomes. In business, predictive analytics uses ML, business rules, and algorithms.

Data analytics methods and techniques

Data analysts use a number of methods and techniques to analyze data. According to Emily Stevens, managing editor at CareerFoundry, seven of the most popular include:

Regression analysis: A set of statistical processes used to estimate the relationships between variables to determine how changes to one or more might affect another, like how social media spending might affect sales.
Monte Carlo simulation: A mathematical technique, frequently used for risk analysis, that relies on repeated random sampling to determine the probability of various outcomes of an event that can’t otherwise be readily predicted due to degrees of uncertainty in its inputs.
Factor analysis: A statistical method for taking a massive data set and reducing it to a smaller, more manageable one to uncover hidden patterns, like when analyzing customer loyalty.
Cohort analysis: A form of analysis in which a dataset is broken into groups that share common characteristics, or cohorts, for analysis like understanding customer segments.
Cluster analysis: A statistical method in which items are classified and organized into clusters in an effort to reveal structures in data. Insurance firms might use cluster analysis to investigate why certain locations are associated with particular insurance claims, for instance.
Time series analysis: A statistical technique in which data in set time periods or intervals is analyzed to identify trends over time, such as weekly sales numbers or quarterly sales forecasting.
Sentiment analysis: A technique that uses natural language processing, text analysis, computational linguistics, and other tools to understand sentiments expressed in data, such as how customers feel about a brand or product based on responses in customer forums. While the previous six methods seek to analyze quantitative or measurable data, sentiment analysis seeks to interpret and classify qualitative data by organizing it all into themes.

Data analytics tools

Data analysts use a range of tools to aid them surface insights from data. Some of the most popular include:

Apache Spark: An open source data science platform to process big data and create cluster computing engines.
AskEnola AI: A conversational analytics tool for business users.
Data analysis with ChatGPT: OpenAI’s chatbot can generate code to perform data analysis, transformation, and visualization tasks using Python.
dbt: An open source analytics engineering tool for data analysts and engineers.
Domo Analytics: A BI SaaS platform to gather and transform data.
Excel: Microsoft’s spreadsheet software for mathematical analysis and tabular reporting.
Julius AI: An AI assistant to analyze spreadsheets and databases.
Knime: A free and open source data cleaning and analysis tool for data mining.
Looker: Google’s data analytics and BI platform.
MySQL: An open source relational database management system to store application data used in data mining.
Observable: A data analysis platform with AI tools for exploratory data analysis and data visualization.
Orange: A data mining tool ideal for smaller projects.
Power BI: Microsoft’s data visualization and analysis tool to create and distribute reports and dashboards.
Python: An open source programming language popular among data scientists to extract, summarize, and visualize data.
Qlik: A suite of tools to explore data and create data visualizations.
R: An open source data analytics tool for statistical analysis and graphical modeling.
RapidMiner: A data science platform that includes a visual workflow designer.
SAS: An analytics platform for business intelligence and data mining.
Sisense: A popular self-service BI platform.
Tableau: Data analysis software from Salesforce to create data dashboards and visualizations.

Data analytics vs. data science

Data analytics is a component of data science used to understand what an organization’s data looks like. Generally, the output of data analytics are reports and visualizations. Data science takes the output of analytics to study and solve problems. The difference between data analytics and data science is often about timescale. Data analytics describes the current or historical state of reality, whereas data science uses that data to predict and/or understand the future.

Data analytics vs. data analysis

While the terms data analytics and data analysis are frequently used interchangeably, data analysis is a subset of data analytics concerned with examining, cleansing, transforming, and modeling data to derive conclusions. Data analytics includes the tools and techniques used to perform data analysis.

Data analytics vs. business analytics

Business analytics is another subset of data analytics. It uses data analytics techniques, including data mining, statistical analysis, and predictive modeling, to drive better business decisions. Gartner defines business analytics as solutions used to build analysis models and simulations to create scenarios, understand realities, and predict future states.

Data analytics examples

Organizations across all industries leverage data analytics to improve operations, increase revenue, and facilitate digital transformations. Here are three examples:

UPS transforms air cargo operations with data, AI: UPS’s Gateway Technology Automation Platform (GTAP) uses AI and digital asset tracking to reduce costs, improve on-time performance, and enhance operational safety at its Worldport air hub.

NFL leverages AI and predictive analytics to reduce injuries: The NFL’s Digital Athlete platform leverages AI and ML to run millions of simulations of in-game scenarios, using video and player tracking data to identify the highest risk of injury during plays, and develop individualized injury prevention courses.

Fresenius Medical Care anticipates complications with predictive analytics: Fresenius Medical Care, which specializes in providing kidney dialysis services, is pioneering the use of a combination of near real-time IoT data and clinical data to predict when kidney dialysis patients might suffer a potentially life-threatening complication called intradialytic hypotension (IDH).

Data analytics salaries

According to data from PayScale, the average annual salary for a data analyst is $70,384, with a reported range from $51,000 to $95,000. Salary data on similar positions include:

JOB TITLE	SALARY RANGE	AVERAGE SALARY
Analytics manager	$79,000 to $140,000	$110,581
Business analyst, IT	$58,000 to $114,000	$80,610
Data scientist	$73,000 to $145,000	$103,441
Quantitative analyst	$74,000 to $161,000	$109,421
Senior business analyst	$72,000 to $127,000	$95,484
Statistician	$61,000 to $139,000	$97,082

PayScale also identifies cities where data analysts earn salaries that are higher than the national average. These include San Francisco (24.2%), Seattle (10.2%), and New York (9.5%).

Security | CIO
SAS, 에이전트 전략 핵심에 ‘AI 거버넌스’ 전면 배치
기업들은 AI 실험 단계를 빠르게 넘어 실제 운영으로 전환하고 있다. 그러나 에이전틱 AI가 더 많은 의사결정을 수행하고 다양한 도구를 호출하며 분산된 데이터 환경 전반에서 작동하기 시작하면, 가시성과 거버넌스, 신뢰가 약화될 수 있다. SAS는 연례 행사 ‘SAS 이노베이트(SAS Innovate)’에서 이러한 문제에 대한 해법을 제시했다. 코파일럿, 에이전트 프레임워크, MCP(Model Context Protocol) 플러그인, 관리 도구 등 새로운 제품군을 공개하며, 기업이 통제력을 유지한 채 AI를 운영할 수 있도록 지원하겠다고 밝혔다. SAS 글로벌 AI 및 생성형 AI 시장 전략 총괄 마리넬라 프로피는 해당 행사에서 “지금은 형태를 만드는 AI에서 실제로 행동하는 AI로 전환되는 시점”이라며 “이는 신뢰, 거버넌스, 책임성 측면에서 새로운 요구사항을 만들어내는 중요한 도약”이라고 말했다. 에이전트와 보다 직관적으로
30 de Abril de 2026, 04:37

SAS, 에이전트 전략 핵심에 ‘AI 거버넌스’ 전면 배치

Security | CIO

30 de Abril de 2026, 04:37

기업들은 AI 실험 단계를 빠르게 넘어 실제 운영으로 전환하고 있다. 그러나 에이전틱 AI가 더 많은 의사결정을 수행하고 다양한 도구를 호출하며 분산된 데이터 환경 전반에서 작동하기 시작하면, 가시성과 거버넌스, 신뢰가 약화될 수 있다.

SAS는 연례 행사 ‘SAS 이노베이트(SAS Innovate)’에서 이러한 문제에 대한 해법을 제시했다. 코파일럿, 에이전트 프레임워크, MCP(Model Context Protocol) 플러그인, 관리 도구 등 새로운 제품군을 공개하며, 기업이 통제력을 유지한 채 AI를 운영할 수 있도록 지원하겠다고 밝혔다.

SAS 글로벌 AI 및 생성형 AI 시장 전략 총괄 마리넬라 프로피는 해당 행사에서 “지금은 형태를 만드는 AI에서 실제로 행동하는 AI로 전환되는 시점”이라며 “이는 신뢰, 거버넌스, 책임성 측면에서 새로운 요구사항을 만들어내는 중요한 도약”이라고 말했다.

에이전트와 보다 직관적으로 상호작용

SAS는 먼저 바이야(Viya) 플랫폼에 내장된 대화형 AI 어시스턴트 ‘SAS 바이야 코파일럿’을 공개했다. 이 도구는 인간의 통제를 기반으로 작동하며, 마이크로소프트 파운드리(Microsoft Foundry)와 통합돼 분석 워크플로우 내에서 자연어를 활용해 데이터 분석, 모델 구축, 의사결정을 수행할 수 있도록 지원한다.

프로피는 “전문가 수준의 어시스턴트를 통해 사용자는 작업을 실행하고 질문을 던지며 전체 분석 라이프사이클을 보다 쉽게 탐색할 수 있다”고 설명했다.

바이야 코파일럿은 핵심 애플리케이션 전반에 걸친 질의응답, 설명 가능하고 문서화된 AI 코드 생성, 모델 파이프라인 가이드(추천 및 다음 단계 제시), 대화형 대시보드, AI 기반 검색과 알림 내러티브를 활용한 시각적 분석 기능 등을 제공한다. SAS에 따르면 향후 데이터 관리, 모델 관리, AI 인프라 영역까지 기능이 확장될 예정이다.

초기에는 두 가지 코파일럿이 먼저 제공된다. 자산·부채 관리(ALM)는 시나리오 개발과 금융 리스크 워크플로우 실행 및 해석, 자연어 입력을 분석 모델로 변환하는 기능을 지원한다. 헬스케어 임상 데이터 디스커버리는 데이터 분석, 코호트 생성, 연구 논문 및 의료 문서 탐색 기능을 제공한다.

SAS는 올해 말까지 은행, 제조 등 다양한 산업으로 바이야 코파일럿 적용 범위를 확대할 계획이다.

임베디드 AI 어시스턴트를 넘어, SAS는 내부 및 외부 에이전트를 연결하고 통제할 수 있는 도구와 인프라도 함께 제공한다. 새롭게 공개된 SAS 바이야 MCP 서버는 연결 방식을 표준화해, 클로드(Claude), GPT, 제미나이(Gemini) 등 다양한 대형언어모델(LLM)이나 인터페이스를 활용하는 외부 에이전트가 별도의 맞춤형 통합 없이도 SAS의 도구, 데이터, 모델에 안전하게 접근할 수 있도록 지원한다.

프로피는 “코파일럿은 단순히 질문에 답하는 것을 넘어, 바이야 전반의 기능을 보다 구조화된 방식으로 호출할 수 있다”고 말했다.

또한 SAS는 ‘에이전틱 AI 액셀러레이터’를 통해 코드, 인터페이스, 구성 요소, 모범 사례를 제공한다. 이를 통해 개발자뿐 아니라 로우코드·노코드 사용자까지 다양한 수준의 팀이 SAS 바이야 환경에서 에이전트를 설계하고 구축, 배포, 관리할 수 있다.

현재 바이야 사용자는 MCP 서버와 AI 액셀러레이터를 모두 깃허브를 통해 이용할 수 있다.

인간 판단 유지 위한 거버넌스 강화

SAS는 감독 체계와 신뢰 가능한 AI, 인간 개입 기반 통제의 중요성을 지속적으로 강조하고 있다. 이러한 전략의 일환으로 SaaS 기반 도구 ‘SAS AI 내비게이터’를 새롭게 공개했다.

이 도구는 기업이 보유한 AI 모델을 체계적으로 파악하고, 거버넌스를 적용하며, 정책을 일관되게 관리할 수 있도록 지원한다. 2026년 3분기 마이크로소프트 애저 마켓플레이스(Microsoft Azure Marketplace)를 통해 제공될 예정이며, 자체 개발 모델과 외부 모델을 포함해 기업 내 모든 AI 모델과 도구를 한눈에 파악할 수 있는 엔드투엔드 가시성을 제공한다. 이를 통해 기업은 내부 정책은 물론 외부 규제와 프레임워크까지 AI 활용 전반에 적용할 수 있다.

SAS 데이터 거버넌스 및 윤리 부문 부사장 레지 타운젠드는 “이 도구는 AI 자산에 대한 가시성을 제공할 뿐 아니라 ‘우리가 얼마나 잘하고 있는가’라는 근본적인 질문에도 답을 제시한다”고 말했다.

타운젠드는 기업이 평판, 효율성, 비용 등 다양한 요소를 동시에 고려하는 만큼, 이를 한눈에 파악할 수 있는 충분한 데이터가 필요하다고 강조했다. 이어 “신뢰는 이제 새로운 비즈니스 차별화 요소이자 일종의 통화로 인식되고 있다”고 덧붙였다.

그는 “책임 있는 AI를 자연스럽게 받아들이도록 만드는 것이 핵심”이라며, 기술 발전 속도와 조직의 적응 능력 간 격차를 의미하는 ‘기술 비대칭(tech asymmetry)’ 상황에서도 인간의 판단을 유지하기 위해 AI 거버넌스가 중요한 역할을 한다고 설명했다.

또한 “기술 역량은 충분하지만, 기업은 이를 대규모로 빠르게 적용하는 데 어려움을 겪고 있다”며 “이러한 역량을 지속 가능한 비즈니스 경쟁력으로 전환하는 것이 필요하다”고 강조했다.

AI 기능과 서비스가 빠르게 확장되는 상황에서, 타운젠드는 사용자들에게 충분한 이해도를 갖추고 호기심을 기반으로 AI를 접근하며, 기술이 비즈니스와 개인의 삶에 어떻게 적용될 수 있을지 비판적으로 사고할 것을 주문했다.

그는 “이처럼 빠르게 변화하는 환경에서는 확신을 잠시 내려놓을 필요가 있다”며 “확신은 경직성을 낳고, 이는 지금 필요한 섬세한 판단을 가로막을 수 있다”고 말했다. 이어 “AI의 다음 단계는 이러한 판단을 확장하고, 빠른 속도로 거버넌스를 적용하며, 신뢰를 경쟁력으로 전환하는 데 있다”고 강조했다.

신뢰 가능한 데이터 확보가 AI 성패 좌우

SAS는 기업 데이터 환경의 복잡성과 신뢰 문제도 주요 과제로 지목했다. SAS 산업 시장 총괄 앨리사 패럴은 온프레미스, 레거시 인프라, 프라이빗·퍼블릭 클라우드 등 다양한 환경에 데이터가 분산돼 있으며, 데이터 자체에 대한 신뢰 부족이 의사결정 신뢰 저하로 이어지고 있다고 설명했다. 또한 성능 제약 역시 AI 도입을 저해하는 요인으로 작용하고 있다고 지적했다.

이 문제를 해결하기 위해 SAS는 바이야 기반 클라우드 네이티브 데이터 관리 포트폴리오 ‘SAS 데이터 매니지먼트’를 개편했다. AI 활용을 고려한 데이터 관리, 설계 단계부터 적용되는 거버넌스, 에이전틱 AI 및 코파일럿 기능, 클라우드 기반 분석 가속 기능 등을 새롭게 추가하거나 강화했다.

패럴은 “이 플랫폼은 데이터가 접근되고 준비되며 활용되는 전체 워크플로우 내에서 데이터 계보, 투명성, 통제 기능을 제공한다”고 설명했다.

그는 “에이전트와 AI는 그 어느 때보다 더 많은 데이터를 필요로 한다”며 “특히 의사결정 자동화를 도입하는 경우, 초기 단계부터 데이터 관리 체계를 제대로 구축하는 것이 매우 중요하다”고 강조했다.

재설계된 플랫폼은 신뢰 가능한 데이터를 기반으로 AI를 구동하며, 원시 데이터를 AI 활용이 가능한 형태로 전환한다. 특히 클라우드 네이티브 분석 데이터 플랫폼 ‘스피디스토어(SpeedyStore)’를 통해 데이터를 이동하지 않고도 분석과 AI 처리를 수행할 수 있도록 했다. 이를 통해 대규모 데이터 이동 없이도 효율적인 처리가 가능해진다.

기업은 여전히 데이터 주권을 유지하면서 다양한 데이터 저장소 전반에 걸쳐 워크플로우를 통제할 수 있다.

패럴은 “고객이 현재 상황에 대응하는 데 필요한 모든 요소를 제공하고, 데이터를 활용하고 관리하며 가치를 창출할 수 있는 도구를 지원하고 있다”며 “이를 통해 기업은 자신 있게 AI를 대규모로 운영할 수 있다”고 말했다.
dl-ciokorea@foundryco.com

Security | CIO
SAS makes AI governance the centerpiece of its agent strategy
Enterprises are quickly moving from AI experimentation to deployment, however, when agentic AI begins making more decisions, invoking more tools, and operating across fragmented data environments, there can be an erosion of visibility, governance, and trust. SAS laid out its answer to that problem at its annual conference, SAS Innovate, introducing a new family of copilots, agent frameworks, Model Context Protocol (MCP) plugins, and management tools to help enterprises
28 de Abril de 2026, 22:38

SAS makes AI governance the centerpiece of its agent strategy

Security | CIO

28 de Abril de 2026, 22:38

Enterprises are quickly moving from AI experimentation to deployment, however, when agentic AI begins making more decisions, invoking more tools, and operating across fragmented data environments, there can be an erosion of visibility, governance, and trust.

SAS laid out its answer to that problem at its annual conference, SAS Innovate, introducing a new family of copilots, agent frameworks, Model Context Protocol (MCP) plugins, and management tools to help enterprises operationalize AI without losing control of it.

“What we’re seeing here is really a shift from AI that forms to AI that acts,” Marinela Profi, the company’s global AI and generative AI market strategy lead, said at the event. “This is a significant leap, because it introduces new requirements around trust, around governance, around accountability.”

Interacting with agents more intuitively

To begin with, SAS today announced SAS Viya Copilot, a human-governed, conversational AI assistant embedded into its Viya platform. It integrates Microsoft Foundry, operating within analytics workflows to help developers, data scientists, and other users instructing it in natural language to analyze data, build models, and make decisions across workflows.

“You have an expert assistant that allows you to take actions, ask questions, and help you navigate across the full analytical lifecycle,” Profi explained.

Its capabilities include: General Q&A across core Viya applications; production of documented and explainable AI-generated code; model pipeline guidance including recommendations and next steps; conversational dashboarding; and visual investigation with AI-assisted search and alert narratives. Copilot capabilities will eventually extend to data management, model management, and AI infrastructure, according to SAS.

The company is initially launching two Copilots: Asset and Liability Management (ALM), for developing scenarios, executing and interpreting financial risk workflows, and translating natural language inputs into analytic models; and Health Clinical Data Discovery, for analyzing data, creating cohorts, and investigating research papers and other medical documents.

SAS plans to expand Viya Copilot into additional industries, including banking and manufacturing, later this year.

Going beyond embedded AI assistants, SAS is providing tools and infrastructure to connect and govern internal and external agents. The new SAS Viya MCP server standardizes connections so external agents can safely access SAS tools, data, and models, using the large language model (LLM) or interface of their choice (Claude, GPT, Gemini), without having to create custom integrations, duplicate logic, or bypass controls.

“The Copilot is not only answering questions for you, it can invoke capabilities across Viya in a more structured way,” Profi said.

In addition, a new Agentic AI Accelerator provides a collection of code, interfaces, components, and best practices that allow teams across skill levels (developers, low-code or no-code users) to design, build, deploy, and manage agents within SAS Viya, she explained.

Current Viya users can access both the MCP server and AI Accelerator via GitHub.

Maintaining human judgment

SAS continues to emphasize the importance of oversight, trustworthy AI, and human-in-the-loop control.

Furthering this mission, the company is introducing SAS AI Navigator. The Software-as-a-Service (SaaS) tool helps enterprises inventory, govern, and apply policies to underlying AI models.

Available in Q3 2026 on Microsoft Azure Marketplace, the platform will offer an end-to-end view of all AI models and tools in use in an enterprise, whether built in-house or provided by third parties. Using it, enterprises will be able to apply internal policies and external regulations and frameworks to AI use cases.

“It’s giving visibility into your AI inventory,” Reggie Townsend, VP of SAS’ data governance and ethics practice, said at today’s event. “But it also answers the really basic question: How are we doing?”

Enterprises want “enough data at a glance” to consider tension points when they’re juggling factors like reputation, efficiency, and cost, he pointed out. They’re also viewing trust as a new business differentiator, even as a currency.

Navigator started with a really simple idea, he noted: “What happens if we can make being responsible irresistible?” AI governance is one way to preserve human judgment amidst what he called “tech asymmetry.”

Technology unevenness has been a long-standing problem; While there’s strong technical capability, enterprises struggle to adapt to the pace of change at scale. “What folks need to do is try to translate some of these capabilities into a sustainable business advantage,” said Townsend.

As AI capabilities (and offerings) continue to expand, he urged users to gain “sufficient literacy,” approach AI with curiosity, and think critically about how evolving tools can apply to both business and personal life.

“In an emerging landscape like this, we’ve got to suspend certainty,” he said. “Certainty breeds rigidity, and rigidity suspends this idea of nuanced judgment, which we need right now.”

The next chapter of AI is about scaling that judgment, governing at speed, and turning trust into that competitive advantage, he emphasized.

Getting to the right enterprise data

Enterprise data can be fragmented across many different ecosystems (on-prem, in legacy infrastructure, or in private or public clouds), noted SAS industry market lead Alyssa Farrell. Beyond that, she said, “[enterprises] have low trust in the data itself, which is leading to low trust in decisions.” Further, performance constraints can hamper AI progress.

To address these issues, SAS today announced a targeted refresh of SAS Data Management, its cloud-native portfolio built on the Viya platform, adding or expanding its AI-ready data management, governance by design, agentic AI and copilots, and cloud-native analytic acceleration. It provides lineage, transparency, and control capabilities within workflows where data is accessed, prepared, and activated, Farrell explained.

“Agents and AI crave data more than ever before,” she said. “It’s really important that organizations get this right from the beginning, especially if they’re adding automation to that decision process.”

The re-architected platform grounds AI in trusted data, making raw data assets usable for AI. Notably, it brings analytics and AI to the data itself through SpeedyStore, the company’s cloud-native analytical data platform, negating the need to move volumes of data for processing, Farrell explained. Enterprises still retain digital sovereignty and can control workflows across their various data stores.

“We’re making sure our customers have everything they need to meet this moment [and] tools that access the data, manage the data and gain value from it,” Farrell noted. “They can really proceed at scale to operationalize AI with confidence.”

Security | CIO
AI won’t fix your data problems. Data engineering will
Most enterprise AI investments today focus on models, compute and tooling. The assumption is that intelligence is the binding constraint and that a more capable model will produce better outcomes across every dimension that matters. This is a reasonable starting point, but it is also where most initiatives go wrong. The models organizations are deploying were trained on public data at scale. None of your internal systems, customer schema, pricing logic or support taxono
28 de Abril de 2026, 09:00

AI won’t fix your data problems. Data engineering will

Security | CIO

28 de Abril de 2026, 09:00

Most enterprise AI investments today focus on models, compute and tooling. The assumption is that intelligence is the binding constraint and that a more capable model will produce better outcomes across every dimension that matters. This is a reasonable starting point, but it is also where most initiatives go wrong.

The models organizations are deploying were trained on public data at scale. None of your internal systems, customer schema, pricing logic or support taxonomy appeared in that training.

When a model encounters your internal data, it processes it as best it can, but without the grounding that comes from having been trained on it. Early AI initiatives are struggling not because the models are weak, but because the context they need to operate reliably inside your organization is something they have never seen before.

Data engineering holds the key to this context.

Why context breaks first

Think about what an AI agent handling a support escalation needs to function well: The customer’s support history across time, not just the most recent ticket. Billing records matter too, because the character of a problem often depends on what the customer is paying for and whether anything has changed recently. Product usage data is equally essential, as what a customer reports is frequently explained by how they have been using the product. None of these things live in a single place, as they are scattered across systems that were each built by different teams, on different timelines, with different definitions of what a customer record is supposed to capture.

Human agents work around these gaps through judgment developed over time. They know which system to trust for a particular type of question, they know the usage data runs six hours behind and they know how to weigh conflicting signals based on context that is never written down anywhere. AI systems do not have that judgment. They process whatever they receive and act on it, which means that when the context is inconsistent or incomplete, the output reflects that, not as a visible error but as a subtly wrong decision. The customer notices before anyone on your team does.

When bad data stops being annoying and starts being operational

In the analytics era, data quality problems surfaced as numbers that looked off in dashboards. Analysts were the error-detection layer, and when something looked wrong, they would investigate, find the issue and get it fixed. The feedback loop was slow, but it existed, and it caught most problems before they reached the business in any consequential way.

AI agents making operational decisions do not have that buffer. They have no way of knowing that a schema migration introduced silent gaps or that a pipeline is running four hours late. Refunds go out incorrectly because the billing context was incomplete at the moment of decision.

What an analytics team could absorb as an occasional anomaly in a report becomes a real problem when an automated system acts on degraded context hundreds of times a day before anyone identifies the pattern. The volume is what makes it dangerous, and by the time it surfaces, the damage is already distributed across thousands of interactions.

The role data engineers play now

For the past decade, data engineering meant building pipelines that fed warehouses so analysts could query data and produce dashboards. The work was foundational but treated as background infrastructure, and its value was measured in pipeline reliability, query performance and reporting freshness.

The agent era changes the purpose of that work entirely. When AI systems make operational decisions, the goal is no longer producing data that is queryable. The goal is producing context that is reliable enough for a system to act on, and those are different problems with different requirements. That starts with entity resolution across systems, providing a consistent and trustworthy answer across every data source that touches them.

This also means handling late-arriving data explicitly, because agents cannot act on a state of the world that no longer holds. Freshness thresholds need to be calibrated to the decision type, since a personalization recommendation can tolerate six-hour-old usage data in ways that a refund workflow cannot. Lineage needs to survive schema changes and reorganizations, so that the provenance of any piece of context can be traced when something goes wrong.

None of that is a model problem, nor does it yield to prompt engineering. This is data engineering work, and organizations that treat it as anything else will spend a long time debugging production failures that look like AI problems but are infrastructure problems.

Context is only half the problem

Getting the right information to an agent is necessary, but it is not sufficient. There is a second challenge that most organizations have not yet confronted: How do you coordinate, govern and operate dozens or hundreds of autonomous agents making real decisions across your business?

Agent frameworks handle reasoning well. What they do not handle is everything around the agent: Scheduling when it runs, controlling what it is allowed to spend, enforcing who can approve its decisions, managing retries when external systems fail and ensuring that when an agent needs human sign-off, it does not tie up compute for hours while it waits. These are not AI problems. They are operational infrastructure problems, and they are the same class of problems that orchestration platforms have been solving for data pipelines for over a decade.

One agent answering questions in a sandbox is a proof of concept. Fifty agents making operational decisions across finance, compliance and customer operations is a fleet management problem, and it requires the same kind of scheduling, governance, cost controls and auditability that enterprises already demand from their data infrastructure.

Orchestration is typically the one layer that already has visibility across platforms, spanning your warehouse, your transformation layer, your external APIs and your operational databases. That cross-platform vantage point is what makes it possible to build a context layer that is comprehensive rather than siloed.

Governance needs to execute at runtime, not live in documentation. Policies about data access, cost limits and human approval requirements need to be enforced in code as agents run, not described in guidelines that agents cannot read and humans forget to follow.

What this means going forward

The organizations that deploy AI agents at scale will have invested in two things before those agents reach production.

First, a context layer that gives agents a reliable, cross-platform understanding of the enterprise’s data. This means not just raw access to tables, but semantic knowledge of what the data means, where it comes from and how much to trust it.

Second, an operational layer that governs how agents act, with the scheduling, cost controls, auditability and human-in-the-loop checkpoints that enterprise deployment demands.

These two investments are not independent. They form a flywheel. Better context makes agents more effective, which drives broader adoption, which generates richer operational metadata, which deepens the context layer further.

Data engineers are becoming the people who determine whether automated decisions are trustworthy, not because they control the models but because they control both the context on which those models operate and the infrastructure through which they act. The organizations that understand this early will keep building on it. The ones that keep treating data engineering and orchestration as background infrastructure will keep rediscovering the same production failures, just with different names on the postmortem each time.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

Security | CIO
Converged analytics is the refinery for the age of sovereign AI and data
“Data is the new oil” is one of the most overused phrases in enterprise technology. Yet it still captures something fundamentally true about the modern enterprise, if we extend the analogy. Crude oil has limited value until it is refined into the fuels, chemicals, plastics, polymers, synthetic fibers, and industrial materials that power entire societies and permeate nearly every aspect of modern life. Similarly, the real value of data does not lie in its raw accumulatio
27 de Abril de 2026, 09:07

Converged analytics is the refinery for the age of sovereign AI and data

Security | CIO

27 de Abril de 2026, 09:07

“Data is the new oil” is one of the most overused phrases in enterprise technology. Yet it still captures something fundamentally true about the modern enterprise, if we extend the analogy.

Crude oil has limited value until it is refined into the fuels, chemicals, plastics, polymers, synthetic fibers, and industrial materials that power entire societies and permeate nearly every aspect of modern life. Similarly, the real value of data does not lie in its raw accumulation but in its transformation, through systems, into decisions, intelligence, and operational impact.

In this context, converged analytics has emerged as the refinery of the data economy. Organizations that lead will be those with the most effective refining layer.

Traditional analytics architectures evolved in silos, no longer compatible with the dynamic AI world

Over the past decade, enterprises have invested heavily in extracting, storing, and moving data. Data lakes, warehouses, streaming platforms, and cloud pipelines have created an unprecedented accumulation of information. And yet only 13% of enterprises globally are successfully achieving ROI from their AI initiatives.

“Enterprises now sit on massive reserves of structured, semi-structured, and unstructured data generated by applications, devices, and digital interactions. Yet despite this abundance, many CIOs still struggle to translate data into consistent, real-time business value. The issue is not scarcity—it is fragmentation,” says Quais Taraki, CTO, EnterpriseDB (EDB).

The value of data is trapped when it’s siloed and spread across systems and teams.

Transactional systems were optimized for operational workloads. Analytical systems were built for reporting and historical analysis. Streaming systems handled real-time events. Each requires different infrastructure, tools, and governance models. Data has to be copied, moved, transformed, and reconciled across environments before it can be used. This introduces latency, complexity, duplication, and risk. Insights often arrive too late to influence outcomes, while operational systems remain disconnected from analytical intelligence.

Converged analytics solves the largest challenge for AI-ready data

What makes crude oil valuable is not extraction alone but its combination with the refinery—the integrated industrial system that processes, synthesizes, and upgrades raw hydrocarbons into usable products.

Comparable in the world of enterprise technology is converged analytics, which addresses data systems fragmentation by unifying capabilities into a single, sovereign architectural paradigm. It brings together transactional processing, analytical processing, and streaming-data handling within a cohesive system.

“Instead of moving data across multiple specialized platforms, converged analytics enables computation to occur where the data resides, across different workloads and time horizons. This integration collapses latency, reduces duplication, and preserves context, allowing organizations to move from retrospective analysis to real-time decision-making,” says Taraki of EDB.

AI raises the stakes

While generative AI and now agentic AI have captured executive attention, their effectiveness depends on access to fresh, well-governed, and contextually rich data. Models trained on stale or fragmented datasets deliver limited value.

Converged analytics provides the foundation for continuous data pipelines, real-time feature engineering, and low-latency inference. It enables architectures such as retrieval-augmented generation and supports ongoing feedback loops that improve model performance over time. In this sense, it is not just complementary to AI; it is a prerequisite for operationalizing it at scale.

AI also intensifies the cost of fragmentation.

“Every time data must be copied, moved, or reconciled across specialized systems, organizations introduce latency, duplication, and loss of context,” says Taraki.

Converged analytics reduces that friction by enabling computation closer to where data already resides, allowing decisions to happen in real time rather than after the fact.

Converged analytics offers non-AI and data companies a pathway to increased relevance and value

Unlike point solutions that address isolated parts of the data pipeline, converged analytics platforms sit at the center of the entire data lifecycle. They intersect with storage, compute, networking, and security, making them a natural integration point for a wide range of technologies.

For hardware vendors, this creates demand for high-performance infrastructure capable of handling mixed workloads with low latency and high throughput. For service providers, it opens the door to long-term engagements around platform design, deployment, optimization, and governance.

Converged analytics workloads are not peripheral use cases; they are core to business performance. Real-time fraud detection, predictive maintenance, personalized customer experiences, and supply chain optimization all depend on the ability to process and act on data as it is generated. These workloads are both compute intensive and mission critical, making converged analytics an especially valuable category for vendors seeking to align with enterprise priorities.

The shift toward hybrid and edge computing environments adds another dimension to the opportunity. As enterprises distribute workloads across cloud, on-premises, and edge locations, the need for consistent analytics capabilities across these environments becomes critical.

Converged analytics platforms are increasingly designed to operate seamlessly across this spectrum, enabling data to be processed and acted upon wherever it is generated. This creates additional insertion points for both hardware and services vendors, from edge devices and accelerators to orchestration, lifecycle management, and ongoing operational support.

Making it work at enterprise scale

In the early stages of the oil industry, value was concentrated in extraction. Over time, it shifted to refining and distribution, with efficiency, scale, and integration determining competitive advantage. The same transition is now underway in the data economy. Enterprises already possess vast reserves of data; the differentiator will be their ability to refine it rapidly, efficiently, and in context.

Converged analytics represents that refining capability. It is why hardware vendors are optimizing for data-intensive workloads and why services firms are reorganizing around platform engineering. But the practical reality is that this refining layer cannot succeed as software alone. It depends on the hardware, services, support, and operational expertise required to deploy and run it at scale.

For CIOs, this is no longer just a question of architecture. It is a prerequisite for making data a true driver of business value. To learn more, visit us here.

Security | CIO
Google pitches Agentic Data Cloud to help enterprises turn data into context for AI agents
Google is recasting its data and analytics portfolio as the Agentic Data Cloud, an architecture it says is aimed at moving enterprise AI from pilot to production by turning fragmented data into a unified semantic layer that agents can reason over and act on more reliably at scale. The new architecture builds on Google’s existing data platform strategy, bringing together services such as BigQuery, Dataplex, and Vertex AI, and elevating their capabilities in metadata, gov
23 de Abril de 2026, 13:55

Google pitches Agentic Data Cloud to help enterprises turn data into context for AI agents

Security | CIO

23 de Abril de 2026, 13:55

Google is recasting its data and analytics portfolio as the Agentic Data Cloud, an architecture it says is aimed at moving enterprise AI from pilot to production by turning fragmented data into a unified semantic layer that agents can reason over and act on more reliably at scale.

The new architecture builds on Google’s existing data platform strategy, bringing together services such as BigQuery, Dataplex, and Vertex AI, and elevating their capabilities in metadata, governance, and cross-cloud interoperability into what the company describes as a shared intelligence layer.

That intelligence layer strategy is underpinned by the new Knowledge Catalog, an evolution of Dataplex Universal Catalog, that the company said uses new capabilities to extend its metadata foundation into a semantic layer mapping business meaning and relationships across data sources.

These capabilities include native support for third-party catalogs, applications such as Salesforce, Palantir, Workday, SAP, and ServiceNow, and the option to move third-party data to Google’s lakehouse, which automatically maps the data to Knowledge Catalog.

To capture business logic more directly for data stored inside Google Cloud, the company is adding tools including a LookML-based agent, currently in preview, that can derive semantics from documentation, and a new feature in BigQuery, also in preview, that allows enterprises to embed that business logic for faster data analysis.

Beyond aggregation, the catalog itself is designed to continuously enrich semantic context by analyzing how data is used across an enterprise, senior google executives wrote in a blog post.

This includes profiling structured datasets as well as tagging and annotating unstructured content stored in Google Cloud Storage, the executives pointed out, adding that the catalog’s underlying system can also infer missing structure in data by using its Gemini models to generate schemas and identify relationships.

Turning data into business context the next battleground for AI

For analysts, Google’s focus on semantics targets one of the biggest barriers to production AI for enterprises.

“The hardest AI problem is inconsistent meaning,” said Dion Hinchcliffe, lead of the CIO practice at The Futurum Group, noting that a unified semantic layer could help CIOs establish consistent business context across systems while reducing the need for developers to manually stitch together metadata and lineage.

That focus on semantic context also reflects a broader shift in how hyperscalers are approaching enterprise AI. Microsoft with Fabric IQ and AWS with Nova Forge are pursuing similar strategies, building semantic context layers over enterprise data to make AI systems more consistent and easier to operationalize at scale.

While Microsoft’s approach is to wrap AI applications and agents with business context and semantic intelligence in its Fabric IQ and Work IQ offerings, AWS want enterprises to blend business context into a foundational LLM by feeding it their proprietary data.

Mike Leone, principal analyst at Moor Insights and Strategy, said Google’s approach, though closer to Microsoft’s, places the data gravity one layer above the lakehouse, within its data catalog and semantic graph capabilities.

“Google and Microsoft are solving the same problem from different angles, Fabric through a unified data foundation and Google through a unified semantic and context layer,” Leone said.

Even data analytics software vendors are converging on the idea of offering a catalog that can map semantic context from a variety of data sources, Leone added, pointing to Databricks’ Unity Catalog and Snowflake’s Horizon Catalog.

Semantic accuracy could pose challenges for CIOs

However, Google’s approach to building an intelligent semantic layer, especially its evolved Knowledge Catalog, comes with its own set of risks for CIOs.

The new catalog’s automated semantic context refinement capability, according to Jim Hare, VP analyst at Gartner, could amplify governance challenges, especially around metadata management: “In complex enterprise domains, errors in inferred relationships or definitions will require ongoing human domain oversight to maintain trust.”

Hare also warned of operational and cost management challenges.

“Agent-driven workflows spanning analytical and operational data, potentially across clouds, will introduce new challenges in observability, debugging, and cost predictability,” he said. “Dynamic agent behavior can generate opaque consumption patterns, requiring chief data and analytics officers (CDAOs) to closely manage cost attribution, usage limits, and operational guardrails as these capabilities mature.”

Adopting Google’s new architectural approach could increase dependence at the orchestration layer, resulting in issues around portability, he warned: “Exiting Google-managed semantics, Gemini agents, or BigQuery abstractions may be harder than migrating data alone.”

Bi-directional federation as strategic play

Even so, the trade-offs may be acceptable for enterprises prioritizing tighter data integration over flexibility.

As part of the new architecture, Google is also offering cross-platform data interoperability via the Apache Iceberg REST Catalog that it says will allow bi-directional federation, in turn letting enterprises access, query, and govern data across environments such as Databricks, Snowflake, and AWS without requiring data movement or cost in egress fees.

For Stephanie Walter, practice leader of the AI stack at HyperFRAME Research, this interoperability will be strategically important for enterprises scaling agents in production, especially ones that have heterogenous data environments.

Moor Insights and Strategy’s Leone, though, sees it as a different strategic play to address enterprises’ demand to access Databricks, Snowflake, and hyperscaler environments without costly data movement.

Google’s Agentic Data Cloud architecture also includes a Data Agent Kit, currently in preview, which the company says is designed to help enterprises build, deploy, and manage data-aware AI agents that can interact with governed datasets, apply business logic, and execute workflows across systems.

Robert Kramer, managing partner at KramerERP, said the Data Agent Kit will help data practitioners abstract t daily tasks, in turn lowering the barrier to operationalizing agentic AI across workflows.

However, Gartner’s Hare warned that enterprises should guard against over delegating critical data management decisions to automated agents without sufficient observability, validation controls, and human review, particularly where downstream AI systems depend on these agents for continuous data operations.

This article first appeared on InfoWorld.

Security | CIO
Snowflake offers help to users and builders of AI agents
Snowflake is enhancing Snowflake Intelligence and Cortex Code to create a unified experience connecting enterprise systems, data sources, and AI models with Snowflake data. It’s part of the company’s vision to become the control plane for the agentic enterprise, enabling enterprises to align data, tools, and workflows with AI agents built on its platform. With these updates, the company said, Snowflake Intelligence becomes an adaptable personal work agent for business u
21 de Abril de 2026, 10:16

Snowflake offers help to users and builders of AI agents

Security | CIO

21 de Abril de 2026, 10:16

Snowflake is enhancing Snowflake Intelligence and Cortex Code to create a unified experience connecting enterprise systems, data sources, and AI models with Snowflake data. It’s part of the company’s vision to become the control plane for the agentic enterprise, enabling enterprises to align data, tools, and workflows with AI agents built on its platform.

With these updates, the company said, Snowflake Intelligence becomes an adaptable personal work agent for business users, and Cortex Code expands as a builder layer for enterprise AI that provides governed, data-native development.

Enhancements to Snowflake Intelligence include automation of routine tasks by describing them in natural language, new Model Context Protocol (MCP) connectors, and reusable artifacts that let users save and share analyses, visualizations, and workflows, all of which will be generally available “soon.” In addition, a new iOS mobile app, and multi-step reasoning with deep research that uses agentic architecture to reason across data will soon be in public preview.

The company said that all of these updates came out from customer feedback, as well as from insights gleaned from Project SnowWork, last month’s preview of an autonomous AI layer for its data cloud.

Cortex Code now supports additional external data sources, including AWS Glue, Databricks, and Postgres, connectivity with other AI agents via MCP and Agent Communication Protocol (ACP), a Claude Code plugin, and a new agent software development kit with support for Python and TypeScript. There are also enhancements to Cortex Code in Snowsight, Snowflake’s web interface, including Plan Mode to allow developers to preview and approve workflows, and Snap & Ask to enable interaction with data artifacts such as charts and tables.

Snowflake also announced the private preview of Cortex Code Sandboxes in Snowsight, a dedicated cloud environment where developers can execute code end-to-end with no setup.

Michael Leone, VP & principal analyst at Moor Insights & Strategy, thinks the roadmap is “ambitious,” noting the number of items announced that are “coming soon” or are in public preview. “These announcements are starting to blur together, with almost every vendor claiming their agents can reason, act, and transform the business,” he said, adding, “What makes this one worth slowing down on, at least for me, is that Snowflake is going after both halves of the enterprise at the same time. Intelligence is built for the business users who want answers and actions without writing SQL, and Cortex Code is built for the builders who actually have to put this into production.”

Most vendors pick one target, users or builders, and come back to the other later, he said, but Snowflake is putting both on the same governed data foundation. “[This] is a harder engineering problem, but I’d argue it’s a cleaner answer to the question enterprises are actually asking, which is how to open AI up to more people without losing control of the data underneath,” he said, noting that Snowflake has changed its approach from “let’s do it inside Snowflake,” to realizing that agentic AI only works if it’s interoperable with the rest of the stack.

Igor Ikonnikov, advisory fellow at Info-Tech Research Group, also sees the control plane play as part of an industry trend. “As always, the devil is in the details: what those platforms are composed of and how they offer to control AI agents,” he said. “Most platforms are built the old-fashioned way: All the controls are coded. Snowflake speaks about reusable analytics through saving the whole solution and reusing complete modules or models. It means that common semantics are still buried inside database models and code.”

All AI vendors are motivated by the same demand from the market, he said: “Move from Copilot-based generic chatbots to business-purpose-specific AI agents that understand business logic and can interact with one another.” With these updates, he sees Snowflake as having caught up with the competition, but not yet surpassing it.

Sanjeev Mohan, principal at SanjMo, said, “The good news for customers is the support for Databricks and AWS Glue. What Snowflake is saying is that even if your data lives in a competitor’s system, Snowflake AI coding agent can be used. And vice versa, the VS Code extension and Claude Code plugin can be used on Snowflake data. In other words, it reduces vendor lock-in fears.”

It’s also the right strategic direction, said Sanchit Vir Gogia, chief analyst at Greyhound Research. “Enterprise AI is moving from generation to orchestration to execution, and Snowflake’s focus on governed data as the foundation for action aligns with that shift,” he said.

“However, becoming the execution layer for enterprise AI requires more than integrating agents and expanding tooling,” he said. It also requires consistent semantics, reliable cross-system execution, strong governance, economic viability, and organisational readiness, as well as overcoming a structural constraint. “Control without ownership of the systems where work is executed introduces dependency that is difficult to fully resolve. This is the central tension in Snowflake’s strategy and will define how far it can realistically extend its influence,” he said. “Snowflake has taken a meaningful step in that direction. It has not yet proven that it can deliver this at scale. At this stage, it is one of the most credible contenders in a race that will be defined not by who builds the smartest AI, but by who can make that AI work reliably inside the enterprise.”

This article first appeared on InfoWorld.

Security | CIO
The next-generation observability architecture: Lessons from a decade of event-scale systems
Revenue dips. Latency spikes. Alerts fire. The dashboards look fine – until they don’t Slack explodes. Ten engineers become 20. Queries multiply. Everyone starts scanning raw event data at once. And then the system starts to buckle. Right when you need it most. Over the past decade, I’ve worked on large-scale, real-time analytics systems for massive, bursty workloads. First in ad tech and more recently in observability. Across very different domains, the same failure
14 de Abril de 2026, 07:00

The next-generation observability architecture: Lessons from a decade of event-scale systems

Security | CIO

14 de Abril de 2026, 07:00

Revenue dips. Latency spikes. Alerts fire. The dashboards look fine – until they don’t

Slack explodes. Ten engineers become 20. Queries multiply. Everyone starts scanning raw event data at once. And then the system starts to buckle. Right when you need it most.

Over the past decade, I’ve worked on large-scale, real-time analytics systems for massive, bursty workloads. First in ad tech and more recently in observability. Across very different domains, the same failure pattern tends to emerge. Platforms that perform well under normal, steady-state conditions degrade under investigative load.

In many cases, this isn’t simply a matter of tuning or operational discipline. It reflects architectural assumptions. Most observability platforms were designed for detection-oriented workloads and not the unpredictable, exploratory way humans investigate incidents in real time.

Where the architecture breaks

Many observability platforms are built around a core assumption that queries will follow normal, predictable patterns. Dashboards, alerts and saved searches reflect known questions about the system.

But incidents aren’t predictable.

During an investigation, workloads shift instantly. Queries become exploratory. Time ranges expand. Filters change constantly. Concurrency spikes as multiple teams dig into the same data.

Architectural assumptions that work well in steady state can begin to show strain. Index-centric systems perform well on known paths. Step outside them, and performance drops quickly. Sub-second queries turn into minutes, concurrency falls off and costs rise.

Over time, teams may begin to limit the scope of analysis or to export data to other systems simply to maintain responsiveness.

This dynamic isn’t primarily about features. It reflects a structural mismatch between how many systems are designed and how investigations actually unfold.

What “event-native” actually means

Over the past decade, several large-scale real-time analytics systems — including Apache Druid, something I’ve been intimately engaged with — were designed to handle highly bursty, event-driven workloads.

These environments required a different architectural model.

Rather than optimizing around predefined views or tightly coupled indexing structures, event-native systems treat raw, immutable events as the primary unit of storage and analysis. Every request, error and interaction is preserved as an event and remains available for exploration.

Data is stored in column-oriented formats designed for large-scale scanning and high-cardinality queries. Instead of shaping the data upfront for specific access patterns, the system is built to support evolving questions directly against the event stream.

The difference becomes clear during an incident.

Imagine a latency spike affecting a subset of users. Engineers may need to pivot across user ID, region, service version or request path — combining dimensions that were not anticipated in advance.

In an event-native system, those pivots can occur directly against stored event data without rebuilding indexes or reshaping datasets for each new question. Multiple teams can run these queries concurrently, even across large time ranges, without the system degrading.

That’s the core shift: you’re no longer constrained by how the data was modeled upfront. You can investigate what actually happened, in real time, at scale.

Cloud economics changed the rules, but architectures stayed the same

Many observability architectures were designed for an era when storage was fixed (and expensive). That’s no longer the case. In the cloud, storage is abundant and cheap. Compute is elastic, which is often the real cost driver. You can store years of event data in object storage at a fraction of the cost of running always-on compute clusters. Yet many observability platforms still tightly couple storage, indexing and query compute as if nothing changed.

What does this mean in practice? You pay peak compute prices just to keep data available and accessible. This turns observability into making bad trade-offs between cost, retention and performance.

All-in-one observability platforms can be powerful, but they’re also rigid. When storage and compute scale together, you lose control over economics.

Monolithic architectures shine in steady state, but when incidents are triggered, they quickly become painfully expensive, painfully slow or both.

Why observability needs a dedicated data layer, not another all-in-one platform

For years, consolidation has been a common response in observability – one more all-in-one platform promising simplicity.

That approach can reduce surface complexity in the short term. Over time, however, tightly coupled systems can limit flexibility. As scale increases, storage, compute and visualization begin to compete for resources inside the same architecture.

Business intelligence learned this lesson decades ago. What started as tightly coupled stacks separated into a modular architecture where storage, transformation and visualization became independent layers. That separation created leverage and companies like Snowflake, Databricks, Fivetran and Tableau emerged by focusing on distinct parts of the stack.

Each layer could innovate independently. Storage could scale without changing dashboards or workflow, compute engines could evolve without changing ingestion and visualization tools could compete on experience rather than infrastructure.

Observability is next.

One architectural response is the introduction of a purpose-built data layer that sits beneath existing observability tools such as Splunk, Grafana or Kibana. By separating data storage from interaction and analysis, organizations can retain large volumes of telemetry while scaling compute based on investigative demand.

It means longer retention without constant peak compute costs. It means bursty, investigative workloads don’t collapse the system and multiple teams can dig into the same event stream without stepping on each other. It aligns the architecture with how observability admins and engineers actually work during incidents.

And critically so, it treats observability as a data infrastructure problem not just a tooling problem.

This shift breaks the lock between data and tools

In tightly integrated observability platforms, data is often bound to a specific query engine or user interface. That coupling can simplify adoption, but it also limits long-term flexibility. Storage decisions, retention policies and performance characteristics become tied to a single vendor’s architecture.

When the underlying event data layer is open, durable and scalable, organizations gain optionality. The same telemetry can be analyzed across multiple tools. Retention strategies can evolve independently of dashboards. New query engines or visualization systems can be adopted over time without migrating years of historical data.

That’s why new architectural patterns are emerging in large-scale deployments – systems designed for unpredictable query shapes and deep exploratory analysis. Architectures that separate storage, compute and indexing that treat observability as a data problem first.

When data is stored in open, scalable systems rather than locked inside a single platform, organizations gain flexibility. They can analyze the same data across multiple tools, adopt new technologies over time and avoid being constrained by the limitations or cost structures of any one vendor.

What the next decade of observability will look like

Telemetry volumes will continue to grow. Distributed systems introduce more surface area. AI workloads generate additional signals and amplify data scale. Investigations are becoming more collaborative and more exploratory.

In that environment, the defining characteristic of observability systems will not be the number of features they expose, but the architecture beneath them.

When Slack explodes and dashboards slow down with (or completely stop) answering the right questions, the architecture underneath will determine whether teams find the root cause in minutes or watch the system buckle all over again.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

Security Boulevard
Identity-Centric Security Strategies for Hybrid Workforces Oluwakorede Akinsete
In the hybrid work era, 80% of breaches stem from compromised credentials. Explore why identity-centric security and Zero Trust are now the "only perimeter that matters," and learn practical strategies for IAM, MFA, and automated governance to secure your modern workforce. The post Identity-Centric Security Strategies for Hybrid Workforces appeared first on Security Boulevard.
19 de Março de 2026, 07:52

Identity-Centric Security Strategies for Hybrid Workforces

Security Boulevard

Por:Oluwakorede Akinsete

19 de Março de 2026, 07:52

In the hybrid work era, 80% of breaches stem from compromised credentials. Explore why identity-centric security and Zero Trust are now the "only perimeter that matters," and learn practical strategies for IAM, MFA, and automated governance to secure your modern workforce.

The post Identity-Centric Security Strategies for Hybrid Workforces appeared first on Security Boulevard.

Security Boulevard
SaaS Sprawl has Become the New Shadow IT: Why Traditional Security Struggles to See (and Stop) It Amit Bareket
Analysis of SaaS sprawl amplified by AI integrations arguing for continuous discovery, application-layer visibility, policy enforcement, and real-time remediation to tame shadow IT and API‑level risk. The post SaaS Sprawl has Become the New Shadow IT: Why Traditional Security Struggles to See (and Stop) It appeared first on Security Boulevard.
17 de Março de 2026, 07:47

SaaS Sprawl has Become the New Shadow IT: Why Traditional Security Struggles to See (and Stop) It

Security Boulevard

Por:Amit Bareket

17 de Março de 2026, 07:47

Analysis of SaaS sprawl amplified by AI integrations arguing for continuous discovery, application-layer visibility, policy enforcement, and real-time remediation to tame shadow IT and API‑level risk.

The post SaaS Sprawl has Become the New Shadow IT: Why Traditional Security Struggles to See (and Stop) It appeared first on Security Boulevard.

Security Boulevard
Monitoring Legitimate Bot Traffic is Now a Cybersecurity Requirement Alex Vakulov
AI-driven and “legitimate” bots now make up a growing share of web traffic, blurring the line between value and risk. Security teams must treat bot traffic as a governance, cost, and cyber supply chain issue, guided by long-term visibility and analytics. The post Monitoring Legitimate Bot Traffic is Now a Cybersecurity Requirement appeared first on Security Boulevard.
11 de Março de 2026, 08:37

Monitoring Legitimate Bot Traffic is Now a Cybersecurity Requirement

Security Boulevard

Por:Alex Vakulov

11 de Março de 2026, 08:37

exposed ports, network, traffic, analysis, Iran hacking UNC1860 initial access networks

AI-driven and “legitimate” bots now make up a growing share of web traffic, blurring the line between value and risk. Security teams must treat bot traffic as a governance, cost, and cyber supply chain issue, guided by long-term visibility and analytics.

The post Monitoring Legitimate Bot Traffic is Now a Cybersecurity Requirement appeared first on Security Boulevard.

The Cloudflare Blog
Investigating multi-vector attacks in Log Explorer Jen Sells · Claudio Jolowicz · Nico Gutierrez
In the world of cybersecurity, a single data point is rarely the whole story. Modern attackers don’t just knock on the front door; they probe your APIs, flood your network with "noise" to distract your team, and attempt to slide through applications and servers using stolen credentials.To stop these multi-vector attacks, you need the full picture. By using Cloudflare Log Explorer to conduct security forensics, you get 360-degree visibility through the integration of 14 new datasets, covering the
10 de Março de 2026, 10:00

Investigating multi-vector attacks in Log Explorer

The Cloudflare Blog

Por:Jen Sells · Claudio Jolowicz · Nico Gutierrez

10 de Março de 2026, 10:00

In the world of cybersecurity, a single data point is rarely the whole story. Modern attackers don’t just knock on the front door; they probe your APIs, flood your network with "noise" to distract your team, and attempt to slide through applications and servers using stolen credentials.

To stop these multi-vector attacks, you need the full picture. By using Cloudflare Log Explorer to conduct security forensics, you get 360-degree visibility through the integration of 14 new datasets, covering the full surface of Cloudflare’s Application Services and Cloudflare One product portfolios. By correlating telemetry from application-layer HTTP requests, network-layer DDoS and Firewall logs, and Zero Trust Access events, security analysts can significantly reduce Mean Time to Detect (MTTD) and effectively unmask sophisticated, multi-layered attacks.

Read on to learn more about how Log Explorer gives security teams the ultimate landscape for rapid, deep-dive forensics.

The flight recorder for your entire stack

The contemporary digital landscape requires deep, correlated telemetry to defend against adversaries using multiple attack vectors. Raw logs serve as the "flight recorder" for an application, capturing every single interaction, attack attempt, and performance bottleneck. And because Cloudflare sits at the edge, between your users and your servers, all of these events are logged before the requests even reach your infrastructure.

Cloudflare Log Explorer centralizes these logs into a unified interface for rapid investigation.

Log Types Supported

Zone-Scoped Logs

Focus: Website traffic, security events, and edge performance.

HTTP Requests	As the most comprehensive dataset, it serves as the "primary record" of all application-layer traffic, enabling the reconstruction of session activity, exploit attempts, and bot patterns.
Firewall Events	Provides critical evidence of blocked or challenged threats, allowing analysts to identify the specific WAF rules, IP reputations, or custom filters that intercepted an attack.
DNS Logs	Identify cache poisoning attempts, domain hijacking, and infrastructure-level reconnaissance by tracking every query resolved at the authoritative edge.
NEL (Network Error Logging) Reports	Distinguish between a coordinated Layer 7 DDoS attack and legitimate network connectivity issues by tracking client-side browser errors.
Spectrum Events	For non-web applications, these logs provide visibility into L4 traffic (TCP/UDP), helping to identify anomalies or brute-force attacks against protocols like SSH, RDP, or custom gaming traffic.
Page Shield	Track and audit unauthorized changes to your site's client-side environment such as JavaScript, outbound connections.
Zaraz Events	Examine how third-party tools and trackers are interacting with user data, which is vital for auditing privacy compliance and detecting unauthorized script behaviors.

Account-Scoped Logs

Focus: Internal security, Zero Trust, administrative changes, and network activity.

Access Requests	Tracks identity-based authentication events to determine which users accessed specific internal applications and whether those attempts were authorized.
Audit Logs	Provides a trail of configuration changes within the Cloudflare dashboard to identify unauthorized administrative actions or modifications.
CASB Findings	Identifies security misconfigurations and data risks within SaaS applications (like Google Drive or Microsoft 365) to prevent unauthorized data exposure.
Magic Transit / IPSec Logs	Helps network engineers perform network-level (L3) monitoring such as reviewing tunnel health and view BGP routing changes.
Browser Isolation Logs	Tracks user actions inside an isolated browser session (e.g., copy-paste, print, or file uploads) to prevent data leaks on untrusted sites
Device Posture Results	Details the security health and compliance status of devices connecting to your network, helping to identify compromised or non-compliant endpoints.
DEX Application Tests	Monitors application performance from the user's perspective, which can help distinguish between a security-related outage and a standard performance degradation.
DEX Device State Events	Provides telemetry on the physical state of user devices, useful for correlating hardware or OS-level anomalies with potential security incidents.
DNS Firewall Logs	Tracks DNS queries filtered through the DNS Firewall to identify communication with known malicious domains or command-and-control (C2) servers.
Email Security Alerts	Logs malicious email activity and phishing attempts detected at the gateway to trace the origin of email-based entry vectors.
Gateway DNS	Monitors every DNS query made by users on your network to identify shadow IT, malware callbacks, or domain-generation algorithms (DGAs).
Gateway HTTP	Provides full visibility into encrypted and unencrypted web traffic to detect hidden payloads, malicious file downloads, or unauthorized SaaS usage.
Gateway Network	Tracks L3/L4 network traffic (non-HTTP) to identify unauthorized port usage, protocol anomalies, or lateral movement within the network.
IPSec Logs	Monitors the status and traffic of encrypted site-to-site tunnels to ensure the integrity and availability of secure network connections.
Magic IDS Detections	Surfaces matches against intrusion detection signatures to alert investigators to known exploit patterns or malware behavior traversing the network.
Network Analytics Logs	Provides high-level visibility into packet-level data to identify volumetric DDoS attacks or unusual traffic spikes targeting specific infrastructure.
Sinkhole HTTP Logs	Captures traffic directed to "sinkholed" IP addresses to confirm which internal devices are attempting to communicate with known botnet infrastructure.
WARP Config Changes	Tracks modifications to the WARP client settings on end-user devices to ensure that security agents haven't been tampered with or disabled.
WARP Toggle Changes	Specifically logs when users enable or disable their secure connectivity, helping to identify periods where a device may have been unprotected.
Zero Trust Network Session Logs	Logs the duration and status of authenticated user sessions to map out the complete lifecycle of a user's access within the protected perimeter.

Log Explorer can identify malicious activity at every stage

Get granular application layer visibility with HTTP Requests, Firewall Events, and DNS logs to see exactly how traffic is hitting your public-facing properties. Track internal movement with Access Requests, Gateway logs, and Audit logs. If a credential is compromised, you’ll see where they went. Use Magic IDS and Network Analytics logs to spot volumetric attacks and "East-West" lateral movement within your private network.

Identify the reconnaissance

Attackers use scanners and other tools to look for entry points, hidden directories, or software vulnerabilities. To identify this, using Log Explorer, you can query http_requests for any EdgeResponseStatus codes of 401, 403, or 404 coming from a single IP, or requests to sensitive paths (e.g. /.env, /.git, /wp-admin).

Additionally, magic_ids_detections logs can also be used to identify scanning at the network layer. These logs provide packet-level visibility into threats targeting your network. Unlike standard HTTP logs, these logs focus on signature-based detections at the network and transport layers (IP, TCP, UDP). Query to discover cases where a single SourceIP is triggering multiple unique detections across a wide range of DestinationPort values in a short timeframe. Magic IDS signatures can specifically flag activities like Nmap scans or SYN stealth scans.

Check for diversions

While the attacker is conducting reconnaissance, they may attempt to disguise this with a simultaneous network flood. Pivot to network_analytics_logs to see if a volumetric attack is being used as a smokescreen.

Identify the approach

Once attackers identify a potential vulnerability, they begin to craft their weapon. The attacker sends malicious payloads (e.g. SQL injection or large/corrupt file uploads) to confirm the vulnerability. Review http_requests and/or fw_events to identify any Cloudflare detection tools that have triggered. Cloudflare logs security signals in these datasets to easily identify requests with malicious payloads using fields such as WAFAttackScore, WAFSQLiAttackScore, FraudAttack, ContentScanJobResults, and several more. Review our documentation to get a full understanding of these fields. The fw_events logs can be used to determine whether these requests made it past Cloudflare’s defenses by examining the action, source, and ruleID fields. Cloudflare’s managed rules by default blocks many of these payloads by default. Review Application Security Overview to know if your application is protected.

^{Showing the Managed rules Insight that displays on Security Overview if the current zone does not have Managed Rules enabled}

Audit the identity

Did that suspicious IP manage to log in? Use the ClientIP to search access_requests. If you see a "Decision: Allow" for a sensitive internal app, you know you have a compromised account.

Stop the leak (data exfiltration)

Attackers sometimes use DNS tunneling to bypass firewalls by encoding sensitive data (like passwords or SSH keys) into DNS queries. Instead of a normal request like google.com, the logs will show long, encoded strings. Look for an unusually high volume of queries for unique, long, and high-entropy subdomains by examining the fields: QueryName: Look for strings like h3ldo293js92.example.com, QueryType: Often uses TXT, CNAME, or NULL records to carry the payload, and ClientIP: Identify if a single internal host is generating thousands of these unique requests.

Additionally, attackers may attempt to leak sensitive data by hiding it within non-standard protocols or by using common protocols (like DNS or ICMP) in unusual ways to bypass standard firewalls. Discover this by querying the magic_ids_detections logs to look for signatures that flag protocol anomalies, such as "ICMP tunneling" or "DNS tunneling" detections in the SignatureMessage.

Whether you are investigating a zero-day vulnerability or tracking a sophisticated botnet, the data you need is now at your fingertips.

Correlate across datasets

Investigate malicious activity across multiple datasets by pivoting between multiple concurrent searches. With Log Explorer, you can now work with multiple queries simultaneously with the new Tabs feature. Switch between tabs to query different datasets or Pivot and adjust queries using filtering via your query results.

When you correlate data across multiple Cloudflare log sources, you can detect sophisticated multi-stage attacks that appear benign when viewed in isolation. This cross-dataset analysis allows you to see the full attack chain from reconnaissance to exfiltration.

Session hijacking (token theft)

Scenario: A user authenticates via Cloudflare Access, but their subsequent HTTP_request traffic looks like a bot.

Step 1: Identify high-risk sessions in http_requests.

SELECT RayID, ClientIP, ClientRequestUserAgent, BotScore
FROM http_requests
WHERE date = '2026-02-22' 
  AND BotScore < 20 
LIMIT 100

Step 2: Copy the RayID and search access_requests to see which user account is associated with that suspicious bot activity.


SELECT Email, IPAddress, Allowed
FROM access_requests
WHERE date = '2026-02-22' 
  AND RayID = 'INSERT_RAY_ID_HERE'

Post-phishing C2 beaconing

Scenario: An employee clicked a link in a phishing email which resulted in compromising their workstation. This workstation sends a DNS query for a known malicious domain, then immediately triggers an IDS alert.

Step 1: Find phishing attacks by examining email_security_alerts for violations.

SELECT Timestamp, Threatcategories, To, Alertreason
FROM email_security_alerts
WHERE date = '2026-02-22' 
  AND Threatcategories LIKE 'phishing'

Step 2: Use Access logs to correlate the user’s email (To) to their IP Address.

SELECT Email, IPAddress
FROM access_requests
WHERE date = '2026-02-22'

Step 3: Find internal IPs querying a specific malicious domain in gateway_dns logs.


SELECT SrcIP, QueryName, DstIP, 
FROM gateway_dns
WHERE date = '2026-02-22' 
  AND SrcIP = 'INSERT_IP_FROM_PREVIOUS_QUERY'
  AND QueryName LIKE '%malicious_domain_name%'

Lateral movement (Access → network probing)

Scenario: A user logs in via Zero Trust and then tries to scan the internal network.

Step 1: Find successful logins from unexpected locations in access_requests.

SELECT IPAddress, Email, Country
FROM access_requests
WHERE date = '2026-02-22' 
  AND Allowed = true 
  AND Country != 'US' -- Replace with your HQ country

Step 2: Check if that IPAddress is triggering network-level signatures in magic_ids_detections.

SELECT SignatureMessage, DestinationIP, Protocol
FROM magic_ids_detections
WHERE date = '2026-02-22' 
  AND SourceIP = 'INSERT_IP_ADDRESS_HERE'

Opening doors for more data

From the beginning, Log Explorer was designed with extensibility in mind. Every dataset schema is defined using JSON Schema, a widely-adopted standard for describing the structure and types of JSON data. This design decision has enabled us to easily expand beyond HTTP Requests and Firewall Events to the full breadth of Cloudflare's telemetry. The same schema-driven approach that powered our initial datasets scaled naturally to accommodate Zero Trust logs, network analytics, email security alerts, and everything in between.

More importantly, this standardization opens the door to ingesting data beyond Cloudflare's native telemetry. Because our ingestion pipeline is schema-driven rather than hard-coded, we're positioned to accept any structured data that can be expressed in JSON format. For security teams managing hybrid environments, this means Log Explorer could eventually serve as a single pane of glass, correlating Cloudflare's edge telemetry with logs from third-party sources, all queryable through the same SQL interface. While today's release focuses on completing coverage of Cloudflare's product portfolio, the architectural groundwork is laid for a future where customers can bring their own data sources with custom schemas.

Faster data, faster response: architectural upgrades

To investigate a multi-vector attack effectively, timing is everything. A delay of even a few minutes in the log availability can be the difference between proactive defense and reactive damage control.

That is why we have optimized our ingestion for better speed and resilience. By increasing concurrency in one part of our ingestion path, we have eliminated bottlenecks that could cause “noisy neighbor” issues, ensuring that one client’s data surge doesn’t slow down another’s visibility. This architectural work has reduced our P99 ingestion latency by approximately 55%, and our P50 by 25%, cutting the time it takes for an event at the edge to become available for your SQL queries.

^{Grafana chart displaying the drop in ingest latency after architectural upgrades}

Follow along for more updates

We're just getting started. We're actively working on even more powerful features to further enhance your experience with Log Explorer, including the ability to run these detection queries on a custom defined schedule.

^{Design mockup of upcoming Log Explorer Scheduled Queries feature}

Subscribe to the blog and keep an eye out for more Log Explorer updates soon in our Change Log.

Get access to Log Explorer

To get access to Log Explorer, you can purchase self-serve directly from the dash or for contract customers, reach out for a consultation or contact your account manager. Additionally, you can read more in our Developer Documentation.

The Cloudflare Blog
Unmasking the Unseen: Your Guide to Taming Shadow AI with Cloudflare One Noelle Kagan · Joey Steinberger
The digital landscape of corporate environments has always been a battleground between efficiency and security. For years, this played out in the form of "Shadow IT" — employees using unsanctioned laptops or cloud services to get their jobs done faster. Security teams became masters at hunting these rogue systems, setting up firewalls and policies to bring order to the chaos.But the new frontier is different, and arguably far more subtle and dangerous.Imagine a team of engineers, deep into the d
25 de Agosto de 2025, 11:05

Unmasking the Unseen: Your Guide to Taming Shadow AI with Cloudflare One

The Cloudflare Blog

Por:Noelle Kagan · Joey Steinberger

25 de Agosto de 2025, 11:05

The digital landscape of corporate environments has always been a battleground between efficiency and security. For years, this played out in the form of "Shadow IT" — employees using unsanctioned laptops or cloud services to get their jobs done faster. Security teams became masters at hunting these rogue systems, setting up firewalls and policies to bring order to the chaos.

But the new frontier is different, and arguably far more subtle and dangerous.

Imagine a team of engineers, deep into the development of a groundbreaking new product. They're on a tight deadline, and a junior engineer, trying to optimize his workflow, pastes a snippet of a proprietary algorithm into a popular public AI chatbot, asking it to refactor the code for better performance. The tool quickly returns the revised code, and the engineer, pleased with the result, checks it in. What they don't realize is that their query, and the snippet of code, is now part of the AI service’s training data, or perhaps logged and stored by the provider. Without anyone noticing, a critical piece of the company's intellectual property has just been sent outside the organization's control, a silent and unmonitored data leak.

This isn't a hypothetical scenario. It's the new reality. Employees, empowered by these incredibly powerful AI tools, are now using them for everything from summarizing confidential documents to generating marketing copy and, yes, even writing code. The data leaving the company in these interactions is often invisible to traditional security tools, which were never built to understand the nuances of a browser tab interacting with a large language model. This quiet, unmanaged usage is "Shadow AI," and it represents a new, high-stakes security blind spot.

To combat this, we need a new approach—one that provides visibility into this new class of applications and gives security teams the control they need, without impeding the innovation that makes these tools so valuable.

Shadow AI reporting

This is where the Cloudflare Shadow IT Report comes in. It’s not a list of threats to be blocked, but rather a visibility and analytics tool designed to help you understand the problem before it becomes a crisis. Instead of relying on guesswork or trying to manually hunt down every unsanctioned application, Cloudflare One customers can use the insights from their traffic to gain a clear, data-driven picture of their organization's application usage.

The report provides a detailed, categorized view of your application activity, and is easily narrowed down to AI activity. We’ve leveraged our network and threat intelligence capabilities to identify and classify AI services, identifying general-purpose models like ChatGPT, code-generation assistants like GitHub Copilot, and specialized tools used for marketing, data analysis, or other content creation, like Leonardo.ai. This granular view allows security teams to see not just that an employee is using an AI app, but which AI app, and what users are accessing it.

How we built it

Sharp eyed users may have noticed that we’ve had a shadow IT feature for a while — so what changed? While Cloudflare Gateway, our secure web gateway (SWG), has recorded some of this data for some time, users have wanted deeper insights and reporting into their organization's application usage. Cloudflare Gateway processes hundreds of millions of rows of app usage data for our biggest users daily, and that scale was causing issues with queries into larger time windows. Additionally, the original implementation lacked the filtering and customization capabilities to properly investigate the usage of AI applications. We knew this was information that our customers loved, but we weren’t doing a good enough job of showing it to them.

Solving this was a cross-team effort requiring a complete overhaul by our analytics and reporting engineers. You may have seen our work recently in this July 2025 blog post detailing how we adopted TimescaleDB to support our analytics platform, unlocking our analytics, allowing us to aggregate and compress long term data to drastically improve query performance. This solves the issue we originally faced around our scale, letting our biggest customers query their data for long time periods. Our crawler collects the original HTTP traffic data from Gateway, which we store into a Timescale database.

Once the data are in our database, we built specific, materialized views in our database around the Shadow IT and AI use case to support analytics for this feature. Whereas the existing HTTP analytics we built are centered around the HTTP requests on an account, these specific views are centered around the information relevant to applications, for example: Which of my users are going to unapproved applications? How much bandwidth are they consuming? Is there an end-user in an unexpected geographical location interacting with an unreviewed application? What devices are using the most bandwidth?

Over the past year, the team has defined a set framework for the analytics we surface. Our timeseries graphs and top-n graphs are all filterable by duration and the relevant data points shown, allowing users to drill down to specific data points and see the details of their corporate traffic. We overhauled Shadow IT by examining the data we had and researching how AI applications were presenting visibility challenges for customers. From there we leveraged our existing framework and built the Shadow IT dashboard. This delivered the application-level visibility that we know our customers needed.

How to use it

1. Proxy your traffic with Gateway

The core of the system is Cloudflare Gateway, an in-line filter and proxy for all your organization's Internet traffic, regardless of where your users are. When an employee tries to access an AI application, their traffic flows through Cloudflare’s global network. Cloudflare can inspect the traffic, including the hostname, and map the traffic to our application definitions. TLS inspection is optional for Gateway customers, but it is required for ShadowIT analytics.

Interactions are logged and tied to user identity, device posture, bandwidth consumed and even the geographic location. This rich context is crucial for understanding who is using which AI tools, when, and from where.

2. Review application use

All this granular data is then presented in an our Shadow IT Report within your Cloudflare One dashboard. Simply filter for AI applications so you can:

High-Level Overview: Get an immediate sense of your organization's AI adoption. See the top AI applications in use, overall usage trends, and the volume of data being processed. This will help you identify and target your security and governance efforts.
Granular Drill-Downs: Need more detail? Click on any AI application to see specific users or groups accessing it, their usage frequency, location, and the amount of data transferred. This detail helps you pinpoint teams using AI around the company, as well as how much data is flowing to those applications.

_{ShadowIT analytics dashboard}

3. Mark application approval statuses

We understand that not all AI tools are created equal, and your organization's comfort level will vary. The Shadow AI Report introduces a flexible framework for Application Approval Status, allowing you to formally categorize each detected AI application:

Approved: These are the AI applications that have passed your internal security vetting, comply with your policies, and are officially sanctioned for use.
Unapproved: These are the red-light applications. Perhaps they have concerning data privacy policies, a history of vulnerabilities, or simply don’t align with your business objectives.
In Review: For those gray-area applications, or newly discovered tools, this status lets your teams acknowledge their usage while conducting thorough due diligence. It buys you time to make an informed decision without immediate disruption.

^{Review and mark application statuses in the dashboard}

4. Enforce policies

These approval statuses come alive when integrated with Cloudflare Gateway policies. This allows you to automatically enforce your AI decisions at the edge of Cloudflare’s network, ensuring consistent security for every employee, anywhere they work.

Here’s how you can translate your decisions into inline protection:

Block unapproved AI: The simplest and most direct action. Create a Gateway HTTP policy that blocks all traffic to any AI application marked as "Unapproved." This immediately shuts down risky data exfiltration.
Limit "In Review" exposure: For applications still being assessed, you might not want a hard block, but rather a soft limit on potential risks:
Data Loss Prevention (DLP): Cloudflare DLP inspects and analyzes traffic for indicators of sensitive data (e.g., credit card numbers, PII, internal project names, source code) and can then block the transfer. By applying DLP to "In Review" AI applications, you can prevent AI prompts containing this proprietary data, as well as notify the user why the prompt was blocked. This could have saved our poor junior engineer from their well-intended mistake..
Restrict Specific Actions: Block only file uploads allowing basic interaction but preventing mass data egress.
Isolate Risky Sessions: Route traffic for "In Review" applications through Cloudflare's Browser Isolation. Browser Isolation executes the browser session in a secure, remote container, isolating all data interactions from your corporate network. With it, you can control file uploads, clipboard actions, reduce keyboard inputs and more, reducing interaction with the application while you review it.
Audit "Approved" usage: Even for AI tools you trust, you might want to log all interactions for compliance auditing or apply specific data handling rules to ensure ongoing adherence to internal policies.

This workflow enables your team to consistently audit your organization’s AI usage and easily update policies to quickly and easily reduce security risk.

Forensics with Cloudflare Log Explorer

While the Shadow AI Report provides excellent insights, security teams often need to perform deeper forensic investigations. For these advanced scenarios, we offer Cloudflare Log Explorer.

Log Explorer allows you to store and query your Cloudflare logs directly within the Cloudflare dashboard or via API, eliminating the need to send massive log volumes to third-party SIEMs for every investigation. It provides raw, unsampled log data with full context, enabling rapid and detailed analysis.

Log Explorer customers can dive into Shadow AI logs with pre-populated SQL queries from Cloudflare Analytics, enabling deeper investigations into AI usage:

_{Log Search’s SQL query interface}

How to investigate Shadow AI with Log Explorer:

Trace Specific User Activity: If the Shadow AI Report flags a user with high activity on an "In Review" or "Unapproved" AI app, you can jump into Log Explorer and query by user, application category, or specific AI services.
Analyze Data Exfiltration Attempts: If you have DLP policies configured, you can search for DLP matches in conjunction with AI application categories. This helps identify attempts to upload sensitive data to AI applications and pinpoint exactly what data was being transmitted.
Identify Anomalous AI Usage: The Shadow AI Report might show a spike in usage for a particular AI application. In Log Explorer, you can filter by application status (In Review or Unapproved) for a specific time range. Then, look for unusual patterns, such as a high number of requests from a single source IP address, or unexpected geographic origins, which could indicate compromised accounts or policy evasion attempts.

If AI visibility is a challenge for your organization, the Shadow AI Report is available now for Cloudflare One customers, as part of our broader shadow IT discovery capabilities. Log in to your dashboard to start regaining visibility and shaping your AI governance strategy today.

Ready to modernize how you secure access to AI apps? Reach out for a consultation with our Cloudflare One security experts about how to regain visibility and control.

Or if you’re not ready to talk to someone yet, nearly every feature in Cloudflare One is available at no cost for up to 50 users. Many of our largest enterprise customers start by exploring the products themselves on our free plan, and you can get started here.

If you’ve got feedback or want to help shape how Cloudflare enhances visibility across shadow AI, please consider joining our user research program.

The Cloudflare Blog
Cloudflare Log Explorer is now GA, providing native observability and forensics Jen Sells · Claudio Jolowicz
We are thrilled to announce the General Availability of Cloudflare Log Explorer, a powerful new product designed to bring observability and forensics capabilities directly into your Cloudflare dashboard. Built on the foundation of Cloudflare's vast global network, Log Explorer leverages the unique position of our platform to provide a comprehensive and contextualized view of your environment.Security teams and developers use Cloudflare to detect and mitigate threats in real-time and to optimize
18 de Junho de 2025, 10:00

Cloudflare Log Explorer is now GA, providing native observability and forensics

The Cloudflare Blog

Por:Jen Sells · Claudio Jolowicz

18 de Junho de 2025, 10:00

We are thrilled to announce the General Availability of Cloudflare Log Explorer, a powerful new product designed to bring observability and forensics capabilities directly into your Cloudflare dashboard. Built on the foundation of Cloudflare's vast global network, Log Explorer leverages the unique position of our platform to provide a comprehensive and contextualized view of your environment.

Security teams and developers use Cloudflare to detect and mitigate threats in real-time and to optimize application performance. Over the years, users have asked for additional telemetry with full context to investigate security incidents or troubleshoot application performance issues without having to forward data to third party log analytics and Security Information and Event Management (SIEM) tools. Besides avoidable costs, forwarding data externally comes with other drawbacks such as: complex setups, delayed access to crucial data, and a frustrating lack of context that complicates quick mitigation.

Log Explorer has been previewed by several hundred customers over the last year, and they attest to its benefits:

“Having WAF logs (firewall events) instantly available in Log Explorer with full context — no waiting, no external tools — has completely changed how we manage our firewall rules. I can spot an issue, adjust the rule with a single click, and immediately see the effect. It’s made tuning for false positives faster, cheaper, and far more effective.”

“While we use Logpush to ingest Cloudflare logs into our SIEM, when our development team needs to analyze logs, it can be more effective to utilize Log Explorer. SIEMs make it difficult for development teams to write their own queries and manipulate the console to see the logs they need. Cloudflare's Log Explorer, on the other hand, makes it much easier for dev teams to look at logs and directly search for the information they need.”

With Log Explorer, customers have access to Cloudflare logs with all the context available within the Cloudflare platform. Compared to external tools, customers benefit from:

Reduced cost and complexity: Drastically reduce the expense and operational overhead associated with forwarding, storing, and analyzing terabytes of log data in external tools.
Faster detection and triage: Access Cloudflare-native logs directly, eliminating cumbersome data pipelines and the ingest lags that delay critical security insights.
Accelerated investigations with full context: Investigate incidents with Cloudflare's unparalleled contextual data, accelerating your analysis and understanding of "What exactly happened?" and "How did it happen?"
Minimal recovery time: Seamlessly transition from investigation to action with direct mitigation capabilities via the Cloudflare platform.

Log Explorer is available as an add-on product for customers on our self serve or Enterprise plans. Read on to learn how each of the capabilities of Log Explorer can help you detect and diagnose issues more quickly.

Monitor security and performance issues with custom dashboards

Custom dashboards allow you to define the specific metrics you need in order to monitor unusual or unexpected activity in your environment.

Getting started is easy, with the ability to create a chart using natural language. A natural language interface is integrated into the chart create/edit experience, enabling you to describe in your own words the chart you want to create. Similar to the AI Assistant we announced during Security Week 2024, the prompt translates your language to the appropriate chart configuration, which can then be added to a new or existing custom dashboard.

As an example, you can create a dashboard for monitoring for the presence of Remote Code Execution (RCE) attacks happening in your environment. An RCE attack is where an attacker is able to compromise a machine in your environment and execute commands. The good news is that RCE is a detection available in Cloudflare WAF. In the dashboard example below, you can not only watch for RCE attacks, but also correlate them with other security events such as malicious content uploads, source IP addresses, and JA3/JA4 fingerprints. Such a scenario could mean one or more machines in your environment are compromised and being used to spread malware — surely, a very high risk incident!

A reliability engineer might want to create a dashboard for monitoring errors. They could use the natural language prompt to enter a query like “Compare HTTP status code ranges over time.” The AI model then decides the most appropriate visualization and constructs their chart configuration.

While you can create custom dashboards from scratch, you could also use an expert-curated dashboard template to jumpstart your security and performance monitoring.

Available templates include:

Bot monitoring: Identify automated traffic accessing your website
API Security: Monitor the data transfer and exceptions of API endpoints within your application
API Performance: See timing data for API endpoints in your application, along with error rates
Account Takeover: View login attempts, usage of leaked credentials, and identify account takeover attacks
Performance Monitoring: Identify slow hosts and paths on your origin server, and view time to first byte (TTFB) metrics over time
Security Monitoring: monitor attack distribution across top hosts and paths, correlate DDoS traffic with origin Response time to understand the impact of DDoS attacks.

Investigate and troubleshoot issues with Log Search

Continuing with the example from the prior section, after successfully diagnosing that some machines were compromised through the RCE issue, analysts can pivot over to Log Search in order to investigate whether the attacker was able to access and compromise other internal systems. To do that, the analyst could search logs from Zero Trust services, using context, such as compromised IP addresses from the custom dashboard, shown in the screenshot below:

Log Search is a streamlined experience including data type-aware search filters, or the ability to switch to a custom SQL interface for more powerful queries. Log searches are also available via a public API.

Save time and collaborate with saved queries

Queries built in Log Search can now be saved for repeated use and are accessible to other Log Explorer users in your account. This makes it easier than ever to investigate issues together.

Monitor proactively with Custom Alerting (coming soon)

With custom alerting, you can configure custom alert policies in order to proactively monitor the indicators that are important to your business.

Starting from Log Search, define and test your query. From here you can opt to save and configure a schedule interval and alerting policy. The query will run automatically on the schedule you define.

Tracking error rate for a custom hostname

If you want to monitor the error rate for a particular host, you can use this Log Search query to calculate the error rate per time interval:

SELECT SUBSTRING(EdgeStartTimeStamp, 1, 14) || '00:00' AS time_interval,
       COUNT() AS total_requests,
       COUNT(CASE WHEN EdgeResponseStatus >= 500 THEN 1 ELSE NULL END) AS error_requests,
       COUNT(CASE WHEN EdgeResponseStatus >= 500 THEN 1 ELSE NULL END) * 100.0 / COUNT() AS error_rate_percentage
 FROM http_requests
WHERE EdgeStartTimestamp >= '2025-06-09T20:56:58Z'
  AND EdgeStartTimestamp <= '2025-06-10T21:26:58Z'
  AND ClientRequestHost = 'customhostname.com'
GROUP BY time_interval
ORDER BY time_interval ASC;

Running the above query returns the following results. You can see the overall error rate percentage in the far right column of the query results.

Proactively detect malware

We can identify malware in the environment by monitoring logs from Cloudflare Secure Web Gateway. As an example, Katz Stealer is malware-as-a-service designed for stealing credentials. We can monitor DNS queries and HTTP requests from users within the company in order to identify any machines that may be infected with Katz Stealer malware.

And with custom alerts, you can configure an alert policy so that you can be notified via webhook or PagerDuty.

Maintain audit & compliance with flexible retention (coming soon)

With flexible retention, you can set the precise length of time you want to store your logs, allowing you to meet specific compliance and audit requirements with ease. Other providers require archiving or hot and cold storage, making it difficult to query older logs. Log Explorer is built on top of our R2 storage tier, so historical logs can be queried as easily as current logs.

How we built Log Explorer to run at Cloudflare scale

With Log Explorer, we have built a scalable log storage platform on top of Cloudflare R2 that lets you efficiently search your Cloudflare logs using familiar SQL queries. In this section, we’ll look into how we did this and how we solved some technical challenges along the way. Log Explorer consists of three components: ingestors, compactors, and queriers. Ingestors are responsible for writing logs from Cloudflare’s data pipeline to R2. Compactors optimize storage files, so they can be queried more efficiently. Queriers execute SQL queries from users by fetching, transforming, and aggregating matching logs from R2.

During ingestion, Log Explorer writes each batch of log records to a Parquet file in R2. Apache Parquet is an open-source columnar storage file format, and it was an obvious choice for us: it’s optimized for efficient data storage and retrieval, such as by embedding metadata like the minimum and maximum values of each column across the file which enables the queriers to quickly locate the data needed to serve the query.

Log Explorer stores logs on a per-customer level, just like Cloudflare D1, so that your data isn't mixed with that of other customers. In Q3 2025, per-customer logs will allow you the flexibility to create your own retention policies and decide in which regions you want to store your data. But how does Log Explorer find those Parquet files when you query your logs? Log Explorer leverages the Delta Lake open table format to provide a database table abstraction atop R2 object storage. A table in Delta Lake pairs data files in Parquet format with a transaction log. The transaction log registers every addition, removal, or modification of a data file for the table – it’s stored right next to the data files in R2.

Given a SQL query for a particular log dataset such as HTTP Requests or Gateway DNS, Log Explorer first has to load the transaction log of the corresponding Delta table from R2. Transaction logs are checkpointed periodically to avoid having to read the entire table history every time a user queries their logs.

Besides listing Parquet files for a table, the transaction log also includes per-column min/max statistics for each Parquet file. This has the benefit that Log Explorer only needs to fetch files from R2 that can possibly satisfy a user query. Finally, queriers use the min/max statistics embedded in each Parquet file to decide which row groups to fetch from the file.

Log Explorer processes SQL queries using Apache DataFusion, a fast, extensible query engine written in Rust, and delta-rs, a community-driven Rust implementation of the Delta Lake protocol. While standing on the shoulders of giants, our team had to solve some unique problems to provide log search at Cloudflare scale.

Log Explorer ingests logs from across Cloudflare’s vast global network, spanning more than 330 cities in over 125 countries. If Log Explorer were to write logs from our servers straight to R2, its storage would quickly fragment into a myriad of small files, rendering log queries prohibitively expensive.

Log Explorer’s strategy to avoid this fragmentation is threefold. First, it leverages Cloudflare’s data pipeline, which collects and batches logs from the edge, ultimately buffering each stream of logs in an internal system named Buftee. Second, log batches ingested from Buftee aren’t immediately committed to the transaction log; rather, Log Explorer stages commits for multiple batches in an intermediate area and “squashes” these commits before they’re written to the transaction log. Third, once log batches have been committed, a process called compaction merges them into larger files in the background.

While the open-source implementation of Delta Lake provides compaction out of the box, we soon encountered an issue when using it for our workloads. Stock compaction merges data files to a desired target size S by sorting the files in reverse order of their size and greedily filling bins of size S with them. By merging logs irrespective of their timestamps, this process distributed ingested batches randomly across merged files, destroying data locality. Despite compaction, a user querying for a specific time frame would still end up fetching hundreds or thousands of files from R2.

For this reason, we wrote a custom compaction algorithm that merges ingested batches in order of their minimum log timestamp, leveraging the min/max statistics mentioned previously. This algorithm reduced the number of overlaps between merged files by two orders of magnitude. As a result, we saw a significant improvement in query performance, with some large queries that had previously taken over a minute completing in just a few seconds.

Follow along for more updates

We're just getting started! We're actively working on even more powerful features to further enhance your experience with Log Explorer. Subscribe to the blog and keep an eye out for more updates in our Change Log to our observability and forensics offering soon.

Get access to Log Explorer

To get started with Log Explorer, sign up here or contact your account manager. You can also read more in our Developer Documentation.

The Cloudflare Blog
Cloudflare enables native monitoring and forensics with Log Explorer and custom dashboards Jen Sells
In 2024, we announced Log Explorer, giving customers the ability to store and query their HTTP and security event logs natively within the Cloudflare network. Today, we are excited to announce that Log Explorer now supports logs from our Zero Trust product suite. In addition, customers can create custom dashboards to monitor suspicious or unusual activity.Every day, Cloudflare detects and protects customers against billions of threats, including DDoS attacks, bots, web application exploits, and
18 de Março de 2025, 10:00

Cloudflare enables native monitoring and forensics with Log Explorer and custom dashboards

The Cloudflare Blog

Por:Jen Sells

18 de Março de 2025, 10:00

In 2024, we announced Log Explorer, giving customers the ability to store and query their HTTP and security event logs natively within the Cloudflare network. Today, we are excited to announce that Log Explorer now supports logs from our Zero Trust product suite. In addition, customers can create custom dashboards to monitor suspicious or unusual activity.

Every day, Cloudflare detects and protects customers against billions of threats, including DDoS attacks, bots, web application exploits, and more. SOC analysts, who are charged with keeping their companies safe from the growing spectre of Internet threats, may want to investigate these threats to gain additional insights on attacker behavior and protect against future attacks. Log Explorer, by collecting logs from various Cloudflare products, provides a single starting point for investigations. As a result, analysts can avoid forwarding logs to other tools, maximizing productivity and minimizing costs. Further, analysts can monitor signals specific to their organizations using custom dashboards.

Zero Trust dataset support in Log Explorer

Log Explorer stores your Cloudflare logs for a 30-day retention period so that you can analyze them natively and in a single interface, within the Cloudflare Dashboard. Cloudflare log data is diverse, reflecting the breadth of capabilities available. For example, HTTP requests contain information about the client such as their IP address, request method, autonomous system (ASN), request paths, and TLS versions used. Additionally, Cloudflare’s Application Security WAF Detections enrich these HTTP request logs with additional context, such as the WAF attack score, to identify threats.

Today we are announcing that seven additional Cloudflare product datasets are now available in Log Explorer. These seven datasets are the logs generated from our Zero Trust product suite, and include logs from Access, Gateway DNS, Gateway HTTP, Gateway Network, CASB, Zero

Trust Network Session, and Device Posture Results. Read on for examples of how to use these logs to identify common threats.

Investigating unauthorized access

By reviewing Access logs and HTTP request logs, we can reveal attempts to access resources or systems without proper permissions, including brute force password attacks, indicating potential security breaches or malicious activity.

Below, we filter Access Logs on the Allowed field, to see activity related to unauthorized access.

By then reviewing the HTTP logs for the requests identified in the previous query, we can assess if bot networks are the source of unauthorized activity.

With this information, you can craft targeted Custom Rules to block the offending traffic.

Detecting malware

Cloudflare's Web Gateway can track which websites users are accessing, allowing administrators to identify and block access to malicious or inappropriate sites. These logs can be used to detect if a user’s machine or account is compromised by malware attacks. When reviewing logs, this may become apparent when we look for records that show a rapid succession of attempts to browse known malicious sites, such as hostnames that have long strings of seemingly random characters that hide their true destination. In this example, we can query logs looking for requests to a spoofed YouTube URL.

Monitoring what matters using custom dashboards

Security monitoring is not one size fits all. For instance, companies in the retail or financial industries worry about fraud, while every company is concerned about data exfiltration, of information like trade secrets. And any form of personally identifiable information (PII) is a target for data breaches or ransomware attacks.

While log exploration helps you react to threats, our new custom dashboards allow you to define the specific metrics you need in order to monitor threats you are concerned about.

Getting started is easy, with the ability to create a chart using natural language. A natural language interface is integrated into the chart create/edit experience, enabling you to describe in your own words the chart you want to create. Similar to the AI Assistant we announced during Security Week 2024, the prompt translates your language to the appropriate chart configuration, which can then be added to a new or existing custom dashboard.

Use a prompt: Enter a query like “Compare status code ranges over time”. The AI model decides the most appropriate visualization and constructs your chart configuration.
Customize your chart: Select the chart elements manually, including the chart type, title, dataset to query, metrics, and filters. This option gives you full control over your chart’s structure.

^{Video shows entering a natural language description of desired metric “compare status code ranges over time”, preview chart shown is a time series grouped by error code ranges, selects “add chart” to save to dashboard.}

For more help getting started, we have some pre-built templates that you can use for monitoring specific uses. Available templates currently include:

Bot monitoring: Identify automated traffic accessing your website
API Security: Monitor the data transfer and exceptions of API endpoints within your application
API Performance: See timing data for API endpoints in your application, along with error rates
Account Takeover: View login attempts, usage of leaked credentials, and identify account takeover attacks
Performance Monitoring: Identify slow hosts and paths on your origin server, and view time to first byte (TTFB) metrics over time

Templates provide a good starting point, and once you create your dashboard, you can add or remove individual charts using the same natural language chart creator.

^{Video shows editing chart from an existing dashboard and moving individual charts via drag and drop.}

Example use cases

Custom dashboards can be used to monitor for suspicious activity, or to keep an eye on performance and errors for your domains. Let’s explore some examples of suspicious activity that we can monitor using custom dashboards.

Take, for example, our use case from above: investigating unauthorized access. With custom dashboards, you can create a dashboard using the Account takeover template to monitor for suspicious login activity related to your domain.

As another example, spikes in requests or errors are common indicators that something is wrong, and they can sometimes be signals of suspicious activity. With the Performance Monitoring template, you can view origin response time and time to first byte metrics as well as monitor for common errors. For example, in this chart, the spikes in 404 errors could be an indication of an unauthorized scan of your endpoints.

Seamlessly integrated into the Cloudflare platform

When using custom dashboards, if you observe a traffic pattern or spike in errors that you would like to further investigate, you can click the button to “View in Security Analytics” in order to drill down further into the data and craft custom WAF rules to mitigate the threat.

These tools, seamlessly integrated into the Cloudflare platform, will enable users to discover, investigate, and mitigate threats all in one place, reducing time to resolution and overall cost of ownership by eliminating the need to forward logs to third party security analysis tools. And because it is a native part of Cloudflare, you can immediately use the data from your investigation to craft targeted rules that will block these threats.

What’s next

Stay tuned as we continue to develop more capabilities in the areas of observability and forensics, with additional features including:

Custom alerts: create alerts based on specific metrics or anomalies
Scheduled query detections: craft log queries and run them on a schedule to detect malicious activity
More integration: further streamlining the journey between detect, investigate, and mitigate across the full Cloudflare platform.

How to get it

Current Log Explorer beta users get immediate access to the new custom dashboards feature. Pricing will be made available to everyone during Q2 2025. Between now and then, these features continue to be available at no cost.

Let us know if you are interested in joining our Beta program by completing this form, and a member of our team will contact you.

Watch on Cloudflare TV

Security Intelligence
Hacking the mind: Why psychology matters to cybersecurity Jonathan Reed
In cybersecurity, too often, the emphasis is placed on advanced technology meant to shield digital infrastructure from external threats. Yet, an equally crucial — and underestimated — factor lies at the heart of all digital interactions: the human mind. Behind every breach is a calculated manipulation, and behind every defense, a strategic response. The psychology of cyber crime, the resilience of security professionals and the behaviors of everyday users combine to form the human element of cy
6 de Fevereiro de 2025, 11:00

Hacking the mind: Why psychology matters to cybersecurity

Security Intelligence

Por:Jonathan Reed

6 de Fevereiro de 2025, 11:00

In cybersecurity, too often, the emphasis is placed on advanced technology meant to shield digital infrastructure from external threats. Yet, an equally crucial — and underestimated — factor lies at the heart of all digital interactions: the human mind. Behind every breach is a calculated manipulation, and behind every defense, a strategic response. The psychology of cyber crime, the resilience of security professionals and the behaviors of everyday users combine to form the human element of cybersecurity. Arguably, it’s the most unpredictable and influential variable in our digital defenses.

To truly understand cybersecurity is to understand the human mind — both as a weapon and as a shield.

Peering into the mind of a cyber criminal

At the core of every cyberattack is a human, driven not just by code but by complex motivations and psychological impulses. Cyber criminals aren’t merely technologists. They are people with intentions, convictions, emotions and specific psychological profiles that drive their actions. Financial gain remains a primary incentive to launch attacks like ransomware. But some are also driven by ideological motives, or they relish the chance to outsmart advanced defenses so they can later brag about it in dark web forums.

Many cyber criminals share distinct personality traits: an inclination for risk-taking, problem-solving prowess and an indifference to ethical boundaries. Furthermore, the physical and digital distance inherent in online crime can create a psychological disconnect, minimizing the moral weight of their actions. This environment enables cyber criminals to justify their behavior in ways they might not if they had to face their victims in person. Equipped with these psychological “advantages,” cyber criminals excel in social engineering tactics. They manipulate people instead of systems to gain unauthorized access.

Exploiting the human factor with social engineering

One of the most powerful weapons in a cyber criminal’s arsenal isn’t high-tech malware but the vulnerability of the human mind. Social engineering attacks, like phishing, vishing (voice phishing) and smishing (SMS phishing), exploit non-technological human factors like trust, fear, urgency and curiosity. And these tactics are alarmingly effective. A recent report from Verizon found that the human element factored into 68% of data breaches, underscoring the vulnerability of human interactions.

Phishing attacks, for instance, are designed to create a sense of urgency, fear or curiosity. Attackers manipulate users into clicking malicious links or revealing sensitive information. The success of these attacks depends on creating a false sense of trust and authority, preying on our innate tendencies. Understanding these methods is not only crucial for developing technical countermeasures but also for educating users to resist psychological manipulation.

The mental fortitude of cyber professionals

Defending against cyber threats requires more than solid technical skills; it demands resilience, ethical conviction and a keen understanding of human behavior. Cyber professionals operate in a high-stakes environment and face unrelenting pressure. Mental resilience enables them to rapidly respond to breaches, restore security and learn from the incident.

Creativity and adaptability are also indispensable in cybersecurity. As cyber criminals constantly refine their tactics, security professionals need to anticipate these moves. They, too, must innovate by developing new countermeasures before an attack even occurs. Like a chess match, staying ahead of intruders requires ingenuity that goes beyond technical skills. The best security teams have the ability to see beyond conventional approaches and the courage to pioneer novel defenses.

Finally, ethics play a defining role, particularly as security professionals are entrusted with sensitive data and powerful tools. Through misuse or negligence, these secrets and tools could cause substantial harm. Adherence to a strong ethical code serves as a psychological anchor, helping cyber pros to navigate the moral complexities of their work while prioritizing user privacy and security.

In a nutshell, working as a cybersecurity professional is one of the hardest jobs on earth.

Build your cybersecurity skills

Building a psychologically aware cybersecurity strategy

A truly effective cybersecurity strategy doesn’t just block attacks; it anticipates and adapts to human behavior. Therefore, aligning security measures with natural human tendencies can elevate an organization’s defenses significantly. This works better than relying on users to remember overly complex protocols.

For instance, training and awareness programs that incorporate psychological insights are far more impactful than traditional “box-ticking” sessions. The principles of Nudge Theory, which employs subtle prompts to influence behavior, offer a potent alternative. Well-designed programs make secure behaviors easy, attractive and timely. This guides employees toward safer practices without the punitive undertones that can breed resentment and resistance.

Creating a culture of psychological safety within an organization can also encourage employees to address security concerns proactively. When people feel safe discussing potential threats and even mistakes, the early identification of risks and a collective commitment to security becomes second nature. This “human firewall” effect, where individuals collectively protect digital assets, strengthens organizational resilience.

Behavioral analytics: The fusion of psychology and technology

User behavior analytics is where technology meets psychology in a powerful way. By analyzing behavioral patterns and detecting deviations, organizations can preemptively identify potential threats. This approach operates on the principle that individuals, even in digital spaces, follow predictable patterns. Behavioral analytics can detect anomalous behaviors — such as a sudden attempt to access restricted files or logins at unusual times — signaling a potential breach.

This combination of psychology and technology allows for dynamic, adaptive security measures that can catch threats early, often before they escalate into full-fledged incidents. By weaving human insight into the fabric of digital security, behavioral analytics represents a major step forward in cybersecurity defenses.

Rethinking the rhetoric of cybersecurity

The cybersecurity industry has long relied on fear-driven messaging to encourage secure behavior. However, experts argue that this approach, while effective in the short term, may actually discourage engagement in the long run. By using dramatic language to describe threats, the industry may be creating a sense of helplessness among the general public. Portraying cybersecurity as a field too complex and overwhelming for normal individuals to understand promotes failure.

Instead, fostering a sense of civic responsibility can empower anyone to participate in cybersecurity efforts. When people understand that their actions contribute to a safer online community, they’re more likely to engage in secure practices. Reframing cybersecurity as a shared responsibility rather than a source of fear can transform public engagement with online security.

Bridging technology and psychology for a secure future

Today, cybersecurity is no longer solely a technical issue — it is a fundamentally human one. Security strategies must weave technology and psychology together to create a comprehensive defense that accounts for both system vulnerabilities and human behavior. Cyber criminals leverage psychological tactics to manipulate individuals. A deeper understanding of this will make security stronger. Meanwhile, cybersecurity professionals rely on their mental resilience, creativity and ethical fortitude to counter these threats.

From training programs based on psychological principles to implementing behavioral analytics, incorporating human insights into cybersecurity strategies leads to a more adaptive and robust defense. By embracing psychology alongside technological advancements, we can transform cybersecurity from a reactive discipline into a proactive, resilient force.

The post Hacking the mind: Why psychology matters to cybersecurity appeared first on Security Intelligence.

Visualização normal

What is data analytics?

What is AI data analytics?

The four types of data analytics

Data analytics methods and techniques

Data analytics tools

Data analytics vs. data science

Data analytics vs. data analysis

Data analytics vs. business analytics

Data analytics examples

Data analytics salaries

에이전트와 보다 직관적으로 상호작용

인간 판단 유지 위한 거버넌스 강화

신뢰 가능한 데이터 확보가 AI 성패 좌우

Interacting with agents more intuitively

Maintaining human judgment

Getting to the right enterprise data

Why context breaks first

When bad data stops being annoying and starts being operational

The role data engineers play now

Context is only half the problem

What this means going forward

Traditional analytics architectures evolved in silos, no longer compatible with the dynamic AI world

Converged analytics solves the largest challenge for AI-ready data

AI raises the stakes

Converged analytics offers non-AI and data companies a pathway to increased relevance and value

Making it work at enterprise scale

Turning data into business context the next battleground for AI

Semantic accuracy could pose challenges for CIOs

Bi-directional federation as strategic play

Where the architecture breaks

What “event-native” actually means

Cloud economics changed the rules, but architectures stayed the same

Why observability needs a dedicated data layer, not another all-in-one platform

This shift breaks the lock between data and tools

What the next decade of observability will look like

The flight recorder for your entire stack

Log Types Supported

Zone-Scoped Logs

Account-Scoped Logs

Log Explorer can identify malicious activity at every stage

Identify the reconnaissance

Check for diversions

Identify the approach

Audit the identity

Stop the leak (data exfiltration)

Correlate across datasets

Session hijacking (token theft)

Post-phishing C2 beaconing

Lateral movement (Access → network probing)

Opening doors for more data

Faster data, faster response: architectural upgrades

Follow along for more updates

Get access to Log Explorer

Shadow AI reporting

How we built it

How to use it

1. Proxy your traffic with Gateway

2. Review application use

3. Mark application approval statuses

4. Enforce policies

Forensics with Cloudflare Log Explorer

Monitor security and performance issues with custom dashboards

Investigate and troubleshoot issues with Log Search

Save time and collaborate with saved queries

Monitor proactively with Custom Alerting (coming soon)

Tracking error rate for a custom hostname

Proactively detect malware

Maintain audit & compliance with flexible retention (coming soon)

How we built Log Explorer to run at Cloudflare scale

Follow along for more updates

Get access to Log Explorer

Zero Trust dataset support in Log Explorer

Investigating unauthorized access

Detecting malware

Monitoring what matters using custom dashboards

Example use cases

Seamlessly integrated into the Cloudflare platform

What’s next

How to get it