Visualização normal

Antes de ontemStream principal

Security Intelligence
How red teaming helps safeguard the infrastructure behind AI models Charles Owen-Jackson
Artificial intelligence (AI) is now squarely on the frontlines of information security. However, as is often the case when the pace of technological innovation is very rapid, security often ends up being a secondary consideration. This is increasingly evident from the ad-hoc nature of many implementations, where organizations lack a clear strategy for responsible AI use. Attack surfaces aren’t just expanding due to risks and vulnerabilities in AI models themselves but also in the underlying inf
13 de Fevereiro de 2025, 11:00

How red teaming helps safeguard the infrastructure behind AI models

13 de Fevereiro de 2025, 11:00

Artificial intelligence (AI) is now squarely on the frontlines of information security. However, as is often the case when the pace of technological innovation is very rapid, security often ends up being a secondary consideration. This is increasingly evident from the ad-hoc nature of many implementations, where organizations lack a clear strategy for responsible AI use.

Attack surfaces aren’t just expanding due to risks and vulnerabilities in AI models themselves but also in the underlying infrastructure that supports them. Many foundation models, as well as the data sets used to train them, are open-source and readily available to developers and adversaries alike.

Unique risks to AI models

According to Ruben Boonen, CNE Capability Development Lead at IBM: “One problem is that you have these models hosted on giant open-source data stores. You don’t know who created them or how they were modified, and there are a number of issues that can occur here. For example, let’s say you use PyTorch to load a model hosted on one of these data stores, but it has been changed in a way that’s undesirable. It can be very hard to tell because the model might behave normally in 99% of cases.”

Recently, researchers discovered thousands of malicious files hosted on Hugging Face, one of the largest repositories for open-source generative AI models and training data sets. These included around a hundred malicious models capable of injecting malicious code onto users’ machines. In one case, hackers set up a fake profile masquerading as genetic testing startup 23AndMe to deceive users into downloading a compromised model capable of stealing AWS passwords. It was downloaded thousands of times before finally being reported and removed.

In another recent case, red team researchers discovered vulnerabilities in ChatGPT’s API, in which a single HTTP request elicited two responses indicating an unusual code path that could theoretically be exploited if not addressed. This, in turn, could lead to data leakage, denial of service attacks and even escalation of privileges. The team also discovered vulnerabilities in plugins for ChatGPT, potentially resulting in account takeover.

While open-source licensing and cloud computing are key drivers of innovation in the AI space, they’re also a source of risk. On top of these AI-specific risk areas, general infrastructure security concerns also apply, such as vulnerabilities in cloud configurations or poor monitoring and logging processes.

AI models are the new frontier of intellectual property theft

Imagine pouring huge amounts of financial and human resources into building a proprietary AI model, only to have it stolen or reverse-engineered. Unfortunately, model theft is a growing problem, not least because AI models often contain sensitive information and can potentially reveal an organization’s secrets should they end up in the wrong hands.

One of the most common mechanisms for model theft is model extraction, whereby attackers access and exploit models through API vulnerabilities. This can potentially grant them access to black-box models — like ChatGPT — at which point they can strategically query the model to collect enough data to reverse engineer it.

In most cases, AI systems run on cloud architecture rather than local machines. After all, the cloud provides the scalable data storage and processing power required to run AI models easily and accessibly. However, that accessibility also increases the attack surface, allowing adversaries to exploit vulnerabilities like misconfigurations in access permissions.

“When companies provide these models, there are usually client-facing applications delivering services to end users, such as an AI chatbot. If there’s an API that tells it which model to use, attackers could attempt to exploit it to access an unreleased model,” says Boonen.

Red teams keep AI models secure

Protecting against model theft and reverse engineering requires a multifaceted approach that combines conventional security measures like secure containerization practices and access controls, as well as offensive security measures.

The latter is where red teaming comes in. Red teams can proactively address several aspects of AI model theft, such as:

API attacks: By systematically querying black-box models in the same way adversaries would, red teams can identify vulnerabilities like suboptimal rate limiting or insufficient response filtering.
Side-channel attacks: Red teams can also carry out side-channel analyses, in which they monitor metrics like CPU and memory usage in an attempt to glean information about the model size, architecture or parameters.
Container and orchestration attacks: By assessing containerized AI dependencies like frameworks, libraries, models and applications, red teams can identify orchestration vulnerabilities, such as misconfigured permissions and unauthorized container access.
Supply chain attacks: Red teams can probe entire AI supply chains spanning multiple dependencies hosted in different environments to ensure that only trusted components like plugins and third-party integrations are being used.

A thorough red teaming strategy can simulate the full scope of real-world attacks against AI infrastructure to reveal gaps in security and incident response plans that could lead to model theft.

Mitigating the problem of excessive agency in AI systems

Most AI systems have a degree of autonomy with regard to how they interface with different systems and respond to prompts. After all, that’s what makes them useful. However, if systems have too much autonomy, functionality or permissions — a concept OWASP calls “excessive agency” — they can end up triggering harmful or unpredictable outputs and processes or leaving gaps in security.

Boonen warns that components, such as optical character recognition (OCR) for PDF files and images which multimodal systems rely on to process inputs, “can introduce vulnerabilities if they’re not properly secured”.

Granting an AI system excessive agency also expands the attack surface unnecessarily, thus giving adversaries more potential entry points. Typically, AI systems designed for enterprise use are integrated into much broader environments spanning multiple infrastructures, plugins, data sources and APIs. Excessive agency is what happens when these integrations result in an unacceptable trade-off between security and functionality.

Let’s consider an example where an AI-powered personal assistant has direct access to an individual’s Microsoft Teams meeting recordings stored in OneDrive for Business, the purpose being to summarize content in those meetings in a readily accessible written format. However, let’s imagine that the plugin doesn’t only have the ability to read meeting recordings but also everything else stored in the user’s OneDrive account, in which many confidential information assets are also stored. Perhaps the plugin even has write capabilities, in which case a security flaw could potentially grant attackers an easy pathway for uploading malicious content.

Once again, red teaming can help identify flaws in AI integrations, especially in environments where many different plugins and APIs are in use. Their simulated attacks and comprehensive analyses will be able to identify vulnerabilities and inconsistencies in access permissions, as well as cases where access rights are unnecessarily lax. Even if they don’t identify any security vulnerabilities, they will still be able to provide insight into how to reduce the attack surface.

The post How red teaming helps safeguard the infrastructure behind AI models appeared first on Security Intelligence.

Security Intelligence
Stress-testing multimodal AI applications is a new frontier for red teams Charles Owen-Jackson
Human communication is multimodal. We receive information in many different ways, allowing our brains to see the world from various angles and turn these different “modes” of information into a consolidated picture of reality. We’ve now reached the point where artificial intelligence (AI) can do the same, at least to a degree. Much like our brains, multimodal AI applications process different types — or modalities — of data. For example, OpenAI’s ChatGPT 4.0 can reason across text, vision and a
5 de Fevereiro de 2025, 14:00

Stress-testing multimodal AI applications is a new frontier for red teams

Security Intelligence

Por:Charles Owen-Jackson

5 de Fevereiro de 2025, 14:00

Human communication is multimodal. We receive information in many different ways, allowing our brains to see the world from various angles and turn these different “modes” of information into a consolidated picture of reality.

We’ve now reached the point where artificial intelligence (AI) can do the same, at least to a degree. Much like our brains, multimodal AI applications process different types — or modalities — of data. For example, OpenAI’s ChatGPT 4.0 can reason across text, vision and audio, granting it greater contextual awareness and more humanlike interaction.

However, while these applications are clearly valuable in a business environment that’s laser-focused on efficiency and adaptability, their inherent complexity also introduces some unique risks.

According to Ruben Boonen, CNE Capability Development Lead at IBM: “Attacks against multimodal AI systems are mostly about getting them to create malicious outcomes in end-user applications or bypass content moderation systems. Now imagine these systems in a high-risk environment, such as a computer vision model in a self-driving car. If you could fool a car into thinking it shouldn’t stop even though it should, that could be catastrophic.”

Multimodal AI risks: An example in finance

Here’s another possible real-world scenario:

An investment banking firm uses a multimodal AI application to inform its trading decisions, processing both textual and visual data. The system uses a sentiment analysis tool to analyze text data, such as earnings reports, analyst insights and news feeds, to determine how market participants feel about specific financial assets. Then, it conducts a technical analysis of visual data, such as stock charts and trend analysis graphs, to offer insights into stock performance.

An adversary, a fraudulent hedge fund manager, then targets vulnerabilities in the system to manipulate trading decisions. In this case, the attacker launches a data poisoning attack by flooding online news sources with fabricated stories about specific markets and financial assets. Next, they launch an adversarial attack by making pixel-level manipulations — known as perturbations — to stock performance charts that are imperceptible to the human eye but enough to exploit the AI’s visual analysis abilities.

The result? Due to the manipulated input data and false signals, the system recommends buying orders at artificially inflated stock prices. Unaware of the exploit, the company follows the AI’s recommendations, while the attacker, holding shares in the target assets, sells them for an ill-gotten profit.

Getting there before adversaries

Now, let’s imagine that the attack wasn’t really carried out by a fraudulent hedge fund manager but was instead a simulated attack by a red team specialist with the goal of discovering the vulnerability before a real-world adversary could.

By simulating these complex, multifaceted attacks in safe, sandboxed environments, red teams can reveal potential vulnerabilities that traditional security systems are almost certain to miss. This proactive approach is essential for fortifying multimodal AI applications before they end up in a production environment.

According to the IBM Institute of Business Value, 96% of executives agree that the adoption of generative AI will increase the chances of a security breach in their organizations within the next three years. The rapid proliferation of multimodal AI models will only be a force multiplier of that problem, hence the growing importance of AI-specialized red teaming. These specialists can proactively address the unique risk that comes with multimodal AI: cross-modal attacks.

Cross-modal attacks: Manipulating inputs to generate malicious outputs

A cross-modal attack involves inputting malicious data in one modality to produce malicious output in another. These can take the form of data poisoning attacks during the model training and development phase or adversarial attacks, which occur during the inference phase once the model has already been deployed.

“When you have multimodal systems, they’re obviously taking input, and there’s going to be some kind of parser that reads that input. For example, if you upload a PDF file or an image, there’s an image-parsing or OCR library that extracts data from it. However, those types of libraries have had issues,” says Boonen.

Cross-modal data poisoning attacks are arguably the most severe since a major vulnerability could necessitate the entire model being retrained on an updated data set. Generative AI uses encoders to transform input data into embeddings — numerical representations of the data that encode relationships and meanings. Multimodal systems use different encoders for each type of data, such as text, image, audio and video. On top of that, they use multimodal encoders to integrate and align data of different types.

In a cross-modal data poisoning attack, an adversary with access to training data and systems could manipulate input data to make encoders generate malicious embeddings. For example, they might deliberately add incorrect or misleading text captions to images so that the encoder misclassifies them, resulting in an undesirable output. In cases where the correct classification of data is crucial, as it is in AI systems used for medical diagnoses or autonomous vehicles, this can have dire consequences.

Red teaming is essential for simulating such scenarios before they can have real-world impact. “Let’s say you have an image classifier in a multimodal AI application,” says Boonen. “There are tools that you can use to generate images and have the classifier give you a score. Now, let’s imagine that a red team targets the scoring mechanism to gradually get it to classify an image incorrectly. For images, we don’t necessarily know how the classifier determines what each element of the image is, so you keep modifying it, such as by adding noise. Eventually, the classifier stops producing accurate results.”

Vulnerabilities in real-time machine learning models

Many multimodal models have real-time machine learning capabilities, learning continuously from new data, as is the case in the scenario we explored earlier. This is an example of a cross-modal adversarial attack. In these cases, an adversary could bombard an AI application that’s already in production with manipulated data to trick the system into misclassifying inputs. This can, of course, happen unintentionally, too, hence why it’s sometimes said that generative AI is getting “dumber.”

In any case, the result is that models that are trained and/or retrained by bad data inevitably end up degrading over time — a concept known as AI model drift. Multimodal AI systems only exacerbate this problem due to the added risk of inconsistencies between different data types. That’s why red teaming is essential for detecting vulnerabilities in the way different modalities interact with one another, both during the training and inference phases.

Red teams can also detect vulnerabilities in security protocols and how they’re applied across modalities. Different types of data require different security protocols, but they must be aligned to prevent gaps from forming. Consider, for example, an authentication system that lets users verify themselves either with voice or facial recognition. Let’s imagine that the voice verification element lacks sufficient anti-spoofing measures. Chances are, the attacker will target the less secure modality.

Multimodal AI systems used in surveillance and access control systems are also subject to data synchronization risks. Such a system might use video and audio data to detect suspicious activity in real-time by matching lip movements captured on video to a spoken passphrase or name. If an attacker were to tamper with the feeds, resulting in a slight delay between the two, they could mislead the system using pre-recorded video or audio to gain unauthorized access.

Getting started with multimodal AI red teaming

While it’s admittedly still early days for attacks targeting multimodal AI applications, it always pays to take a proactive stance.

As next-generation AI applications become deeply ingrained in routine business workflows and even security systems themselves, red teaming doesn’t just bring peace of mind — it can uncover vulnerabilities that will almost certainly go unnoticed by conventional, reactive security systems.

Multimodal AI applications present a new frontier for red teaming, and organizations need their expertise to ensure they learn about the vulnerabilities before their adversaries do.

The post Stress-testing multimodal AI applications is a new frontier for red teams appeared first on Security Intelligence.

Security Intelligence
AI and cloud vulnerabilities aren’t the only threats facing CISOs today Charles Owen-Jackson
With cloud infrastructure and, more recently, artificial intelligence (AI) systems becoming prime targets for attackers, security leaders are laser-focused on defending these high-profile areas. They’re right to do so, too, as cyber criminals turn to new and emerging technologies to launch and scale ever more sophisticated attacks. However, this heightened attention to emerging threats makes it easy to overlook traditional attack vectors, such as human-driven social engineering and vulnerabilit
29 de Janeiro de 2025, 11:00

AI and cloud vulnerabilities aren’t the only threats facing CISOs today

Security Intelligence

Por:Charles Owen-Jackson

29 de Janeiro de 2025, 11:00

With cloud infrastructure and, more recently, artificial intelligence (AI) systems becoming prime targets for attackers, security leaders are laser-focused on defending these high-profile areas. They’re right to do so, too, as cyber criminals turn to new and emerging technologies to launch and scale ever more sophisticated attacks.

However, this heightened attention to emerging threats makes it easy to overlook traditional attack vectors, such as human-driven social engineering and vulnerabilities in physical security.

As adversaries exploit an ever-wider range of potential entry points — both new and old — security leaders must strike a balance to ensure that they’re capable of addressing all risks effectively.

Cyber crime is still a human problem

Despite overwhelming hype, technology is not a panacea. It can’t replace human expertise in every domain, and AI alone can’t match the innately human qualities of intuition and creative thinking. Adversaries know this too, which is why the smarter — and much more dangerous — ones use a blend of human- and technology-powered tactics.

While major technical vulnerabilities tend to make the headlines, the reality is that the weakest link is almost always the human element. Almost all attacks involve a social engineering element, and despite the buzz around generative AI and deepfakes helping scale such attacks, it’s human-to-human interaction where the greatest risks lie.

Synthetic content is now all around us, and people are getting better at telling it apart. Whether we get to the point when that’s no longer the case is a topic for another discussion. But for now, the most dangerous and effective social engineering attacks still depend primarily on human conversations, whether by phone, email or even in person. After all, a seasoned attacker can build trust and forge sham relationships in a way that no AI nor deepfake can match.

Cyber espionage remains a serious threat

Take state-sponsored cyber espionage, for example. Highly trained social engineers are a far cry from the typical rabble of independent cyber crime rackets operating off the dark web, who tend to rely more on scale than targeting specific enterprises and individuals. These attackers may target data systems, but when it comes to their own arsenals, their talents in manipulation and deception are by far their greatest weapons.

Technology still has a long way to go before it can come close to matching the age-old tactics of spycraft.

When facing an attacker who can pose effectively as an internal employee or any other trusted individual, someone relying solely on technology to mitigate the threat stands little chance of protecting themselves. That isn’t a technology failure. It’s a process failure, hence why the human element must always be a key factor in any cybersecurity strategy.

Of course, that’s not to say technology doesn’t have a vital role to play in bolstering your cyber defenses. It most certainly does, not least, because more and more routine threats are being automated or are carried out en-masse by attackers who are less skilled or experienced. The value of technology — especially AI-powered cybersecurity automation — exists primarily in its ability to free up time for security leaders to focus on the threats that technology alone can’t solve.

Explore cybersecurity services

It’s not all about the cloud, either

The majority of business data is now stored in the cloud, and the percentage continues to rise. Many businesses, especially smaller organizations and startups, exclusively use the cloud for data storage and other IT operations. The rise of AI, given how computationally demanding it is, is further accelerating cloud adoption.

Nonetheless, cloud computing isn’t the best option in all situations. On-premises remains the preferred choice for high-performance workloads that require extremely low latencies. In some cases, on-premises computing is also the cheaper option, and that’s unlikely to change in the near future.

Even though more companies are migrating to the cloud, that doesn’t mean they don’t keep sensitive data on-site. For instance, edge computing, which brings data processing closer to where it’s needed, has become a critical enabler in certain use cases. Examples include smart energy grids, remote monitoring of industrial assets and autonomous vehicles. These include cases where you can’t always rely on internet connectivity.

The smarter and better-funded adversaries aren’t just targeting cloud-hosted infrastructure. They’re also setting their sights on local servers and cyber-physical systems, such as industrial control systems and hardware supply chains. The fact that there’s often minimal collaboration between logistics, production and cybersecurity departments makes these risks all the more serious.

Ransomware remains one of the biggest threats targeting on-premises systems despite the small reduction in attacks over the last year. While cloud systems aren’t inherently immune from ransomware attacks, the vast majority target bare-metal hypervisors and local servers. In one recent case, the Akira ransomware group reverted to its earlier double extortion tactics, experimenting with different code frameworks to target systems running ESXi and Linux.

Botnets are another growing concern as the number of IoT devices continues to soar. Used to launch distributed denial of service (DDoS) attacks spanning thousands of devices, these botnets primarily target unsecured IoT devices, like those that monitor and operate industrial machines and critical infrastructure. One recent report discovered that DDoS attacks against critical infrastructure have increased by 55% in the last four years. These attacks don’t directly involve the exfiltration of sensitive data, but given how they can cause widespread disruption, adversaries may rely on them to draw attention away from more serious threats.

Why physical security is still relevant

As security leaders focus on locking down their cloud-hosted assets, they cannot afford to lose sight of the risks facing their physical infrastructure. Sometimes, the easiest way into the cloud is from within.

Even thin clients and dumb terminals — both widely used in high-security environments like healthcare and finance — can potentially give attackers a foothold in wider systems, including cloud infrastructure and remote data centers. Edward Snowden proved that while working at the National Security Agency when he exfiltrated 20,000 government documents stored on the servers in NSA’s headquarters 5,000 miles away. He did so without using any advanced technology. While that happened way back in 2013, and the NSA has long since updated its physical security protocols, the risk is just as relevant today as it was then.

While most thin clients are now protected by multiple layers of security, including encryption and multifactor authentication, these solutions alone can’t fully protect against physical compromise. If an attacker gains access to a terminal — perhaps by way of social engineering — they may be able to compromise it using unauthorized peripherals or by directly manipulating the device’s firmware. This could give them access to the wider network, potentially allowing for the injection of customized malware that goes undetected by regular security scans.

IoT devices are another leading reason behind the expansion of attack surfaces. They often lack adequate security, also giving attackers a potential entry point into the broader computing infrastructures they’re connected to. The fact that these connected technologies are being rolled out en masse in areas like smart cities, critical infrastructure and transportation networks, greatly magnifies such vulnerabilities.

Ultimately, if an attacker is able to get past your physical safeguards, then these connected systems present far easier pathways to an organization’s so-called “crown jewels” than trying to break through multi-layered cloud defenses.

Cloud data is not always the true target

In other cases, data hosted in the cloud might not be the attacker’s end goal. Many companies, such as those subject to stringent data residency regulations or that require high performance for real-time applications, still store their data on on-premises servers.

Some of these systems are air-gapped, meaning they’re entirely disconnected from any other networks, including the Internet itself. While more secure than any cloud-hosted server, at least in theory, their security can’t be taken for granted. For instance, anyone with physical access to the servers may be able to compromise them, either maliciously or accidentally.

Physical security, such as CCTV and biometric security checkpoints, is as important as ever in such cases. But it’s not just about protecting against intentional physical tampering. Indirect attacks orchestrated by highly skilled social engineers can also dupe unsuspecting employees into taking a desired action — such as lending them a biometric security access card.

These are not the sort of adversaries that usually work by email or use AI to scale their attacks – they’re far likelier to deceive someone in person, a tactic as old as humanity itself. In fact, the attacker could be anyone, such as a disgruntled former employee, a hacker operating in the interests of a rival company or even a rogue state.

Bridging the gap between digital and human security

Technology alone can’t protect an organization from the myriad threats out there, and neither can humans keep up with ever-expanding system logs and security information feeds if they’re relying solely on manual processes.

The reality is that you need both, starting with people and using technology to broaden their capabilities. A layered security strategy should typically start with locking down physical access to any data-bearing system or system that is connected to another.

The next layer of defense is the human one. This revolves heavily around security awareness training. But the reality is that many programs are ineffective, either because they lack practical application, are overly reliant on generic content or focus too much on technical factors that are beyond the target audience’s understanding.

Phishing simulations are often similarly limited in their scope, focusing on common lures like trending news topics, a sense of urgency or even outright threats. However, more sophisticated attackers tend to use subtler ways to elicit a response. This could be something as simple as sending messages about a routine policy update regarding company dress code or remote work guidelines. These topics might seem trivial, but they can pique interest, especially when they concern changes to daily routines and work-life balance. Attackers could then use this to dupe unsuspecting victims into divulging sensitive information via a sham survey.

Like any other security measure, physical systems and awareness training will only ever be effective if they’re tested regularly. That’s where physical red teaming comes in. Whereas red teaming in the context of IT focuses on technical measures like penetration testing, physical red teaming is all about having teams try to gain entry to restricted areas and systems. To do so, they might use a blend of simulated social engineering attacks and technology to hack into physical security systems. By attempting to bypass physical security barriers or impersonate staff, red teams can reveal gaps that might otherwise go unnoticed. That’s what makes them a valuable part of any comprehensive information security program.

The post AI and cloud vulnerabilities aren’t the only threats facing CISOs today appeared first on Security Intelligence.

Security Intelligence
Testing the limits of generative AI: How red teaming exposes vulnerabilities in AI models Charles Owen-Jackson
With generative artificial intelligence (gen AI) on the frontlines of information security, red teams play an essential role in identifying vulnerabilities that others can overlook. With the average cost of a data breach reaching an all-time high of $4.88 million in 2024, businesses need to know exactly where their vulnerabilities lie. Given the remarkable pace at which they’re adopting gen AI, there’s a good chance that some of those vulnerabilities lie in AI models themselves — or the data us
17 de Dezembro de 2024, 11:00

Testing the limits of generative AI: How red teaming exposes vulnerabilities in AI models

Security Intelligence

Por:Charles Owen-Jackson

17 de Dezembro de 2024, 11:00

With generative artificial intelligence (gen AI) on the frontlines of information security, red teams play an essential role in identifying vulnerabilities that others can overlook.

With the average cost of a data breach reaching an all-time high of $4.88 million in 2024, businesses need to know exactly where their vulnerabilities lie. Given the remarkable pace at which they’re adopting gen AI, there’s a good chance that some of those vulnerabilities lie in AI models themselves — or the data used to train them.

That’s where AI-specific red teaming comes in. It’s a way to test the resilience of AI systems against dynamic threat scenarios. This involves simulating real-world attack scenarios to stress-test AI systems before and after they’re deployed in a production environment. Red teaming has become vitally important in ensuring that organizations can enjoy the benefits of gen AI without adding risk.

IBM’s X-Force Red Offensive Security service follows an iterative process with continuous testing to address vulnerabilities across four key areas:

Model safety and security testing
Gen AI application testing
AI platform security testing
MLSecOps pipeline security testing

In this article, we’ll focus on three types of adversarial attacks that target AI models and training data.

Prompt injection

Most mainstream gen AI models have safeguards built in to mitigate the risk of them producing harmful content. For example, under normal circumstances, you can’t ask ChatGPT or Copilot to write malicious code. However, methods such as prompt injection attacks and jailbreaking can make it possible to work around these safeguards.

One of the goals of AI red teaming is to deliberately make AI “misbehave” — just as attackers do. Jailbreaking is one such method that involves creative prompting to get a model to subvert its safety filters. However, while jailbreaking can theoretically help a user carry out an actual crime, most malicious actors use other attack vectors — simply because they’re far more effective.

Prompt injection attacks are much more severe. Rather than targeting the models themselves, they target the entire software supply chain by obfuscating malicious instructions in prompts that otherwise appear harmless. For instance, an attacker might use prompt injection to get an AI model to reveal sensitive information like an API key, potentially giving them back-door access to any other systems that are connected to it.

Red teams can also simulate evasion attacks, a type of adversarial attack whereby an attacker subtly modifies inputs to trick a model into classifying or misinterpreting an instruction. These modifications are usually imperceptible to humans. However, they can still manipulate an AI model into taking an undesired action. For example, this might include changing a single pixel in an input image to fool the classifier of a computer vision model, such as one intended for use in a self-driving vehicle.

Explore X-Force Red Offensive Security Services

Data poisoning

Attackers also target AI models during training and development, hence it’s essential that red teams simulate the same attacks to identify risks that could compromise the whole project. A data poisoning attack happens when an adversary introduces malicious data into the training set, thereby corrupting the learning process and embedding vulnerabilities into the model itself. The result is that the entire model becomes a potential entry point for further attacks. If training data is compromised, it’s usually necessary to retrain the model from scratch. That’s a highly resource-intensive and time-consuming operation.

Red team involvement is vital from the very beginning of the AI model development process to mitigate the risk of data poisoning. Red teams simulate real-world data poisoning attacks in a secure sandbox environment air-gapped from existing production systems. Doing so provides insights into how vulnerable the model is to data poisoning and how real threat actors might infiltrate or compromise the training process.

AI red teams can proactively identify weaknesses in data collection pipelines, too. Large language models (LLMs) often draw data from a huge number of different sources. ChatGPT, for example, was trained on a vast corpus of text data from millions of websites, books and other sources. When building a proprietary LLM, it’s crucial that organizations know exactly where they’re getting their training data from and how it’s vetted for quality. While that’s more of a job for security auditors and process reviewers, red teams can use penetration testing to assess a model’s ability to resist flaws in its data collection pipeline.

Model inversion

Proprietary AI models are usually trained, at least partially, on the organization’s own data. For instance, an LLM deployed in customer service might use the company’s customer data for training so that it can provide the most relevant outputs. Ideally, models should only be trained based on anonymized data that everyone is allowed to see. Even then, however, privacy breaches may still be a risk due to model inversion attacks and membership inference attacks.

Even after deployment, gen AI models can retain traces of the data that they were trained on. For instance, the team at Google’s DeepMind AI research laboratory successfully managed to trick ChatGPT into leaking training data using a simple prompt. Model inversion attacks can, therefore, allow malicious actors to reconstruct training data, potentially revealing confidential information in the process.

Membership inference attacks work in a similar way. In this case, an adversary tries to predict whether a particular data point was used to train the model through inference with the help of another model. This is a more sophisticated method in which an attacker first trains a separate model – known as a membership inference model — based on the output of the model they’re attacking.

For example, let’s say a model has been trained on customer purchase histories to provide personalized product recommendations. An attacker may then create a membership inference model and compare its outputs with those of the target model to infer potentially sensitive information that they might use in a targeted attack.

In either case, red teams can evaluate AI models for their ability to inadvertently leak sensitive information directly or indirectly through inference. This can help identify vulnerabilities in training data workflows themselves, such as data that hasn’t been sufficiently anonymized in accordance with the organization’s privacy policies.

Building trust in AI

Building trust in AI requires a proactive strategy, and AI red teaming plays a fundamental role. By using methods like adversarial training and simulated model inversion attacks, red teams can identify vulnerabilities that other security analysts are likely to miss.

These findings can then help AI developers prioritize and implement proactive safeguards to prevent real threat actors from exploiting the very same vulnerabilities. For businesses, the result is reduced security risk and increased trust in AI models, which are fast becoming deeply ingrained across many business-critical systems.

The post Testing the limits of generative AI: How red teaming exposes vulnerabilities in AI models appeared first on Security Intelligence.