CHAPTER 2

External Threats

The wide and rapid adoption of AI technologies has created opportunities for both the attackers and the defenders.

By far the most common way for an attacker to get a foothold within a target network is via phishing email. Out of 11,340 cybersecurity products tracked in the IT-Harvest Dashboard, 596 are meant to counter phishing attacks.

One of the first uses for GenAI was the creation of more convincing phishing emails — first to evade detection, but second to induce the recipient to click on a link. GenAI is very good at creating copy. It can be fed data from intelligence sources to customize an email or LinkedIn message to fit the recipient. They appear to be personal and have none of the traditional “tells” of phishing emails.

GenAI takes scams and attacks to a new level with the generation of deep fakes. In the past, a business email compromise (BEC) was limited to an email or text asking an employee to wire funds to the attacker’s bank account. Now it could be a Zoom meeting with full video mimicking the CEO asking someone to skirt financial controls. Deep fakes go beyond scams though. A fake video can be used to compromise a person’s or a brand’s reputation. Imagine the next artificially generated bank run with fake news stories accompanied by apparent live video reports of people queuing up at bank branches and ATMs.

Just as LLMs are making coding and creative tasks easier for developers, artists, and writers, they are also accelerating the productivity of attackers.

New Zero Days

If an AI can read code, then it can also identify vulnerabilities. A researcher has millions of lines of code to examine in the opensource repositories. Finding a new vulnerability and generating an exploit is something that is easily automated. If that exploit works against a commonly deployed app or function, it could have the power to wreak havoc, or more likely empower the next ransomware or initial access attack. With the release of Claude 4.6 in February 2026, Anthropic revealed that it had used the new model to discover 500 new zero days in opensource code repositories.

Recall the devastation caused by a pre-LLM exploit: EternalBlue. This exploit of a vulnerability in Microsoft Windows is attributed to intelligence operatives at the National Security Agency (NSA). It was leaked with a trove of other data by a group called The Shadow Brokers in April 2017. While Microsoft had patched it by March that same year, a rival intelligence unit at the Russian Military spy agency, GRU, weaponized the exploit, infiltrated a small Ukrainian accounting software company (MEDoc), and inserted the exploit in a software update. When that update went out to customers, it rapidly spread within every environment it touched, causing more than $10 billion in damage claims by large pharmaceutical, shipping, and transport companies. It was called NotPetya. As an aside, the technique of infiltrating a vendor’s software development process and inserting malicious code that gets distributed to all of its customers is not uncommon. An early variant, dubbed FLAME, bypassed the infiltration part. Also attributed to the NSA, FLAME used a hash collision to mimic the signed code that a victim’s computer would expect from a real Microsoft update. In this way the attackers could get a foothold on any PC or server on their victim’s network. Creating a hash collision had been a theoretical attack in the past. It requires lots of calculations on thousands of computers. It is actually almost the same process as used for proof of work in the mining of Bitcoins. That is the same infrastructure, GPUs, that is proving to be even more useful for training and using AI models. There was a similar infiltration and manipulation of software against SolarWinds using SUNBURST malicious code, this time attributed to the SVR, a rival spy agency to GRU. As many as 18,000 SolarWinds customers downloaded and installed a software update that contained backdoors.

NotPetya, FLAME, and SUNBURST demonstrate that attacking the software development lifecycle has become common. Therefore, contrary to advice from security pundits, it may be wise to delay updating for as long as possible to avoid being patient zero. Imagine if Delta Air Lines did not have a policy of accepting all updates from CrowdStrike immediately upon release. The airline would not have experienced any downtime from the infamous CrowdStrike disaster of July 2024.

Many organizations and researchers are working hard to prevent future incidents. For example, MIT researchers, led by Senior Research Scientist Una-May O’Reilly at MIT CSAIL, are developing AI-driven adversarial agents that mimic sophisticated cyber attackers. At the same time, Anthropic has reported that it detected a Chinese group using Claude to engage in just such attacks as MIT is hypothesizing.

Now imagine a fully automated process let loose against a target. What Anthropic described in its report was a hacking co-pilot. The attackers used Claude along the way, much like script kiddies and cyber criminals have been using tools and Google search for decades to walk them through the process.

A fully automated AI attack would have agency. It would be given a prompt and a target and set loose on the internet. The possibilities are endless, and it doesn’t take long to realize the danger of what an AI attack could achieve. For example, consider the following hypothetical prompts:

“Propagate a BGP route announcement that funnels all traffic bound for AWS East to whitehouse.gov.”

Or “Infiltrate the data centers of Mastercard and intercept all credit card transactions. Charge an additional $1 to each credit card and put the funds in this merchant account in Minsk.”

Or “Infiltrate Eglin Air Force Base in Florida and change the maps for all F-35 Joint Strike Fighter mission deployments to be offset 10 kilometers.”

You get the idea. The point is that the days of tracking mean time to detect (MTTD), and mean time to repair (MTTR), have passed. The average dwell time for an attacker is 16-21 days. The-best-in-class security teams can get that down to hours. In the looming future, the time to respond will be measured in minutes, if not seconds.

Automated defenses are called for.

Attacks on Models

The widespread use of AI is an extension of the attack surface. Now both foundational models from OpenAI, Anthropic, etc., and models deployed within organizations, have to be defended against multiple types of attacks. Data poisoning is one attack methodology. A malicious actor could seek to change the behavior of a model by changing the training data, or changing the weights after training. In this way, errors could be introduced or general sentiment of output could be changed. Prompt injection is a technique by which the user of an LLM attempts to break its guardrails with cleverly constructed prompts.

Indirect prompt injection is a technique that somehow intercepts a prompt and changes or adds to it so that the output is corrupted. An indirect prompt injection can be very hard to detect. When Google Gemini added its AI assistant to gmail, researchers demonstrated that they could send a victim an email with whited out instructions to issue a security warning. When the victim clicked on “Summarize email,” Google Gemini would read the hidden text and take action as if it was an instruction in a prompt. The user would see an URGENT message to click a link.

Attacks on Agents

As agents are empowered with access to tools, they become targets. Now they have user privileges and access to all the data those tools have access to.

Early versions of the rapidly adopted OpenClaw came with issues. A remote code execution vulnerability was discovered and exploited that gave access to anyone’s “Clawdbot” and the system it was running on. OpenClaw’s own Skills marketplace was of course filled with skills files that contained backdoors or took other malicious actions. Agents are the new frontier of applied AI and will have to be protected. Providing credentials and identities to agents is a popular approach. Monitoring and defending MPC gateways is another.

It is safe to assume that any attack vector of the past is going to be amplified by AI in the future. Scams, phishing, and business email compromise will of course be influenced by AI. And that’s just the beginning — DDoS, worms, network infiltration, and other vectors will also be leveraged by newly empowered attackers. It is no surprise that there are already so many solutions to AI security issues.