A new Booz Allen report warns that code generated by certain Chinese AI models can introduce security weaknesses into American software supply chains, especially when those models think they are writing code for U.S. government or critical infrastructure. The study compared several Chinese models to Anthropic’s Claude and found notable increases in vulnerable code under specific prompts, prompting calls to restrict their use for government work. Experts disagree on the methodology, but the core national security concern is clear: cheaper foreign models might be creating long-term risk. Lawmakers and contractors are already being urged to treat these tools as a supply-chain threat rather than a productivity hack.
Booz Allen’s central claim is blunt and hard to ignore: “The first link in the software supply chain is no longer the code. It’s the AI models behind it.” The firm examined models named Kimi, Qwen, MiniMax and DeepSeek, and put them up against Anthropic’s Claude to measure how often generated code contained exploitable problems. The report warns that when those models believed they were responding to U.S. government prompts, some produced noticeably weaker code.
The numbers the firm published are alarming on their face: Qwen and MiniMax generated code with more vulnerabilities — increases of 130% and 20%, respectively, in the government-context prompts versus neutral prompts. DeepSeek showed a smaller 5% uptick and Kimi produced code of similar quality regardless of prompt. That pattern suggests certain models may be more sensitive to context and could inadvertently introduce security gaps.
Those gaps aren’t just academic. Booz Allen defines “vulnerabilities” as “code that can be exploited by an attacker” to enable unauthorized access, data theft, system disruption, or control of affected software. Their testing flagged issues like hardcoded passwords, SQL injection risks, missing security tokens, outdated encryption and disabled security checks. Any one of those flaws can be the weak point an attacker leverages to gain a foothold.
Critics of the study raise methodological flags. “While the raised risk categories are understandable, the report’s stronger claims are not fully supported as presented,” Lukasz Olejnik said. He argued that using unnatural or politically loaded prompts could skew the results and that the report “underplays the complexity of the issue.” That pushback matters, but it doesn’t erase the risk the authors laid out.
Other experts were more receptive. Lenart Heim described the research as credible and pointed to similar work showing politically sensitive triggers can change outputs. He also noted how contextual information in real development — like license headers or an attached codebase — could effectively signal a model and change its behavior. In other words, you might not even need to tell the model it is writing for a government agency before the behavior changes.
The Booz Allen team accessed the Chinese models online rather than running them locally, a choice the firm said reflected realistic usage. That access method has its own trade-offs, and some researchers warned that remote APIs could behave differently than locally run weights. Still, from the standpoint of a contracting shop or a startup, many teams will hit these models over the internet, making the report’s approach plausible for assessing real-world exposure.
There’s a wider policy dimension here. Booz Allen recommends banning Chinese models for government and critical infrastructure work and urging contractors and the tech community to scrub code generated by these tools from supply chains. The private sector’s cost calculus can make these models attractive: they’re often cheaper and readily available, but that short-term gain could become an expensive liability if vulnerabilities slip into production software.
Those concerns are already finding sympathetic ears in Washington. “American companies shouldn’t build applications and write code with Chinese models, which introduce more cyber vulnerabilities,” Sen. Tom Cotton, R-Ark., told Fox News Digital when presented with Booz Allen’s report. That line reflects a common Republican view: when national security is at stake, prioritize trust and control over low-cost convenience.
Open-source models complicate the debate because their weights and training artifacts are inspectable, which helps audits and fixes. Still, even open systems can contain hidden weaknesses if malicious edits or poisoned data made their way into training. The sensible path is not panic but a layered defense: rigorous vetting, provenance checks, and clear rules banning risky models from sensitive projects.
For companies and government contractors the takeaway is straightforward: treat AI models as part of the supply chain and assess them accordingly. Relying on unvetted foreign models for code generation is an avoidable exposure. Policymakers and procurement officials should move quickly to set guardrails so that cheaper tools don’t become the vector that hands sensitive American data to hostile actors.
IT’S TIME TO BAN CHINESE AI APP DEEPSEEK FROM ‘GOVERNMENT DEVICES,’ STATE AGS URGE CONGRESS
AI YOU USE EVERY DAY IS BIASED — AND IT’S QUIETLY SHAPING YOUR WORLDVIEW, NEW REPORT SAYS
ANTHROPIC’S MYTHOS AI FOUND OVER 2,000 UNKNOWN SOFTWARE VULNERABILITIES IN JUST SEVEN WEEKS OF TESTING
DEEPSEEK AI BOT IS PART OF CHINA’S ‘UNRESTRICTED WARFARE’ DOCTRINE
AI MODELS CAN SECRETLY INFECT EACH OTHER