
Sailors assigned to Navy Cyber Defense Operations Command monitor, analyze, detect and respond to unauthorized activity within U.S. Navy information systems and computer networks. (DVIDS)
WASHINGTON — Large Language Models haven’t achieved human-like consciousness and transformed or shattered society — at least not yet — as prominent figures like Elon Musk suggested early in the hype cycle. But they also haven’t been crippled to the point of inutility by their tendency to “hallucinate” false answers.
Instead, generative AI is emerging as a useful tool for a wide but hardly unlimited range of purposes, from summarizing reams of regulations to drafting procurement memoranda and supply plans.
So, two years after the public unveiling of ChatGPT, 16 months after the Department of Defense launched Task Force Lima to figure out the perils and potential of generative AI, the Pentagon’s Chief Digital & AI Office (CDAO) effectively declared the new technology was adequately understood and sufficiently safeguarded to deploy. On Dec. 11 the CDAO officially wrapped up the exploratory task force a few months ahead of schedule, institutionalized its findings, and created a standing AI Rapid Capabilities Cell (AIRCC) with $100 million in seed funding to accelerate GenAI adoption across the DoD.
[This article is one of many in a series in which Breaking Defense reporters look back on the most significant (and entertaining) news stories of 2024 and look forward to what 2025 may hold.]
The AIRCC’s forthcoming pilot projects are hardly the first Pentagon deployments of GenAI. The Air Force gave its personnel access to a chatbot called NIPRGPT in June, for example, while the Army deployed a GenAI system by Ask Sage that could even be used to draft formal acquisition documents. But these two cases also show the kinds of “guardrails” the Pentagon believes are necessary to safely and responsibly use generative AI.
RELATED: In AI we trust: how DoD’s Task Force Lima can safeguard generative AI for warfighters
To start with, neither AI is on the open internet: They both run only on closed Defense Department networks — the Army cloud for Ask Sage, the DoD-wide NIPRnet for NIPRPT. That sequestration helps prevent leakage of users’ inputs, such as detailed prompts which might reveal sensitive information. Commercial chatbots, by contrast, often suck up everything their users tell them to feed their insatiable appetite for training data, and it’s possible to prompt them in such a way that they regurgitate, verbatim, the original information they’ve been fed — something the military definitely doesn’t want to happen.
Another increasingly common safeguard to run the user’s input through multiple Large Language Models and use them to doublecheck each other. Ask Sage, for instance, has over 150 different models under the hood. That way, while any individual AI may still hallucinate random absurdities, it’s unlikely that two completely different models from different makers will generate the same mistakes.
Finally, in 2024 it became a best practice in both DoD and the private sector to put generative AI on a diet, feeding it only carefully selected and trustworthy data, often using a process called Retrieval Augmented Generation (RAG). By contrast, many free public chatbots were trained on vast swathes of the Internet, without any human factchecking beforehand or any algorithmic ability to detect errors, frauds, or outright jokes — like an old Reddit post about putting glue on pizza that Google’s AI began regurgitating as a serious recipe in one notable example this year.
Some defense officials said this year they a savvy adversary could go further and deliberately insert errors into training data, “poisoning” any AI built on it to make errors they could exploit. By contrast, the Pentagon prefers AIs which are trained on official documents and other government datasets, and which cite specific pages and paragraphs as supporting evidence for their answers so the human user can double-check for themselves.
None of these safeguards is surefire, and it’s still possible for generative AI to go wrong. But at least the guardrails are now strong enough that the Pentagon feels safe to drive ahead into 2025.