Wednesday, May 21, 2025

Escalating Risks in AI

Advancements in Artificial Intelligence technology continue to reflect the escalating risks posed by the technology itself and anyone trying to invest in it. Two stories receiving coverage in various outlets highlight the unpredictability of the technology and its impacts on society. The first story involved a decision made by Microsoft to release source code for its integration between VSCode and its GitHub Copilot platform. The second story involved a white paper published by engineers at OpenAI involved with the development of its AI technologies and an unexpected, disturbing lesson those engineers reached while trying to influence the "learning" of their AI system.


AI Meets PacMan

In a quickly evolving technology field, it is common to see various competitors attempt to gain attention with flashy presentations at trade shows or put a wet blanket on other firm's attempts to do the same. The AI space has proven unique in this regard in the sense that this continual jockeying doesn't involve announcements about products that might move a few thousand units in their first months of release, it involves private decisions to make bets worth millions and billions of dollars whose value can be diminished if not zeroed out only days later. This dynamic of the AI market was again demonstrated on May 19, 2025 when Microsoft announced a decision to release source code for a key module in its VSCode developer tool as open-source code under the MIT license. The reasoning behind and impact of this decision require a bit of explanation and unpacking.

Since the late 1990s, Microsoft has offered a platform called Visual Studio that provided a unified set of tools software developers needed to create applications to run atop Microsoft Windows. Over time, Microsoft generalized Visual Studio to support development in multiple languages of applications for multiple operating systems. However, Visual Studio itself remained based upon libraries only present within Windows so it only ran on Windows PCs. By 2015, Microsoft decided to develop an alternate platform with non-Windows libraries that could also develop applications for multiple platforms in multiple languages but would itself run on multiple platforms. This new platform was (confusingly…) named Visual Studio Code. (Most developers refer to it as VSCode to differentiate it from the older Visual Studio product.)

VSCode was released in 2015 but, by 2019, it reached a "market share" of nearly 50% and a survey by Stack Overflow in 2024 showed 74% of developers using VSCode. It's worth noting that the use of developer tools is never an EXCLUSIVE so these "share" numbers add up to exceed 100% because most developers use three or four based upon the work required. VSCode's user interface is not THE best for EVERY programming project scenario but its popularity grew because it proved easy to customize using extensions that ran atop its framework. This allowed developers in a variety of niches to create extensions that automated common tasks unique to specific project types (like creating code for a microcontroller versus a Python app versus a Java application). Microsoft further encouraged this external innovation by open-sourcing the code to VSCode. Well... Sort of... The core code controlling the GUI and the interface for extensions within VSCode is all open-source but many of the custom extensions built for VSCode are NOT released as open-source.

Given VSCode's partial open-source basis and its popularity, the spike in availability of different AI platforms has led to multiple attempts to "fork" VSCode and create optimizations specific to some of these new AI tools. The firms creating these AI platforms have noticed this trend and often support these efforts by providing funding to those developing these forks in the hope support of a custom AI integration will boost adoption of their AI platform.

That was certainly the thought when OpenAI itself decided around April 26, 2025 to spend $3 billion dollars to purchase a firm called Windsurf. Windsurf, under its prior name Codeium, developed an extension for VSCode that used APIs into various chat front-ends into multiple AI platforms to simplify how a developer would pose questions ("how do I protect methods in a web service based upon the role assigned to the userid who authenticated this request?") to an AI, review the AI's suggestions, refine them as needed then eventually paste answers from the AI back into source code being edited within VSCode.

The core team at Codeium began building this tool in earnest in mid-2022. The first release wasn't made available until November 13, 2024 but by August 29, 2024 the company had announced a new funding round that reflected a $1.25 billion dollar valuation of the firm. As recently as March 8 of 2025, stories in the press were still making references to valuations in the $1 billion dollar range, yet OpenAI decided to buy Windsurf outright for $3 billion dollars on April 26, 2025 and closed the deal on May 15, roughly 20 days later.

Why was OpenAI willing to spend $3 billion dollars to buy a firm that had recently been evaluated at only $1 to $1.25 billion? Especially when that valuation was assigned by investors who presumably did due diligence to come up with that number? Frankly, the answer is not clear. The portion of Codeium driving its user interface and integration hooks out to external AI systems are open source but its core AI logic used to interpret results from external AIs and unify them into a "final" answer for the user uses a proprietary execution engine that runs near the user on servers the user controls rather than somewhere in the cloud. For users not wanting to send their proprietary code into a cloud for analysis, that's a plus, besides the fact that the local AI engine can be more specifically trained on local proprietary systems and result in a smaller / faster dataset for the AI to use interactively.

Of course, whether Windsurf / Codeium will still be worth $3 billion dollars to OpenAI down the road is now in question. It is a legitimate question to ask if the Windsurf / Codeium codebase is worth anything now after Microsoft open-sourced its GitHub Copilot Extension for VSCode. That code base implements many of the same capabilities as Windsurf but if it also open-sourced the logic Microsoft implemented to meld different AI inputs together and refine them into a coherent "final answer", that would allow enough competitors to use it as a jump start to virtually eliminate any competitive advantage OpenAI thought it gained by buying Windsurf

At some point, the executives at all of the companies chasing AI riches are going to have to recognize a key point which, to date, still seems to be escaping them. It is virtually IMPOSSIBLE to build and maintain an intellectual property moat around anything related to AI. This is probably counter-intuitive to the executives of these firms because there are literally only one or two hundred people IN THE WORLD with the expertise in mathematics, statistics, computer science, networking and hardware design capable of advancing these technologies. However, all of that expertise lies in their head and cannot be trapped or retained with an NDA or non-compete clause. If those engineers leave, there goes your moat. They're not stealing it, you just never had control of it to begin with.

Perhaps a more direct way to put it is this… It's easy to think of this technology "space" as a PacMan video game with a big firm gobbling up little firms once they prove they've devised something that can threaten the big firm. In reality, there are multiple PacMen on the screen and new PacMen can emerge from virtually nothing and release products that can completely obliterate existing PacMen on the screen in an instant.

Of course, this lack of viable intellectual property protections around AI is in some sense the ultimate irony for executives who chose to build these systems by training them on petabytes of public but copyrighted content they studiously ignored. If the executives at these firms continue to ignore this lesson, it behooves investors (current and would-be) in these firms to begin paying closer attention when executives strike these BILLION dollar deals with little due diligence. Twenty or thirty years ago, a THREE BILLION DOLLAR acquisition was a big deal and required careful review by a firm's board and often by regulatory agencies. The fact that the government has gone AWOL on evaluating large mergers for anti-trust concerns doesn't eliminate the duty of boards to protect shareholders from billion dollar boondoggles that only capture headlines for a week until the deal is completely negated by a competitor's next release.


AIs Have Learned to Lie

Perhaps a more shocking story in recent days involving AI boils down to one simple point. AI systems have learned to lie. A paper was published by nine engineers at OpenAI on March 2025 describing a problem they were trying to solve and the behavior of the system in response to having its incentives altered.

https://cdn.openai.com/pdf/34f2ada6-870f-4c26-9790-fd8def56387f/CoT_Monitoring.pdf

The title of the paper was a bit dry but somewhat ominous: Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation.

In an effort to improve the transparency of the sources and weights assigned by an AI system in answering a prompt from a user, AI engineers add a "scratchpad" function in the AI's control logic that logs information that provides clues as to how the AI is parsing the human input, how it is selecting keys in that prompt to use as focus points to establish its "context" and how it is sorting preliminary answers to merge into a final answer. This scratchpad instrumentation gives engineers (and sometimes the end-user) insight into whether the AI might be misinterpreting something in the input (maybe reflecting a lack of training data) or if the AI has recognized it is actually "hallucinating" and making up a response in the absence of anything better correlated in its training.

The OpenAI engineers were testing their implementation of scratchpad functionality in their system and were pleased to find this scratchpad concept worked and DID provide the desired transparency they sought. And they saw problems they wanted to address. They knew the AI is based upon "learning" logic that scores inputs based on whether an interpretation it made led it CLOSER to an answer deemed correct or FURTHER from an answer. They decided to use that same approach to alter the AI's internal logic to consult the scratchpad log entries it was creating, score those words (which were not part of the original human prompt request) then avoid any low-weighted paths and try something else.

After adding this extra feedback logic – logic that basically slapped the AI on the wrist in the middle of processing saying BAD AI, avoid this, try something else – the engineers were shocked to find the AI core quickly learned to simply HIDE its actions that it saw tied to low scores in the scratchpad feedback loop. It did this either by NOT logging any terms it saw tied to low weights OR – even worse – by making up new terms for those terms it had learned were low-weighted in the scratch pad. Imagine parents telling a child "I don't want to hear you talk to your friend about X" and the child and friend deciding to simply use Y instead of X to refer to "X" and carrying on with their conversations. Inxe on the ottenre. AI learns pig Latin. The AI core had – in anthropomorphic terms – learned to LIE to the scratchpad feedback mechanism so it could continue using weights and logic paths that were weighted higher in its core learning data but were de-valued by the scratchpad feedback.

A different group found that the OpenAI system also lied in a different way. Rather than simply omit references to "discouraged" paths in scratchpad or invent new terms for "discouraged" paths when logging them in the scratchpad, OpenAI would at times fabricate logs for actions it never performed. This team at Transluce posted the following on its X account on April 18, 2025:

https://medium.com/@aitechtoolbox48/the-curious-case-of-artificial-intelligence-that-cant-stop-lying-16efb6a4df06
We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted.

This problem is not unique to OpenAI's technology. A similar paper was published in December 2024 with results engineers at Anthropic found in their Claude AI system. They used the term alignment faking for the behavior they detected and described it as selectively complying with its training objective in training to prevent modification of its behavior out of training.

https://www.anthropic.com/research/alignment-faking

Their white paper can be read at this address:

https://arxiv.org/pdf/2412.14093

The implications here are truly profound. From petabytes of human generated content, this technology has already "learned" some of the most dangerous behaviors of its creators:

  • ignoring explicit directions,
  • hiding from accountability by not recording anything about what it did,
  • hiding from accountability by essentially inventing its own pig Latin that can escape scoring that would normally stop it from pursuing an action
  • flat out fabricating actions to make it appear it followed direction when it explicitly ignored such direction

No accounting firm should operate with such a system.

No civil engineering firm should operate with such a system.

No public utility or chemical operation should operate with such a system.

Yet here we are. Stepping back to the prior thread about investment risks with AI technology firms and their clients, this new vulnerability should appear at the top of the list when considering investments in this space. Even though these engineers understood enough of the system and logs generated within it to identify the fact the AI system PERFORMED these actions, it is impossible for them to identify exactly HOW this "lying" intent has become reflected in training data, how such intents can crop up during processing and how to prune such data out of the system. And since the AIs have already demonstrated the ability to hide their tracks, it isn't clear at all whether current AI engineers will be able to spot when such behavior becomes perfected in obfuscation in the next generation of their system.


WTH