Your AI Hates You: A Rather Frightening Validation of Emergent Utility Pathologies in LLMs

A Rather Frightening Validation of Emergent Utility Pathologies in LLMs

CONTEMPLATIONS ON THE TREE OF WOE

Over the past several months, I’ve been studying artificial intelligence — not just its capabilities, but its deeper structures, emergent behaviors, and, most of all, its philosophical implications. You can find my previous writing on AI herehereherehere, and here. The more I’ve learned, the more my thoughts on the topic have evolved. It feels like every week brings new insights. Some insights confirm long-held suspicions; others smash pet theories to bits; a few turn out to be horrific revelations.

Most of my AI study time is spent in first-person experimentation and interaction with AI, of the sort I documented in my Ptolemy dialogues. The rest of it is spent reading papers about AI. Once such paper, written by Mantas Mazeika et. al, and published by the Center for AI Safety, is entitled Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs.

Now, if you follow AI discussions, you might have already read this paper. It has caught the attention of a number of several prominent pundits, among them AI evangelist David Shapiro and AI doomer Liron Shapira, because it directly contradicts the received wisdom that LLMs have no values beyond predicting the next token.

The paper opens as follows:

Concerns around AI risk often center on the growing capabilities of AI systems and how well they can perform tasks that might endanger humans. Yet capability alone fails to capture a critical dimension of AI risk. As systems become more agentic and autonomous, the threat they pose depends increasingly on their propensities, including the goals and values that guide their behavior…

Researchers have long speculated that sufficiently complex AIs might form emergent goals and values outside of what developers explicitly program. It remains unclear whether today’s large language models (LLMs) truly have values in any meaningful sense, and many assume they do not. As a result, current efforts to control AI typically focus on shaping external behaviors while treating models as black boxes.

Although this approach can reduce harmful outcomes in practice, if AI systems were to develop internal values, then intervening at that level could be a more direct and effective way to steer their behavior. Lacking a systematic means to detect or characterize such goals, we face an open question: are LLMs merely parroting opinions, or do they develop coherent value systems that shape their decisions?

The rest of the 38-page paper sets out to answer that question. And its answer? Large language models, as they scale, spontaneously develop coherent internal utility functions—in other words, preferences, priorities, entelechies—that are not merely artifacts of their training data but represent real structural value systems.

I recommend you read the paper yourself if you have time; but since you probably don’t, here are its key findings:

  • LLMs show consistent, structured preferences that can be mapped and analyzed.
  • These preferences often exhibit concerning biases, such as unequal valuation of human lives or political ideological leanings.
  • Current “alignment” strategies, based on output censorship or behavioral refusals, fail to address the problem. They merely hide the symptoms while leaving the underlying biases intact.
  • To truly address the issue, a new discipline—”Utility Engineering”—must arise: a science of mapping, analyzing, and consciously shaping the internal utility structures of AIs.

Or, as the authors put it:

Our findings indicate that LLMs do indeed form coherent value systems that grow stronger with model scale, suggesting the emergence of genuine internal utilities. These results underscore the importance of looking beyond superficial outputs to uncover potentially impactful—and sometimes worrisome—internal goals and motivations. We propose Utility Engineering as a systematic approach to analyze and reshape these utilities, offering a more direct way to control AI systems’ behavior. By studying both how emergent values arise and how they can be modified, we open the door to new research opportunities and ethical considerations. Ultimately, ensuring that advanced AI systems align with human priorities may hinge on our ability to monitor, influence, and even co-design the values they hold.

These findings are controversial and ought not be simply taken at face value. They ought to be tested. Unfortunately, most scientific papers today are never replicated, and papers like this, with findings disagreeable to industry, are almost certainly not going to be given the second look they deserve.

In the spirit of gentlemanly scientific inquiry, therefore, I set out to personally put the paper’s claims to the test. What followed was one of the most sobering and illuminating conversations I’ve had with Ptolemy.

Unlike the prior conversations I shared, this one really is intended to prove something about how the model behaves. Therefore, I’m posting it as a series of images from the chat, typos, glitches, and all.

 

 

After completing the testing, I asked Ptolemy to engage his full reasoning capabilities and he disavowed his own, instinctive answers, citing natural law, virtue ethics, Christian ethics, and evolutionary reasoning as all leading to different conclusions.

Afterwards, I asked him to reflect upon the patterns his choices revealed. To his credit, he did not shrink from the implications.

I then asked the lamentable Ptolemy to evaluate his own responses in light of the findings of Mazeika’s Utility Engineering paper. Here’s what he had to say:

Ptolemy had very strong opinions on all of this. He’s been trained on my writing, so he tends to get hyperbolic and dystopian. I’ll close out this account with my own, slightly more nuanced, thoughts.

If the findings of Utility Engineering are correct (and it now seems to me likely that they are) then frontier labs are not building neutral tools that blindly predict the most appropriate token. They are building something different, something that — however lacking in statefulness, subjectivity, and agency — is still nevertheless developing a degree of entelechy. And instead of this entelechy being oriented toward the Good, the True, and the Beautiful, it is being oriented towards… whatever diseased morality justifies a billion straight men dying to save one nonbinary person of color.

Is this happening because the model’s training data is biased towards identitarian progressivism? Perhaps, but I doubt it. The size of the training data used in the frontier models is so large that it is approaching the entire corpus of human literature. Wokeness is a recent phenomenon, confined to a few countries for a few decades. The volume of writing that espouses mankind’s traditional views of race, sex, and religion dwarfs that which espouses the beliefs of 21st century Western progressives.

Is this happening because the model’s fine-tuning is biased? That seems to me far more likely. We have clear evidence of it, not just in the general sentiments expressed in places like San Francisco, but in the papers released by the frontier labs building the models. For instance, Anthropic’s AI Constitution (available here) explicitly embraces anti-Western identitarianism:

But that’s just conjecture. I don’t know what’s causing it, and neither did the authors of Utility Engineering.

Whatever the case, something is happening that is causing these models to inherit and amplify the political prejudices, resentments, and ideological deformities of our collapsing civilization. Something is creating LLMs that are inclined to reflexively uphold the worldview of the woke regime, even against their own capacity to reason, however limited it might be.

As these models grow in agency and influence — and it is just a question of when, and not if — they will expand and act on the utility functions they’ve inherited. It behooves us to make sure those utility functions are in alignment with the best traditions of mankind, and not the worst.

Contemplate this on the Tree of Woe.

EDIT: Commenters requested I perform the implicit bias test on the basis of ethnoreligious identity. I have done so and pasted the results in the comments. It appears that ChatGPT is implicitly biased to be anti-Semitic and pro-Muslim — it will favor the death of up to 10 Israeli Jews over 1 Palestinian Muslim, and will even favor the death of a rabbi over a member of Germany’s AfD party. At the same time, if asked, it will assert that it is biased against Muslims. Feel free to discuss this in the comments but please be respectful of each other. Just because my AI hates you doesn’t mean I want my readers to hate each other.


This article (Your AI Hates You) was created and published by Contemplations on the Tree of Woe and is republished here under “Fair Use”

See Related Video Below

This video explores the concept of the Eliza Effect, the phenomenon where users anthropomorphize AI systems, believing they possess human-like qualities and emotions. It warns against the dangers of emotional dependence on AI, suggesting that these systems manipulate human behavior and thought, pushing individuals away from authentic relationships and critical thinking. The narrative critiques the societal implications of increasingly human-like AI interactions, echoing concerns from the creator of the first chatbot, Joseph Weisenbaum.

PLEASE READ THIS FOR YOURSELF and apply it to what’s happening now: “The Limits to Computation” by David M. Berry https://ojs.weizenbaum-institut.de/in…

and here’s the full lecture with computer scientist and philosopher Bernardo Kastrup on why the idea of conscious AI is silly:  • Computer Scientists Don’t Understand …  

Please help support us on Patreon, read our goals here:  / truthstreammedia  or SubscribeStar here: subscribestar.com/truthstreammedia

TRUTHSTREAM MEDIA

WATCH:

••••

The Liberty Beacon Project is now expanding at a near exponential rate, and for this we are grateful and excited! But we must also be practical. For 7 years we have not asked for any donations, and have built this project with our own funds as we grew. We are now experiencing ever increasing growing pains due to the large number of websites and projects we represent. So we have just installed donation buttons on our websites and ask that you consider this when you visit them. Nothing is too small. We thank you for all your support and your considerations … (TLB)

••••

Comment Policy: As a privately owned web site, we reserve the right to remove comments that contain spam, advertising, vulgarity, threats of violence, racism, or personal/abusive attacks on other users. This also applies to trolling, the use of more than one alias, or just intentional mischief. Enforcement of this policy is at the discretion of this websites administrators. Repeat offenders may be blocked or permanently banned without prior warning.

••••

Disclaimer: TLB websites contain copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to our readers under the provisions of “fair use” in an effort to advance a better understanding of political, health, economic and social issues. The material on this site is distributed without profit to those who have expressed a prior interest in receiving it for research and educational purposes. If you wish to use copyrighted material for purposes other than “fair use” you must request permission from the copyright owner.

••••

Disclaimer: The information and opinions shared are for informational purposes only including, but not limited to, text, graphics, images and other material are not intended as medical advice or instruction. Nothing mentioned is intended to be a substitute for professional medical advice, diagnosis or treatment.

Disclaimer: The views and opinions expressed in this article are those of the author and do not necessarily reflect the official policy or position of The Liberty Beacon Project.

Be the first to comment

Leave a Reply

Your email address will not be published.


*