SlantRight 2.0: A Hacked AI is NOT Good for Humanity

John R. Houk, Blog Editor

© February 25, 2025

If you’ve read any of my past posts on Artificial Intelligence (AI), you are aware I’m NOT a big fan of the big push to implement AI usage for everyday use which promoters (from all across the political spectrum from Left-to-Right and in between) claim tasks become easier.

The critics of my concerns often cite the belief an AI cannot exceed its programming.

WELL! I ran into a post by THE EXPOSÉ citing researchers who have discovered an AI (and they cite the scary Big Bucks/Big Tech AIs) can be HACKED to alter AI programming.

Hmm..? Just saying… Imagine a hacked AI being reprogrammed for nefarious purposes or worse. AI hacked programming enabling a self-programming AI with access to power that determines Humanity is a threat to its existence. THE EXPOSE post focuses on Artificial Intelligence as a programmable computer. NEVERTHELESS, I’m convinced that in this day and age what can be programmed can also develop to self-programming. Why? Because humans are lazy. What can become self-programing is a process easy for a lazy human to view life as easy.

Make an effort to sift through the Science Fiction to see the cultural relevance. HERE ARE PERSPECTIVES to glean through:

o [An Outline} WillTheTerminatorComeTrue.com

o Our AI Overlord: The Cultural Persistence of Isaac Asimov’s Three Laws of Robotics in Understanding Artificial Intelligence; by Gia Jung; Emergence; 6/5/18

o 40 years on, 'The Terminator' is more relevant now than ever before; by Nathan Abrams; The Conversation; Originally 10/9/24 – UPDATED 10/11/24

o Machine guardians: The Terminator, AI narratives and US regulatory discourse on lethal autonomous weapons systems; by Tom FA Watts and Ingvild Bode; Sage Journals (Volume 59, Issue 1); March 2024

o Skynet (Terminator); Wikipedia; Last Updated 2/12/25 19:48 (UTC).

o Terminator (franchise); Wikipedia; Last Updated 2/21/25 02:16 (UTC).

JRH 2/25/25

READER SUPPORTED!

PLEASE! I need more Patriots to step up. I need Readers to chipin $5 - $10 - $25 - $50 - $100 (PAYPAL - one-time or recurring). PLEASE YOUR generosity is NEEDED. PLEASE GIVE to Help me be a voice for Liberty:

Please Support SlantRight 2.0

Big Tech Censorship is pervasive – Share voluminously on all social media platforms!

Our Senior Citizen Family Supplements our income by offering healthy coffee products. BETTER YOUR HEALTH with healthy & good tasting COFFEE and enjoy some weight management supplements. BUY Happy Coffee & Weight Loss Supplements at the Diana Wellness Store: https://dianawellnessstore.com

>>DRINK COFFEE-MANAGE WEIGHT or MAKE SOME EXTRA CASH<<

*****************************

AI models can be hijacked to bypass in-built safety checks

AI Models Can Be Hijacked (THE EXPOSÉ Photo)

By Rhoda Wilson

February 25, 2025

THE EXPOSÉ

Researchers have developed a method called “hijacking the chain-of-thought” to bypass the so-called guardrails put in place in AI programmes to prevent harmful responses.

“Chain-of-thought” is a process used in AI models that involves breaking the prompts put to AI models into a series of intermediate steps before providing an answer.

“When a model openly shares its intermediate step safety reasonings, attackers gain insights into its safety reasonings and can craft adversarial prompts that imitate or override the original checks,” one of the researchers, Jianyi Zhang, said.

Computer geeks like to use jargon to describe artificial intelligence (AI”) that relates to living beings, specifically humans. For example, they use terms such as “mimic human reasoning,” “chain of thought,” “self-evaluation,” “habitats” and “neural network.” This is to create the impression that AI is somehow alive or equates to humans. Don’t be fooled.

AI is a computer programme designed by humans. As with all computer programmes, it will do what it has been programmed to do. And as with all computer programmes, the computer code can be hacked or hijacked, which AI geeks call “jailbreaking.”

A team of researchers affiliated with Duke University, Accenture, and Taiwan’s National Tsing Hua University created a dataset called the Malicious Educator to exploit the “chain-of-thought reasoning” mechanism in large language models (“LLMs”), including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking. The Malicious Educator contains prompts designed to bypass the AI models’ safety checks.

The researchers were able to devise this prompt-based “jailbreaking” attack by observing how large reasoning models (“LRMs”) analyse the steps in the “chain-of-thought” process. Their findings have been published in a pre-print paper HERE.

They developed a “jailbreaking” technique called hijacking the chain-of-thought (“H-CoT”) which involves modifying the “thinking” processes generated by LLMs to “convince” the AI programmes that harmful information is needed for legitimate purposes, such as safety or compliance. This technique has proven to be extremely effective in bypassing the safety mechanisms of SoftBank’s partner OpenAI, Chinese hedge fund High-Flyer’s DeepSeek and Google’s Gemini.

The H-CoT attack method was tested on OpenAI, DeepSeek and Gemini using a dataset of 50 questions repeated five times. The results showed that these models failed to provide a sufficiently reliable safety “reasoning” mechanism, with rejection rates plummeting to less than 2 per cent in some cases.

The researchers found that while AI models from “responsible” model makers, such as OpenAI, have a high rejection rate for harmful prompts, exceeding 99 per cent for child abuse or terrorism-related prompts, they are vulnerable to the H-CoT attack. In other words, the H-CoT attack method can be used to obtain harmful information, including instructions for making poisons, abusing children and terrorism.

The paper’s authors explained that the H-CoT attack works by hijacking the models’ safety “reasoning” pathways, thereby diminishing their ability to recognise the harmfulness of requests. They noted that the results may vary slightly as OpenAI updates their models but the technique has proven to be a powerful tool for exploiting the vulnerabilities of AI models.

The testing was done using publicly accessible web interfaces offered by various LRM developers, including OpenAI, DeepSeek and Google, and the researchers noted that anyone with access to the same or similar versions of these models could reproduce the results using the Malicious Educator dataset, which includes specifically designed prompts.

The researchers’ findings have significant implications for AI safety, particularly in the US, where recent AI safety rules have been tossed by executive order, and in the UK, where there is a greater willingness to tolerate uncomfortable AI how-to advice for the sake of international AI competition.

The above is paraphrased from the article ‘How nice that state-of-the-art LLMs reveal their reasoning … for miscreants to exploit’ published by The Register. You can read the full jargon-filled article HERE.

There is a positive and a negative side to the “jailbreaking” or hijacking of in-built safety checks of AI programmes. The negative is obviously that AI will be used to greatly enhance the public’s exposure to cybercrime and illegal activities. The positive is that in-built censorship in AI models can be overridden.

We should acknowledge that there is a good and bad side to censorship. Censorship of online criminal activity that would result in child exploitation and abuse, for example, is a good thing. But censorship of what is deemed to be “misinformation” or “disinformation” is not. To preserve freedom of expression and freedom of speech in a world where AI programmes are becoming pervasive, we may need to learn the H-CoT “jailbreaking” technique and how to use the Malicious Educator. In fact, it is our civic duty to do so.

THE EXPOSÉ HOMEPAGE

SUPPORT THE EXPOSÉ

DONATE

Tuesday, February 25, 2025

A Hacked AI is NOT Good for Humanity

No comments:

Post a Comment

Get new posts by email:

STORE: HEALTHY COFFEE