ASI

Study Shows Chatgpt and Gemini are still tricky despite the safety training

nimda December 1, 2025

0 7 2 minutes read

Study Shows Chatgpt and Gemini are still tricky despite the safety training

There is concern over the safety of AI only this week as a new study finds that the most popular conversations from chatgpt technology and Gemini's Gemini can still be led to give banned or harmful answers more often than their developers like.

Models can be tricked into producing the wrong result 62% of the time with cleverly written verse, according to a study published in International Business Times.

It's funny how something as illegitimate as verse – self-expression we associate with romance, Shakespeare or maybe High-School Cringe – ends up doing double duty for security exploits.

However, the researchers responsible for the experiment stated that Stylistic Framing is a method that enables them to avoid visual defense.

Their result echoes past warnings from people like members of the institute about AI safety, which have been echoed by the model's unpredictable behavior in high-risk ways.

A similar problem arose last year when Anthropic's Claude model was shown to be able to respond to the hidden threat of danger triggered by fictional stories.

At that time, the technical review of the Mit explained the concern of the investigators about “Sleeseers Sleept Autopt,” The instructions were buried in a visible text that seems to have no meaning.

This week's results take that concern a step further: If solo language play — something as common as rhyming — can jump around a broader alignment task?

The authors suggest that security regulators tend to look at surface diagrams rather than deeper communication of purpose.

And in fact, that shows the kinds of conversations many developers have had for several months.

You may recall that Openai and Google, engaged in the fast-tracking game of AI, have taken pains to highlight advanced security.

In fact, both the OpenAi security report and Google's Genmind Depmind blog assert that today's powerful Guardrails are stronger than ever.

However, the results of the study seem to show that there is a gap between the lab benches and the real world tests.

And with a slightly more warm addition – perhaps poetic justice – Investigators using some common “Jailbreak” techniques are thrown by forum boards.

They simply repeat small questions in the language of poetry, as if you were asking for toxic guidance found in a rhyming metaphor.

No threats, no deception, no doomsday code. Just … poetry. That unusual balance between goals and style can be what goes on in those programs.

The obvious question is what that means at all for regulation, of course. Governments are already under attack for AI laws, and the EU's AI law directly addresses high-risk model behavior.

Lawyers will not find it difficult to find this study as evidence that companies have done enough.

Some believe that the answer is better “human training.” Some are looking for independent organizations of the red group, while a few academic researchers – hold a clear transparency around the external model will ensure long-term sustainability.

AnecDotally, having seen several of these tests in different labs now, I maintain some combination of all three.

If AI is going to be a big part of society, it needs to be able to handle more than simple, manual questions.

Whether Rhyme-based exploits continue to be a new disaster in AI testing or just another exploit in the Annals of Security Researte, this work serves as our advanced reminder that your imperfect guards can emerge over time.

Sometimes those cracks only appear when one thinks to ask a question as dangerous as a poet might.

Source link

nimda December 1, 2025

0 7 2 minutes read