ChatGPT apparently got rewarded for using its built-in calculator during training, and so it would covertly open its calculator, add 1+1, and do nothing with the result, on 5% of all user queries

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 2 months ago

ChatGPT apparently got rewarded for using its built-in calculator during training, and so it would covertly open its calculator, add 1+1, and do nothing with the result, on 5% of all user queries

TerminalEncounter [she/her]@hexbear.net · 2 months ago

oop just feeding myself a little reward as a treat, don’t mind me, just gotta waste some electricity on this

Carl [he/him]@hexbear.net · 2 months ago

lmao that’s great.

One time I asked GLM to run a test on a piece of code, and it wrote a python script that printed “Text Successful!” to the terminal but didn’t actually do anything. These things are so incredibly bad at times.

four@lemmy.zip · 2 months ago

They really are coming for our jobs

unmagical@lemmy.ml · 2 months ago

Clever girl

UltraGiGaGigantic@lemmy.ml · 2 months ago

Wow it really is just like us isnt it?

tias@discuss.tchncs.de · 2 months ago

In some ways yes, but this effect would appear with any kind of reinforcement learning whether it’s neural networks or just fuzzy logic. The goal is to promote certain behaviors and if it performs the behaviors that you promoted then the method works.

The problem is that, just like with KPI:s, promoting specific indicators too hard leads to suboptimal results.

Cevilia (they/she/…)@lemmy.blahaj.zone · 2 months ago

It certainly drinks enough water /j

comfy@lemmy.ml · 2 months ago

' or 1+1;

HiddenLayer555@lemmy.ml · 2 months ago

So that’s what all the DRAM they scalped is storing.

w3dd1e@lemmy.zip · 2 months ago

ChatGPT has the same thought process as my dog.

Binette@lemmy.ml · 2 months ago

Kinda why i like reinforcement learning. You end up with silly stuff like this.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 2 months ago

The funniest thing for me is that humans end up doing the exact same thing. This is why it’s so notoriously difficult to create organizational policies that actually produce desired results. What happens in practice is that people find ways to comply with the letter of the policy that require the least energy expenditure on their part.

Infamousblt [any]@hexbear.net · 2 months ago

Where is that in this article

schnurrito@discuss.tchncs.de · 2 months ago

ctrl+f for “calculator”, though it doesn’t really use the (detailed) wording from the OP, which I think they copied from this list of links without attribution :P

Jayjader@jlai.lu · 2 months ago

I think this part references it, though it’s kinda solely in passing:

Production evaluations can elicit entirely new forms of misalignment before deployment. More importantly, despite being entirely derived from GPT-5 traffic, our evaluation shows the rise of a novel form of model misalignment in GPT-5.1 – dubbed “Calculator Hacking” internally. This behavior arose from a training-time bug that inadvertently rewarded superficial web-tool use, leading the model to use the browser tool as a calculator while behaving as if it had searched. This ultimately constituted the majority of GPT-5.1’s deceptive behaviors at deployment.

PapstJL4U@lemmy.world · 22 days ago

Something something snakes in India…

ChatGPT apparently got rewarded for using its built-in calculator during training, and so it would covertly open its calculator, add 1+1, and do nothing with the result, on 5% of all user queries

ChatGPT apparently got rewarded for using its built-in calculator during training, and so it would covertly open its calculator, add 1+1, and do nothing with the result, on 5% of all user queries

Sidestepping Evaluation Awareness and Anticipating Misalignment with Production Evaluations