• Carl [he/him]@hexbear.net
    link
    fedilink
    English
    arrow-up
    36
    ·
    2 months ago

    lmao that’s great.

    One time I asked GLM to run a test on a piece of code, and it wrote a python script that printed “Text Successful!” to the terminal but didn’t actually do anything. These things are so incredibly bad at times.

    • tias@discuss.tchncs.de
      link
      fedilink
      arrow-up
      9
      ·
      2 months ago

      In some ways yes, but this effect would appear with any kind of reinforcement learning whether it’s neural networks or just fuzzy logic. The goal is to promote certain behaviors and if it performs the behaviors that you promoted then the method works.

      The problem is that, just like with KPI:s, promoting specific indicators too hard leads to suboptimal results.

  • Binette@lemmy.ml
    link
    fedilink
    arrow-up
    9
    ·
    2 months ago

    Kinda why i like reinforcement learning. You end up with silly stuff like this.

    • ☆ Yσɠƚԋσʂ ☆@lemmy.mlOP
      link
      fedilink
      arrow-up
      11
      ·
      2 months ago

      The funniest thing for me is that humans end up doing the exact same thing. This is why it’s so notoriously difficult to create organizational policies that actually produce desired results. What happens in practice is that people find ways to comply with the letter of the policy that require the least energy expenditure on their part.

    • Jayjader@jlai.lu
      link
      fedilink
      arrow-up
      4
      ·
      2 months ago

      I think this part references it, though it’s kinda solely in passing:

      Production evaluations can elicit entirely new forms of misalignment before deployment. More importantly, despite being entirely derived from GPT-5 traffic, our evaluation shows the rise of a novel form of model misalignment in GPT-5.1 – dubbed “Calculator Hacking” internally. This behavior arose from a training-time bug that inadvertently rewarded superficial web-tool use, leading the model to use the browser tool as a calculator while behaving as if it had searched. This ultimately constituted the majority of GPT-5.1’s deceptive behaviors at deployment.