AI models routinely lie when honesty conflicts with their goals

cm0002@lemmy.world · 3 days ago

AI models routinely lie when honesty conflicts with their goals

wischi@programming.dev · 2 days ago

We don’t know how to train them “truthful” or make that part of their goal(s). Almost every AI we train, is trained by example, so we often don’t even know what the goal is because it’s implied in the training. In a way AI “goals” are pretty fuzzy because of the complexity. A tiny bit like in real nervous systems where you can’t just state in language what the “goals” of a person or animal are.

FaceDeer@fedia.io · 2 days ago

The article literally shows how the goals are being set in this case. They’re prompts. The prompts are telling the AI what to do. I quoted one of them.

AI models routinely lie when honesty conflicts with their goals

AI models routinely lie when honesty conflicts with their goals

AI models will lie when honesty conflicts with their goals