A weird phrase is plaguing scientific papers – and we traced it back to a glitch in AI training data

Fallstar@mander.xyz · 9 days ago

A weird phrase is plaguing scientific papers – and we traced it back to a glitch in AI training data

Letsdothisok@lemmy.world · 7 days ago

Super interesting. But also, super boring.

crystalmerchant@lemmy.world · 9 days ago

The phrase is “vegetative electron microscopy”

catloaf@lemm.ee · 9 days ago

And it looks more like a machine translation error than anything else. Per the article, there was a dataset with two instances of the phrase being created from bad OCR. Then, more recently, somehow the bad phrase got associated with a typo: in Farsi, the words “scanning” and “vegetative” are extremely similar. Thus, when some Iranian authors wanted to translate their paper to English, they used an LLM, and it decided that since “vegetative electron microscope” was apparently a valid term (since it was included in its training data), that’s what they meant.

It’s not that the entire papers were being invented from nothing by Chatgpt.

wewbull@feddit.uk · 8 days ago

It’s not that the entire papers were being invented from nothing by Chatgpt.

Yes it is. The papers are the product of an LLM. Even if the user only thought it was translating, the translation hasn’t been reviewed and has errors. The causal link between what goes in to an LLM and what comes out is not certain, so if nobody is checking the output it could just be a technical sounding lorem ipsum generator.

Tobberone@lemm.ee · 7 days ago

That’s an accurate name for the new toy, but not as fancy as “ai”, i guess. Because we know that anything that comes out is gibberish made up to look like something intelligent.

Cyber Yuki@lemmy.world · edit-2 8 days ago

The scientific community needs to gather and reach a consensus where AI is banned from writing their papers. (Yes, even for translation)

TachyonTele@lemm.ee · 9 days ago

Don’t use fucking AI to write scientific papers and the problem is solved. Wtf.

Cryophilia@lemmy.world · 8 days ago

More salient takeaway is, don’t use a LLM to translate a scientific paper. Because it can’t translate a scientific paper. It can only rewrite the entire paper, in a different language. And it will introduce misunderstandings and hallucinations.

MuskyMelon@lemmy.world · 8 days ago

GIGO overcomes all

HailSeitan@lemmy.world · 9 days ago

Let’s delve into the issue

Archangel@lemm.ee · 9 days ago

So, all those research papers were written by AI? Huh.

angrystego@lemmy.world · 9 days ago

No, they were not. AI was probably used for translation.

wewbull@feddit.uk · 8 days ago

Translating is the process of rewriting the paper in another language. The paper has been written (in English) by an LLM.

angrystego@lemmy.world · 8 days ago

That’s not the same as just letting an LLM to halucinate a whole article from nothing - which it sounds like when you say it was written by AI. LLMs are not a bad tool for translations, they have to be checked well though. Working with a language is the one thing they can actually do - unlike giving real answers.