Chat GPT appears to hallucinate or outright lie about everything

Buttflapper@lemmy.world · 9 months ago

Chat GPT appears to hallucinate or outright lie about everything

cheddar@programming.dev · 9 months ago

It’s incorrect to ask chatgpt such questions in the first place. I thought we’ve figured that out 18 or so months ago.

ABCDE@lemmy.world · 9 months ago

Why? It actually answered the question properly, just not to the OP’s satisfaction.

ramirezmike@programming.dev · 9 months ago

because it could have just as easily confidentiality said something incorrect. You only know it’s correct by going through the process of verifying it yourself, which is why it doesn’t make sense to ask it anything like this in the first place.

ABCDE@lemmy.world · 9 months ago

I mean… I guess? But the question was answered correctly, I was playing Beat Saber on my 1060 with my Vive and Quest 2.

ramirezmike@programming.dev · 9 months ago

It doesn’t matter that it was correct. There isn’t anything that verifies what it’s saying, which is why it’s not recommended to ask it questions like that. You’re taking a risk if you’re counting on the information it gives you.

aberrate_junior_beatnik@lemmy.world · 9 months ago

ChatGPT does not “hallucinate” or “lie”. It does not perceive, so it can’t hallucinate. It has no intent, so it can’t lie. It generates text without any regard to whether said text is true or false.

GetOffMyLan@programming.dev · 9 months ago

Hallucinating is the term for when ai generate incorrect information.

aberrate_junior_beatnik@lemmy.world · 9 months ago

I know, but it’s a ridiculous term. It’s so bad it must have been invented or chosen to mislead and make people think it has a mind, which seems to have been successful, as evidenced by the OP

GetOffMyLan@programming.dev · edit-2 9 months ago

At no point does OP imply it can actually think and as far as I can see they only use the term once and use it correctly.

If you are talking about the use of “lie” that’s just a simplification of explaining it creates false information.

From the context there is nothing that implies OP thinks it has a real mind.

You’re essentially arguing semantics even though it’s perfectly clear what they mean.

aberrate_junior_beatnik@lemmy.world · 9 months ago

OP clearly expects LLMs to exhibit mind-like behaviors. Lying absolutely implies agency, but even if you don’t agree, OP is confused that

It did not simply analyze the best type of graphics card for the situation

The whole point of the post is that OP is upset that LLMs are generating falsehoods and parroting input back into its output. No one with a basic understanding of LLMs would be surprised by this. If someone said their phone’s autocorrect was “lying”, you’d be correct in assuming they didn’t understand the basics of what autocorrect is, and would be completely justified in pointing out that that’s nonsense.

helenslunch@feddit.nl · 9 months ago

Well, you’re wrong. Its right a lot of the time.

You have a fundamental misunderstanding of how LLMs are supposed to work. They’re mostly just text generation machines.

In the case of more useful ones like Bing or Perplexity, they’re more like advanced search engines. You can get really fast answers instead of personally trawling the links it provides and trying to find the necessary information. Of course, if it’s something important, you need to verify the answers they provide, which is why they provide links to the sources they used.

conciselyverbose@sh.itjust.works · 9 months ago

Except they also aren’t reliable at parsing and summarizing links, so it’s irresponsible to use their summary of a link without actually going to the link and seeing for yourself.

It’s a search engine with confabulation and extra steps.

helenslunch@feddit.nl · 9 months ago

Except they also aren’t reliable at parsing and summarizing links

Probably 90%+ of the time they are.

so it’s irresponsible to use their summary

You missed this part:

if it’s something important

hamms@lemmy.world · 9 months ago

I think this article does a good job of exploring and explaining how LLM attempts at text summarization could be more accurately described as “text shortening”; a subtle but critical distinction.

conciselyverbose@sh.itjust.works · 9 months ago

90% reliability is not anywhere remotely in the neighborhood of acceptable, let alone good.

No, I didn’t miss anything. All misinformation makes you dumber. Filling your head with bullshit that may or may not have any basis in reality is always bad, no matter how low the stakes.

helenslunch@feddit.nl · 9 months ago

Agree to disagree, I suppose.

conciselyverbose@sh.itjust.works · edit-2 9 months ago

You can’t just handwave away your deliberate participation in making humanity dumber by shoveling known bullshit as a valid source of truth.

helenslunch@feddit.nl · 9 months ago

I guess it’s a good thing I’m not doing that, then.

conciselyverbose@sh.itjust.works · 9 months ago

Wasting a ridiculous amount of energy for the sole purpose of making yourself dumber is literally all you’re doing every single time you use an LLM as a search engine.

ABCDE@lemmy.world · 9 months ago

Perplexity has been great for my ADHD brain and researching for my master’s.

mozz@mbin.grits.dev · 9 months ago

May I offer you a fairly convincing explanation

jeeva@lemmy.world · 9 months ago

I enjoyed reading this, thank you.

subignition@piefed.social · 9 months ago

This is the best article I’ve seen yet on the topic. It does mention the “how” in brief, but this analogy really explains the “why” Gonna bookmark this in case I ever need to try to save another friend or family member from drinking the Flavor-Aid

leftzero@lemmynsfw.com · 9 months ago

So, they’ve basically accidentally (or intentionally) made Eliza with extra steps (and many orders of magnitude more energy consumption).

mozz@mbin.grits.dev · 9 months ago

I mean, it’s clearly doing something which is impressive and useful. It’s just that the thing that it’s doing is not intelligence, and dressing it up convincingly imitate intelligence may not have been good for anyone involved in the whole operation.

leftzero@lemmynsfw.com · 9 months ago

Impressive how…? It’s just statistics-based very slightly fancier autocomplete…

And useful…? It’s utterly useless for anything that requires the text it generates to be reliable and trustworthy… the most it can be somewhat reliably used for is as a somewhat more accurate autocomplete (yet with a higher chance for its mistakes to go unnoticed) and possibly, if trained on a custom dataset, as a non-quest-essential dialogue generator for NPCs in games… in any other use case it’ll inevitably cause more harm than good… and in those two cases the added costs aren’t remotely worth the slight benefits.

It’s just a fancy extremely expensive toy with no real practical uses worth its cost.

The only people it’s useful to are snake oil salesmen and similar scammers (and even then only in the short run, until model collapse makes it even more useless).

All it will have achieved in the end is an increase in enshittification, global warming, and distrust in any future real AI research.

ABCDE@lemmy.world · 9 months ago

Yes and no. 1060 is fine for basic VR stuff. I used my Vive and Quest 2 on one.

breadsmasher@lemmy.world · 9 months ago

I have some vague memory of lyrics, which I am trying to find the song title theyre from. I am pretty certain of the band. Google was of no use.

I asked ChatGPT. It gave me a song title. Wasn’t correct. It apologised and gave me a different one - again, incorrect. I asked it to provide the lyrics to the song it had suggested. It gave me the correct lyrics for the song it had suggested, but inserted the lyrics I had provided, randomly into the song.

I said it was wrong - it apologised, and tried again. Rinse repeat.

I feel part of the issue is LLMs feel they have to provide an answer, and can’t say it doesn’t know the answer. Which highlights a huge limitation of these systems - they can’t know if something is right or wrong. Where these systems suggest can index and parse vast amounts of data and suggest you can ask it questions about that data, fundamentally (imo) it needs to be able to say “I dont have the data to provide that answer”

JackGreenEarth@lemm.ee · 9 months ago

It all depends on the training data and preprompt. With the right combination of those, it will admit when it doesn’t know an answer most of the time.

ThePowerOfGeek@lemmy.world · 9 months ago

I’ve had a similar experience. Except in my case I used lyrics for a really obscure song where I knew the writer. I asked Chat GPT, and it gave me completely the wrong artist. When I corrected it, it apologized profusely and agreed with exactly what I had said. Of course, it didn’t remember that correct answer, because it can’t add to it update its data source.

SlopppyEngineer@lemmy.world · 9 months ago

they have to provide an answer

Indeed. That’s the G in chatGPT. It stands for generative. It looks at all the previous words and “predicts” the most likely next word. You could see this very clearly with chatGPT-2. It just generated good looking nonsense based on a few words.

Then you have the P in chatGPT, pre-trained. If it happens to have received training data on what you’re asking, that data is shown. It it’s not trained on that data, it just uses what is more likely to appear and generates something that looks good enough for the prompt. It appears to hallucinate, lie, make stuff up.

It’s just how the thing works. There is serious research to fix this and a recent paper claimed to have a solution so the LLM knows it doesn’t know.

subignition@piefed.social · edit-2 9 months ago

~~The “P” is for predictive, not pre-trained. Generative Predictive Text~~

Edit: Nope I was wrong.

explore_broaden@midwest.social · 9 months ago

That’s not right, it’s generative pre-trained transformer.

subignition@piefed.social · 9 months ago

Well today I learned, thanks for the correction.

hperrin@lemmy.world · 9 months ago

It’s trained on internet discussions and people on the internet rarely say, “I don’t know”.

bungleofjoy@programming.dev · 9 months ago

LLMs don’t “feel”, “know”, or “understand” anything. They spit out statistically most significant answer from it’s data-set, that is all they do.

NuXCOM_90Percent@lemmy.zip · 9 months ago

The issue is: What is right and what is wrong?

"mondegreen"s are so ubiquitous that there are multiple websites dedicated to it. Is it “wrong” to tell someone that the song where Jimi Hendrix talked about kissing a guy is Purple Haze? And even pointing out where in the song that happens has value.

In general, I would prefer it if all AI Search Engines provided references. Even a top two or three pages. But that gets messy when said reference is telling someone they misunderstood a movie plot or whatever. “The movie where Anthony Hopkins pays Brad Pitt for eternal life using his daughter is Meet Joe Black. Also you completely missed the point of that movie” is a surefired way to make customers incredibly angry because we live in bubbles where everything we do or say (or what influencers do or say and we pretend we agree with…) is reinforced, truth or not.

And while it deeply annoys me when I am trying to figure out how to do something in Gitlab CI or whatever and get complete nonsense based on a single feature proposal from five years ago? That… isn’t much better than asking for help in a message board where people are going to just ignore the prompt and say whatever they Believe.

In a lot of ways, the backlash against the LLMs reminds me a lot of when people get angry at self checkout lines. People have this memory of a time that never was where cashiers were amazingly quick baggers and NEVER had to ask for help to figure out if something was an Anaheim or Poblano pepper or have trouble scanning something or so forth. Same with this idea of when search (for anything non-trivial) was super duper easy and perfect and how everyone always got exactly the answer they wanted when they posted on a message board rather than complete nonsense (if they weren’t outright berated for not searching for a post from ten years ago that is irrelevant).

Dnb@lemmy.dbzer0.com · 9 months ago

While I’d generally agree thst they are wrong or make up incorrect info on this case it was correct.

It gave you the min specs for vr the first time and updated specs for no man’s sky the second time when you asked a more specific question.

It used your prompt of a 3070 and gave a similar perf amd card.

It doesn’t know the answer, it can’t run the game in vr to test. It relies on information sourced and isn’t magic.

Dasus@lemmy.world · 9 months ago

“Converted what I said into the truth”

Now I’m not against the point you’re making in any way, I think the bots are hardcore yes men.

Buut… I have a 1060 and I got it around when No Man’s Sky came out, and I did try it on my 4k LED TV. It did run, but it also stuttered quite a bit.

Now I’m currently thinking of updating my card, as I’ve updated the rest of the PC last year. A 3070 is basically what I’m considering, unless I can find a nice 4000 series with good VRAM.

My point here being that this isn’t the best example you could have given, as I’ve basically had that conversation several times in real life, exactly like that, as “it runs” is somewhat subjective.

LLM’s obviously have trouble with subjective things, as we humans do too.

But again, I agree with the point you’re trying to make. You can get these bots to say anything. It amused me that the blocks are much more easily circumvented just by telling them to ignore something or by talking hypothetically. Idk but at least very strong text based erotica was easy to get out of them last year, which I think should not have been the case, probably.

linearchaos@lemmy.world · 9 months ago

I don’t want to sound like an AI fanboy but it was right. It gave you minimum requirements for most VR games.

No man Sky’s minimum requirements are at 1060 and 8 gigs of system RAM.

If you tell it it’s wrong when it’s not, it will wake s*** up to satisfy your statement. Earlier versions of the AI argued with people and it became a rather sketchy situation.

Now if you tell it it’s wrong when it’s wrong, It has a pretty good chance of coming back with information as to why it was wrong and the correct answer.

VinS@sh.itjust.works · 9 months ago

Well I asked some questions yesterday about classes of DAoC game to help me choose a starter class. It totally failed there attributing skills to wrong class. When poking it with this error it said : you are right, class x don’t do Mezz, it’s the speciality of class Z.

But class Z don’t do Mezz either… I wanted to gain some time. Finally I had to do the job myself because I could not trust anything it said.

linearchaos@lemmy.world · 9 months ago

God I loved DAoC, Play the hell of it back in it’s Hey Day.

I can’t help but think it would have low confidence on it though, there’s going to be an extremely limited amount of training data that’s still out there. I’d be interested in seeing how well it fares on world of Warcraft or one of the newer final fantasies.

The problem is there’s as much confirmation bias positive is negative. We can probably sit here all day and I can tell you all the things that it picks up really well for me and you can tell me all the things that it picks up like crap for you and we can make guesses but there’s no way we’ll ever actually know.

VinS@sh.itjust.works · 9 months ago

I like it for brainstorming while debbuging, finding funny names, creating stories “where you are the hero” for the kids or things that don’t impact if it’s hallucinating . I don’t trust it for much more unfortunately. I’d like to know your uses cases where it works. It could open my mind on things I haven’t done yet.

DAoC is fun, playing on some freeshard (eden actually, started one week ago, good community)

Oka@sopuli.xyz · 9 months ago

If I narrow down the scope, or ask the same question a different way, there’s a good chance I reach the answer I’m looking for.

https://chatgpt.com/share/ca367284-2e67-40bd-bff5-2e1e629fd3c0

Brkdncr@lemmy.world · 9 months ago

TIL ChatGPT is taking notes off my ex.

filister@lemmy.world · edit-2 9 months ago

And you as an analytics engineer should know that already? I am using some LLMs on almost a daily basis, Gemini, OpenAI, Mistral, etc. and I know for sure that if you ask it a question about a niche topic, the chances for the LLM to hallucinate are much higher. But also to avoid hallucinating, you can use different prompt engineering techniques and ask a better question.

Another very good question to ask an LLM is what is heavier one kilogram of iron or one kilogram of feathers. A lot of LLMs are really struggling with this question and start hallucinating and invent their own weird logical process by generating completely credibly sounding but factually wrong answers.

I still think that LLMs aren’t the silver bullet for everything, but they really excel in certain tasks. And we are still in the honeymoon period of AIs, similar to self-driving cars, I think at some point most of the people will realise that even this new technology has its limitations and hopefully will learn how to use it more responsibly.

bane_killgrind@slrpnk.net · 9 months ago

They seem to give the average answer, not the correct answer. If you can bound your prompt to the range of the correct answer, great

If you can’t bind the prompt it’s worse than useless, it’s misleading.

ipkpjersi@lemmy.ml · 9 months ago

Don’t use them for facts, use them for assisting you with menial tasks like data entry.

maniclucky@lemmy.world · 9 months ago

Best use I’ve had for them (data engineer here) is things that don’t have a specific answer. Need a cover letter? Perfect. Script for a presentation? Gets 95% of the work done. I never ask for information since it has no capability to retain a fact.

paraphrand@lemmy.world · 9 months ago

Those first set of specs it quoted are actually the original min specs that Oculus and Valve promoted for the Rift and Vive when they were new.

Ever since then there have not been new “official” min specs. But it’s true that higher spec if better and that newer headsets are higher res and could use higher spec stuff.

Also, a “well actually” on this would be that those are the revised min specs that were put out a few years after the initial specs. It use to be a GTX 970 was min spec. But they changed that to the 1060.

What is failing here is the model actually being smart. If it was smart it would have reasoned that time moves on and it would have considered better mins pecs for current hardware. But instead it just regurgitated the min specs that were once commonly quoted by Oculus/Meta and Valve.

🇰 🌀 🇱 🇦 🇳 🇦 🇰 🇮 🏆@yiffit.net · edit-2 9 months ago

Imagine text gen AI as just a big hat filled with slips of paper and when you ask it for something, it’s just grabbing random shit out of the hat and arranging it so it looks like a normal sentence.

Even if you filled it with only good information, it will still cross those things together to form an entirely new and novel response, which would invariably be wrong as it mixes info about multiple subjects together even if all the information individually was technically accurate.

They are not intelligent. They aren’t even better than similar systems that existed before LLMs!