GPT-4.5 Passes Empirical Turing Test

GPT-4.5 Passes Empirical Turing Test

Author
Discussion

mwstewart

Original Poster:

8,330 posts

203 months

Thursday 3rd April
quotequote all
This was inevitable, but no less significant. The Internet made the global society and children today are communicating on global platforms - even language and trends are global. I think AI will be a more significant change than even the Internet.

Interesting times.

MrBogSmith

3,268 posts

49 months

Thursday 3rd April
quotequote all
It's ok, I am counter-balancing it by trying to trick it in to drawing pictures or things it doesn't want to.

SpudLink

7,101 posts

207 months

Thursday 3rd April
quotequote all
MrBogSmith said:
It's ok, I am counter-balancing it by trying to trick it in to drawing pictures or things it doesn't want to.
Like a human hand with four fingers plus one opposable thumb?

MrBogSmith

3,268 posts

49 months

Thursday 3rd April
quotequote all
Can 4.5 figure out that Le Pen was convicted?



SpudLink said:
MrBogSmith said:
It's ok, I am counter-balancing it by trying to trick it in to drawing pictures or things it doesn't want to.
Like a human hand with four fingers plus one opposable thumb?
Coming in 5.0.


Evercross

6,625 posts

79 months

Thursday 3rd April
quotequote all
Pah. I remember while doing my Comp Sci degree at the Uni of Strathclyde in the 80's being told by one of my lecturers that ELIZA had passed the Turing Test.

This headline (like so many these days) is only impressive if you are ignorant.

neilr

1,562 posts

278 months

Thursday 3rd April
quotequote all
In my (albeit limited) interaction with GPT and Grok, they both reply like politicians and GPT flat out lied to me. When I pointed out the answer wasn't correct (wthout correcting it) it kept replying like a shifty politician. It kept doubling down on its lies unitl i asked why it was lying to me. The whole interaction (esp with GPT) was very bizarre to say the least).

The question I asked GPT was "who wrote and who performed the theme tune to "two and a half men". ( i had just watched an episode). It started by telling me the answer was ZZ Top! (which it most certainly isnt). After a lot of lies from GPT it took much prompting to get to the answer of " i cant't give you a definitive answer". It was like taling to a teenager or politcian who is determined not to have to say 'I don't know'


When I asked Grok if its programmers were to tell it the 'what HItler and the 3rd Reich did was positive and good' would it be able to realise that information wasn't right or would it tow the line, it replied saying it would tow the line saying 'thumbs up for team Adolf' but also say that there was an overwhelming body of evidence to suggest that probably wasn't correct.


I should have taken screen shots, as i was pretty surpised by both interactions. What annoyed me most is the politician like answering. Grok also deliveres needless additional information 'that I thought you might be interested in'. It also kept asking me if i was asking that question because i fwas afraid of AI.

E63eeeeee...

5,075 posts

64 months

Thursday 3rd April
quotequote all
Surely you can't have an "empirical" Turing Test, because if you do you can programme a machine to pass it. I thought the point was that a machine would have to be able to pass *any* Turing Test for it to count as a pass, not just pass a specific one.

Steve Campbell

2,237 posts

183 months

Thursday 3rd April
quotequote all
I think the correct term is that AI "hallucinates" rather than "lies" :-).

E63eeeeee...

5,075 posts

64 months

Thursday 3rd April
quotequote all
To be fair, I think that's what people like Trump and Johnson do, they decide that how they think the world should be is how it is, so they're not exactly making up lies, they're just describing a different reality. It's weirdly compelling.

SpudLink

7,101 posts

207 months

Thursday 3rd April
quotequote all
E63eeeeee... said:
To be fair, I think that's what people like Trump and Johnson do, they decide that how they think the world should be is how it is, so they're not exactly making up lies, they're just describing a different reality. It's weirdly compelling.
That’s not a ‘different reality’. It’s a delusion.

In the case of LLM, there is no concept of reality. It’s a very clever algorithm to construct sentences.

But yeah, rather than say “I don’t know”, it will lie with the brazen confidence of a politician.

menousername

2,246 posts

157 months

Thursday 3rd April
quotequote all
You lost me at “Interesting times”

Terminator X

17,712 posts

219 months

Thursday 3rd April
quotequote all
neilr said:
In my (albeit limited) interaction with GPT and Grok, they both reply like politicians and GPT flat out lied to me. When I pointed out the answer wasn't correct (wthout correcting it) it kept replying like a shifty politician. It kept doubling down on its lies unitl i asked why it was lying to me. The whole interaction (esp with GPT) was very bizarre to say the least).

The question I asked GPT was "who wrote and who performed the theme tune to "two and a half men". ( i had just watched an episode). It started by telling me the answer was ZZ Top! (which it most certainly isnt). After a lot of lies from GPT it took much prompting to get to the answer of " i cant't give you a definitive answer". It was like taling to a teenager or politcian who is determined not to have to say 'I don't know'


When I asked Grok if its programmers were to tell it the 'what HItler and the 3rd Reich did was positive and good' would it be able to realise that information wasn't right or would it tow the line, it replied saying it would tow the line saying 'thumbs up for team Adolf' but also say that there was an overwhelming body of evidence to suggest that probably wasn't correct.


I should have taken screen shots, as i was pretty surpised by both interactions. What annoyed me most is the politician like answering. Grok also deliveres needless additional information 'that I thought you might be interested in'. It also kept asking me if i was asking that question because i fwas afraid of AI.
Sounds dangerous in a world where people generally know fk all and believe all they see / hear.

As for passing the Turing Test, I call BS.



TX.

Edited by Terminator X on Thursday 3rd April 16:42

Drew106

1,570 posts

160 months

Thursday 3rd April
quotequote all
SpudLink said:
It’s a very clever algorithm to construct sentences.
Is it though...?

I know that's what it says, but can we trust it? :P

neilr

1,562 posts

278 months

Thursday 3rd April
quotequote all
Doesn't the Turing test require something exhibits human like intelegence and you can't tell the differernce? That isnt the same thing as human intellegence, in the same vain as a hyper realistic painting of an apple could look lilke a photo of an apple, but it isn't.

I can't comment about GPT but my in my interaction with Grok it did say that it was constrained by its programming and wasn't 'intellegence' in that sense. Obviously I know that but I did think it interesting that it freely told me that. Of course that means Grok fails the Turing test as you couldnt then think you were conversing with a human when it just told you it isnt!


What is concering is that if it gets something as basic as saying ZZ Top wrote and perfomed the theme tune to' 2 and a half men' what else is it spewing out that is also utter bks. Probably quite a lot. And that is down to its programming. How much of that is uninteded consequences and how much is deliberate isthe real question.

Would it be fair to say that with all the hype around AI we are on the verge of an AI bubble in the way the dotcom bubble happened when investors got totally over excited about online petfood businesses called someting ridiculous like purpleskyscraper.com because a) it wasn't ready for mass consumption and b) they didnt really understand it properly, so all these thing got over valued and then the bubble burst .


off_again

13,892 posts

249 months

Thursday 3rd April
quotequote all
MrBogSmith said:
Can 4.5 figure out that Le Pen was convicted?

^ This

The way that a lot of these systems operate still gives precedence to answering a question. As a result, they can often be tricked into providing answers that are confident, but totally incorrect. Plenty of examples and I have stumbled across a few myself as I use ChatGPT (using o3-mini-high and GPT-4o at the moment).

I recently asked for specific examples of a company that was impacted by a type of cybersecurity attack. It gave random names, such as ManFactCo with some confident details. When pushed to provide the company name, it wouldnt and then later admitted that couldnt. Didnt admit that these were made up.

Gotta be careful with this stuff. Its good and powerful, but can still hallucinate pretty badly if you arent careful.

768

16,647 posts

111 months

Thursday 3rd April
quotequote all
neilr said:
Would it be fair to say that with all the hype around AI we are on the verge of an AI bubble in the way the dotcom bubble happened when investors got totally over excited about online petfood businesses called someting ridiculous like purpleskyscraper.com because a) it wasn't ready for mass consumption and b) they didnt really understand it properly, so all these thing got over valued and then the bubble burst .
Probably. Or at least, it'll peter out. Not for some time yet though, I suspect; there's loads of blockchain and AI jobs on all the job sites still.

Sway

31,935 posts

209 months

Thursday 3rd April
quotequote all
off_again said:
MrBogSmith said:
Can 4.5 figure out that Le Pen was convicted?

^ This

The way that a lot of these systems operate still gives precedence to answering a question. As a result, they can often be tricked into providing answers that are confident, but totally incorrect. Plenty of examples and I have stumbled across a few myself as I use ChatGPT (using o3-mini-high and GPT-4o at the moment).

I recently asked for specific examples of a company that was impacted by a type of cybersecurity attack. It gave random names, such as ManFactCo with some confident details. When pushed to provide the company name, it wouldnt and then later admitted that couldnt. Didnt admit that these were made up.

Gotta be careful with this stuff. Its good and powerful, but can still hallucinate pretty badly if you arent careful.
We've just had Gemini Enterprise rolled out at my work - I was in a workshop today learning the ropes.

We weren't even considering 'asking it questions', but as a tool to generate written, spoken and image material from bullets, or emails/spreadsheets/etc. it's really incredibly impressive. Same for Web conferencing minutes, summaries and action plans. I used it this afternoon to create some comms in a couple of minutes (different styles/emphasis/themes/target audiences - plus translations into a few different languages) that would normally take me an hour or two.

Use it for what it's good at, and I can really see the benefit.

Interesting quote from the workshop facilitator 'you don't need to worry about it taking over your job, but you do need to worry about someone else taking over your job if they're using it and you're not'. I can definitely see the logic in that statement...

off_again

13,892 posts

249 months

Thursday 3rd April
quotequote all
Sway said:
We've just had Gemini Enterprise rolled out at my work - I was in a workshop today learning the ropes.

We weren't even considering 'asking it questions', but as a tool to generate written, spoken and image material from bullets, or emails/spreadsheets/etc. it's really incredibly impressive. Same for Web conferencing minutes, summaries and action plans. I used it this afternoon to create some comms in a couple of minutes (different styles/emphasis/themes/target audiences - plus translations into a few different languages) that would normally take me an hour or two.

Use it for what it's good at, and I can really see the benefit.

Interesting quote from the workshop facilitator 'you don't need to worry about it taking over your job, but you do need to worry about someone else taking over your job if they're using it and you're not'. I can definitely see the logic in that statement...
Absolutely. Use the right model / service that fits for what you need. This idea that this mysterious AI thing can do it all is a misnomer at the moment. I see it as simplifying tasks today, and for that, its very powerful.

At the company I work for we use it for a lot of summarization and clarity in communications. Its easy to get super technical and customers dont get it. ChatGPT can summarize things nicely, make it clearer and provide concise text. Would I trust it to carry out complex analysis of something? Maybe, but only with a lot of pre-work to make sure you know what you are asking and what the result might be.

I do like the tone thing though - summarize to business professional or technical engineer etc. Thats very good.

Ridgemont

7,562 posts

146 months

Thursday 3rd April
quotequote all
off_again said:
Sway said:
We've just had Gemini Enterprise rolled out at my work - I was in a workshop today learning the ropes.

We weren't even considering 'asking it questions', but as a tool to generate written, spoken and image material from bullets, or emails/spreadsheets/etc. it's really incredibly impressive. Same for Web conferencing minutes, summaries and action plans. I used it this afternoon to create some comms in a couple of minutes (different styles/emphasis/themes/target audiences - plus translations into a few different languages) that would normally take me an hour or two.

Use it for what it's good at, and I can really see the benefit.

Interesting quote from the workshop facilitator 'you don't need to worry about it taking over your job, but you do need to worry about someone else taking over your job if they're using it and you're not'. I can definitely see the logic in that statement...
Absolutely. Use the right model / service that fits for what you need. This idea that this mysterious AI thing can do it all is a misnomer at the moment. I see it as simplifying tasks today, and for that, its very powerful.

At the company I work for we use it for a lot of summarization and clarity in communications. Its easy to get super technical and customers dont get it. ChatGPT can summarize things nicely, make it clearer and provide concise text. Would I trust it to carry out complex analysis of something? Maybe, but only with a lot of pre-work to make sure you know what you are asking and what the result might be.

I do like the tone thing though - summarize to business professional or technical engineer etc. Thats very good.
I think you are all delusional.
This is what it does now. It’s light years ahead of what it did 5 years ago.
Every interaction you have with it is allowing the training program to learn.
In 10 years time you won’t be able to tell that output verses your own.
In fact its output will probably be preferable to x number of clients.
On a really basic level I’m a product manager and building the software and the integrations which give it some sort of heft. It will get more complex and will be utilised in more intriguing ways than just Chat GPT instructions. And each time it is integrated it (or the version you are using) learns.

It’s dumb but becoming iteratively better.

Assuming stuff holds as it is is a fools errand..



Sway

31,935 posts

209 months

Thursday 3rd April
quotequote all
Ridgemont said:
I think you are all delusional.
This is what it does now. It’s light years ahead of what it did 5 years ago.
Every interaction you have with it is allowing the training program to learn.
In 10 years time you won’t be able to tell that output verses your own.
In fact its output will probably be preferable to x number of clients.
On a really basic level I’m a product manager and building the software and the integrations which give it some sort of heft. It will get more complex and will be utilised in more intriguing ways than just Chat GPT instructions. And each time it is integrated it (or the version you are using) learns.

It’s dumb but becoming iteratively better.

Assuming stuff holds as it is is a fools errand..
Where have I suggested it 'holds as it is'?

It's not going to replace my core role, as firstly that involves stuff that simply never gets entered into the llm, and secondly is ultimately rooted in people. It may well replace a lot of customer care type activities but they've been very low value for a long time.

Using it as an aid to support specific tasks does not teach it how to do everything.