I'm not aware that the Turing test has been tried in its original formulation, where both a human and a machine try to convince another human that they are human and the other is a machine. They may still be able to pass that, but probably not in its commercial embodiments, as the simplest trick for the human would be to start talking about topics the commercial models aren't allowed to mention. This is of course not a limitation of the technology itself. Still, the original version of the test is harder than the versions that I've seen, where a human has a one on one conversation with an unknown entity and rates them as human or machine afterwards.
There has been quite a bit of discussion about the validity of the original Turing test, for it truly being able to test if the one you are communicating with is a human or a machine.
The Turing test was of course created by Alan Turing, and can be simplified as:
Three players:
- A: A man (or computer trying to act as a human)
- B: A woman (basically a 'human' control)
- C: An interrogator (of any gender) that was isolated in a separate room
The interrogator C, would communicate with A and B through text, this to avoid audio, visual or other sensory clues.
C would ask questions for 5 minutes to determine if A or B would be the human and who would be the machine.
Then the critical part that Turing stated:
"
I believe that in about fifty years’ time it will be possible to programme computers, with a storage capacity of about 10^9, to make them play the imitation game so well that an average interrogator will not have more than 70 percent chance of making the right identification after five minutes of questioning."
This gives an, IMHO unfair artificial, advantage to pass the test, by only having to succeed in 30% or more of the cases.
In 2014 there was the first, very controversial, first 'pass' of this test with a 33% score, where they specifically designed the machine to simulate a 13 year old Ukrainian boy to mask grammatical errors and stuff like lacking general western knowledge.
Then you have
this study from May 2024 that is often cited as the first robust modern pass, with a score of 54%, making it pass the random 50%.
After that the debate got very heated again, by many stating that the original Turing test is lacking in many important aspects, to make it a valid test for the original idea.
So now you have more modern tests, like the Lovelace test in several iterations, that tries to focus more on the ability to create original and complex works like stories or music, proving that it can create something novel beyond traditional programming.
Anyways, to me it's bizarre the identification between producing speech and intelligence. I think it's just a manifestation of typical mind fallacy and anthropomorphization, and not that different from attributing intentions to a storm.
As I see it, a prerequisite for intelligence, shared by life at all points of the intelligence spectrum, is a set of values/goals to pursue and maximize, some very common ones being self preservation and reproduction. In that sense, it could be argued that a simple thermostat is closer to intelligence than a LLM, as it acts upon the world to pursue and preserve a certain, "desirable" state. The attempts to make LLMs closer to this are lackluster, post-hoc, bolted-on approaches that essentially fail. I don't think language is a good medium for core intelligence: it just enables already intelligent beings to get to further levels of intelligence. It's a bad foundation for intelligence. With all its limitations and acknowledging that they're far from complex intelligence, I think reinforcement learning systems are often much closer to actual, animal intelligence, and it's a much more solid foundation.
Hopefully, once the bubble pops and the hype moves onto another thing, LLMs will be studied more rigorously by actual scientists without the goal of attracting even more investment. They are a very interesting technology that essentially solved natural language processing, and deserve to be studied seriously.
Yes, that's a very valid vision on this topic as well.
And you can already see the first cracks in the corporate world, where applied science is pretty much unable to provide the much hyped fully functional A.I. agents, that keep performing reliably over time.
That however does not stop the corporate world to just do the basic business risks calculations, where outcomes like A.I. performing 80% of the functionality of your average human, being more than enough to replace the humans for a better profit maximization, a good example where this happened pretty rapidly is in customer support where those sh*tty chat bots replaced humans as the first line of contact.
Then again, I use A.I. pretty much daily as some kind of 'junior assistant' to perform boiler plate tasks for me, were it's fairly easy for me to validate the results. And just like with Google search, it surprises me that people often struggle to get good enough results from LLM, where they give it way too ambiguous or complex tasks, or my personal favorite: where they start to argue with the A.I. about the results not being to their liking and then move to a state of reasoning why, and in that proces getting more and more angry by the forever polite responses from the LLM's.
