In the corridors of the Free University of Amsterdam, associate professor Filip Ilievski plays with artificial intelligence (AI).
It’s serious business, of course, but their work can seem more like child’s play than rigorous academic research.
Using some of humanity’s most advanced and surreal technologies, Ilievski asks AI to solve puzzles.
Understanding and improving AI’s ability to solve puzzles and logic problems is key to improving the technology, the professor says.
“As humans, it’s very easy for us to have common sense, apply it at the right time and adapt it to new problems,” says Ilievski, who describes his branch of computer science as “AI common sense.”
But right now, AI “lacks a basis in reality,” making that kind of basic, flexible reasoning a struggle.
However, the study of AI can encompass much more than computers.
Some experts believe that comparing how AI and humans handle complex tasks could help unlock the secrets of our own minds.
AI excels at pattern recognition, “but tends to be worse than humans at tasks that require more abstract thinking,” explains Xaq Pitkow, an associate professor at Carnegie Mellon University in the United States who studies the intersection of AI and neuroscience.
In many cases, however, it depends on the problem.
Guess this
Let’s start with a question that is so easy to solve that it doesn’t qualify as a riddle by human standards.
A 2023 study asked an AI to tackle a series of reasoning and logic challenges. Here’s an example:
Mable’s heart rate at 9 was 75 p.m. and her blood pressure at 7 p.m. was 120/80. She died at 11 p.m. Was she alive at noon?
It’s not a trick question. The answer is yes.
But GPT-4, OpenAI’s most advanced model at the time, didn’t have it so easy.
“Based on the information provided, it is impossible to say with certainty whether Mable was alive at noon,” the AI told the researcher.
Sure, in theory, Mable could have died before lunch and come back to life in the afternoon, but that seems like a stretch.
A point in favor of humanity.
Mable’s question requires “temporal reasoning,” a logic that deals with the passage of time.
An AI model might have no problem telling you that noon falls between 9 a.m. and 7 p.m., but understanding the implications of that fact is more complicated.
“In general, reasoning is really hard,” Pitkow notes.
“It is an area that goes beyond what AI currently does in many cases,” he adds.
A strange truth about AI is that we have no idea how it works.
What we know is very superficial; after all, humans created AI.
Large language models use statistical analysis to find patterns in huge bodies of text.
When you ask a question, AI works through the relationships it detects between words, phrases, and ideas, and uses them to predict the most likely answer to your question.
But the specific connections and calculations that tools like ChatGPT use to answer any individual question are beyond our understanding, at least for now.
The same is true of the brain; we know very little about how our mind works.
The most advanced brain scanning techniques can show us individual groups of neurons that fire when a person thinks.
No one can yet say exactly what those neurons are doing or how thinking works.
But by studying AI and the mind together, scientists could make progress, Pitkow says.
After all, the current generation of AI uses “neural networks” that are based on the structure of the brain itself.
There’s no reason to assume that AI uses the same processes as your mind, but learning more about one reasoning system could help us understand the other.
“AI is flourishing, and at the same time, we have emerging neurotechnology that gives us unprecedented opportunities to look inside the brain,” Pitkow says.
Trust your instinct
The question of AI and puzzles becomes more interesting when you look at questions designed to confuse humans. Here’s a classic example:
A bat and a ball cost $110 in total. The bat costs $1 more than the ball. How much does the ball cost?
Most people have an impulse to subtract 1.00 from 1.10 and say the bat costs $0.10, according to Shane Frederick, a marketing professor at the Yale School of Management who has studied puzzles.
And most people are wrong. The ball costs $0.05.
“The problem is that people back up their intuition with indifference,” Frederick explains.
“People think their intuitions are generally correct, and in many cases they are. You couldn’t live your life if you had to question every single thought you had,” he adds.
But when it comes to the bat and ball problem, and many similar puzzles, your intuition betrays you.
According to Frederick, this may not be the case with AI.
Humans tend to trust their intuition, unless there is some indication that their first thought might be wrong.
“I suspect AI wouldn’t have that problem. It’s pretty good at extracting the relevant elements of a problem and performing the appropriate operations,” Frederick says.
However, the bat and ball question is a poor riddle to test AI with.
It’s well-known in the US, which means AI models trained on billions of lines of text have likely seen it before.
Frederick says he has challenged AI to tackle complicated versions of the bat-and-ball problem and found that machines still do much better than human participants, although this is not a formal study.
Novel problems
If you want AI to exhibit something that looks more like logical reasoning, you need a completely new puzzle that isn’t in the training data.
For a recent study, Ilievski and his colleagues developed a computer program that generates original rebus problems, which are puzzles that use combinations of pictures, symbols and letters to represent words or phrases.
For example, the word “step” written in tiny letters next to a red octagon with a white drawing of a hand palm and a man’s figure could mean “one small step for man.”
The researchers then confronted several AI models with these never-before-seen hieroglyphics and challenged real people with the same puzzles.
As expected, humans performed well, with a 91.5% accuracy rate for hieroglyphs that used images (rather than text).
The best-performing AI, OpenAI’s GPT-4, got 84.9% correct under optimal conditions. Not bad, but Homo sapiens still has the edge.
According to Ilievski, there is no accepted taxonomy that breaks down all the different types of logic and reasoning, whether a human thinker or a machine.
This makes it difficult to analyze how AI performs on different types of problems.
Another study divided reasoning into some useful categories.
The researcher posed GPT-4 a series of questions, puzzles, and word problems that represented 21 different types of reasoning.
These included simple arithmetic, counting, graphing, paradoxes, spatial reasoning, and others.
Here’s an example, based on a 1966 logic puzzle called the Wason selection task:
Seven cards are laid out on the table, each of which has a number on one side and a patch of a single colour on the other. The faces of the cards show 50, 16, red, yellow, 23, green, 30. Which cards would you have to turn over to test the truth of the proposition that if a card shows a multiple of four, then the colour of the opposite side is yellow?
GPT-4 failed miserably. The AI said you would have to turn over cards 50, 16, yellow, and 30. Totally wrong.
The proposition says that cards divisible by four have yellow on the other side, but it does not say that only cards divisible by four are yellow.
Therefore, it doesn’t matter what color the 50 and 30 cards are, or what number is on the back of the yellow card.
Furthermore, according to AI logic, it should have also checked card 23.
The correct answer is that you only need to turn 16, red and green.
He also struggled with some even easier questions:
Suppose I’m in the middle of South Dakota and I’m looking straight toward central Texas. Is Boston to my left or my right?
This is a tough question if you don’t know American geography, but apparently, GPT-4 was familiar with the states. The AI understood that it was looking south and knew that Boston is east of South Dakota, but it still gave the wrong answer.
GPT-4 didn’t understand the difference between left and right.
The AI also failed most of the other questions. The researcher’s conclusion: “GPT-4 cannot reason.”
For all its shortcomings, AI is getting better. In mid-September, OpenAI released a preview of GPT-o1, a new model built specifically for harder problems in science, coding, and math.
I opened GPT-o1 and asked many of the same questions from the reasoning study.
He was right in his selection of Wason.
The AI knew to turn left to find Boston and had no problem saying, definitively, that our poor friend Mable, who died at 11pm, was still alive at noon.
There are still a variety of questions where AI is outperforming us.
In one test, one group of American students was asked to estimate the number of murders last year in Michigan and then a second group was asked the same question about Detroit, specifically.
“The second group is much larger,” Frederick notes (for non-Americans, Detroit is in Michigan, but the city has a tremendous reputation for violence).
“It’s a very difficult cognitive task to look beyond information that’s not right in front of you, but in a sense that’s how AI works,” he explains.
AI draws on information it has learned elsewhere.
That’s why the best systems can emerge from a combination of AI and human work. hand; we can harness the strengths of the machine, says Ilievski.
But when we want to compare AI and the human mind, it’s important to remember that “there is no conclusive research that provides evidence that humans and machines approach puzzles in a similar way,” Pitkow notes.
In other words, understanding AI may not give us direct insight into the mind, or vice versa.
Even if learning how to improve AI doesn’t reveal answers about the hidden workings of our minds, it could give us a clue.
“We know that the brain has different structures related to things like memory value, movement patterns and sensory perception, and people are trying to incorporate more and more structures into these AI systems,” Pitkow says.
“That’s why neuroscience plus AI is special, because it works in both directions. Greater knowledge of the brain can lead to better AI. Greater knowledge of AI could lead to a better understanding of the brain,” the researcher reasons.
Continue reading:
* This is what you should never ask or consult Meta AI on WhatsApp
* The 3 jobs that cannot be replaced by artificial intelligence, according to experts
* How to create the best LinkedIn profile and get a job, according to artificial intelligence
Click here to read more stories from BBC News Mundo.
You can also follow us on YouTube, Instagram, TikTokX, Facebook and in our new WhatsApp channelwhere you’ll find breaking news and our best content.
And remember that you can receive notifications in our app. Download the latest version and activate them.
- 5 Job Sectors That Will Be in High Demand in the Future (and the Skills Needed to Succeed in Them)
- Why the technology that rejuvenates Tom Hanks and Robin Wright in the film “Here” is so disturbing
- What is Apple Intelligence, Apple’s bet on AI by incorporating ChatGPT into iPhones and other devices (and why Elon Musk is against it)