As a programmer, I often use ChatGPT, Grok, and Claude in my work. These tools, created by OpenAI, xAI, and Anthropic respectively, have become indispensable in my workflow. They will help me with deployments or environment configurations, or write a quick function for automation.
I often check myself to see how much my brain hasn’t aged by using various brain simulators, including IQ tests. I wondered if AI could develop as well, so I decided to torture them a bit to see if they could already think abstractly like humans. To investigate this, I tested three leading AI models, focusing on two types of IQ questions: number sequencing tasks and visual logic puzzles. The results were striking, exposing both the strengths and weaknesses of current artificial intelligence – and offering a glimpse of where it might be headed and if AI outsmart humans or not (or not yet).
The first challenge was a numerical sequence test, sourced from iq-global-test.com, a platform known for its rigorous IQ assessments. The task was straightforward: identify the next number in the sequence 2, 4, 8, 16, 32, ?. This type of question tests pattern recognition, a cornerstone of human intelligence and a common feature in IQ tests.
I asked ChatGPT, Grok, and Claude this question to see how each of them would handle it. ChatGPT answered with characteristic speed and accuracy: “The next number is 64”. It found the answer in seconds, determining that each subsequent number doubles the previous one (2 × 2 = 4, 4 × 2 = 8, and so on). No fuss, no thinking – just the right answer, delivered with the efficiency I’ve come to expect from the latest gpt models.
Grok also answered correctly, predicting 64. But he took a more extensive approach, breaking down the pattern step by step: “Each term is obtained by multiplying the previous term by 2: 2 × 2 = 4, 4 × 2 = 8, 8 × 2 = 16, 16 × 2 = 32. Thus, 32 × 2 = 64.” As a programmer, I appreciated the clarity of Grok’s reasoning – this is exactly the kind of explanation I would write in a code comment to ensure transparency.
Claude similarly identified 64 as the next number, offering a concise explanation: “The sequence doubles each time, so 32 × 2 = 64.” Its response was accurate and to the point, striking a balance between ChatGPT’s brevity and Grok’s thoroughness.
All three AIs aced the numerical sequence test, demonstrating their ability to handle abstract patterns with ease. This wasn’t surprising—numerical sequences align closely with the kind of rule-based logic AI excels at, especially when trained on vast datasets of mathematical problems. But would they perform as well on a more complex, visual task?
The second test was a visual logic puzzle, also from iq-global-test.com. The question presented a 3×3 grid with different shapes and colors, and required the respondent to determine which option completed the pattern. These types of questions are notoriously difficult because they require the respondent to recognize relationships between shapes, positions, and colors – skills that involve human intuition and spatial reasoning. I chose a question that would not seem very difficult to a human, but due to the many different graphic elements, I wondered how my “assistants” would cope with it.
I was more optimistic, given how easily AI copes with image recognition in other contexts, such as describing photos or creating artwork. But the results turned out to be disappointing.
ChatGPT, despite its confidence, completely misread the grid. It accurately described the shapes and colors, but failed to identify the underlying pattern, offering the wrong answer with a conviction that seemed almost human in its overconfidence.
Grok also failed in the task. He provided a detailed description of the grid (which was not accurate), but failed to identify the logic that drives the sequence, resulting in an incorrect answer.
Claude followed suit, interpreting the visual elements correctly, but was unable to connect them into a coherent picture.
As a programmer, I’m used to AI parsing complex data structures, but these models-despite their advanced architecture-couldn’t replicate the human ability to “see” the logic in a visual puzzle. It’s one thing to describe an image, it’s another to analyze its abstract relationships. This limitation highlights a critical gap in modern artificial intelligence: while it is excellent at processing explicit rules and patterns, it has difficulty coping with the intuitive, holistic thinking that humans apply to visual tasks.
Everyone can already draw their own conclusions, but I’ll add a few of my own. For programmers like me, AI is a true gift—it can automate tasks that require logical reasoning, from optimizing algorithms to solving mathematically complex problems. However, a visual logic puzzle revealed a weakness in AI. Despite its ability to describe images in detail, none of the models could replicate the human ability to synthesize visual information into abstract patterns. This suggests that while AI can mimic certain aspects of intelligence, it lacks the flexible, intuitive reasoning that humans apply to ambiguous or novel tasks. It’s a reminder that intelligence, as measured by IQ tests, is not just about computation but also about perception, creativity, and adaptability.
In the future, these insights will have profound implications. First, they highlight the need to develop better multimodal reasoning in AI, which integrates visual and logical processing in a way that mimics human cognition. Progress in this area could lead to breakthroughs in fields like robotics, where understanding complex environments is critical. Second, they underscore the importance of human-AI collaboration. As a programmer, I see AI as a powerful tool, not a replacement for human intuition. By combining AI’s computational power with our own intuitive strengths, we can solve problems that neither could tackle alone.
Finally, these tests raise questions about how we measure intelligence—both human and artificial. IQ tests, designed for humans, may not fully capture the capabilities or limitations of AI. As AI continues to evolve, we’ll need new benchmarks that reflect its unique strengths, such as processing speed and data synthesis, while accounting for gaps in intuition and creativity.
ChatGPT, Grok, and Claude are amazing tools, and we see their development every day with the rise of competition. Perhaps soon, we’ll address these questions with technology—who knows.