The AI arms race shows no signs of slowing down, with Anthropic being the latest major player to make waves with their new AI model, Claude 3.
Founded by former OpenAI researchers including siblings Daniela and Dario Amodei, this startup has garnered significant funding and attention for its mission: to develop safe and aligned artificial intelligence that benefits society.
Claude by Anthropic started as an early chatbot contender to the forerunner Chat GPT. On the 3rd generation, has now evolved into a multimodal powerhouse claiming to surpass GPT-4 and Google’s Gemini across various benchmarks – Claude 3 been launched in march 2024.
Our team at Tech Pilot has took Claude 3 for a test drive on different tasks and see how it performed for practical use cases such as copywriting, marketing copies, brain storming for business and more. Stay tuned to find out what we know so far:
At the core of Claude 3’s capabilities lies its multimodal nature – the ability to analyze not just text but also images, charts, graphics, and technical diagrams. This visual understanding represents a significant step forward, opening up new possibilities for sectors like healthcare, engineering, and data analysis that heavily rely on non-textual information.
Underlying Claude 3 is Anthropic’s “Constitutional AI” approach, which is built on ethical principles and safeguards from the ground up. This safety-first mindset could prove invaluable as AI systems become increasingly sophisticated and ubiquitous. However, critics question whether such measures can truly prevent all potential misuse or unintended consequences. Those guardrails place it into a special category: Anthropic is a Public Benefit Corporation (PBC) with ties to the effective altruism movement.
Incorrect refusal of completing a harmless task has been greatly reduced
The Claude 3 family comprises three distinct models, each catering to different needs and budgets.
To substantiate its claims, Anthropic has put Claude 3 through a gauntlet of industry-standard benchmarks evaluating reasoning, mathematical aptitude, and general knowledge. The results, according to the company, show Opus outperforming GPT-4 and Gemini in domains like graduate-level reasoning and grade school mathematics.
Independent verification of these claims is still lacking, as Anthropic’s benchmarks have yet to be rigorously scrutinized by third parties. Given the high stakes and rapid pace of AI development, impartial testing and validation are urgently needed to establish transparency and build public trust.
One particular incident that stirred debate was an internal Anthropic test where Claude 3 Opus appeared to demonstrate a form of “metacognition” or self-awareness. During a “needle-in-the-haystack” evaluation meant to test recall abilities, Opus not only located the target sentence but also recognized it as artificially inserted and out of place. This sparked curiosity from some experts, but also skepticism from others who attributed it to pattern matching rather than true self-awareness.
In my own experience testing Claude 3, I found its performance impressive yet inconsistent. While it excelled at certain tasks like image, diagrams and data analysis, there were instances where it struggled with factual accuracy or provided incoherent responses, reminding me that current AI is still far from human-level general intelligence. However, generally, when used for content writing, the tone is less robotic than ChatGPT-4 or Gemini.
Mihai from Tech Pilot
For this part, we have prompted Claude 3 – Sonnet to create part of the content for this article and here is what it came out with:
Tech Pilot Team
- Key take away: Most advanced LLMs are still prone to hallucinations and seems like Claude 3 is really getting ahead of itself. Or perhaps, predict the future with that GPT-5 claim?
- Key take away #2: Always double check the output, as we do at Tech Pilot – We are using AI to help with ideation and crafting content, yet our writers are deeply engaged in the creative process.
Claude 3 in Action: Practical Applications
While benchmarks provide insightful data points, the true test of an AI assistant lies in its real-world utility across diverse business scenarios. Claude 3’s multimodal capabilities open up myriad potential use cases:
For users eager to experience Claude 3 first-hand, access is currently available through the Anthropic website and Claude API, spanning 159 countries at launch. Opus can be accessed through a paid Claude Pro subscription, while Sonnet powers the free Claude AI experience.
Cluade 3 might be able to beat GPT-4 and Gemini in few technical benchmark. Yet, it still have a way to go in order to dethrone Chat GPT as the leading LLMs on the market. Here are few reasons:
First move advantage – Chat GPT has captivated the collective imagination and their constant upgrades are always bring new flavor to the LLM space: Plug ins, Custom GPTs, Web browsing. And probably soon GPT-5.
No Web Browsing – Not having access to most up to date, accurate data might be a deal breaker for some people. Everything boils down to the use case where LLMs are implemented.
Anthropic AI has been built on the constitutional AI Framework – which makes it the best option for sensitive use cases such as health care, finance, psychotherapy, consulting and pretty much anything dealing with sensitive information.
Moreover, the latest multi-modal capabilities of recognizing images and diagrams does cement Claude’s position as a strong option on the healthcare front, by being able to recognize image patterns and make sense of data (e.g. Disease recognition from MRI Imagery)
It’s still early stages to have a better understanding on the full capabilities of Claude 3 family – I am a strong believer that the AI applications are only limited by human imagination