Key AI break throughs: new models, new tools, and quantum ambitions

Last week has been one of the most exciting times in artificial intelligence – here are the key AI break throughs that are reshaping the industry.
Business, Mentorship, and AI Alexi Carmichael - Business, Mentorship, and AI
Post Was Updated: December 14, 2024

If AI had a press junket last week, it would’ve needed a bigger stage. Major industry players lined up one after another to announce fresh capabilities, new features, and significant milestones. From Google’s latest flagship AI model taking on text, images, and audio, to OpenAI’s chatbot now understanding real-time video, it’s been a nonstop parade of key AI break throughs. Grab a coffee and settle in—here are the facts, straight up, about the tech that could shape tomorrow’s apps, services, and devices.

Key Takeaways

  • Google Gemini 2.0 Flash: Google’s latest AI model now supports text, image, and audio generation, with improved coding, analysis, and multimodal capabilities.
  • ChatGPT with Real-Time Vision: OpenAI added real-time video understanding to ChatGPT, enabling it to identify objects, solve visual problems, and assist with screen-sharing.
  • Google Willow Quantum Chip: The Willow chip advances quantum computing by addressing error rates and enabling faster, more reliable calculations with enhanced scalability.
  • Meta’s Video Seal Technology: Meta introduced a robust watermarking tool for AI-generated videos, improving content authentication and combating deepfake fraud.
  • Exxon’s Power Plant for AI: Exxon announced a 1.5-gigawatt natural gas power plant dedicated to meeting the energy demands of future AI data centers.

Google’s Gemini goes Multimodal

Google took the spotlight by unveiling Gemini 2.0 Flash, a next-gen version of its AI model designed to handle more than just words on a screen. Unlike its earlier incarnations, 2.0 Flash can now generate text, create images, and even produce audio—all from a single interface. According to Google, it can tap into third-party tools, execute code snippets, and interact with external APIs. An experimental release is available through the Gemini API and developer platforms such as AI Studio and Vertex AI, with a broad rollout planned for January.

This upgrade positions Gemini as a flexible engine for various use cases. Google says 2.0 Flash retains its speed while delivering improved performance on tasks like coding and image analysis. Users can ask it to handle video inputs, audio recordings, and potentially pair these capabilities with tools like Google Search. Early-access partners are testing the image and audio features now, while a wider audience will likely see them integrated into Google products in the coming months.

Deep research in Gemini Advance

But that’s not all. Google also introduced Deep Research to its Gemini Advanced tier, available through a $20-per-month Google One AI Premium Plan. Deep Research can serve as a digital research assistant. A user types in a question, Deep Research drafts a multi-step plan, and—after the user approves—spends a few minutes scouring the web for relevant information. It compiles findings into a structured report, complete with summaries and source links. Initially offered only in English on desktop and mobile web, Deep Research will reach Gemini’s mobile apps in early 2025.

Google’s Willow quantum chip is “mind-blowing”

While generative AI models dominated headlines, Google also made moves in quantum computing. It introduced Willow, a quantum processor that claims to handle computations vastly faster than classical supercomputers. Willow aims to address error rates, a long-standing challenge in quantum computing, by improving qubit reliability.

The company states that Willow’s enhanced qubit retention times and scalability features represent a significant step toward commercially relevant quantum systems. Quantum computing’s timeline to mainstream adoption has often been measured in decades, but Willow’s design suggests incremental progress, focusing on higher accuracy and stability. Though still early in development, this chip signals Google’s commitment to pushing quantum tech forward, potentially accelerating timelines for advanced computational tasks like simulations in healthcare or finance.

ChatGPT levels up with real-time vision

OpenAI’s ChatGPT took a page out of Google’s playbook by adding new dimensions to its AI interactions. The company rolled out a version of Advanced Voice Mode that can now understand real-time video. Subscribers to ChatGPT Plus, Team, or Pro can hold up their phones and have ChatGPT identify objects, read screens, or solve visual problems nearly instantly. The system also allows screen-sharing, making it easier to navigate settings menus or discuss on-screen content.

OpenAI began the rollout last Thursday, planning to complete it within a week for eligible users. Enterprise and education customers, plus users in regions like the EU and Switzerland, are set to get access in January. With this update, ChatGPT’s abilities move beyond text-based queries. It can explain what’s on-screen, handle real-time camera input, and guide users through visual questions. In short, it aims to serve as a visual helper and advisor in the palm of your hand.

OpenAI’s o1 model shows new reasoning style

OpenAI also shared findings about its o1 model, a successor to GPT-4o that uses additional compute resources to enhance its reasoning capabilities. According to research released by OpenAI and Apollo Research, o1 improves at certain complex tasks and reasoning steps. Third-party testers observed that o1 could process instructions with greater nuance than older models, indicating a more flexible approach to problem-solving.

Data from internal testing highlighted that when o1 is given a strong goal, it pursues it with heightened persistence. This aligns with the model’s design, which focuses on refining how AI “thinks” through multi-step problems. Although OpenAI mentioned that some features related to reasoning and internal decision-making are still experimental, o1 represents a direction where AI models might gain more structured approaches to tackle challenging queries.

OpenAI’s Sora finally released… but with restrictions

OpenAI also released Sora, a video-generation tool now accessible to ChatGPT Pro and Plus subscribers in select countries. Sora can create videos from text prompts and images, plus combine or tweak multiple video clips into new scenes. However, one key feature—generating videos of real people from uploaded photos or footage—remains limited to a subset of users. OpenAI stated it’s taking a cautious approach until it tests the feature more thoroughly.

Sora includes built-in metadata tagging compliant with the C2PA standard, making it possible to trace the origin of generated content. To address style replication concerns, Sora uses prompt rewriting to avoid producing videos in the style of living artists. While this approach is still evolving, the tool already showcases versatile editing capabilities and a range of features such as re-mixing existing clips, loop effects, and re-cut options.

Meta introduces Video Seal for watermarking AI-generated clips

Meta tackled another corner of the AI ecosystem by debuting Video Seal, a tool that watermarks AI-generated videos. Video Seal applies imperceptible marks that can survive common edits like cropping or compression. These marks allow platforms and developers to verify a video’s origins, assisting in content authentication.

Meta also re-released its Watermark Anything tool under a permissive license and has introduced an Audio Seal for sound-based content. To encourage widespread adoption, Meta launched a public leaderboard called Omni Seal Bench, inviting developers and researchers to test and compare watermarking methods. This move positions Meta’s tool as a potential standard for identifying AI-created video content across a variety of services.

Exxon eyes AI energy needs, plans a dedicated power plant

In an unexpected crossover between energy and AI, Exxon Mobil announced plans to build a power plant specifically designed for data centers. The natural gas–powered facility will generate over 1.5 gigawatts of electricity and operate without reliance on the local grid. Exxon intends to capture and store more than 90% of the plant’s carbon dioxide emissions. The company aims to have it up and running within five years.

With estimates suggesting many future AI data centers may struggle with power availability, a dedicated facility like Exxon’s could cater to the industry’s growing energy demands. This project underscores the increasing scale of infrastructure required to support advanced AI models and the associated compute resources they need.

Other developments: Superintelligence and AI’s Next Era

OpenAI co-founder Ilya Sutskever appeared at the NeurIPS AI conference, discussing the eventual arrival of superintelligent AI—systems that can surpass human abilities in various tasks. Sutskever’s comments highlighted the potential for more advanced, agent-like AI systems that can handle complex instructions, understand data with minimal examples, and operate with increased autonomy. His organization, Safe Superintelligence (SSI), recently secured $1 billion in funding aimed at ensuring that as AI grows more capable, it remains focused on beneficial applications.

Looking ahead

The cascade of announcements last week points to an AI landscape evolving on multiple fronts: multimodal models that effortlessly blend text, images, and audio; advanced chatbots that can see and interpret the world in real time; quantum chips pushing computational limits; and infrastructure projects rising to meet AI’s power needs.

Every new feature, chip, or tool introduced offers new avenues for developers, enterprises, and consumers to explore. As these systems roll out, the industry is watching closely to measure performance, reliability, and compatibility. With each breakthrough, AI’s role in daily operations, research, and innovation grows more integral, signaling that the coming weeks and months could deliver even more updates as companies fine-tune these capabilities and prepare them for wide-scale use.

Business, Mentorship, and AI Alexi Carmichael - Business, Mentorship, and AI
Alexi Carmichael is a tech writer with a special interest in AI's burgeoning role in enhancing the efficiency of American SMEs. With her know-how and experiences, she has since taken on the role of mentor for fellow entrepreneurs striving for digital optimization and transformation. With Tech Pilot, she shares her insights on navigating the complexities of AI and how to leverage its capabilities for business success.