If AI had a press junket last week, it would’ve needed a bigger stage. Major industry players lined up one after another to announce fresh capabilities, new features, and significant milestones. From Google’s latest flagship AI model taking on text, images, and audio, to OpenAI’s chatbot now understanding real-time video, it’s been a nonstop parade of key AI break throughs. Grab a coffee and settle in—here are the facts, straight up, about the tech that could shape tomorrow’s apps, services, and devices.
Google took the spotlight by unveiling Gemini 2.0 Flash, a next-gen version of its AI model designed to handle more than just words on a screen. Unlike its earlier incarnations, 2.0 Flash can now generate text, create images, and even produce audio—all from a single interface. According to Google, it can tap into third-party tools, execute code snippets, and interact with external APIs. An experimental release is available through the Gemini API and developer platforms such as AI Studio and Vertex AI, with a broad rollout planned for January.
This upgrade positions Gemini as a flexible engine for various use cases. Google says 2.0 Flash retains its speed while delivering improved performance on tasks like coding and image analysis. Users can ask it to handle video inputs, audio recordings, and potentially pair these capabilities with tools like Google Search. Early-access partners are testing the image and audio features now, while a wider audience will likely see them integrated into Google products in the coming months.
But that’s not all. Google also introduced Deep Research to its Gemini Advanced tier, available through a $20-per-month Google One AI Premium Plan. Deep Research can serve as a digital research assistant. A user types in a question, Deep Research drafts a multi-step plan, and—after the user approves—spends a few minutes scouring the web for relevant information. It compiles findings into a structured report, complete with summaries and source links. Initially offered only in English on desktop and mobile web, Deep Research will reach Gemini’s mobile apps in early 2025.
While generative AI models dominated headlines, Google also made moves in quantum computing. It introduced Willow, a quantum processor that claims to handle computations vastly faster than classical supercomputers. Willow aims to address error rates, a long-standing challenge in quantum computing, by improving qubit reliability.
The company states that Willow’s enhanced qubit retention times and scalability features represent a significant step toward commercially relevant quantum systems. Quantum computing’s timeline to mainstream adoption has often been measured in decades, but Willow’s design suggests incremental progress, focusing on higher accuracy and stability. Though still early in development, this chip signals Google’s commitment to pushing quantum tech forward, potentially accelerating timelines for advanced computational tasks like simulations in healthcare or finance.
OpenAI’s ChatGPT took a page out of Google’s playbook by adding new dimensions to its AI interactions. The company rolled out a version of Advanced Voice Mode that can now understand real-time video. Subscribers to ChatGPT Plus, Team, or Pro can hold up their phones and have ChatGPT identify objects, read screens, or solve visual problems nearly instantly. The system also allows screen-sharing, making it easier to navigate settings menus or discuss on-screen content.
OpenAI began the rollout last Thursday, planning to complete it within a week for eligible users. Enterprise and education customers, plus users in regions like the EU and Switzerland, are set to get access in January. With this update, ChatGPT’s abilities move beyond text-based queries. It can explain what’s on-screen, handle real-time camera input, and guide users through visual questions. In short, it aims to serve as a visual helper and advisor in the palm of your hand.
OpenAI also shared findings about its o1 model, a successor to GPT-4o that uses additional compute resources to enhance its reasoning capabilities. According to research released by OpenAI and Apollo Research, o1 improves at certain complex tasks and reasoning steps. Third-party testers observed that o1 could process instructions with greater nuance than older models, indicating a more flexible approach to problem-solving.
Data from internal testing highlighted that when o1 is given a strong goal, it pursues it with heightened persistence. This aligns with the model’s design, which focuses on refining how AI “thinks” through multi-step problems. Although OpenAI mentioned that some features related to reasoning and internal decision-making are still experimental, o1 represents a direction where AI models might gain more structured approaches to tackle challenging queries.
OpenAI also released Sora, a video-generation tool now accessible to ChatGPT Pro and Plus subscribers in select countries. Sora can create videos from text prompts and images, plus combine or tweak multiple video clips into new scenes. However, one key feature—generating videos of real people from uploaded photos or footage—remains limited to a subset of users. OpenAI stated it’s taking a cautious approach until it tests the feature more thoroughly.
Sora includes built-in metadata tagging compliant with the C2PA standard, making it possible to trace the origin of generated content. To address style replication concerns, Sora uses prompt rewriting to avoid producing videos in the style of living artists. While this approach is still evolving, the tool already showcases versatile editing capabilities and a range of features such as re-mixing existing clips, loop effects, and re-cut options.
Meta tackled another corner of the AI ecosystem by debuting Video Seal, a tool that watermarks AI-generated videos. Video Seal applies imperceptible marks that can survive common edits like cropping or compression. These marks allow platforms and developers to verify a video’s origins, assisting in content authentication.
Meta also re-released its Watermark Anything tool under a permissive license and has introduced an Audio Seal for sound-based content. To encourage widespread adoption, Meta launched a public leaderboard called Omni Seal Bench, inviting developers and researchers to test and compare watermarking methods. This move positions Meta’s tool as a potential standard for identifying AI-created video content across a variety of services.
In an unexpected crossover between energy and AI, Exxon Mobil announced plans to build a power plant specifically designed for data centers. The natural gas–powered facility will generate over 1.5 gigawatts of electricity and operate without reliance on the local grid. Exxon intends to capture and store more than 90% of the plant’s carbon dioxide emissions. The company aims to have it up and running within five years.
With estimates suggesting many future AI data centers may struggle with power availability, a dedicated facility like Exxon’s could cater to the industry’s growing energy demands. This project underscores the increasing scale of infrastructure required to support advanced AI models and the associated compute resources they need.
OpenAI co-founder Ilya Sutskever appeared at the NeurIPS AI conference, discussing the eventual arrival of superintelligent AI—systems that can surpass human abilities in various tasks. Sutskever’s comments highlighted the potential for more advanced, agent-like AI systems that can handle complex instructions, understand data with minimal examples, and operate with increased autonomy. His organization, Safe Superintelligence (SSI), recently secured $1 billion in funding aimed at ensuring that as AI grows more capable, it remains focused on beneficial applications.
The cascade of announcements last week points to an AI landscape evolving on multiple fronts: multimodal models that effortlessly blend text, images, and audio; advanced chatbots that can see and interpret the world in real time; quantum chips pushing computational limits; and infrastructure projects rising to meet AI’s power needs.
Every new feature, chip, or tool introduced offers new avenues for developers, enterprises, and consumers to explore. As these systems roll out, the industry is watching closely to measure performance, reliability, and compatibility. With each breakthrough, AI’s role in daily operations, research, and innovation grows more integral, signaling that the coming weeks and months could deliver even more updates as companies fine-tune these capabilities and prepare them for wide-scale use.