The relentless pursuit of more powerful artificial intelligence isn’t simply about building “smarter” models, according to Michael Gerstenhaber, product VP at Google Cloud and head of the Vertex AI platform. He argues that the field is currently navigating three distinct, and equally key, frontiers: raw intelligence, speed of response, and crucially, cost-effective scalability. This framework, he suggests, is essential for anyone hoping to deploy AI at a truly impactful scale.
Gerstenhaber, who previously held a role at Anthropic, sees Google’s unique position – controlling everything from data centers and chip design to model development and the user interface – as a significant advantage in tackling these challenges. He believes this vertical integration allows for a more holistic approach to AI development and deployment, something he didn’t uncover elsewhere. The company’s Vertex AI platform is at the center of this effort, providing engineers with the tools to build and deploy their own AI applications, leveraging the “smartest models in the world.” But simply having powerful models isn’t enough; they must also be fast and affordable to run.
The Three Frontiers of AI Capability
Gerstenhaber breaks down the challenges facing AI development into three key areas. The first, and perhaps most intuitively understood, is raw intelligence. For tasks like code generation, the priority is simply the best possible outcome, even if it takes considerable time. “You just want the best code you can get,” he explained, “doesn’t matter if it takes 45 minutes, because I have to maintain it, I have to put it in production.” However, this isn’t universally true.
The second frontier is latency – the time it takes for a model to respond. In customer service scenarios, for example, a highly accurate answer is useless if it arrives after the customer has lost patience. “More intelligence no longer matters once that person gets bored and hangs up the phone,” Gerstenhaber noted. Finding the right balance between intelligence and speed is critical for real-time applications.
The third, and perhaps most novel, frontier is cost. Companies like Reddit and Meta, tasked with moderating vast amounts of online content, require models that can operate at scale without breaking the bank. They demand to find the highest level of intelligence they can afford while ensuring the system can handle an unpredictable volume of data. This is where cost-effectiveness becomes paramount. As Gerstenhaber put it, they need to avoid “enterprise risk” by ensuring the model can scale to an “infinite number of subjects” without exceeding budgetary constraints.
The Slow Rollout of Agentic AI
Despite the impressive capabilities of current AI models, the widespread adoption of “agentic systems” – AI systems that can autonomously perform tasks – has been slower than many expected. While demonstrations have been promising, the transition to real-world applications has proven challenging. Gerstenhaber attributes this to the relative youth of the technology and the lack of established infrastructure.
“This technology is basically two years old, and there’s still a lot of missing infrastructure,” he said. “We don’t have patterns for auditing what the agents are doing. We don’t have patterns for authorization of data to an agent.” These missing pieces are essential for ensuring responsible and reliable AI deployment. He emphasized that production implementation always lags behind technological capability, and that’s where many organizations are currently struggling.
However, Gerstenhaber is optimistic, pointing to the rapid progress in software engineering as a model for other professions. Google’s internal processes, which require multiple code reviews before deployment, provide a low-risk environment for innovation. “We have a lot of those human-in-the-loop processes that make the implementation exceptionally low-risk,” he explained. The challenge now is to replicate these patterns in other fields.
Expanding Vertex AI Capabilities
Google is actively working to address these challenges through its Vertex AI platform. Recent updates, announced on February 18, 2026, focus on improving provisioned throughput (PT), which guarantees capacity and predictable performance for AI agents. According to a Google Cloud blog post, these improvements include expanded model diversity, multimodal innovation (processing text, images, and video), and greater operational flexibility.
The platform is also expanding its support for third-party models, including Anthropic’s Claude and popular open-source options like Llama 4, Qwen3, GLM-4.7, and DeepSeek-OCR. This broader model portfolio, coupled with unified governance tools, aims to simplify capacity management for engineering teams. Google is also expanding Vertex AI with Claude Opus 4.6, further enhancing its capabilities.
The ongoing development of Vertex AI reflects Google’s commitment to pushing the boundaries of AI, not just in terms of raw intelligence, but also in terms of practicality, scalability, and cost-effectiveness. The company’s vertically integrated approach, combined with its focus on building robust infrastructure, positions it as a key player in shaping the future of AI.
Looking ahead, the focus will remain on building out the necessary infrastructure and establishing best practices for deploying agentic systems responsibly and reliably. The next step for Google, and the industry as a whole, is to translate the promise of AI into tangible benefits for businesses and individuals alike.
What are your thoughts on the future of AI? Share your comments below and let us know how you see these advancements impacting your work and life.
