Cartwheel: Solving AI Animation’s Control Problem with 3D Assets

by Sofia Alvarez

For most creators, current generative AI animation feels less like a tool and more like a slot machine. You input a prompt, pull the lever, and hope the resulting video doesn’t feature “wonky feet” or surreal anatomical glitches. When the output misses the mark, the only recourse is to tweak the text and try again—a “black box” process that offers plenty of randomness but very little creative agency.

Cartwheel, a new 3D animation startup, is attempting to dismantle this approach. By shifting the focus from generating flat pixels to creating manipulatable 3D assets, the company aims to make it easier to share open-ended stories where the AI handles the technical drudgery while the artist retains absolute control over the performance.

The venture is led by Andrew Carr and Jonathan Jarvis, veterans with professional roots at OpenAI and Google, respectively. Their goal is to move beyond the “one-and-done” nature of current AI video generators, replacing the prompt-and-pray method with a sophisticated “control layer” that functions as a power tool for animators rather than a replacement for them.

The shift from flat images to 3D assets is what gives animators the control they have been missing in the AI era.
Cartwheel

Solving the 3D Data Scarcity Problem

The primary challenge facing the next generation of animation AI is a lack of raw material. While large language models have had access to nearly the entire written history of the internet, 3D motion data is remarkably scarce. Most AI models are trained on text, audio, and 2D images because those patterns are abundant and easier to identify.

Solving the 3D Data Scarcity Problem

“If you look at all the big tech companies, they’ve built their models on written language, audio, image, [and] video because there’s just so much of it, so finding those patterns is much easier,” Jarvis said. He noted that acquiring the necessary 3D data proved to be “harder than we thought by probably a factor of 10 or 100.”

To overcome this, Cartwheel has spent years mapping the intricacies of human movement. Rather than predicting what a pixel should look like in the next frame, their models are designed to understand the biomechanics of a performance. This allows the system to take a simple 2D video—such as someone dancing in a backyard—and translate it into a precise, realistic 3D skeleton.

Cartwheel has spent years tackling the difficult task of mapping how humans actually move.

Cartwheel has spent years tackling the difficult task of mapping how humans actually move.
Cartwheel

Combating the “Sameness” of Generative AI

A recurring criticism of generative AI in the arts is the emergence of a detectable “AI style”—a certain aesthetic sameness that occurs when creators use the same underlying models. Cartwheel argues that this sameness is not a failure of the AI’s imagination, but a byproduct of a lack of user control.

When a user cannot edit the output, they are forced to accept the model’s default “taste.” Cartwheel’s architecture is designed specifically to be touched and manipulated. By generating 3D data instead of a flat video file, the creator can adjust lighting, shift camera angles, or refine a character’s pose after the initial AI generation is complete.

“The output of our system is designed for people to edit. It’s designed for people to touch and manipulate, and we don’t want someone to type something in and then have it shuffle through to a finished animation,” Carr said. “That’s not the point of it. That’s boring, who’s going to watch that?”

According to Carr, providing this level of granularity removes the sameness problem because the AI output is merely a starting point. By pushing or pulling the performance and placing characters in unique environments, the human artist reintroduces the individuality that is often lost in prompt-based generation.

Founder Andrew Carr said one of his core scientific hypotheses is that movement and motion is a fundamental data type.

Founder Andrew Carr said one of his core scientific hypotheses is that movement and motion is a fundamental data type.
Cartwheel

The Shift Toward Open-Ended World-Building

The long-term vision for Cartwheel extends beyond traditional film or clip production. The company is targeting “open-ended storytelling” and “open-ended world-building,” specifically for the demands of modern gaming and social media, where the volume of required content far exceeds what manual animation can provide.

Instead of choreographing a finite set of animations, Cartwheel envisions characters powered by motion models that allow them to react and perform in real time. This approach treats the AI as a digital actor that understands the intent of a scene, allowing creators to “rehearse” with the character rather than meticulously plotting every frame.

This strategy focuses on the “layer below the pixels,” bridging the gap between a 2D vision and 3D execution. Carr believes that within the next three years, 3D will become the primary workspace for creators, regardless of whether the final output is a 3D game or a 2D video.

By automating the biomechanics and file exports, the technology aims to lower the barrier to entry for storytelling while ensuring that human taste, timing, and heart remain the final arbiters of the work. The objective is to transform AI from a black-box generator into a transparent, professional-grade instrument for the animation industry.

As the company continues to refine its motion models and expand its toolset, the next phase of development will likely focus on the integration of these 3D assets into real-time engines. Further updates on their technical milestones and potential public beta access are expected as the company moves toward its three-year goal of universal 3D authoring.

We want to hear from you. Do you believe 3D-centric AI will solve the “sameness” problem in digital art? Share your thoughts in the comments or join the conversation on our social channels.

You may also like

Leave a Comment