Netflix has released a new AI tool that can effectively erase objects from a video scene while logically simulating how the environment would behave if that object had never existed. The model, dubbed VOID, moves beyond simple visual masking to address one of the most expensive hurdles in film production: the need for costly reshoots when a specific element in a shot becomes an unwanted distraction.
Developed through a collaboration between Netflix researchers and Sofia University, the Netflix VOID AI model for video editing is designed for “Video Object and Interaction Deletion.” Unlike traditional video inpainting tools that merely fill a hole with static background imagery, VOID is capable of predicting and altering the physical interactions within a scene. This means if an object is removed, the AI doesn’t just hide the object—it removes the ripple effects, splashes, or collisions that the object caused.
The model has been made available for public use on Hugging Face, meaning its utility extends beyond Netflix’s own internal productions to the broader community of creators, and developers.
Beyond Inpainting: The Logic of Interaction Deletion
To understand the technical leap VOID represents, it is helpful to distinguish it from standard AI inpainting. Most current tools treat video frames as a series of images. when an object is removed, the AI looks at the surrounding pixels to “guess” what should be behind it. However, these tools often struggle with “interactions”—the way a moving object changes its environment.
VOID operates as a vision-language system. Instead of relying solely on visual data, it takes both the video and a natural language description of the object to be removed as inputs. This allows the model to understand the context of the interaction it is deleting.
For example, in a scene where two vehicles collide, a standard tool might remove one car but leave behind the smoke, fire, and debris caused by the impact. VOID, however, can remove the vehicle and simultaneously generate footage where the remaining car continues down the road undisturbed, replacing the wreckage with a clean road surface.
Another demonstration involves a person jumping into a swimming pool. While a traditional editor would be left with a lingering splash and displaced water, VOID can remove the person and render the pool surface as if it had remained untouched, eliminating the splash entirely.
Performance and Industry Benchmarks
The researchers tested VOID against several established industry tools, including Runway, ProPainter, and DiffuEraser. In a user survey involving 25 participants across various complex scenarios, VOID emerged as the clear favorite for its ability to handle dynamic movements.
| AI Model | Preference Percentage |
|---|---|
| VOID | 64.8% |
| Runway | 18.4% |
| Other Tools | 16.8% |
In the project’s preprint paper, the authors noted that “VOID excels at modeling complex dynamics which can follow on from object removal.” This ability to handle “complex dynamics” is what separates a believable edit from one that looks like a digital smudge.
Academic Context and Implementation
While the results are promising, the research is currently in the preprint stage and has not yet undergone formal peer review. The paper was authored by a team of Netflix researchers—Saman Motamed, William Harvey, Benjamin Klein, Zhuoning Yuan, and Ta-Ying Cheng—alongside Luc Van Gool from Sofia University.
From a production standpoint, the implications for post-production editing are significant. The ability to remove a misplaced prop, a stray crew member, or an unwanted piece of equipment without needing to return to a location for a reshoot could save studios millions of dollars in logistics and labor.
Despite the public release of the model on Hugging Face, Netflix has not yet announced plans to integrate VOID into its official production pipelines or existing consumer-facing products. For now, it remains a powerful research contribution to the field of synthetic video generation and visual effects (VFX).
The next step for the model will likely be the peer-review process, which will provide a more rigorous validation of its performance claims and the efficiency of its vision-language architecture.
Do you feel AI-driven interaction deletion will replace traditional VFX workflows, or will it remain a tool for minor corrections? Share your thoughts in the comments below.
