“`html
reinforcement Learning Breakthrough Accelerates model-Free Training of Optical AI Systems
A new framework utilizing proximal policy optimization allows optical processors to learn adn adapt without relying on pre-programmed physical models, paving the way for faster, more efficient AI hardware.
Optical computing is rapidly emerging as a promising solution for high-speed, energy-efficient information processing. Diffractive optical networks, which leverage the principles of light propagation and structured phase masks, offer the potential for large-scale parallel computation. However,a important hurdle has long plagued the field: systems meticulously trained in simulated environments often falter when deployed in real-world scenarios due to the inherent difficulties in accounting for misalignments,noise,and inaccuracies in existing models.
Researchers at the University of California, Los Angeles (UCLA) have unveiled a novel approach to overcome this challenge. Published in Light: Science & Applications on January 3, 2026, their work details a model-free in situ training framework for diffractive optical processors, powered by proximal policy optimization (PPO) – a reinforcement learning algorithm celebrated for it’s stability and efficiency in data usage.This innovative system learns directly from real optical measurements, dynamically optimizing its diffractive features on the hardware itself, eliminating the need for a “digital twin” or detailed prior knowledge of the physical system.
“Instead of trying to simulate complex optical behavior perfectly,we allow the device to learn from experience or experiments,” explained Aydogan Ozcan,Chancellor’s Professor of Electrical and Computer Engineering at UCLA and the study’s lead author. “PPO makes this in situ process fast, stable, and scalable to realistic experimental conditions.”
To demonstrate the efficacy of their approach, the UCLA team conducted extensive experimental tests across a range of optical tasks. The system successfully learned to focus optical energy through a random, unknown diffuser more quickly than traditional policy-gradient optimization methods, showcasing its ability to efficiently navigate the optical parameter space.The framework was also successfully applied to hologram generation and aberration correction.
In a particularly compelling presentation,a diffractive processor was trained directly on the optical hardware to classify handwritten digits using only real-time measurements. As the in situ training progressed, the output patterns became increasingly clear and distinct for each input number, achieving accurate classification without any digital post-processing.
the advantages of PPO stem from its ability to reuse measured data for multiple update steps while carefully controlling shifts in the system’s “policy,” or decision-making process. This significantly reduces the amount of experimental data required and prevents instability during training – a critical benefit in noisy optical environments. importantly, this methodology isn’t confined to diffractive optics; it holds potential for application across a broad spectrum of physical systems capable of providing feedback and real-time adjustments.
“This work represents a step toward intelligent physical systems that autonomously learn, adapt, and compute without requiring detailed physical models of an experimental setup,” Ozcan stated. “the approach could expand to photonic accelerators, nanophotonic processors, adaptive imaging systems, and real-time optical AI hardware.”
Further details on the research can be found in the paper, “Model-free optical processors using in situ reinforcement learning with proximal policy optimization,” by Yuhang Li et al., published in Light: Science & Applications (DOI: 10.1038
