TornadoVM 2.0: Java Gets GPU Acceleration & LLM Support

by priyanka.patel tech editor

TornadoVM 2.0 Accelerates Java Applications on Diverse Hardware, Boosts LLM inference

The latest release of TornadoVM, a groundbreaking open-source project, promises notable performance gains for Java applications, notably those powering large language models (LLMs).

TornadoVM has reached version 2.0, marking a major milestone in its mission to provide a heterogeneous hardware runtime for Java. This release is poised to be especially impactful for teams developing LLM solutions on the Java Virtual Machine (JVM).

The project distinguishes itself by automatically accelerating Java programs across multi-core CPUs, GPUs, and FPGAs. Unlike customary JVMs,TornadoVM doesn’t aim to replace them,but rather augments their capabilities. It facilitates offloading Java code to specialized hardware, manages memory transfer between Java and these accelerators, and executes the core computational kernels. According to project documentation, this functionality is a critical component for modern cloud and machine learning workloads.

infoq previously covered tornadovm in both 2020 and 2022,highlighting its evolving capabilities.

At its core, TornadoVM functions as a just-in-time (JIT) compiler, translating Java bytecode at runtime into one of three backends: OpenCL C, NVIDIA CUDA PTX, and SPIR-V binary. Developers retain the adaptability to choose which backends to install and utilize, tailoring the system to their specific hardware configurations.

Though, not all Java computations are equally suited for TornadoVM’s acceleration. The project excels with workloads featuring for-loops that lack inter-iteration dependencies, enabling parallel computation. “Workloads with independent loop iterations are very good candidates,” a senior official stated.

Matrix-based applications, prevalent in machine learning and deep learning, are particularly well-suited. Other promising use cases include physics simulations – such as N-body particle computations – financial modeling like Black-Scholes, and a broad spectrum of applications within computational science.

Key improvements in TornadoVM 2.0 include:

  • Thanks to new TornadoVM SDKs, eliminating complex GPU configuration requirements.
  • Support for NVIDIA PTX, OpenCL, and initial compatibility with Apple silicon.
  • improved integration with the Quarkus framework.
  • Seamless integration with LangChain4j.

Currently, GPULlama3.java supports a range of FP16 (16-bit floating point) and 8-bit quantized models, encompassing models with parameter sizes in the single-digit billions:

  • Llama 3.2 (1B) – FP16
  • Llama 3.2 (3B) – FP16
  • Llama 3 (8B) – FP16
  • Mistral (7B) – FP16
  • Qwen3 (0.6B) – FP16
  • Qwen3 (1.7B) – FP16
  • Qwen3 (4B) – FP16
  • Qwen3 (8B) – FP16
  • Phi-3-mini-4k – FP16
  • Qwen2.5 (0.5B)
  • Qwen2.5 (1.5B)
  • DeepSeek-R1-Distill-Qwen (1.5B)

The project, spearheaded by the Beehive lab within the Advanced processor Technologies Group at the University of Manchester, specializes in the co-design of integrated hardware and software solutions.

To further streamline the developer experience, the team has also developed TornadoInsight, an IntelliJ IDEA plugin designed to enhance workflows with TornadoVM.

Looking ahead, the roadmap includes making TornadoVM available through SDKman and transitioning the codebase’s JNI components to utilize the new FFM API.

Leave a Comment