This repository contains demo applications showcasing how to run LLaMA 3 models on GPU via LangChain4j, using GPULlama3.java and integrated with TornadoVM.
The examples demonstrate:
- Basic chat interaction
- Memory-enabled conversations
- Streaming responses
- Multi-turn conversational agents
- Java 21+
- Maven (to build the project)
- TornadoVM installed and configured on your system
- A GPU with sufficient VRAM (20 GB recommended)
Follow the TornadoVM installation guide to set up TornadoVM on your system.
Make sure TornadoVM is available in your PATH via the tornado command.
Run the following to build the project and generate the classpath file (cp.txt):
mvn clean package && mvn dependency:build-classpath -Dmdep.outputFile=cp.txtThis produces:
target/langchain4j-gpullama3-demo-1.0-SNAPSHOT.jarcp.txt(classpath file used in the run commands)
All commands assume you are in the project root and have a valid cp.txt (classpath dependencies file).
tornado CLI:
tornado --jvm="-Dorg.slf4j.simpleLogger.defaultLogLevel=off \
-Dtornado.device.memory=20GB -XX:MaxDirectMemorySize=20G" \
-cp "target/langchain4j-gpullama3-demo-1.0-SNAPSHOT.jar:$(cat cp.txt)" \
org.example._1_ChatExamplejava
java @options.txt -cp "target/langchain4j-gpullama3-demo-1.0-SNAPSHOT.jar:$(cat cp.txt)" org.example._1_ChatExampletornado CLI:
tornado --jvm="-Dorg.slf4j.simpleLogger.defaultLogLevel=off \
-Dtornado.device.memory=20GB -XX:MaxDirectMemorySize=20G" \
-cp "target/langchain4j-gpullama3-demo-1.0-SNAPSHOT.jar:$(cat cp.txt)" \
org.example._2_StreamingExamplejava
java @options.txt -cp "target/langchain4j-gpullama3-demo-1.0-SNAPSHOT.jar:$(cat cp.txt)" org.example._2_StreamingExampletornado CLI:
tornado --jvm="-Dorg.slf4j.simpleLogger.defaultLogLevel=off \
-Dtornado.device.memory=20GB -XX:MaxDirectMemorySize=20G" \
-cp "target/langchain4j-gpullama3-demo-1.0-SNAPSHOT.jar:$(cat cp.txt)" \
org.example._3_ChatMemoryExamplejava
java @options.txt -cp "target/langchain4j-gpullama3-demo-1.0-SNAPSHOT.jar:$(cat cp.txt)" org.example.agents._1_basic_agent.org.example._3_ChatMemoryExample!!! Port of Agentic Tutorial for GPULlama3.java !!!
tornado CLI:
tornado --jvm="-Dorg.slf4j.simpleLogger.defaultLogLevel=off \
-Dtornado.device.memory=20GB -XX:MaxDirectMemorySize=20G" \
-cp "target/langchain4j-gpullama3-demo-1.0-SNAPSHOT.jar:$(cat cp.txt)" \
org.example.agents._1_basic_agent._1a_Basic_Agent_Example <GPU|CPU>java
java @options.txt -cp "target/langchain4j-gpullama3-demo-1.0-SNAPSHOT.jar:$(cat cp.txt)" org.example.agents._1_basic_agent._1a_Basic_Agent_Example GPUtornado CLI:
tornado --jvm="-Dorg.slf4j.simpleLogger.defaultLogLevel=off \
-Dtornado.device.memory=20GB -XX:MaxDirectMemorySize=20G" \
-cp "target/langchain4j-gpullama3-demo-1.0-SNAPSHOT.jar:$(cat cp.txt)" \
org.example.agents._1_basic_agent._1b_Basic_Agent_Example_Structured <GPU|CPU>java
java @options.txt -cp "target/langchain4j-gpullama3-demo-1.0-SNAPSHOT.jar:$(cat cp.txt)" org.example.agents._1_basic_agent._1b_Basic_Agent_Example_Structured GPUtornado CLI:
tornado --jvm="-Dorg.slf4j.simpleLogger.defaultLogLevel=off \
-Dtornado.device.memory=20GB -XX:MaxDirectMemorySize=20G" \
-cp "target/langchain4j-gpullama3-demo-1.0-SNAPSHOT.jar:$(cat cp.txt)" \
org.example.agents._2_sequential_workflow._2a_Sequential_Agent_Example <GPU|CPU>java
java @options.txt -cp "target/langchain4j-gpullama3-demo-1.0-SNAPSHOT.jar:$(cat cp.txt)" org.example.agents._2_sequential_workflow._2a_Sequential_Agent_Example GPUjava
java @options.txt -cp "target/langchain4j-gpullama3-demo-1.0-SNAPSHOT.jar:$(cat cp.txt)" org.example.agents.tic_tac_toe.TicTacToeAgenticGameThe plots compare GPULlama3.java running on two different execution engines:
- GPU Engine (TornadoVM OpenCL) on Nvidia 5090 – 24 GB
- CPU Engine (llama3.java)1 on Intel® Core™ Ultra 9 275HX × 24, 64 GB
The bar charts show GPU speedups over CPU for different tasks and models.
- Across all benchmarks, the GPU engine consistently outperforms the CPU engine, with speedups ranging from ~3.5× to nearly 5× depending on the model and workload.
- The task-level plots highlight where GPU acceleration provides the largest gains, while the average speedup plots summarize overall performance advantages per model.
These results demonstrate the significant benefits of running GPULlama3 models on TornadoVM’s GPU engine compared to the CPU engine.
1: It was developed from the commit point commit https://github.com/mukel/llama3.java/commit/5fc76c665c349456e1a6a458339531bf3abab308