What is Cactus Inference Engine?

Definition

An inference engine built from scratch for mobile devices, wearables, and custom hardware. Cactus is designed to run models like Needle efficiently on consumer devices with constraints on compute, memory, and power. It enables deployment of agentic models on phones, watches, and glasses.

Examples in the Wild

Example 1:Running Needle (26M parameters) at 6000 tok/s prefill on consumer phones
Example 2:Deploying tool-calling agents on smartwatches
Example 3:Custom hardware optimization for edge inference

See it in action

View deploy-needle-mobile-wearable Template →