Cactus Inference Engine

Cactus: Mobile-First Inference Engine

Definition

An inference engine built from scratch for mobile devices, wearables, and custom hardware. Cactus is designed to run models like Needle efficiently on consumer devices with constraints on compute, memory, and power. It enables deployment of agentic models on phones, watches, and glasses.

Examples in the Wild

  • Example 1:Running Needle (26M parameters) at 6000 tok/s prefill on consumer phones
  • Example 2:Deploying tool-calling agents on smartwatches
  • Example 3:Custom hardware optimization for edge inference