Back to Directory
vllm-mlx
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.
419
Stars
50
Forks
February 24, 2026
Created On
Install this Skill
Add this capability to your agent instantly using the CLI.
$npx @agent/tresor install vllm-mlx