vllm-mlx

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.

View Source

419

Stars

Forks

February 24, 2026

Created On

Install this Skill

Add this capability to your agent instantly using the CLI.

$npx @agent/tresor install vllm-mlx