What is Simple Attention Networks?

Definition

A neural network architecture composed entirely of attention and gating mechanisms without Feed-Forward Network (FFN) layers. Needle uses this design for tool calling, based on the insight that FFN parameters are wasted when the model has access to external structured knowledge (tools, RAG context). The architecture is optimized for retrieval-and-assembly tasks rather than fact memorization.

Examples in the Wild

Example 1:Needle's 26M parameter model uses only attention and gating, no MLPs
Example 2:Effective for RAG (retrieval-augmented generation) where facts come from external sources
Example 3:Suitable for tool-calling where tool definitions are provided in context

See it in action

View finetune-needle-custom-tools Template →View deploy-needle-mobile-wearable Template →