DEFINITION
YOLO Model
You Only Look Once
Definition
A real-time object detection neural network architecture. In the context of agent automation, finetuned YOLO models are used to detect and localize UI elements (text, buttons, inputs) in screenshots. Can run locally on CPU with ONNX for fast inference.
Examples in the Wild
- Example 1:SoMatic's finetuned YOLO for identifying text and interactable elements in native OS UIs
- Example 2:OmniParser v2 approach using YOLO-inspired detection