LLM in C++
Run massive Large Language Models on consumer-grade hardware with zero dependencies.
The GGUF Standard
EdgeAI automatically converts Llama, Mistral, and Falcon models to optimized GGUF files with tiered quantization (Q4_0, Q5_K_M, Q8_0).
CPU Offloading
Perfect for devices without a dedicated GPU. GGUF allows for split execution between CPU and VRAM for maximum efficiency.
Apple Silicon (Metal)
Native Metal integration allows for 40+ tokens per second on basic M-series MacBooks.
Convert & Quantize
edge-ai export --target gguf --quant Q4_K_M
Self-contained Binary