LLM in C++

Run massive Large Language Models on consumer-grade hardware with zero dependencies.

The GGUF Standard

EdgeAI automatically converts Llama, Mistral, and Falcon models to optimized GGUF files with tiered quantization (Q4_0, Q5_K_M, Q8_0).

Perfect for devices without a dedicated GPU. GGUF allows for split execution between CPU and VRAM for maximum efficiency.

Native Metal integration allows for 40+ tokens per second on basic M-series MacBooks.

edge-ai export --target gguf --quant Q4_K_M

Self-contained Binary