LLM in C++

Run massive Large Language Models on consumer-grade hardware with zero dependencies.

The GGUF Standard

EdgeAI automatically converts Llama, Mistral, and Falcon models to optimized GGUF files with tiered quantization (Q4_0, Q5_K_M, Q8_0).

CPU Offloading

Perfect for devices without a dedicated GPU. GGUF allows for split execution between CPU and VRAM for maximum efficiency.

Apple Silicon (Metal)

Native Metal integration allows for 40+ tokens per second on basic M-series MacBooks.

Convert & Quantize

edge-ai export --target gguf --quant Q4_K_M
Self-contained Binary