INT4 Mixed Precision

Push the boundaries of sharding by mixing 4-bit and 8-bit precision across sensitive layers.

Why Mixed Precision?

Not all layers are created equal. While some neural layers can be compressed to 4-bit without info loss, others (like attention heads in Transformers) require higher precision. Our optimizer automatically identifies these bottlenecks.

Layer-wise Optimization

Automatic sensitivity analysis across 1,000+ layers in real-time.

Compression Command

Enable mixed precision by specifying the auto-bit encoder.

edge-ai compress ./llama-2.pt --method mixed-precision --auto-bit

LLM Focus

Perfect for Large Language Models where VRAM is the primary constraint. 7B parameter models can run on 6GB VRAM devices.

Vision Pipelines

Ideal for high-throughput video streams where 4-bit weights allow for massive batch parallelization.