HN – Show HN: ARIA – P2P distributed inference protocol for 1-bit LLMs on CPU

ARIA is a peer-to-peer protocol for running 1-bit quantized LLMs (ternary weights: -1, 0, +1) on ordinary CPUs. No GPU needed. We benchmarked on a Ryzen 9: 89.65 t/s for 0.7B params, 36.94 t/s for 2.4B, 15.03 t/s for 8B — all on CPU, at ~28 mJ/token (99.5% less energy than GPU inference). Key design choices: WebSocket-based P2P with pipeline parallelism for model sharding across nodes. Provenance ledger records every inference immutably. Proof of Useful Work replaces wasteful hash mining — the "mining" is the inference itself. Consent contracts ensure no resource is used without explicit permission. Drop-in OpenAI-compatible API. ~5,800 lines Python, MIT licensed, 102 tests passing.

Show HN: ARIA – P2P distributed inference protocol for 1-bit LLMs on CPU

0 comments