Two honest answers in one tool. Section A: pick your Mac chip and RAM, get a source-cited shortlist of the local-AI runtimes and model sizes your machine can actually run before you download 40GB of weights. Section B: contrast a 3-year on-prem cost against cloud API and per-seat SaaS at your real token volume. Every number carries a visible evidence label (measured, vendor-claimed, or engineering judgment) and a link to its primary source. Figures are as-of 2026-06-16. Want the model lineup itself? Use the AI Model Recommender.
Pick your chip, RAM, and macOS version. The advisor budgets memory (OS reserve plus KV-cache headroom), checks each model against a Q4 size table, and matches runtimes to your comfort level. It refuses to recommend a model that does not actually fit.
Pick a cloud model tier and a rough monthly token volume. The advisor computes a cloud API bill, lines it up against per-seat SaaS and a 3-year on-prem TCO (hardware plus electricity), and shows the honest break-even plus a privacy and speed row. All prices are list rates as-of 2026-06-16.
Most "can my Mac run X" guides bake in vendor marketing or recycle the "20% to 30% MLX speedup" line as if it were a constant. This advisor ships only what traces to a primary source, labels every number's evidence class, and refuses to recommend models that do not fit (Llama 4 Scout at Q4 needs roughly 55 to 60 GB, not the "30B-class" framing some posts use). It also surfaces three assets people miss: Apple's free roughly 3B Foundation Model on macOS 26+, the Ollama 0.19 MLX switch (March 30 2026), and the ANE-accelerated speech stack (WhisperKit plus FluidAudio).