This is the scrapbook.

Here is where random information goes. Most of them are little snippets of information, stories, quick fixes or references to additional scrapbook-style information – short notes that were helpful to me and might be useful to someone else. Of course, this is only meant as a suggestion, without any guarantee or assurance of function or feasibility. If you would like my professional, technical support, please contact me at https://c-7.de.

Die “Kladde”

Hier landen zufällige Informationen. Die meisten davon sind kleine Informationsschnipsel,Geschichten, schnelle Lösungen oder Verweise auf zusätzliche Informationen im Sammelalbum-Stil – kurze Notizen, die für mich hilfreich waren und vielleicht auch für jemand anderen nützlich sein können. Natürlich ist das hier nur als Anregung zu verstehen, ohne jegliche Gewähr oder Zusicherung einer Funktion oder Machbarkeit. Wenn Sie meine professionelle, technische Unterstützung möchten, Kontaktieren Sie mich bitte über https://c-7.de

Turning the M4 Mac Mini into a Local LLM Workhorse

by Claus | May 17, 2025 | Tech Corner

Lessons from the Edge: Turning the M4 Mac Mini into a Local LLM Workhorse

Introduction

I promised to document my experience—here it is. Over the past month I ran a base-spec M4 Mac Mini (512 GB internal SSD, 30 GB unified memory\*¹) with a 4 TB external SSD as both a desktop and an OpenAI-compatible inference server. Spoiler: it’s quiet, frugal, and strong enough for day-to-day private AI workloads.

*¹ Apple hasn’t published granular GPU limits; Activity Monitor shows ~22 GB addressable by the GPU. Your mileage may vary.

1 Why Local LLMs?

Data sovereignty & GDPR – nothing leaves the building
Cost certainty – no usage-based billing
Air-gap option – disable outbound traffic entirely

Telemetry caveat: Ollama and LM Studio collect anonymous stats unless you disable them (OLLAMA_DISABLE_TELEMETRY=1, LM Studio → Settings → Privacy).

2 Boot Options: Internal vs External macOS

I installed macOS 14.5 on a USB-4 4 TB SSD to hot-swap configurations.

Pros: instant roll-back, sandboxed experiments
Cons: macOS updates can block the drive if Secure Boot is on Full.
Fix: Reboot → hold ⌘R → Startup Security Utility → Reduced Security + Allow booting from external media.

Booting internally avoids this hurdle entirely.

3 Toolchain

Runner	Why I Use It	Key Tweaks
Ollama	Fast CLI, good model library	`zsh export OLLAMA_MODELS=/Volumes/LLMRepo <br> export OLLAMA_HOST=0.0.0.0 # expose to LAN <br> Watch for silent num_ctx` mismatches when you change context-size, even with re-configured models.
LM Studio	GUI + OpenAI API in one app. Good model library	Edit `~/.cache/lm-studio/.internal/http-server-config.json`: `"networkInterface": "0.0.0.0"` to bind to all NICs.
llama.cpp	Lowest overhead, script-friendly	Compile with `make LLAMA_METAL=1 LLAMA_METAL_EMBED=1`. Minimal logs: run via `-v` for more detail.

For experimentation, testing and interactive as well as server use, LM Studio was my preferred choice.

4 Networking Cheat-Sheet

Assign the Mac Mini (brainbox.local) a static IP (e.g., 192.168.0.42).
Add it on clients: 192.168.0.42 brainbox.local in /etc/hosts.
Bonjour works mostly on the same subnet, but static mapping survives VLANs & VPNs.

5 Performance Snapshot (measured with `llama-bench`)

Model	Quant.	VRAM (GB)	Tokens / s	Power (W)
Mistral-7B-Instruct	Q4_K_M	≈ 10	54	28
Qwen-14B-Chat	Q5_K_S	≈ 19	25	29
Llama-3–24B	Q4_K_M	≈ 22	12	30
(Batch = 1, 4096 ctx, “metal” backend)

Above 24B the context window or swap hits host RAM and responsiveness drops sharply.

6 System Characteristics

Power: 6 W idle → 30 W sustained load
Noise: ≤ 20 dB(A); the fan stays below 2000 RPM
Footprint: 19 cm square—fits under a monitor stand

7 Typical In-House Use Cases

Privacy Chat sessions Anything LLM and RAG
Virtual assistant “Agent Kim” replying to emails (agentic system tasked by [local] e-mail and watched folder)
Batch embedding for a local semantic-search index
Document conversion & watermarking

8 CrewAI Configuration Example (most OpenAI-compatible client work) for local processing

import os
os.environ["OPENAI_API_BASE"]  = "http://brainbox:1234/v1"
os.environ["OPENAI_MODEL_NAME"] = "openai/qwen2.5-coder-7b-instruct"
os.environ["OPENAI_API_KEY"]    = "lmstudio_placeholder"  # dummy
os.environ["LITELLM_PROVIDER"]  = "openai-chat"

Swap brainbox for your Mini’s IP/hostname.

9 Takeaways

The base M4 Mac Mini isn’t a GPU monster, but for 7B-to-13B models it feels like a dedicated inference appliance—drawing less power than many laptop chargers. If your data can’t leave the premises (or you’re simply done paying per token), this little box is worth a spot on the desk for less than 1500 € for the hardware.