Select Page

This is the scrapbook.

Here is where random information goes. Most of them are little snippets of information, stories, quick fixes or references to additional scrapbook-style information – short notes that were helpful to me and might be useful to someone else. Of course, this is only meant as a suggestion, without any guarantee or assurance of function or feasibility. If you would like my professional, technical support, please contact me at https://c-7.de.

Die “Kladde”

Hier landen zufällige Informationen. Die meisten davon sind kleine Informationsschnipsel,Geschichten,  schnelle Lösungen oder Verweise auf zusätzliche Informationen im Sammelalbum-Stil – kurze Notizen, die für mich hilfreich waren und vielleicht auch für jemand anderen nützlich sein können. Natürlich ist das hier nur als Anregung zu verstehen, ohne jegliche Gewähr oder Zusicherung einer Funktion oder Machbarkeit. Wenn Sie meine professionelle, technische Unterstützung möchten, Kontaktieren Sie mich bitte über https://c-7.de

Turning the M4 Mac Mini into a Local LLM Workhorse

by | May 17, 2025 | Tech Corner

Lessons from the Edge: Turning the M4 Mac Mini into a Local LLM Workhorse


Introduction

I promised to document my experience—here it is. Over the past month I ran a base-spec M4 Mac Mini (512 GB internal SSD, 30 GB unified memory\*¹) with a 4 TB external SSD as both a desktop and an OpenAI-compatible inference server. Spoiler: it’s quiet, frugal, and strong enough for day-to-day private AI workloads.

Apple hasn’t published granular GPU limits; Activity Monitor shows ~22 GB addressable by the GPU. Your mileage may vary.


1 Why Local LLMs?

  • Data sovereignty & GDPR – nothing leaves the building
  • Cost certainty – no usage-based billing
  • Air-gap option – disable outbound traffic entirely

Telemetry caveat: Ollama and LM Studio collect anonymous stats unless you disable them (OLLAMA_DISABLE_TELEMETRY=1, LM Studio → Settings → Privacy).


2 Boot Options: Internal vs External macOS

I installed macOS 14.5 on a USB-4 4 TB SSD to hot-swap configurations.

  • Pros: instant roll-back, sandboxed experiments
  • Cons: macOS updates can block the drive if Secure Boot is on Full.
  • Fix: Reboot → hold ⌘R → Startup Security Utility → Reduced Security + Allow booting from external media.

Booting internally avoids this hurdle entirely.


3 Toolchain

Runner Why I Use It Key Tweaks
Ollama Fast CLI, good model library zsh
export OLLAMA_MODELS=/Volumes/LLMRepo <br>
export OLLAMA_HOST=0.0.0.0 # expose to LAN <br> Watch for silent num_ctx
mismatches when you change context-size, even with re-configured models.
LM Studio GUI + OpenAI API in one app.
Good model library
Edit ~/.cache/lm-studio/.internal/http-server-config.json:
"networkInterface": "0.0.0.0" to bind to all NICs.
llama.cpp Lowest overhead, script-friendly Compile with make LLAMA_METAL=1 LLAMA_METAL_EMBED=1. Minimal logs: run via -v for more detail.

For experimentation, testing and interactive as well as server use, LM Studio was my preferred choice.

4 Networking Cheat-Sheet

  • Assign the Mac Mini (brainbox.local) a static IP (e.g., 192.168.0.42).
  • Add it on clients: 192.168.0.42 brainbox.local in /etc/hosts.
  • Bonjour works mostly on the same subnet, but static mapping survives VLANs & VPNs.

5 Performance Snapshot (measured with llama-bench)

Model Quant. VRAM (GB) Tokens / s Power (W)
Mistral-7B-Instruct Q4_K_M ≈ 10 54 28
Qwen-14B-Chat Q5_K_S ≈ 19 25 29
Llama-3–24B Q4_K_M ≈ 22 12 30
(Batch = 1, 4096 ctx, “metal” backend)

Above 24B the context window or swap hits host RAM and responsiveness drops sharply.


6 System Characteristics

  • Power: 6 W idle → 30 W sustained load
  • Noise: ≤ 20 dB(A); the fan stays below 2000 RPM
  • Footprint: 19 cm square—fits under a monitor stand

7 Typical In-House Use Cases

  1. Privacy Chat sessions Anything LLM and RAG
  2. Virtual assistant “Agent Kim” replying to emails (agentic system tasked by [local] e-mail and watched folder)
  3. Batch embedding for a local semantic-search index
  4. Document conversion & watermarking

8 CrewAI Configuration Example (most OpenAI-compatible client work) for local processing

import os
os.environ["OPENAI_API_BASE"]  = "http://brainbox:1234/v1"
os.environ["OPENAI_MODEL_NAME"] = "openai/qwen2.5-coder-7b-instruct"
os.environ["OPENAI_API_KEY"]    = "lmstudio_placeholder"  # dummy
os.environ["LITELLM_PROVIDER"]  = "openai-chat"

Swap brainbox for your Mini’s IP/hostname.


9 Takeaways

The base M4 Mac Mini isn’t a GPU monster, but for 7B-to-13B models it feels like a dedicated inference appliance—drawing less power than many laptop chargers. If your data can’t leave the premises (or you’re simply done paying per token), this little box is worth a spot on the desk for less than 1500 € for the hardware.