Date: June 21st, 2026 9:55 PM
Author: Harry Hand
- Qwen3.6-27-Q8 for tool calling and reasoning, though it has vision as well. The is the main model that runs my agentic harness
- glm-ocr for ocr. Don't even bother using anything else
- translationgemma:27b for translations
- granite4.1:8b to compress my context window. It's so fast
- nomic-embed if I'm turning anything into vector databases
All of these have to be available at any given time. They live in separate ollama containers, except that Qwen runs on llama.cpp
(http://www.autoadmit.com/thread.php?thread_id=5876301&forum_id=2Reputation#49954437)