Skip to content

Ollama Setup

Ollama is the easiest way to run large language models locally. It provides a simple CLI and API interface with support for many popular models.

Terminal window
brew install ollama
Terminal window
curl -fsSL https://ollama.com/install.sh | sh

Download the installer from ollama.com

Terminal window
ollama serve
Terminal window
ollama pull llama3.2
ollama pull mistral
ollama pull codellama
Terminal window
ollama run llama3.2

Ollama runs a local API server by default at http://localhost:11434.

In your Evonic AI configuration:

model:
provider: ollama
endpoint: http://localhost:11434
model_name: llama3.2

Browse available models at ollama.com/library or list them:

Terminal window
ollama list
Terminal window
ollama list
Terminal window
ollama rm llama3.2
Terminal window
ollama cp llama3.2 llama3.2-custom

Ollama automatically uses GPU if available. To force CPU-only:

Terminal window
OLLAMA_NUM_GPU=0 ollama serve
Terminal window
OLLAMA_HOST=0.0.0.0:11435 ollama serve

Create a Modelfile for custom parameters:

FROM llama3.2
PARAMETER temperature 0.7
PARAMETER num_ctx 4096
SYSTEM """You are a helpful assistant."""

Build the custom model:

Terminal window
ollama create my-model -f Modelfile
  • Check GPU memory availability
  • Try a smaller model (e.g., llama3.2:1b instead of llama3.2:70b)
  • Ensure Ollama is running: ollama list
  • Enable GPU acceleration
  • Reduce num_ctx parameter
  • Close other GPU-intensive applications
  • Verify Ollama is running: curl http://localhost:11434
  • Check firewall settings
  • Ensure correct endpoint in configuration