Ollama Setup

Ollama is the easiest way to run large language models locally. It provides a simple CLI and API interface with support for many popular models.

Installation

brew install ollama

curl -fsSL https://ollama.com/install.sh | sh

Download the installer from ollama.com

ollama serve

ollama pull llama3.2
ollama pull mistral
ollama pull codellama

ollama run llama3.2

Ollama runs a local API server by default at http://localhost:11434.

In your Evonic AI configuration:

model:
  provider: ollama
  endpoint: http://localhost:11434
  model_name: llama3.2

Browse available models at ollama.com/library or list them:

ollama list

ollama list

ollama rm llama3.2

ollama cp llama3.2 llama3.2-custom

Ollama automatically uses GPU if available. To force CPU-only:

OLLAMA_NUM_GPU=0 ollama serve

OLLAMA_HOST=0.0.0.0:11435 ollama serve

Create a Modelfile for custom parameters:

FROM llama3.2
PARAMETER temperature 0.7
PARAMETER num_ctx 4096
SYSTEM """You are a helpful assistant."""

Build the custom model:

ollama create my-model -f Modelfile