Ollama Setup
Ollama is the easiest way to run large language models locally. It provides a simple CLI and API interface with support for many popular models.
Installation
Section titled “Installation”brew install ollamacurl -fsSL https://ollama.com/install.sh | shWindows
Section titled “Windows”Download the installer from ollama.com
Getting Started
Section titled “Getting Started”1. Start Ollama
Section titled “1. Start Ollama”ollama serve2. Pull a Model
Section titled “2. Pull a Model”ollama pull llama3.2ollama pull mistralollama pull codellama3. Test the Model
Section titled “3. Test the Model”ollama run llama3.2Configuration with Evonic AI
Section titled “Configuration with Evonic AI”API Endpoint
Section titled “API Endpoint”Ollama runs a local API server by default at http://localhost:11434.
Configuration
Section titled “Configuration”In your Evonic AI configuration:
model: provider: ollama endpoint: http://localhost:11434 model_name: llama3.2Available Models
Section titled “Available Models”Browse available models at ollama.com/library or list them:
ollama listModel Management
Section titled “Model Management”List Installed Models
Section titled “List Installed Models”ollama listRemove a Model
Section titled “Remove a Model”ollama rm llama3.2Copy a Model
Section titled “Copy a Model”ollama cp llama3.2 llama3.2-customAdvanced Configuration
Section titled “Advanced Configuration”GPU Acceleration
Section titled “GPU Acceleration”Ollama automatically uses GPU if available. To force CPU-only:
OLLAMA_NUM_GPU=0 ollama serveCustom Port
Section titled “Custom Port”OLLAMA_HOST=0.0.0.0:11435 ollama serveModel Parameters
Section titled “Model Parameters”Create a Modelfile for custom parameters:
FROM llama3.2PARAMETER temperature 0.7PARAMETER num_ctx 4096SYSTEM """You are a helpful assistant."""Build the custom model:
ollama create my-model -f ModelfileTroubleshooting
Section titled “Troubleshooting”Model Not Loading
Section titled “Model Not Loading”- Check GPU memory availability
- Try a smaller model (e.g.,
llama3.2:1binstead ofllama3.2:70b) - Ensure Ollama is running:
ollama list
Slow Inference
Section titled “Slow Inference”- Enable GPU acceleration
- Reduce
num_ctxparameter - Close other GPU-intensive applications
Connection Refused
Section titled “Connection Refused”- Verify Ollama is running:
curl http://localhost:11434 - Check firewall settings
- Ensure correct endpoint in configuration