Local Models Overview
The Evonic AI is designed with a local-first philosophy. By running AI models locally, you gain full control over your data, privacy, and inference costs — while still accessing the latest open-source models.
Why Local-First?
Section titled “Why Local-First?”Privacy & Data Security
Section titled “Privacy & Data Security”- Your data never leaves your infrastructure
- No third-party API calls or data sharing
- Full compliance with data protection regulations
Cost Efficiency
Section titled “Cost Efficiency”- No per-token or per-request fees
- Predictable infrastructure costs
- Scale without worrying about API rate limits
Flexibility & Control
Section titled “Flexibility & Control”- Choose the model that best fits your use case
- Fine-tune and customize models for your domain
- Run models on your own hardware or cloud instances
Offline Capability
Section titled “Offline Capability”- Operate without internet connectivity
- Reliable in air-gapped environments
- No dependency on external services
Supported Local Model Runners
Section titled “Supported Local Model Runners”The Evonic AI supports multiple local model runners:
| Runner | Best For | Hardware Requirements |
|---|---|---|
| Ollama | Quick setup, multi-model support | Moderate (CPU/GPU) |
| llama.cpp | Maximum portability, edge devices | Low to Moderate |
| vLLM | High-throughput production workloads | High (GPU recommended) |
Use Cases
Section titled “Use Cases”- Internal Knowledge Bots — Connect your team’s documentation to AI agents
- Customer Support — Deploy agents on-premise for sensitive customer data
- Research & Analysis — Run models on proprietary datasets without sharing
- Edge Deployment — Run agents on devices with limited connectivity
- Development & Testing — Iterate quickly with local models before production
Managing Models via CLI
Section titled “Managing Models via CLI”List Models
Section titled “List Models”View all configured LLM models:
evonic model listOutput:
ID Name Provider Status--------------------------------------------------------------------------cf4cbe3b-1e2f-4ce7-811d-bb0a24ac09aa Gemma4-local llama.cpp enablede1e18b95-dbe0-4b94-bd1a-39c40ab40268 Grok-4.1-Fast openrouter enabled603b799f-c203-44ad-871a-0bb7394f0aa3 Kimi-K2-Thinking openrouter enabledGet Model Details
Section titled “Get Model Details”View detailed information about a specific model:
evonic model get 603b799f-c203-44ad-871a-0bb7394f0aa3Output:
ID: 603b799f-c203-44ad-871a-0bb7394f0aa3Name: Kimi-K2-ThinkingType: remoteProvider: openrouterModel Name: moonshotai/kimi-k2-thinkingBase URL: https://openrouter.ai/api/v1API Key: ***afb76aMax Tokens: 32768Timeout: 60Temperature: NoneThinking: yesEnabled: yesDefault: noAdd a Model
Section titled “Add a Model”Add a new LLM model configuration:
# Add OpenAI modelevonic model add gpt4o --name "GPT-4o" --provider openai --api-key "sk-..." --base-url "https://api.openai.com/v1"
# Add local llama.cpp modelevonic model add local_llama --name "Local Llama 3" --provider llama.cpp --base-url "http://localhost:8080/v1"Options:
| Flag | Required | Description |
|---|---|---|
--name | Yes | Display name for the model |
--provider | Yes | Provider (e.g. openai, anthropic, groq, openrouter, llama.cpp) |
--api-key | No | API key for the provider |
--base-url | No | Base URL for the API endpoint |
Remove a Model
Section titled “Remove a Model”Remove a model configuration. Requires interactive confirmation:
evonic model rm gpt4oOutput:
Model to remove: ID: gpt4o Name: GPT-4o Provider: openai Status: enabledAre you sure? [y/N]: yModel removed: gpt4oGetting Started
Section titled “Getting Started”- Choose your model runner (Ollama, llama.cpp, or vLLM)
- Install and configure the runner
- Select a model suitable for your hardware
- Configure the Evonic AI to connect to your local model
- Start building your agents!
For detailed setup instructions, see the individual runner guides.