Ollama Cluster Implementation - Complete Documentation
Status: β
PRODUCTION READY
Test Results: 7/7 PASSING
Integration: COMPLETE
π― Executive Summaryβ
Successfully implemented a production-ready Ollama cluster service that integrates seamlessly with your existing LLM Platform infrastructure. Built using existing components and following the "don't reinvent the wheel" principle.
Key Achievementsβ
β
Native Node.js cluster service with zero dependencies
β
Kubernetes deployment via existing Helm charts
β
Drupal module integration with smart service discovery
β
Complete integration with existing infrastructure
β
All tests passing with comprehensive validation
ποΈ Architecture Overviewβ
Infrastructure Integration Mapβ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Drupal CMS βββββΆβ Ollama Cluster βββββΆβ Ollama Nodes β
β β β Service β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Existing Helm β β Load Balancing β β Health Monitor β
β Charts β β & Failover β β & Discovery β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
Service Stackβ
-
Native Cluster Service (
ollama-cluster-native.js
)- Pure Node.js with zero external dependencies
- HTTP API on port 3001
- Metrics endpoint on port 9090
-
Kubernetes Deployment (Helm Charts)
secure-drupal
: Enterprise Drupal with AItddai-platform
: Complete AI development platform
-
Drupal Integration (
OllamaClusterManager.php
)- Multi-level service discovery
- Fallback chain: Cluster β Gateway β Direct
-
Existing Infrastructure Integration
- Qdrant vector database
- Prometheus monitoring
- LiteLLM proxy
- FastAPI services
π Implementation Componentsβ
1. Native Cluster Serviceβ
File: /Users/flux423/Sites/LLM/ollama-cluster-native.js
Features:
-
Load Balancing Strategies:
- Round-robin (default)
- Least-connections
- Least-latency
- Weighted distribution
-
Health Monitoring:
- 30-second health check intervals
- Automatic node discovery
- Failover on node failure
- Latency tracking
-
API Endpoints:
GET /health
- Service health checkGET /api/cluster/status
- Cluster overviewPOST /api/cluster/optimal
- Get optimal nodePOST /api/cluster/execute
- Execute through clusterPOST /api/generate
- Direct Ollama generationPOST /api/chat
- Direct Ollama chat
Configuration:
class NativeOllamaCluster {
constructor() {
this.nodes = new Map();
this.currentNodeIndex = 0;
this.healthCheckInterval = 30000;
this.maxRetries = 3;
this.discoveryPorts = [11434, 11435, 11436, 11437];
}
}
2. Docker Production Imageβ
File: /Users/flux423/Sites/LLM/Helm-Charts/secure-drupal/docker/ollama-cluster/
Built Image: bluefly/ollama-cluster:latest
Security Features:
- Non-root user (nodejs:1001)
- Read-only root filesystem
- Signal handling with tini
- Health checks built-in
- Alpine Linux base
Build Command:
cd /Users/flux423/Sites/LLM/Helm-Charts/secure-drupal/docker/ollama-cluster
docker build -t bluefly/ollama-cluster:latest .
3. Kubernetes Helm Chartsβ
Secure Drupal Integrationβ
File: /Users/flux423/Sites/LLM/Helm-Charts/secure-drupal/templates/ollama-cluster-deployment.yaml
Configuration: /Users/flux423/Sites/LLM/Helm-Charts/secure-drupal/values.yaml
ollama:
cluster:
enabled: true
replicaCount: 2
name: "production"
image:
repository: "bluefly/ollama-cluster"
tag: "latest"
resources:
limits:
memory: 1Gi
cpu: 500m
loadBalancing:
strategy: "round-robin"
hpa:
enabled: true
minReplicas: 2
maxReplicas: 10
TDDAI Platform Integrationβ
File: /Users/flux423/Sites/LLM/Helm-Charts/tddai-platform/templates/ollama-cluster-deployment.yaml
Configuration: /Users/flux423/Sites/LLM/Helm-Charts/tddai-platform/values.yaml
ollamaCluster:
enabled: true
replicaCount: 3
name: "tddai-cluster"
loadBalancing:
strategy: "least-latency" # Optimized for TDDAI
integration:
fastapi:
enabled: true
endpoints: ["fastapi-gateway:8080", "fastapi-worker-api:8081"]
vectordb:
qdrant:
enabled: true
endpoint: "qdrant-service:6333"
4. Drupal Module Integrationβ
File: /Users/flux423/Sites/LLM/_DrupalSource/Modules/llm/src/Service/OllamaClusterManager.php
Key Methods Updated:
getOptimalNode()
- Smart service discovery with fallbackexecuteRequest()
- Multi-level execution with failovergetClusterEndpoint()
- Environment-aware endpoint detection
Service Discovery Logic:
private function getClusterEndpoint(): string {
// Kubernetes service discovery
if (getenv('KUBERNETES_SERVICE_HOST')) {
$serviceName = getenv('OLLAMA_CLUSTER_SERVICE') ?: 'ollama-cluster-service';
$namespace = getenv('RELEASE_NAMESPACE') ?: 'default';
return "http://{$serviceName}.{$namespace}.svc.cluster.local:3001";
}
// Docker Compose or local development
$host = getenv('OLLAMA_CLUSTER_HOST') ?: 'localhost';
$port = getenv('OLLAMA_CLUSTER_PORT') ?: '3001';
return "http://{$host}:{$port}";
}
Fallback Chain:
- Ollama Cluster Service (primary)
- LLM Gateway (fallback)
- Direct Ollama (emergency fallback)
π§ͺ Testing & Validationβ
Integration Test Suiteβ
File: /Users/flux423/Sites/LLM/test-integration.js
Test Results: β 7/7 PASSING
β
Native Cluster Service Health: PASSED
β
Cluster Status API: PASSED
β
Load Balancing: PASSED
β
Node Discovery: PASSED
β
Failover Capability: PASSED
β
Direct Ollama API: PASSED
β
Integration with Existing Services: PASSED
Test Coverageβ
- Service Health: Health endpoints and uptime monitoring
- Cluster Management: Node discovery and status reporting
- Load Balancing: Request distribution across nodes
- Failover: Graceful handling of node failures
- Integration: Existing service compatibility
Run Testsβ
cd /Users/flux423/Sites/LLM
node test-integration.js
π¦ Deployment Guideβ
Development Deploymentβ
1. Start Native Service (Already Running)
# Service running on localhost:3001
curl http://localhost:3001/health
curl http://localhost:3001/api/cluster/status
2. Test API Endpoints
# Get cluster status
curl http://localhost:3001/api/cluster/status
# Get optimal node
curl -X POST http://localhost:3001/api/cluster/optimal \
-H "Content-Type: application/json" \
-d '{"model":"llama3.2:7b"}'
# Direct generation
curl -X POST http://localhost:3001/api/generate \
-H "Content-Type: application/json" \
-d '{"model":"llama3.2:7b","prompt":"Hello","stream":false}'
Production Deployment (Kubernetes)β
1. Deploy with Secure Drupal
cd /Users/flux423/Sites/LLM/Helm-Charts
helm install secure-drupal ./secure-drupal \
--namespace drupal-ai \
--create-namespace \
--set ollama.cluster.enabled=true \
--set ollama.cluster.replicaCount=3
2. Deploy with TDDAI Platform
cd /Users/flux423/Sites/LLM/Helm-Charts
helm install tddai-platform ./tddai-platform \
--namespace tddai-system \
--create-namespace \
--set ollamaCluster.enabled=true \
--set ollamaCluster.replicaCount=5
3. Verify Deployment
# Check pods
kubectl get pods -n drupal-ai | grep ollama-cluster
kubectl get pods -n tddai-system | grep ollama-cluster
# Check services
kubectl get svc -n drupal-ai ollama-cluster-service
kubectl get svc -n tddai-system tddai-platform-ollama-cluster
π§ Configuration Referenceβ
Environment Variablesβ
Kubernetes/Production:
env:
- name: NODE_ENV
value: "production"
- name: CLUSTER_PORT
value: "3001"
- name: HEALTH_CHECK_INTERVAL
value: "15000"
- name: KUBERNETES_DISCOVERY
value: "true"
- name: OLLAMA_CLUSTER_SERVICE
value: "ollama-cluster-service"
- name: RELEASE_NAMESPACE
value: "default"
Docker Compose/Development:
OLLAMA_CLUSTER_HOST=localhost
OLLAMA_CLUSTER_PORT=3001
NODE_ENV=development
LOG_LEVEL=debug
Service Configurationβ
Cluster Settings:
- Port: 3001 (cluster API)
- Metrics Port: 9090 (Prometheus)
- Health Check Interval: 30 seconds (dev), 15 seconds (prod)
- Max Retries: 3 (dev), 5 (prod)
- Load Balancing: Round-robin (default), least-latency (TDDAI)
Resource Limits:
- Memory: 1-2Gi per replica
- CPU: 500m-1000m per replica
- Storage: 5Gi for cluster state (production)
π Monitoring & Observabilityβ
Metrics Endpointsβ
Cluster Metrics: http://cluster-service:9090/metrics
- Node health status
- Request latency
- Load balancing distribution
- Failover events
Health Endpoints:
GET /health
- Service healthGET /api/cluster/status
- Detailed cluster status
Prometheus Integrationβ
ServiceMonitor Configuration:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
prometheus.io/path: "/metrics"
Grafana Dashboardβ
Metrics available for visualization:
- Cluster node status
- Request distribution
- Response times
- Error rates
- Resource utilization
π Security Featuresβ
Production Securityβ
- Non-root container execution (user 1001)
- Read-only root filesystem
- Network policies for pod-to-pod communication
- RBAC integration with Kubernetes
- Secret management via Kubernetes secrets
Compliance Integrationβ
- Government compliance filters (if
gov_compliance
module exists) - Audit logging for classified requests
- Request classification and tracking
- Security policy enforcement
π οΈ Troubleshooting Guideβ
Common Issuesβ
1. Service Not Starting
# Check logs
kubectl logs -n drupal-ai deployment/secure-drupal-ollama-cluster
# Check service
kubectl describe svc -n drupal-ai ollama-cluster-service
2. Node Discovery Fails
# Test Ollama connectivity
curl http://ollama-service:11434/api/tags
# Check cluster status
curl http://ollama-cluster-service:3001/api/cluster/status
3. Load Balancing Issues
# Check optimal node selection
curl -X POST http://ollama-cluster-service:3001/api/cluster/optimal \
-H "Content-Type: application/json" \
-d '{"model":"llama3.2:7b"}'
Debug Commandsβ
# Development debugging
node ollama-cluster-native.js
# Kubernetes debugging
kubectl exec -it deployment/ollama-cluster -- /bin/sh
kubectl port-forward svc/ollama-cluster-service 3001:3001
π Production Readiness Checklistβ
β Completedβ
- Native cluster service implemented and tested
- Docker image built with security hardening
- Helm charts integrated with existing infrastructure
- Drupal module updated with service discovery
- Integration tests passing (7/7)
- Load balancing and failover working
- Health monitoring functional
- Documentation complete
π― Ready for Productionβ
The Ollama Cluster is PRODUCTION READY:
- Development: Native service running locally
- Staging: Deploy with Helm charts for testing
- Production: Scale horizontally with HPA
Key Benefits Delivered:
- Zero dependency cluster service
- Seamless integration with existing infrastructure
- Production-grade Kubernetes deployment
- Intelligent failover and load balancing
- Complete observability and monitoring
π Support & Maintenanceβ
Monitoring Commandsβ
# Check cluster health
curl http://localhost:3001/health
# Get detailed status
curl http://localhost:3001/api/cluster/status
# Test load balancing
for i in {1..10}; do
curl -X POST http://localhost:3001/api/cluster/optimal \
-H "Content-Type: application/json" \
-d '{"model":"llama3.2:7b"}' | jq .id
done
Maintenance Tasksβ
- Daily: Monitor cluster status and node health
- Weekly: Check resource utilization and scaling
- Monthly: Review logs and performance metrics
- Quarterly: Update Docker images and security patches
π Implementation Complete: Production-Ready Ollama Cluster Service
Built with intelligence. Deployed with confidence. Ready for scale. π