0005 - ClickHouse Stack: Installation
Summary
Install and configure ClickHouse on the k8s-lab cluster to provide a high-performance columnar database for analytics and metrics storage. ClickHouse will leverage the existing local storage infrastructure and integrate with the GitOps workflow via ArgoCD.
Problem Statement
The k8s-lab cluster currently lacks a dedicated analytics database for time-series data and metrics storage. While Prometheus exists for metrics collection, there is no long-term, queryable analytics store suitable for:
- Historical metrics retention beyond Prometheus’s retention window
- Ad-hoc analytics queries on operational data
- Log aggregation and analysis
- Custom business metrics and event storage
ClickHouse is well-suited for this role due to its columnar storage design, excellent compression for time-series data, and SQL interface for analytics queries.
Goals
- Install ClickHouse on the k8s-lab cluster using the existing component patterns (Kustomize + Helm)
- Integrate with local storage using the
local-pathStorageClass provided by k8s lab for persistent data - GitOps deployment via the existing ArgoCD/kustomization workflow
- Basic time-series configuration with appropriate table engines and retention policies for analytics workloads
- Expose ClickHouse via the existing Traefik ingress with TLS termination
- Cluster accessibility - HTTP interface for queries and native protocol for high-performance ingestion
Non-Goals
- ClickHouse cluster mode - Single-node deployment is sufficient for lab environment; clustering is out of scope
- Data migration - No existing data to migrate; this is a fresh installation
- Application integration - Configuring specific applications to use ClickHouse (e.g., Prometheus remote write) is separate work
- High availability - Lab environment does not require HA configuration
- Custom materialized views - Specific analytics schemas will be defined in follow-up work
- Backup automation - Manual snapshots are sufficient for lab; automated backup is out of scope
- Authentication/RBAC - Basic password authentication is sufficient; advanced RBAC is out of scope
Technical Approach
Component Structure
Follow the existing k8s-lab component pattern:
components/clickhouse/
├── kustomization.yaml # Kustomize configuration with Helm chart
├── namespace.yaml # Dedicated namespace
├── values.yaml # Helm values for ClickHouse
└── ingress.yaml # Traefik IngressRoute (if not in values)
Helm Chart Selection
Use the Altinity ClickHouse Operator or Bitnami ClickHouse chart. The Bitnami chart is recommended for simplicity in a single-node lab environment:
helmCharts:
- name: clickhouse
repo: https://charts.bitnami.com/bitnami
version: <latest-stable>
releaseName: clickhouse
namespace: clickhouse
valuesFile: values.yamlStorage Configuration
Leverage the existing local-path storage:
persistence:
enabled: true
storageClass: local-path #zfs-nfs
size: 50Gi # Initial size, expandableBenefits from ZFS:
- LZ4 compression (ClickHouse data compresses well with columnar storage)
- Data integrity via checksums
- Snapshot capability for backups
Network Configuration
- Internal access: ClusterIP service for internal cluster communication
- External access: Traefik IngressRoute at
clickhouse.lab.local.ctoaas.co - Ports:
- 8123: HTTP interface (queries, healthcheck)
- 9000: Native TCP protocol (high-performance client connections)
Basic Configuration
Configure ClickHouse for time-series analytics workloads:
<clickhouse>
<!-- Optimize for analytics queries -->
<max_memory_usage>4000000000</max_memory_usage>
<max_threads>4</max_threads>
<!-- Default MergeTree settings for time-series -->
<merge_tree>
<min_bytes_for_wide_part>0</min_bytes_for_wide_part>
</merge_tree>
</clickhouse>Registration in Components
Add to components/kustomization.yaml:
resources:
# ... existing resources
- clickhouse/Implementation Tasks
Phase 1: Basic Installation
- Create
components/clickhouse/directory structure - Create namespace.yaml for
clickhousenamespace - Configure kustomization.yaml with Helm chart reference
- Create values.yaml with:
- local-path storage class configuration
- Resource limits appropriate for lab (2-4 CPU, 4-8Gi memory)
- Basic authentication (password from central-secret-store)
- Add component to
components/kustomization.yaml
Phase 2: Ingress and Access
- Configure Traefik IngressRoute for HTTP interface
- Set up TLS via existing LetsEncrypt issuer
- Verify external access at
clickhouse.lab.local.ctoaas.co
Phase 3: Verification
- Deploy and verify pod is running
- Confirm PVC is created with local-path storage class
- Test HTTP interface connectivity
- Run basic SQL queries to verify functionality
- Create a sample time-series table to validate configuration
Success Criteria
- Deployment: ClickHouse pod running in
clickhousenamespace - Storage: PVC bound to local PV via
local-pathstorage class - Connectivity: HTTP interface accessible at
clickhouse.lab.local.ctoaas.co - Functionality: Can create tables, insert data, and run queries
- GitOps: Component deploys cleanly via
kubectl apply -k components/or ArgoCD sync - Documentation: README.md in component directory with usage instructions
Dependencies
| Dependency | Status | Notes |
|---|---|---|
| Traefik ingress | Complete | Ingress controller running |
| LetsEncrypt issuer | Complete | TLS certificates available |
| central-secret-store | Complete | Can store ClickHouse credentials |
| ArgoCD | Complete | GitOps deployment mechanism |
Risks
Risk 1: Storage Performance over NFS (need to consider if we use NFS/ZFS)
Risk: NFS overhead may impact ClickHouse write performance for high-volume ingestion.
Mitigation:
- Lab environment has modest ingestion requirements
- ZFS compression reduces write amplification
- Can tune NFS mount options (already configured with
nfsvers=4.2,noatime) - If needed, can evaluate iSCSI or local-path for ClickHouse specifically in future
Likelihood: Low | Impact: Medium
Risk 2: Resource Contention
Risk: ClickHouse analytics queries can be resource-intensive, potentially impacting other cluster workloads.
Mitigation:
- Configure appropriate resource limits (requests/limits)
- Set ClickHouse
max_memory_usageandmax_threadsconstraints - Lab cluster has sufficient resources for moderate analytics workload
Likelihood: Low | Impact: Low
Risk 3: Helm Chart Compatibility
Risk: Chosen Helm chart may not support all desired configuration options or may have breaking changes.
Mitigation:
- Pin Helm chart version explicitly
- Review chart values schema before implementation
- Bitnami charts are well-maintained and documented
- Can switch to Altinity operator if Bitnami proves insufficient
Likelihood: Low | Impact: Low
Open Questions
- Chart selection: Bitnami vs Altinity operator - need to evaluate which better fits single-node lab use case
- Initial storage size: 50Gi proposed; should this be larger for anticipated use cases?
- Integration priority: Which systems should integrate with ClickHouse first (Prometheus remote write, log aggregation, custom metrics)?