0005 - ClickHouse Stack: Installation

Summary

Install and configure ClickHouse on the k8s-lab cluster to provide a high-performance columnar database for analytics and metrics storage. ClickHouse will leverage the existing local storage infrastructure and integrate with the GitOps workflow via ArgoCD.

Problem Statement

The k8s-lab cluster currently lacks a dedicated analytics database for time-series data and metrics storage. While Prometheus exists for metrics collection, there is no long-term, queryable analytics store suitable for:

  • Historical metrics retention beyond Prometheus’s retention window
  • Ad-hoc analytics queries on operational data
  • Log aggregation and analysis
  • Custom business metrics and event storage

ClickHouse is well-suited for this role due to its columnar storage design, excellent compression for time-series data, and SQL interface for analytics queries.

Goals

  1. Install ClickHouse on the k8s-lab cluster using the existing component patterns (Kustomize + Helm)
  2. Integrate with local storage using the local-path StorageClass provided by k8s lab for persistent data
  3. GitOps deployment via the existing ArgoCD/kustomization workflow
  4. Basic time-series configuration with appropriate table engines and retention policies for analytics workloads
  5. Expose ClickHouse via the existing Traefik ingress with TLS termination
  6. Cluster accessibility - HTTP interface for queries and native protocol for high-performance ingestion

Non-Goals

  1. ClickHouse cluster mode - Single-node deployment is sufficient for lab environment; clustering is out of scope
  2. Data migration - No existing data to migrate; this is a fresh installation
  3. Application integration - Configuring specific applications to use ClickHouse (e.g., Prometheus remote write) is separate work
  4. High availability - Lab environment does not require HA configuration
  5. Custom materialized views - Specific analytics schemas will be defined in follow-up work
  6. Backup automation - Manual snapshots are sufficient for lab; automated backup is out of scope
  7. Authentication/RBAC - Basic password authentication is sufficient; advanced RBAC is out of scope

Technical Approach

Component Structure

Follow the existing k8s-lab component pattern:

components/clickhouse/
├── kustomization.yaml     # Kustomize configuration with Helm chart
├── namespace.yaml         # Dedicated namespace
├── values.yaml            # Helm values for ClickHouse
└── ingress.yaml           # Traefik IngressRoute (if not in values)

Helm Chart Selection

Use the Altinity ClickHouse Operator or Bitnami ClickHouse chart. The Bitnami chart is recommended for simplicity in a single-node lab environment:

helmCharts:
  - name: clickhouse
    repo: https://charts.bitnami.com/bitnami
    version: <latest-stable>
    releaseName: clickhouse
    namespace: clickhouse
    valuesFile: values.yaml

Storage Configuration

Leverage the existing local-path storage:

persistence:
  enabled: true
  storageClass: local-path #zfs-nfs
  size: 50Gi  # Initial size, expandable

Benefits from ZFS:

  • LZ4 compression (ClickHouse data compresses well with columnar storage)
  • Data integrity via checksums
  • Snapshot capability for backups

Network Configuration

  1. Internal access: ClusterIP service for internal cluster communication
  2. External access: Traefik IngressRoute at clickhouse.lab.local.ctoaas.co
  3. Ports:
    • 8123: HTTP interface (queries, healthcheck)
    • 9000: Native TCP protocol (high-performance client connections)

Basic Configuration

Configure ClickHouse for time-series analytics workloads:

<clickhouse>
  <!-- Optimize for analytics queries -->
  <max_memory_usage>4000000000</max_memory_usage>
  <max_threads>4</max_threads>
 
  <!-- Default MergeTree settings for time-series -->
  <merge_tree>
    <min_bytes_for_wide_part>0</min_bytes_for_wide_part>
  </merge_tree>
</clickhouse>

Registration in Components

Add to components/kustomization.yaml:

resources:
  # ... existing resources
  - clickhouse/

Implementation Tasks

Phase 1: Basic Installation

  1. Create components/clickhouse/ directory structure
  2. Create namespace.yaml for clickhouse namespace
  3. Configure kustomization.yaml with Helm chart reference
  4. Create values.yaml with:
    • local-path storage class configuration
    • Resource limits appropriate for lab (2-4 CPU, 4-8Gi memory)
    • Basic authentication (password from central-secret-store)
  5. Add component to components/kustomization.yaml

Phase 2: Ingress and Access

  1. Configure Traefik IngressRoute for HTTP interface
  2. Set up TLS via existing LetsEncrypt issuer
  3. Verify external access at clickhouse.lab.local.ctoaas.co

Phase 3: Verification

  1. Deploy and verify pod is running
  2. Confirm PVC is created with local-path storage class
  3. Test HTTP interface connectivity
  4. Run basic SQL queries to verify functionality
  5. Create a sample time-series table to validate configuration

Success Criteria

  1. Deployment: ClickHouse pod running in clickhouse namespace
  2. Storage: PVC bound to local PV via local-path storage class
  3. Connectivity: HTTP interface accessible at clickhouse.lab.local.ctoaas.co
  4. Functionality: Can create tables, insert data, and run queries
  5. GitOps: Component deploys cleanly via kubectl apply -k components/ or ArgoCD sync
  6. Documentation: README.md in component directory with usage instructions

Dependencies

DependencyStatusNotes
Traefik ingressCompleteIngress controller running
LetsEncrypt issuerCompleteTLS certificates available
central-secret-storeCompleteCan store ClickHouse credentials
ArgoCDCompleteGitOps deployment mechanism

Risks

Risk 1: Storage Performance over NFS (need to consider if we use NFS/ZFS)

Risk: NFS overhead may impact ClickHouse write performance for high-volume ingestion.

Mitigation:

  • Lab environment has modest ingestion requirements
  • ZFS compression reduces write amplification
  • Can tune NFS mount options (already configured with nfsvers=4.2,noatime)
  • If needed, can evaluate iSCSI or local-path for ClickHouse specifically in future

Likelihood: Low | Impact: Medium

Risk 2: Resource Contention

Risk: ClickHouse analytics queries can be resource-intensive, potentially impacting other cluster workloads.

Mitigation:

  • Configure appropriate resource limits (requests/limits)
  • Set ClickHouse max_memory_usage and max_threads constraints
  • Lab cluster has sufficient resources for moderate analytics workload

Likelihood: Low | Impact: Low

Risk 3: Helm Chart Compatibility

Risk: Chosen Helm chart may not support all desired configuration options or may have breaking changes.

Mitigation:

  • Pin Helm chart version explicitly
  • Review chart values schema before implementation
  • Bitnami charts are well-maintained and documented
  • Can switch to Altinity operator if Bitnami proves insufficient

Likelihood: Low | Impact: Low

Open Questions

  1. Chart selection: Bitnami vs Altinity operator - need to evaluate which better fits single-node lab use case
  2. Initial storage size: 50Gi proposed; should this be larger for anticipated use cases?
  3. Integration priority: Which systems should integrate with ClickHouse first (Prometheus remote write, log aggregation, custom metrics)?

References