0005 - ClickHouse Stack: Installation

Summary

Install and configure ClickHouse on the k8s-lab cluster to provide a high-performance columnar database for analytics and metrics storage. ClickHouse will leverage the existing local storage infrastructure and integrate with the GitOps workflow via ArgoCD.

Problem Statement

The k8s-lab cluster currently lacks a dedicated analytics database for time-series data and metrics storage. While Prometheus exists for metrics collection, there is no long-term, queryable analytics store suitable for:

Historical metrics retention beyond Prometheus’s retention window
Ad-hoc analytics queries on operational data
Log aggregation and analysis
Custom business metrics and event storage

ClickHouse is well-suited for this role due to its columnar storage design, excellent compression for time-series data, and SQL interface for analytics queries.

Goals

Install ClickHouse on the k8s-lab cluster using the existing component patterns (Kustomize + Helm)
Integrate with local storage using the local-path StorageClass provided by k8s lab for persistent data
GitOps deployment via the existing ArgoCD/kustomization workflow
Basic time-series configuration with appropriate table engines and retention policies for analytics workloads
Expose ClickHouse via the existing Traefik ingress with TLS termination
Cluster accessibility - HTTP interface for queries and native protocol for high-performance ingestion

Non-Goals

ClickHouse cluster mode - Single-node deployment is sufficient for lab environment; clustering is out of scope
Data migration - No existing data to migrate; this is a fresh installation
Application integration - Configuring specific applications to use ClickHouse (e.g., Prometheus remote write) is separate work
High availability - Lab environment does not require HA configuration
Custom materialized views - Specific analytics schemas will be defined in follow-up work
Backup automation - Manual snapshots are sufficient for lab; automated backup is out of scope
Authentication/RBAC - Basic password authentication is sufficient; advanced RBAC is out of scope

Technical Approach

Component Structure

Follow the existing k8s-lab component pattern:

components/clickhouse/
├── kustomization.yaml     # Kustomize configuration with Helm chart
├── namespace.yaml         # Dedicated namespace
├── values.yaml            # Helm values for ClickHouse
└── ingress.yaml           # Traefik IngressRoute (if not in values)

Helm Chart Selection

Use the Altinity ClickHouse Operator or Bitnami ClickHouse chart. The Bitnami chart is recommended for simplicity in a single-node lab environment:

helmCharts:
  - name: clickhouse
    repo: https://charts.bitnami.com/bitnami
    version: <latest-stable>
    releaseName: clickhouse
    namespace: clickhouse
    valuesFile: values.yaml

Storage Configuration

Leverage the existing local-path storage:

persistence:
  enabled: true
  storageClass: local-path #zfs-nfs
  size: 50Gi  # Initial size, expandable

Benefits from ZFS:

LZ4 compression (ClickHouse data compresses well with columnar storage)
Data integrity via checksums
Snapshot capability for backups

Network Configuration

Internal access: ClusterIP service for internal cluster communication
External access: Traefik IngressRoute at clickhouse.lab.local.ctoaas.co
Ports:
- 8123: HTTP interface (queries, healthcheck)
- 9000: Native TCP protocol (high-performance client connections)

Basic Configuration

Configure ClickHouse for time-series analytics workloads:

<clickhouse>
  <!-- Optimize for analytics queries -->
  <max_memory_usage>4000000000</max_memory_usage>
  <max_threads>4</max_threads>
 
  <!-- Default MergeTree settings for time-series -->
  <merge_tree>
    <min_bytes_for_wide_part>0</min_bytes_for_wide_part>
  </merge_tree>
</clickhouse>

Registration in Components

Add to components/kustomization.yaml:

resources:
  # ... existing resources
  - clickhouse/

Implementation Tasks

Phase 1: Basic Installation

Create components/clickhouse/ directory structure
Create namespace.yaml for clickhouse namespace
Configure kustomization.yaml with Helm chart reference
Create values.yaml with:
- local-path storage class configuration
- Resource limits appropriate for lab (2-4 CPU, 4-8Gi memory)
- Basic authentication (password from central-secret-store)
Add component to components/kustomization.yaml

Phase 2: Ingress and Access

Configure Traefik IngressRoute for HTTP interface
Set up TLS via existing LetsEncrypt issuer
Verify external access at clickhouse.lab.local.ctoaas.co

Phase 3: Verification

Deploy and verify pod is running
Confirm PVC is created with local-path storage class
Test HTTP interface connectivity
Run basic SQL queries to verify functionality
Create a sample time-series table to validate configuration

Success Criteria

Deployment: ClickHouse pod running in clickhouse namespace
Storage: PVC bound to local PV via local-path storage class
Connectivity: HTTP interface accessible at clickhouse.lab.local.ctoaas.co
Functionality: Can create tables, insert data, and run queries
GitOps: Component deploys cleanly via kubectl apply -k components/ or ArgoCD sync
Documentation: README.md in component directory with usage instructions

Dependencies

Dependency	Status	Notes
Traefik ingress	Complete	Ingress controller running
LetsEncrypt issuer	Complete	TLS certificates available
central-secret-store	Complete	Can store ClickHouse credentials
ArgoCD	Complete	GitOps deployment mechanism

Risks

Risk 1: Storage Performance over NFS (need to consider if we use NFS/ZFS)

Risk: NFS overhead may impact ClickHouse write performance for high-volume ingestion.

Mitigation:

Lab environment has modest ingestion requirements
ZFS compression reduces write amplification
Can tune NFS mount options (already configured with nfsvers=4.2,noatime)
If needed, can evaluate iSCSI or local-path for ClickHouse specifically in future

Likelihood: Low | Impact: Medium

Risk 2: Resource Contention

Risk: ClickHouse analytics queries can be resource-intensive, potentially impacting other cluster workloads.

Mitigation:

Configure appropriate resource limits (requests/limits)
Set ClickHouse max_memory_usage and max_threads constraints
Lab cluster has sufficient resources for moderate analytics workload

Likelihood: Low | Impact: Low

Risk 3: Helm Chart Compatibility

Risk: Chosen Helm chart may not support all desired configuration options or may have breaking changes.

Mitigation:

Pin Helm chart version explicitly
Review chart values schema before implementation
Bitnami charts are well-maintained and documented
Can switch to Altinity operator if Bitnami proves insufficient

Likelihood: Low | Impact: Low

Open Questions

Chart selection: Bitnami vs Altinity operator - need to evaluate which better fits single-node lab use case
Initial storage size: 50Gi proposed; should this be larger for anticipated use cases?
Integration priority: Which systems should integrate with ClickHouse first (Prometheus remote write, log aggregation, custom metrics)?

Techcle Wiki

Explorer

Spec

0005 - ClickHouse Stack: Installation

Summary

Problem Statement

Goals

Non-Goals

Technical Approach

Component Structure

Helm Chart Selection

Storage Configuration

Network Configuration

Basic Configuration

Registration in Components

Implementation Tasks

Phase 1: Basic Installation

Phase 2: Ingress and Access

Phase 3: Verification

Success Criteria

Dependencies

Risks

Risk 1: Storage Performance over NFS (need to consider if we use NFS/ZFS)

Risk 2: Resource Contention

Risk 3: Helm Chart Compatibility

Open Questions

References

Graph View

Table of Contents

Backlinks