Design Document

Overview

This design implements ZFS-backed persistent storage for the k8s-lab cluster using democratic-csi with NFS protocol. The Proxmox host exposes a ZFS dataset via NFS, which democratic-csi uses to dynamically provision Kubernetes PersistentVolumes with ZFS features (compression, snapshots, quotas).

Architecture

High-Level Architecture

┌─────────────────────────────────────────────────────────────┐
│ Proxmox Host (k8s-lab)                                      │
│                                                             │
│  ┌──────────────────────────────────────────────────────┐  │
│  │ ZFS Pool: rpool                                      │  │
│  │                                                      │  │
│  │  ├── rpool/data/vm-100-disk-0 (Talos VM - 10.3G)   │  │
│  │  └── rpool/data/k8s-storage (NFS Export)            │  │
│  │       ├── pvc-code-server-storage                   │  │
│  │       ├── pvc-postgres-data                         │  │
│  │       └── pvc-...                                   │  │
│  └──────────────────────────────────────────────────────┘  │
│                          │                                  │
│                          │ NFS Export                       │
│                          │ (192.168.32.59)                  │
└──────────────────────────┼──────────────────────────────────┘
                           │
                           │ NFS Mount
                           ▼
┌─────────────────────────────────────────────────────────────┐
│ Talos Kubernetes Cluster (192.168.32.59)                   │
│                                                             │
│  ┌──────────────────────────────────────────────────────┐  │
│  │ democratic-csi (NFS Driver)                          │  │
│  │  - Controller: Provisions PVs                        │  │
│  │  - Node: Mounts NFS to pods                          │  │
│  └──────────────────────────────────────────────────────┘  │
│                          │                                  │
│                          │ Creates/Manages                  │
│                          ▼                                  │
│  ┌──────────────────────────────────────────────────────┐  │
│  │ StorageClass: zfs-nfs                                │  │
│  │  - Provisioner: org.democratic-csi.nfs               │  │
│  │  - ReclaimPolicy: Retain                             │  │
│  │  - VolumeBindingMode: Immediate                      │  │
│  └──────────────────────────────────────────────────────┘  │
│                          │                                  │
│                          │ Provisions                       │
│                          ▼                                  │
│  ┌──────────────────────────────────────────────────────┐  │
│  │ PersistentVolumes (PVs)                              │  │
│  │  - Backed by ZFS datasets                            │  │
│  │  - NFS mounted to pods                               │  │
│  └──────────────────────────────────────────────────────┘  │
│                          │                                  │
│                          │ Claimed by                       │
│                          ▼                                  │
│  ┌──────────────────────────────────────────────────────┐  │
│  │ Workloads (code-server, codev, etc.)                 │  │
│  │  - Mount PVCs as volumes                             │  │
│  │  - Read/write to ZFS-backed storage                  │  │
│  └──────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Component Responsibilities

Proxmox Host

  • ZFS Pool Management: Maintains rpool with datasets
  • NFS Server: Exports rpool/data/k8s-storage via NFS
  • ZFS Features: Provides compression, checksums, snapshots at the storage layer

democratic-csi

  • Controller: Provisions/deletes PVs by creating/destroying ZFS datasets
  • Node Driver: Mounts NFS shares to pods
  • StorageClass: Defines provisioning parameters

Kubernetes

  • PVC: Workloads request storage via PersistentVolumeClaims
  • PV: Bound to ZFS datasets, mounted via NFS
  • Pods: Access storage through volume mounts

Design Decisions

Decision 1: democratic-csi with NFS vs OpenEBS ZFS LocalPV

Chosen: democratic-csi with NFS

Rationale:

  • Talos Compatibility: NFS support is built into Talos kubelet, no custom images or extensions required
  • Simplicity: No need to manage ZFS on Talos nodes (immutable OS)
  • Proven: Well-documented with Talos Linux
  • Flexibility: Can easily reconfigure or migrate storage
  • ZFS Benefits: Still get ZFS features (compression, snapshots) on Proxmox side

Alternatives Considered:

  • OpenEBS ZFS LocalPV: Requires Talos ZFS extension, custom image builds, managing zpools via special pods
  • Proxmox CSI Plugin: Designed for VMs as storage, not for exposing host ZFS to k8s
  • Local-path provisioner: Already in use, but doesn’t leverage ZFS features

Decision 2: NFS vs iSCSI

Chosen: NFS

Rationale:

  • Built-in Support: Talos has NFS support by default
  • Simpler Setup: No iSCSI initiator configuration needed
  • Sufficient Performance: For development workloads (code-server), NFS overhead is acceptable
  • Easier Debugging: NFS mounts are easier to troubleshoot

Alternatives Considered:

  • iSCSI: Better performance but requires Talos iscsi-tools extension

Decision 3: Retain Reclaim Policy

Chosen: Retain

Rationale:

  • Data Safety: PVs are not automatically deleted when PVCs are removed
  • Manual Cleanup: Operator must explicitly delete ZFS datasets
  • Recovery: Can recover data if PVC is accidentally deleted
  • Lab Environment: Safer for experimentation

Alternatives Considered:

  • Delete: Automatic cleanup but risk of data loss

Decision 4: ZFS Dataset Structure

Chosen: rpool/data/k8s-storage as parent dataset

Rationale:

  • Isolation: Separate from VM storage (rpool/data/vm-*)
  • Organization: All k8s PVCs under one parent
  • ZFS Properties: Can set properties on parent that inherit to children
  • Snapshots: Can snapshot entire k8s storage tree

Structure:

rpool/data
├── vm-100-disk-0 (Talos VM)
└── k8s-storage (k8s PVCs)
    ├── pvc-abc123
    ├── pvc-def456
    └── pvc-ghi789

Implementation Details

Proxmox Configuration

ZFS Dataset Setup

# Create k8s storage dataset
zfs create rpool/data/k8s-storage
 
# Set optimal properties
zfs set compression=lz4 rpool/data/k8s-storage
zfs set atime=off rpool/data/k8s-storage
zfs set recordsize=128k rpool/data/k8s-storage

Property Explanations:

  • compression=lz4: Fast compression, good space savings
  • atime=off: Don’t update access times, improves performance
  • recordsize=128k: Good default for mixed workloads (files + databases)

NFS Export Configuration

# Install NFS server
apt update && apt install nfs-kernel-server -y
 
# Add to /etc/exports
echo "/rpool/data/k8s-storage 192.168.32.59(rw,sync,no_subtree_check,no_root_squash)" >> /etc/exports
 
# Reload exports
exportfs -ra
 
# Verify
exportfs -v
showmount -e localhost

NFS Options:

  • rw: Read-write access
  • sync: Synchronous writes (data safety)
  • no_subtree_check: Performance optimization
  • no_root_squash: Allow root access from k8s nodes (required for democratic-csi)

Kubernetes Configuration

democratic-csi Installation

Namespace:

apiVersion: v1
kind: Namespace
metadata:
  name: democratic-csi
  labels:
    pod-security.kubernetes.io/enforce: privileged

Helm Values (democratic-csi-nfs-values.yaml):

csiDriver:
  name: "org.democratic-csi.nfs"
 
storageClasses:
- name: zfs-nfs
  defaultClass: false
  reclaimPolicy: Retain
  volumeBindingMode: Immediate
  allowVolumeExpansion: true
  parameters:
    fsType: nfs
 
volumeSnapshotClasses:
- name: zfs-nfs-snapshot
  deletionPolicy: Delete
  parameters: {}
 
driver:
  config:
    driver: freenas-nfs
    instance_id:
    httpConnection:
      protocol: http
      host: 192.168.32.1
      port: 80
      apiKey:
    zfs:
      datasetParentName: rpool/data/k8s-storage
      detachedSnapshotsDatasetParentName: rpool/data/k8s-storage/snapshots
      datasetEnableQuotas: true
      datasetEnableReservation: false
      datasetPermissionsMode: "0777"
      datasetPermissionsUser: 0
      datasetPermissionsGroup: 0
    nfs:
      shareHost: 192.168.32.1
      shareAlldirs: false
      shareAllowedHosts: []
      shareAllowedNetworks: []
      shareMaprootUser: root
      shareMaprootGroup: root
      shareMapallUser: ""
      shareMapallGroup: ""

Installation Command:

helm repo add democratic-csi https://democratic-csi.github.io/charts/
helm repo update
 
helm upgrade --install \
  --namespace democratic-csi \
  --create-namespace \
  --values democratic-csi-nfs-values.yaml \
  zfs-nfs democratic-csi/democratic-csi

StorageClass

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: zfs-nfs
provisioner: org.democratic-csi.nfs
reclaimPolicy: Retain
volumeBindingMode: Immediate
allowVolumeExpansion: true
parameters:
  fsType: nfs

Code-Server Integration

Updated PVC

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: code-server-storage
  namespace: code-server
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: zfs-nfs  # Changed from local-path
  resources:
    requests:
      storage: 10Gi

Deployment Updates

No changes required to code-server or codev deployments - they already reference the PVC by name.

Data Flow

PVC Creation Flow

  1. User creates PVC with storageClassName: zfs-nfs
  2. democratic-csi controller receives provisioning request
  3. Controller creates ZFS dataset: rpool/data/k8s-storage/pvc-<uuid>
  4. Controller sets ZFS properties (quota, compression, etc.)
  5. Controller creates PV object in Kubernetes
  6. PVC binds to PV
  7. Pod scheduled, democratic-csi node driver mounts NFS share
  8. Pod can read/write to ZFS-backed storage

Volume Mount Flow

Pod → PVC → PV → democratic-csi node driver → NFS mount → ZFS dataset

Write Path

Application write → NFS client (in pod) → NFS server (Proxmox) → ZFS dataset → Disk
                                                                      ↓
                                                                 Compression
                                                                 Checksums
                                                                 COW

Testing Strategy

Unit Tests

Not applicable - this is infrastructure configuration.

Integration Tests

  1. PVC Provisioning Test

    • Create PVC with zfs-nfs storage class
    • Verify PV is created and bound
    • Verify ZFS dataset exists on Proxmox
    • Delete PVC, verify PV is retained
  2. Volume Mount Test

    • Create pod with PVC
    • Write file to mounted volume
    • Verify file exists in ZFS dataset on Proxmox
    • Delete pod, recreate, verify file persists
  3. Shared Workspace Test

    • Deploy code-server and codev with shared PVC
    • Verify both pods can read/write same files
    • Verify pods are scheduled on same node (RWO requirement)

Acceptance Tests

  1. Code-Server Workspace Test

    • Deploy code-server with ZFS-backed PVC
    • Create files in workspace
    • Restart pod, verify files persist
    • Access from both code-server and codev pods
  2. Performance Test

    • Measure file I/O performance
    • Compare to local-path baseline
    • Verify acceptable latency for development workloads
  3. Snapshot Test (Future)

    • Create VolumeSnapshot
    • Verify ZFS snapshot exists
    • Restore from snapshot

Monitoring and Observability

Metrics

  • PV/PVC Status: Monitor via kubectl get pv,pvc
  • ZFS Dataset Usage: Monitor via zfs list -o space
  • NFS Mount Status: Check pod events and logs

Logging

  • democratic-csi logs: kubectl logs -n democratic-csi -l app=democratic-csi
  • Pod mount errors: kubectl describe pod <pod-name>
  • NFS server logs: /var/log/syslog on Proxmox

Alerts

  • PVC provisioning failures
  • NFS mount failures
  • ZFS pool capacity warnings (>80%)

Security Considerations

NFS Security

  • Network Isolation: NFS export restricted to k8s node IP (192.168.32.59)
  • no_root_squash: Required for democratic-csi but limits export to trusted network
  • Firewall: Ensure NFS ports (2049, 111) only accessible from k8s network

Access Control

  • PVC Namespaces: PVCs isolated by namespace
  • RBAC: democratic-csi service account has minimal required permissions
  • Pod Security: Pods run as non-root where possible

Operational Procedures

Adding Storage Capacity

# ZFS automatically uses available pool space
# No action needed unless pool is full
zfs list -o space rpool/data/k8s-storage

Backup Strategy

# Snapshot entire k8s storage
zfs snapshot -r rpool/data/k8s-storage@backup-$(date +%Y%m%d)
 
# Send to backup location
zfs send -R rpool/data/k8s-storage@backup-YYYYMMDD | \
  ssh backup-host zfs recv backup-pool/k8s-storage

Disaster Recovery

# List available snapshots
zfs list -t snapshot -r rpool/data/k8s-storage
 
# Rollback to snapshot
zfs rollback rpool/data/k8s-storage/pvc-abc123@snapshot-name
 
# Clone snapshot to new dataset
zfs clone rpool/data/k8s-storage/pvc-abc123@snapshot-name \
  rpool/data/k8s-storage/pvc-abc123-restored

Cleanup Retained PVs

# List retained PVs
kubectl get pv | grep Released
 
# Delete PV from Kubernetes
kubectl delete pv <pv-name>
 
# Delete ZFS dataset on Proxmox
zfs destroy rpool/data/k8s-storage/pvc-<uuid>

Migration Path

From local-path to zfs-nfs

  1. Create new PVC with zfs-nfs storage class
  2. Copy data from old PVC to new PVC (using a migration pod)
  3. Update deployments to reference new PVC
  4. Verify functionality
  5. Delete old PVC

Migration Pod Example:

apiVersion: v1
kind: Pod
metadata:
  name: pvc-migrator
spec:
  containers:
  - name: migrator
    image: alpine
    command: ["sh", "-c", "cp -a /old/* /new/ && sync"]
    volumeMounts:
    - name: old-storage
      mountPath: /old
    - name: new-storage
      mountPath: /new
  volumes:
  - name: old-storage
    persistentVolumeClaim:
      claimName: code-server-storage-old
  - name: new-storage
    persistentVolumeClaim:
      claimName: code-server-storage
  restartPolicy: Never

Future Enhancements

  1. Volume Snapshots: Implement VolumeSnapshot support for backups
  2. Capacity Management: Automated alerts for storage usage
  3. Performance Tuning: Optimize ZFS recordsize per workload type
  4. Multi-node Support: Evaluate ReadWriteMany for shared workloads
  5. Encryption: Enable ZFS encryption for sensitive data

References