Design Document
Overview
This design implements ZFS-backed persistent storage for the k8s-lab cluster using democratic-csi with NFS protocol. The Proxmox host exposes a ZFS dataset via NFS, which democratic-csi uses to dynamically provision Kubernetes PersistentVolumes with ZFS features (compression, snapshots, quotas).
Architecture
High-Level Architecture
┌─────────────────────────────────────────────────────────────┐
│ Proxmox Host (k8s-lab) │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ ZFS Pool: rpool │ │
│ │ │ │
│ │ ├── rpool/data/vm-100-disk-0 (Talos VM - 10.3G) │ │
│ │ └── rpool/data/k8s-storage (NFS Export) │ │
│ │ ├── pvc-code-server-storage │ │
│ │ ├── pvc-postgres-data │ │
│ │ └── pvc-... │ │
│ └──────────────────────────────────────────────────────┘ │
│ │ │
│ │ NFS Export │
│ │ (192.168.32.59) │
└──────────────────────────┼──────────────────────────────────┘
│
│ NFS Mount
▼
┌─────────────────────────────────────────────────────────────┐
│ Talos Kubernetes Cluster (192.168.32.59) │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ democratic-csi (NFS Driver) │ │
│ │ - Controller: Provisions PVs │ │
│ │ - Node: Mounts NFS to pods │ │
│ └──────────────────────────────────────────────────────┘ │
│ │ │
│ │ Creates/Manages │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ StorageClass: zfs-nfs │ │
│ │ - Provisioner: org.democratic-csi.nfs │ │
│ │ - ReclaimPolicy: Retain │ │
│ │ - VolumeBindingMode: Immediate │ │
│ └──────────────────────────────────────────────────────┘ │
│ │ │
│ │ Provisions │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ PersistentVolumes (PVs) │ │
│ │ - Backed by ZFS datasets │ │
│ │ - NFS mounted to pods │ │
│ └──────────────────────────────────────────────────────┘ │
│ │ │
│ │ Claimed by │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Workloads (code-server, codev, etc.) │ │
│ │ - Mount PVCs as volumes │ │
│ │ - Read/write to ZFS-backed storage │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Component Responsibilities
Proxmox Host
- ZFS Pool Management: Maintains
rpoolwith datasets - NFS Server: Exports
rpool/data/k8s-storagevia NFS - ZFS Features: Provides compression, checksums, snapshots at the storage layer
democratic-csi
- Controller: Provisions/deletes PVs by creating/destroying ZFS datasets
- Node Driver: Mounts NFS shares to pods
- StorageClass: Defines provisioning parameters
Kubernetes
- PVC: Workloads request storage via PersistentVolumeClaims
- PV: Bound to ZFS datasets, mounted via NFS
- Pods: Access storage through volume mounts
Design Decisions
Decision 1: democratic-csi with NFS vs OpenEBS ZFS LocalPV
Chosen: democratic-csi with NFS
Rationale:
- Talos Compatibility: NFS support is built into Talos kubelet, no custom images or extensions required
- Simplicity: No need to manage ZFS on Talos nodes (immutable OS)
- Proven: Well-documented with Talos Linux
- Flexibility: Can easily reconfigure or migrate storage
- ZFS Benefits: Still get ZFS features (compression, snapshots) on Proxmox side
Alternatives Considered:
- OpenEBS ZFS LocalPV: Requires Talos ZFS extension, custom image builds, managing zpools via special pods
- Proxmox CSI Plugin: Designed for VMs as storage, not for exposing host ZFS to k8s
- Local-path provisioner: Already in use, but doesn’t leverage ZFS features
Decision 2: NFS vs iSCSI
Chosen: NFS
Rationale:
- Built-in Support: Talos has NFS support by default
- Simpler Setup: No iSCSI initiator configuration needed
- Sufficient Performance: For development workloads (code-server), NFS overhead is acceptable
- Easier Debugging: NFS mounts are easier to troubleshoot
Alternatives Considered:
- iSCSI: Better performance but requires Talos
iscsi-toolsextension
Decision 3: Retain Reclaim Policy
Chosen: Retain
Rationale:
- Data Safety: PVs are not automatically deleted when PVCs are removed
- Manual Cleanup: Operator must explicitly delete ZFS datasets
- Recovery: Can recover data if PVC is accidentally deleted
- Lab Environment: Safer for experimentation
Alternatives Considered:
- Delete: Automatic cleanup but risk of data loss
Decision 4: ZFS Dataset Structure
Chosen: rpool/data/k8s-storage as parent dataset
Rationale:
- Isolation: Separate from VM storage (
rpool/data/vm-*) - Organization: All k8s PVCs under one parent
- ZFS Properties: Can set properties on parent that inherit to children
- Snapshots: Can snapshot entire k8s storage tree
Structure:
rpool/data
├── vm-100-disk-0 (Talos VM)
└── k8s-storage (k8s PVCs)
├── pvc-abc123
├── pvc-def456
└── pvc-ghi789
Implementation Details
Proxmox Configuration
ZFS Dataset Setup
# Create k8s storage dataset
zfs create rpool/data/k8s-storage
# Set optimal properties
zfs set compression=lz4 rpool/data/k8s-storage
zfs set atime=off rpool/data/k8s-storage
zfs set recordsize=128k rpool/data/k8s-storageProperty Explanations:
compression=lz4: Fast compression, good space savingsatime=off: Don’t update access times, improves performancerecordsize=128k: Good default for mixed workloads (files + databases)
NFS Export Configuration
# Install NFS server
apt update && apt install nfs-kernel-server -y
# Add to /etc/exports
echo "/rpool/data/k8s-storage 192.168.32.59(rw,sync,no_subtree_check,no_root_squash)" >> /etc/exports
# Reload exports
exportfs -ra
# Verify
exportfs -v
showmount -e localhostNFS Options:
rw: Read-write accesssync: Synchronous writes (data safety)no_subtree_check: Performance optimizationno_root_squash: Allow root access from k8s nodes (required for democratic-csi)
Kubernetes Configuration
democratic-csi Installation
Namespace:
apiVersion: v1
kind: Namespace
metadata:
name: democratic-csi
labels:
pod-security.kubernetes.io/enforce: privilegedHelm Values (democratic-csi-nfs-values.yaml):
csiDriver:
name: "org.democratic-csi.nfs"
storageClasses:
- name: zfs-nfs
defaultClass: false
reclaimPolicy: Retain
volumeBindingMode: Immediate
allowVolumeExpansion: true
parameters:
fsType: nfs
volumeSnapshotClasses:
- name: zfs-nfs-snapshot
deletionPolicy: Delete
parameters: {}
driver:
config:
driver: freenas-nfs
instance_id:
httpConnection:
protocol: http
host: 192.168.32.1
port: 80
apiKey:
zfs:
datasetParentName: rpool/data/k8s-storage
detachedSnapshotsDatasetParentName: rpool/data/k8s-storage/snapshots
datasetEnableQuotas: true
datasetEnableReservation: false
datasetPermissionsMode: "0777"
datasetPermissionsUser: 0
datasetPermissionsGroup: 0
nfs:
shareHost: 192.168.32.1
shareAlldirs: false
shareAllowedHosts: []
shareAllowedNetworks: []
shareMaprootUser: root
shareMaprootGroup: root
shareMapallUser: ""
shareMapallGroup: ""Installation Command:
helm repo add democratic-csi https://democratic-csi.github.io/charts/
helm repo update
helm upgrade --install \
--namespace democratic-csi \
--create-namespace \
--values democratic-csi-nfs-values.yaml \
zfs-nfs democratic-csi/democratic-csiStorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: zfs-nfs
provisioner: org.democratic-csi.nfs
reclaimPolicy: Retain
volumeBindingMode: Immediate
allowVolumeExpansion: true
parameters:
fsType: nfsCode-Server Integration
Updated PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: code-server-storage
namespace: code-server
spec:
accessModes:
- ReadWriteOnce
storageClassName: zfs-nfs # Changed from local-path
resources:
requests:
storage: 10GiDeployment Updates
No changes required to code-server or codev deployments - they already reference the PVC by name.
Data Flow
PVC Creation Flow
- User creates PVC with
storageClassName: zfs-nfs - democratic-csi controller receives provisioning request
- Controller creates ZFS dataset:
rpool/data/k8s-storage/pvc-<uuid> - Controller sets ZFS properties (quota, compression, etc.)
- Controller creates PV object in Kubernetes
- PVC binds to PV
- Pod scheduled, democratic-csi node driver mounts NFS share
- Pod can read/write to ZFS-backed storage
Volume Mount Flow
Pod → PVC → PV → democratic-csi node driver → NFS mount → ZFS dataset
Write Path
Application write → NFS client (in pod) → NFS server (Proxmox) → ZFS dataset → Disk
↓
Compression
Checksums
COW
Testing Strategy
Unit Tests
Not applicable - this is infrastructure configuration.
Integration Tests
-
PVC Provisioning Test
- Create PVC with
zfs-nfsstorage class - Verify PV is created and bound
- Verify ZFS dataset exists on Proxmox
- Delete PVC, verify PV is retained
- Create PVC with
-
Volume Mount Test
- Create pod with PVC
- Write file to mounted volume
- Verify file exists in ZFS dataset on Proxmox
- Delete pod, recreate, verify file persists
-
Shared Workspace Test
- Deploy code-server and codev with shared PVC
- Verify both pods can read/write same files
- Verify pods are scheduled on same node (RWO requirement)
Acceptance Tests
-
Code-Server Workspace Test
- Deploy code-server with ZFS-backed PVC
- Create files in workspace
- Restart pod, verify files persist
- Access from both code-server and codev pods
-
Performance Test
- Measure file I/O performance
- Compare to local-path baseline
- Verify acceptable latency for development workloads
-
Snapshot Test (Future)
- Create VolumeSnapshot
- Verify ZFS snapshot exists
- Restore from snapshot
Monitoring and Observability
Metrics
- PV/PVC Status: Monitor via
kubectl get pv,pvc - ZFS Dataset Usage: Monitor via
zfs list -o space - NFS Mount Status: Check pod events and logs
Logging
- democratic-csi logs:
kubectl logs -n democratic-csi -l app=democratic-csi - Pod mount errors:
kubectl describe pod <pod-name> - NFS server logs:
/var/log/syslogon Proxmox
Alerts
- PVC provisioning failures
- NFS mount failures
- ZFS pool capacity warnings (>80%)
Security Considerations
NFS Security
- Network Isolation: NFS export restricted to k8s node IP (192.168.32.59)
- no_root_squash: Required for democratic-csi but limits export to trusted network
- Firewall: Ensure NFS ports (2049, 111) only accessible from k8s network
Access Control
- PVC Namespaces: PVCs isolated by namespace
- RBAC: democratic-csi service account has minimal required permissions
- Pod Security: Pods run as non-root where possible
Operational Procedures
Adding Storage Capacity
# ZFS automatically uses available pool space
# No action needed unless pool is full
zfs list -o space rpool/data/k8s-storageBackup Strategy
# Snapshot entire k8s storage
zfs snapshot -r rpool/data/k8s-storage@backup-$(date +%Y%m%d)
# Send to backup location
zfs send -R rpool/data/k8s-storage@backup-YYYYMMDD | \
ssh backup-host zfs recv backup-pool/k8s-storageDisaster Recovery
# List available snapshots
zfs list -t snapshot -r rpool/data/k8s-storage
# Rollback to snapshot
zfs rollback rpool/data/k8s-storage/pvc-abc123@snapshot-name
# Clone snapshot to new dataset
zfs clone rpool/data/k8s-storage/pvc-abc123@snapshot-name \
rpool/data/k8s-storage/pvc-abc123-restoredCleanup Retained PVs
# List retained PVs
kubectl get pv | grep Released
# Delete PV from Kubernetes
kubectl delete pv <pv-name>
# Delete ZFS dataset on Proxmox
zfs destroy rpool/data/k8s-storage/pvc-<uuid>Migration Path
From local-path to zfs-nfs
- Create new PVC with
zfs-nfsstorage class - Copy data from old PVC to new PVC (using a migration pod)
- Update deployments to reference new PVC
- Verify functionality
- Delete old PVC
Migration Pod Example:
apiVersion: v1
kind: Pod
metadata:
name: pvc-migrator
spec:
containers:
- name: migrator
image: alpine
command: ["sh", "-c", "cp -a /old/* /new/ && sync"]
volumeMounts:
- name: old-storage
mountPath: /old
- name: new-storage
mountPath: /new
volumes:
- name: old-storage
persistentVolumeClaim:
claimName: code-server-storage-old
- name: new-storage
persistentVolumeClaim:
claimName: code-server-storage
restartPolicy: NeverFuture Enhancements
- Volume Snapshots: Implement VolumeSnapshot support for backups
- Capacity Management: Automated alerts for storage usage
- Performance Tuning: Optimize ZFS recordsize per workload type
- Multi-node Support: Evaluate ReadWriteMany for shared workloads
- Encryption: Enable ZFS encryption for sensitive data