Troubleshooting Guide
This guide helps diagnose and resolve common PVC Chonker issues.
Common Issues
PVC Not Expanding
Symptoms
- PVC usage above threshold but no expansion occurs
- No expansion events in PVC description
- Controller logs show no activity for the PVC
Diagnosis Steps
# 1. Check PVC annotations
kubectl get pvc your-pvc -o yaml | grep -A 10 annotations
# 2. Check PVC events
kubectl describe pvc your-pvc
# 3. Check controller logs
kubectl logs -n pvc-chonker-system deployment/controller-manager --tail=50
# 4. Check storage class
kubectl get storageclass your-storage-class -o yaml
Common Causes & Solutions
PVC not enabled:
# Solution: Add enable annotation
metadata:
annotations:
pvc-chonker.io/enabled: "true"
Storage class doesn't support expansion:
# Check allowVolumeExpansion
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: your-storage-class
allowVolumeExpansion: true # Must be true
PVC in cooldown period:
# Check last expansion time
kubectl get pvc your-pvc -o jsonpath='{.metadata.annotations.pvc-chonker\.io/last-expansion}'
# Solution: Wait for cooldown or reduce cooldown period
Maximum size reached:
# Check current size vs max-size annotation
kubectl get pvc your-pvc -o jsonpath='{.spec.resources.requests.storage}'
kubectl get pvc your-pvc -o jsonpath='{.metadata.annotations.pvc-chonker\.io/max-size}'
Controller Not Starting
Symptoms
- Controller pods in CrashLoopBackOff
- Controller pods not ready
- No controller logs
Diagnosis Steps
# Check pod status
kubectl get pods -n pvc-chonker-system
# Check pod events
kubectl describe pod -n pvc-chonker-system controller-manager-xxx
# Check pod logs
kubectl logs -n pvc-chonker-system controller-manager-xxx
Common Causes & Solutions
RBAC permissions missing:
# Check ClusterRole exists
kubectl get clusterrole pvc-chonker-manager-role
# Check ClusterRoleBinding
kubectl get clusterrolebinding pvc-chonker-manager-rolebinding
CRDs not installed:
# Check CRDs exist
kubectl get crd | grep pvc-chonker
# Install missing CRDs
kubectl apply -f config/crd/bases/
Image pull issues:
# Check image pull policy and availability
kubectl describe pod -n pvc-chonker-system controller-manager-xxx | grep -A 5 Events
Metrics Not Available
Symptoms
- Kubelet metrics endpoint returns 404
- Controller logs show "metrics not found" errors
- PVCs not expanding despite being above threshold
Diagnosis Steps
# 1. Check kubelet metrics endpoint
kubectl get nodes -o wide
# Then test: curl http://NODE-IP:10255/metrics
# 2. Check alternative kubelet port
curl -k https://NODE-IP:10250/metrics
# 3. Check from within cluster
kubectl run debug --image=curlimages/curl -it --rm -- curl http://NODE-IP:10255/metrics
Solutions
Enable kubelet metrics (self-managed clusters):
# kubelet-config.yaml
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
serverTLSBootstrap: true
authentication:
webhook:
enabled: true
authorization:
mode: Webhook
Configure alternative kubelet URL:
# Use environment variable or flag
--kubelet-url="https://NODE-IP:10250"
Webhook Issues (PVCGroup)
Symptoms
- PVC creation fails with webhook errors
- Webhook timeout errors
- PVCGroup annotations not applied
Diagnosis Steps
# Check webhook configuration
kubectl get mutatingwebhookconfiguration pvc-chonker-mutating-webhook-configuration
# Check webhook service
kubectl get svc -n pvc-chonker-system pvc-chonker-webhook-service
# Check webhook certificates
kubectl get secret -n pvc-chonker-system pvc-chonker-webhook-server-cert
# Check controller logs for webhook errors
kubectl logs -n pvc-chonker-system deployment/controller-manager | grep webhook
Solutions
Webhook not enabled:
# Enable webhook in controller
--enable-webhook=true
# Or environment variable
PVC_CHONKER_ENABLE_WEBHOOK=true
Certificate issues:
# Regenerate webhook certificates
./hack/generate-webhook-certs.sh
kubectl apply -f config/webhook/
Service connectivity:
# Check service endpoints
kubectl get endpoints -n pvc-chonker-system pvc-chonker-webhook-service
# Test webhook connectivity
kubectl run debug --image=curlimages/curl -it --rm -- \
curl -k https://pvc-chonker-webhook-service.pvc-chonker-system.svc:443/mutate--v1-persistentvolumeclaim
Performance Issues
Slow Expansion Detection
Symptoms
- Long delays between threshold breach and expansion
- High controller CPU usage
- Many PVCs not being processed
Diagnosis
# Check reconciliation interval
kubectl logs -n pvc-chonker-system deployment/controller-manager | grep "watch-interval"
# Check controller resource usage
kubectl top pod -n pvc-chonker-system
# Check number of managed PVCs
kubectl get pvc --all-namespaces -o json | jq '.items | map(select(.metadata.annotations."pvc-chonker.io/enabled" == "true")) | length'
Solutions
Reduce watch interval:
# Faster reconciliation (higher CPU usage)
--watch-interval=30s
Increase controller resources:
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 200m
memory: 256Mi
Reduce managed PVCs:
# Disable expansion for unused PVCs
kubectl annotate pvc unused-pvc pvc-chonker.io/enabled=false
High Memory Usage
Symptoms
- Controller OOMKilled
- High memory usage in metrics
- Slow API responses
Solutions
Increase memory limits:
resources:
limits:
memory: 2Gi
requests:
memory: 512Mi
Reduce concurrent operations:
--max-parallel=2 # Default is 4
Configuration Issues
Policy Not Applied
Symptoms
- PVC matches policy selector but settings not applied
- Policy exists but PVC uses different configuration
Diagnosis
# Check policy selector matches PVC labels
kubectl get pvcpolicy your-policy -o yaml
kubectl get pvc your-pvc -o yaml | grep -A 10 labels
# Check policy namespace matches PVC namespace
kubectl get pvcpolicy -n your-namespace
# Check for annotation overrides
kubectl get pvc your-pvc -o yaml | grep -A 20 annotations
Solutions
Fix label matching:
# Ensure PVC labels match policy selector
metadata:
labels:
workload: database # Must match policy selector
Check namespace:
# PVCPolicy must be in same namespace as PVC
kubectl get pvcpolicy -n correct-namespace
Group Coordination Issues
Symptoms
- PVCs in group have different sizes
- Group coordination not working
- Webhook not applying group settings
Diagnosis
# Check PVCGroup status
kubectl get pvcgroup your-group -o yaml
# Check group member PVCs
kubectl get pvc -l app=your-app -o custom-columns=NAME:.metadata.name,SIZE:.spec.resources.requests.storage
# Check webhook logs
kubectl logs -n pvc-chonker-system deployment/controller-manager | grep "webhook\|group"
Solutions
Verify webhook is enabled:
# Check webhook configuration
kubectl get mutatingwebhookconfiguration pvc-chonker-mutating-webhook-configuration
# Enable webhook if missing
helm upgrade pvc-chonker logiciq/pvc-chonker --set webhook.enabled=true
Check group selector:
# Ensure PVC labels match group selector
spec:
selector:
matchLabels:
app: elasticsearch # PVCs must have this label
Debugging Commands
Comprehensive Status Check
#!/bin/bash
echo "=== PVC Chonker Status ==="
echo "Controller Pods:"
kubectl get pods -n pvc-chonker-system
echo -e "\nController Logs (last 20 lines):"
kubectl logs -n pvc-chonker-system deployment/controller-manager --tail=20
echo -e "\nManaged PVCs:"
kubectl get pvc --all-namespaces -o json | \
jq -r '.items[] | select(.metadata.annotations."pvc-chonker.io/enabled" == "true") | "\(.metadata.namespace)/\(.metadata.name)"'
echo -e "\nPVCPolicies:"
kubectl get pvcpolicy --all-namespaces
echo -e "\nPVCGroups:"
kubectl get pvcgroup --all-namespaces
echo -e "\nWebhook Configuration:"
kubectl get mutatingwebhookconfiguration pvc-chonker-mutating-webhook-configuration 2>/dev/null || echo "Webhook not configured"
PVC Expansion History
#!/bin/bash
PVC_NAME=$1
NAMESPACE=${2:-default}
echo "=== PVC Expansion History: $NAMESPACE/$PVC_NAME ==="
kubectl get events --field-selector involvedObject.name=$PVC_NAME -n $NAMESPACE --sort-by='.firstTimestamp'
echo -e "\nCurrent PVC Status:"
kubectl get pvc $PVC_NAME -n $NAMESPACE -o yaml | grep -A 20 -B 5 "pvc-chonker"
echo -e "\nController Logs for this PVC:"
kubectl logs -n pvc-chonker-system deployment/controller-manager | grep $PVC_NAME
Metrics Validation
#!/bin/bash
echo "=== Kubelet Metrics Validation ==="
NODE_NAME=$(kubectl get nodes -o jsonpath='{.items[0].metadata.name}')
echo "Testing node: $NODE_NAME"
# Test metrics endpoint
kubectl get --raw /api/v1/nodes/$NODE_NAME/proxy/metrics | grep kubelet_volume_stats | head -5 || echo "No volume metrics found"
echo -e "\nPVC Chonker Metrics:"
kubectl port-forward -n pvc-chonker-system svc/pvc-chonker-metrics 8080:8080 &
PF_PID=$!
sleep 2
curl -s http://localhost:8080/metrics | grep pvcchonker | head -10 || echo "No PVC Chonker metrics found"
kill $PF_PID 2>/dev/null
Getting Help
Collecting Debug Information
#!/bin/bash
echo "Collecting PVC Chonker debug information..."
mkdir -p pvc-chonker-debug
# Controller information
kubectl get pods -n pvc-chonker-system -o yaml > pvc-chonker-debug/controller-pods.yaml
kubectl logs -n pvc-chonker-system deployment/controller-manager > pvc-chonker-debug/controller-logs.txt
# Configuration
kubectl get pvcpolicy --all-namespaces -o yaml > pvc-chonker-debug/pvcpolicies.yaml
kubectl get pvcgroup --all-namespaces -o yaml > pvc-chonker-debug/pvcgroups.yaml
kubectl get mutatingwebhookconfiguration pvc-chonker-mutating-webhook-configuration -o yaml > pvc-chonker-debug/webhook-config.yaml 2>/dev/null
# Managed PVCs
kubectl get pvc --all-namespaces -o yaml | \
yq eval 'select(.metadata.annotations."pvc-chonker.io/enabled" == "true")' > pvc-chonker-debug/managed-pvcs.yaml
# System information
kubectl version > pvc-chonker-debug/cluster-version.txt
kubectl get nodes -o yaml > pvc-chonker-debug/nodes.yaml
kubectl get storageclass -o yaml > pvc-chonker-debug/storageclasses.yaml
echo "Debug information collected in pvc-chonker-debug/"
echo "Please attach this directory when reporting issues."
Support Channels
- GitHub Issues: Report bugs
- GitHub Discussions: Community support
- Documentation: Project docs