Suyog Maid
Suyog Maid
šŸ“„
Article2026-01-20

Building Production-Ready EKS Clusters with AWS CDK

#kubernetes#eks#aws#cdk#devops#infrastructure-as-code

Building Production-Ready EKS Clusters with AWS CDK

Amazon Elastic Kubernetes Service (EKS) has become the de facto standard for running Kubernetes workloads on AWS. However, deploying a production-ready EKS cluster involves much more than just clicking a button in the console. In this post, I'll walk you through building a robust, scalable, and cost-optimized EKS cluster using AWS CDK (Cloud Development Kit).

Why AWS CDK for EKS?

Infrastructure as Code (IaC) is essential for modern DevOps practices. While CloudFormation and Terraform are popular choices, AWS CDK offers several advantages:

  • Type Safety: Write infrastructure code in TypeScript, Python, or Java with full IDE support
  • Constructs Library: High-level abstractions for complex AWS resources
  • Reusability: Create and share custom constructs across projects
  • Native AWS Integration: First-class support for all AWS services
  • Testability: Unit test your infrastructure code before deployment

Architecture Overview

Our production EKS cluster follows AWS Well-Architected Framework principles:

Network Architecture

VPC (10.0.0.0/16)
ā”œā”€ā”€ Public Subnets (3 AZs)
│   ā”œā”€ā”€ 10.0.1.0/24 (us-east-1a)
│   ā”œā”€ā”€ 10.0.2.0/24 (us-east-1b)
│   └── 10.0.3.0/24 (us-east-1c)
│
ā”œā”€ā”€ Private Subnets (3 AZs)
│   ā”œā”€ā”€ 10.0.11.0/24 (us-east-1a)
│   ā”œā”€ā”€ 10.0.12.0/24 (us-east-1b)
│   └── 10.0.13.0/24 (us-east-1c)
│
└── NAT Gateway (Single for cost optimization)

EKS Components

  1. Control Plane: Managed by AWS in Multi-AZ configuration
  2. Worker Nodes: EC2 instances in private subnets across 3 AZs
  3. Add-ons: VPC CNI, CoreDNS, kube-proxy, EBS CSI Driver
  4. Observability: CloudWatch Container Insights, Prometheus, Grafana
  5. Ingress: AWS Load Balancer Controller for ALB/NLB integration

Implementation with AWS CDK

Setting Up the CDK Project

// Initialize CDK project
import * as cdk from 'aws-cdk-lib';
import * as eks from 'aws-cdk-lib/aws-eks';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as iam from 'aws-cdk-lib/aws-iam';

export class EksClusterStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);
    
    // VPC with optimized NAT Gateway configuration
    const vpc = new ec2.Vpc(this, 'EksVpc', {
      maxAzs: 3,
      natGateways: 1, // Cost optimization
      subnetConfiguration: [
        {
          name: 'Public',
          subnetType: ec2.SubnetType.PUBLIC,
          cidrMask: 24,
        },
        {
          name: 'Private',
          subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS,
          cidrMask: 24,
        },
      ],
    });
    
    // EKS Cluster
    const cluster = new eks.Cluster(this, 'EksCluster', {
      version: eks.KubernetesVersion.V1_34,
      vpc,
      vpcSubnets: [{ subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS }],
      defaultCapacity: 0, // We'll add managed node groups separately
    });
  }
}

Node Group Configuration

// Multi-AZ Managed Node Group
cluster.addNodegroupCapacity('ManagedNodeGroup', {
  instanceTypes: [new ec2.InstanceType('t3.medium')],
  minSize: 2,
  maxSize: 10,
  desiredSize: 3,
  subnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
  labels: {
    role: 'general-purpose',
  },
  tags: {
    'k8s.io/cluster-autoscaler/enabled': 'true',
    'k8s.io/cluster-autoscaler/cluster-name': cluster.clusterName,
  },
});

Essential Add-ons and Controllers

1. AWS Load Balancer Controller

The AWS Load Balancer Controller enables Kubernetes Ingress resources to provision Application Load Balancers automatically.

// IAM Policy for Load Balancer Controller
const albControllerPolicy = new iam.PolicyDocument({
  statements: [
    new iam.PolicyStatement({
      actions: [
        'ec2:CreateTags',
        'ec2:DeleteTags',
        'elasticloadbalancing:*',
        'ec2:AuthorizeSecurityGroupIngress',
        'ec2:RevokeSecurityGroupIngress',
        'ec2:DeleteSecurityGroup',
      ],
      resources: ['*'],
    }),
  ],
});

// Install AWS Load Balancer Controller via Helm
cluster.addHelmChart('AwsLoadBalancerController', {
  chart: 'aws-load-balancer-controller',
  repository: 'https://aws.github.io/eks-charts',
  namespace: 'kube-system',
  values: {
    clusterName: cluster.clusterName,
    serviceAccount: {
      create: true,
      name: 'aws-load-balancer-controller',
    },
  },
});

2. EBS CSI Driver

For persistent storage, the EBS CSI Driver is essential:

// Enable EBS CSI Driver
cluster.addHelmChart('EbsCsiDriver', {
  chart: 'aws-ebs-csi-driver',
  repository: 'https://kubernetes-sigs.github.io/aws-ebs-csi-driver',
  namespace: 'kube-system',
  values: {
    enableVolumeScheduling: true,
    enableVolumeResizing: true,
    enableVolumeSnapshot: true,
  },
});

3. Cluster Autoscaler

Automatic node scaling based on pod resource requirements:

// Cluster Autoscaler Deployment
const clusterAutoscaler = cluster.addManifest('ClusterAutoscaler', {
  apiVersion: 'apps/v1',
  kind: 'Deployment',
  metadata: {
    name: 'cluster-autoscaler',
    namespace: 'kube-system',
  },
  spec: {
    selector: {
      matchLabels: {
        app: 'cluster-autoscaler',
      },
    },
    template: {
      metadata: {
        labels: {
          app: 'cluster-autoscaler',
        },
      },
      spec: {
        containers: [
          {
            name: 'cluster-autoscaler',
            image: 'k8s.gcr.io/autoscaling/cluster-autoscaler:v1.26.2',
            command: [
              './cluster-autoscaler',
              `--cluster-name=${cluster.clusterName}`,
              '--aws-region=us-east-1',
              '--balance-similar-node-groups',
              '--skip-nodes-with-system-pods=false',
            ],
          },
        ],
      },
    },
  },
});

Security Best Practices

1. IAM Roles for Service Accounts (IRSA)

IRSA provides fine-grained IAM permissions to Kubernetes pods:

const s3AccessServiceAccount = cluster.addServiceAccount('S3AccessServiceAccount', {
  name: 's3-access',
  namespace: 'default',
});

s3AccessServiceAccount.addToPrincipalPolicy(
  new iam.PolicyStatement({
    actions: ['s3:GetObject', 's3:PutObject'],
    resources: ['arn:aws:s3:::my-bucket/*'],
  })
);

2. Network Policies

Implement network segmentation using Calico or Cilium:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-ingress
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  ingress: []

3. Pod Security Standards

Enforce pod security policies:

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

Monitoring and Observability

CloudWatch Container Insights

Enable comprehensive cluster monitoring:

// Enable CloudWatch Container Insights
cluster.addHelmChart('CloudWatchInsights', {
  chart: 'aws-cloudwatch-metrics',
  repository: 'https://aws.github.io/eks-charts',
  namespace: 'amazon-cloudwatch',
  values: {
    clusterName: cluster.clusterName,
  },
});

Prometheus and Grafana Stack

Deploy the kube-prometheus-stack for advanced monitoring:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --values prometheus-values.yaml

Cost Optimization Strategies

1. Spot Instances for Non-Critical Workloads

cluster.addNodegroupCapacity('SpotNodeGroup', {
  instanceTypes: [
    new ec2.InstanceType('t3.medium'),
    new ec2.InstanceType('t3.large'),
  ],
  capacityType: eks.CapacityType.SPOT,
  minSize: 0,
  maxSize: 5,
  labels: {
    role: 'spot',
  },
  taints: [
    {
      key: 'spot',
      value: 'true',
      effect: eks.TaintEffect.NO_SCHEDULE,
    },
  ],
});

2. Right-Sizing with Vertical Pod Autoscaler

Install VPA to automatically adjust resource requests:

kubectl apply -f https://github.com/kubernetes/autoscaler/releases/download/vertical-pod-autoscaler-0.13.0/vpa-release.yaml

3. Automated Scaling During Off-Hours

Use CronJobs or external schedulers to scale down during non-business hours:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: scale-down-evening
spec:
  schedule: "0 18 * * 1-5"  # 6 PM weekdays
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: kubectl
            image: bitnami/kubectl:latest
            command:
            - /bin/sh
            - -c
            - kubectl scale deployment myapp --replicas=1

Deployment Pipeline Integration

GitOps with ArgoCD

Implement GitOps for declarative cluster management:

# Install ArgoCD
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# Access ArgoCD UI
kubectl port-forward svc/argocd-server -n argocd 8080:443

Helm Chart Deployment

Example application deployment with Helm:

# values.yaml
replicaCount: 3

image:
  repository: my-registry/my-app
  tag: v1.0.0
  pullPolicy: IfNotPresent

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 250m
    memory: 256Mi

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

Disaster Recovery and Backup

Velero for Cluster Backup

# Install Velero
velero install \
  --provider aws \
  --bucket eks-backup-bucket \
  --secret-file ./credentials-velero \
  --backup-location-config region=us-east-1 \
  --snapshot-location-config region=us-east-1

# Create scheduled backup
velero schedule create daily-backup --schedule="0 2 * * *"

Key Takeaways

  1. Infrastructure as Code: Use AWS CDK for reproducible, version-controlled infrastructure
  2. Multi-AZ Design: Ensure high availability across availability zones
  3. Security First: Implement IRSA, network policies, and pod security standards
  4. Cost Optimization: Leverage spot instances, autoscaling, and right-sizing
  5. Observability: Deploy comprehensive monitoring with CloudWatch and Prometheus
  6. GitOps: Adopt declarative configuration management with ArgoCD
  7. Automation: Automate everything from deployment to scaling and backup

Conclusion

Building production-ready EKS clusters requires careful planning and implementation of best practices across security, scalability, cost, and operations. AWS CDK provides an excellent framework for codifying these practices and ensuring consistent, reliable deployments.

The investment in proper EKS architecture pays dividends in:

  • Reduced operational overhead through automation
  • Improved reliability with multi-AZ deployments
  • Cost savings through optimization strategies
  • Enhanced security with defense-in-depth approach
  • Better developer experience with self-service capabilities

Ready to build your own EKS cluster? Start with the CDK examples above and customize them for your specific requirements!


Questions or feedback? Feel free to reach out through the contact form. I'd love to hear about your EKS journey!

Share this insight