Tavo-IT Logo
Fortgeschritten30 min Lesezeit2025-06-05

AWS InfrastructureDesign

Skalierbare AWS-Architekturen mit EC2, ECS, RDS und Lambda - Best Practices und Kostenoptimierung für Enterprise-Cloud-Infrastrukturen.

AWSCloud ArchitectureInfrastructureEnterprise

🟠 AWS Grundlagen

Amazon Web Services (AWS) ist die weltweit führende Cloud-Computing-Plattform mit über 200 Services. Eine gut durchdachte AWS-Architektur ermöglicht Skalierbarkeit, Hochverfügbarkeit und Kosteneffizienz.

🌍 Global

33 Regionen, 105 Availability Zones

⚡ Skalierbar

Elastische Ressourcen-Skalierung

💰 Pay-as-Use

Bezahlen nur für genutzte Ressourcen

🌐 VPC & Networking

VPC Design Beispiel

# VPC mit Terraform
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true
  
  tags = {
    Name        = "production-vpc"
    Environment = "prod"
  }
}

# Public Subnets für Load Balancer
resource "aws_subnet" "public" {
  count             = length(var.availability_zones)
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.${count.index + 1}.0/24"
  availability_zone = var.availability_zones[count.index]
  
  map_public_ip_on_launch = true
  
  tags = {
    Name = "public-subnet-${count.index + 1}"
    Type = "Public"
  }
}

# Private Subnets für Application Layer
resource "aws_subnet" "private" {
  count             = length(var.availability_zones)
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.${count.index + 10}.0/24"
  availability_zone = var.availability_zones[count.index]
  
  tags = {
    Name = "private-subnet-${count.index + 1}"
    Type = "Private"
  }
}

# Database Subnets
resource "aws_subnet" "database" {
  count             = length(var.availability_zones)
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.${count.index + 20}.0/24"
  availability_zone = var.availability_zones[count.index]
  
  tags = {
    Name = "database-subnet-${count.index + 1}"
    Type = "Database"
  }
}

Networking Components

  • Internet Gateway - Internet-Zugang für Public Subnets
  • NAT Gateway - Outbound Internet für Private Subnets
  • Route Tables - Traffic-Routing Regeln
  • Security Groups - Instance-Level Firewall
  • NACLs - Subnet-Level Access Control

Best Practices

  • Multi-AZ Design - Hochverfügbarkeit
  • Subnet Segmentierung - Public/Private/DB Tiers
  • CIDR Planning - Ausreichende IP-Ranges
  • Security Groups - Principle of Least Privilege
  • VPC Endpoints - Private AWS Service Access

🖥️ EC2 Instances

Auto Scaling Group Setup

# Launch Template
resource "aws_launch_template" "web_server" {
  name_prefix   = "web-server-"
  image_id      = data.aws_ami.amazon_linux.id
  instance_type = "t3.medium"
  
  vpc_security_group_ids = [aws_security_group.web_server.id]
  
  user_data = base64encode(templatefile("userdata.sh", {
    app_version = var.app_version
  }))
  
  tag_specifications {
    resource_type = "instance"
    tags = {
      Name        = "web-server"
      Environment = var.environment
    }
  }
  
  lifecycle {
    create_before_destroy = true
  }
}

# Auto Scaling Group
resource "aws_autoscaling_group" "web_servers" {
  name                = "web-servers-asg"
  vpc_zone_identifier = aws_subnet.private[*].id
  target_group_arns   = [aws_lb_target_group.web_servers.arn]
  health_check_type   = "ELB"
  health_check_grace_period = 300
  
  min_size         = 2
  max_size         = 10
  desired_capacity = 3
  
  launch_template {
    id      = aws_launch_template.web_server.id
    version = "$Latest"
  }
  
  tag {
    key                 = "Name"
    value               = "web-server-asg"
    propagate_at_launch = true
  }
}

Instance Types

  • t3/t4g - Burstable Performance
  • m5/m6i - General Purpose
  • c5/c6i - Compute Optimized
  • r5/r6i - Memory Optimized

Kostenoptimierung

  • Reserved Instances - 1-3 Jahre Commitment
  • Spot Instances - Bis zu 90% Ersparnis
  • Savings Plans - Flexible Commitments
  • Right Sizing - Optimale Instance-Größe

Monitoring

  • CloudWatch - Metrics & Alarms
  • SSM Agent - Patch Management
  • Inspector - Security Assessment
  • Systems Manager - Operations

🐳 ECS & Fargate

ECS Service mit Fargate

# ECS Cluster
resource "aws_ecs_cluster" "main" {
  name = "production-cluster"
  
  setting {
    name  = "containerInsights"
    value = "enabled"
  }
}

# Task Definition
resource "aws_ecs_task_definition" "web_app" {
  family                   = "web-app"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = "512"
  memory                   = "1024"
  execution_role_arn       = aws_iam_role.ecs_execution.arn
  task_role_arn           = aws_iam_role.ecs_task.arn
  
  container_definitions = jsonencode([
    {
      name  = "web-app"
      image = "${var.ecr_repository_url}:latest"
      
      portMappings = [
        {
          containerPort = 3000
          protocol      = "tcp"
        }
      ]
      
      environment = [
        {
          name  = "NODE_ENV"
          value = "production"
        }
      ]
      
      secrets = [
        {
          name      = "DATABASE_URL"
          valueFrom = aws_ssm_parameter.database_url.arn
        }
      ]
      
      logConfiguration = {
        logDriver = "awslogs"
        options = {
          awslogs-group         = aws_cloudwatch_log_group.app.name
          awslogs-region        = var.aws_region
          awslogs-stream-prefix = "ecs"
        }
      }
      
      healthCheck = {
        command     = ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"]
        interval    = 30
        timeout     = 5
        retries     = 3
        startPeriod = 60
      }
    }
  ])
}

# ECS Service
resource "aws_ecs_service" "web_app" {
  name            = "web-app-service"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.web_app.arn
  desired_count   = 3
  launch_type     = "FARGATE"
  
  network_configuration {
    subnets          = aws_subnet.private[*].id
    security_groups  = [aws_security_group.ecs_service.id]
    assign_public_ip = false
  }
  
  load_balancer {
    target_group_arn = aws_lb_target_group.web_app.arn
    container_name   = "web-app"
    container_port   = 3000
  }
  
  deployment_configuration {
    maximum_percent         = 200
    minimum_healthy_percent = 100
  }
  
  depends_on = [aws_lb_listener.web_app]
}

🗄️ RDS & Aurora

Aurora PostgreSQL Cluster

# Aurora Subnet Group
resource "aws_rds_subnet_group" "aurora" {
  name       = "aurora-subnet-group"
  subnet_ids = aws_subnet.database[*].id
  
  tags = {
    Name = "Aurora DB subnet group"
  }
}

# Aurora Cluster
resource "aws_rds_cluster" "aurora_postgresql" {
  cluster_identifier     = "aurora-postgresql-cluster"
  engine                = "aurora-postgresql"
  engine_version        = "13.7"
  database_name         = var.database_name
  master_username       = var.database_username
  master_password       = var.database_password
  
  vpc_security_group_ids = [aws_security_group.aurora.id]
  db_subnet_group_name   = aws_rds_subnet_group.aurora.name
  
  # Backup Configuration
  backup_retention_period = 30
  preferred_backup_window = "03:00-04:00"
  
  # Maintenance
  preferred_maintenance_window = "sun:04:00-sun:05:00"
  
  # Encryption
  storage_encrypted = true
  kms_key_id       = aws_kms_key.aurora.arn
  
  # Performance Insights
  enabled_cloudwatch_logs_exports = ["postgresql"]
  
  # Point-in-time Recovery
  copy_tags_to_snapshot = true
  deletion_protection   = true
  
  tags = {
    Name        = "Aurora PostgreSQL Cluster"
    Environment = var.environment
  }
}

# Aurora Instances
resource "aws_rds_cluster_instance" "aurora_instances" {
  count              = 2
  identifier         = "aurora-instance-${count.index + 1}"
  cluster_identifier = aws_rds_cluster.aurora_postgresql.id
  instance_class     = "db.r6g.large"
  engine             = aws_rds_cluster.aurora_postgresql.engine
  engine_version     = aws_rds_cluster.aurora_postgresql.engine_version
  
  performance_insights_enabled = true
  monitoring_interval          = 60
  monitoring_role_arn         = aws_iam_role.rds_monitoring.arn
  
  tags = {
    Name = "Aurora Instance ${count.index + 1}"
  }
}

✅ Aurora Vorteile

Performance

Bis zu 3x schneller als Standard PostgreSQL

Skalierung

Automatische Storage-Skalierung bis 128TB

Verfügbarkeit

99.99% SLA mit Multi-AZ Deployment

Backup

Automatische, kontinuierliche Backups

🔒 Security & IAM

IAM Role für ECS Task

# ECS Task Execution Role
resource "aws_iam_role" "ecs_execution" {
  name = "ecs-execution-role"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ecs-tasks.amazonaws.com"
        }
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "ecs_execution" {
  role       = aws_iam_role.ecs_execution.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}

# Custom Policy für Secrets Access
resource "aws_iam_role_policy" "ecs_secrets" {
  name = "ecs-secrets-policy"
  role = aws_iam_role.ecs_execution.id
  
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "ssm:GetParameters",
          "secretsmanager:GetSecretValue",
          "kms:Decrypt"
        ]
        Resource = [
          "arn:aws:ssm:*:*:parameter/myapp/*",
          "arn:aws:secretsmanager:*:*:secret:myapp/*",
          aws_kms_key.app.arn
        ]
      }
    ]
  })
}

# ECS Task Role (Runtime Permissions)
resource "aws_iam_role" "ecs_task" {
  name = "ecs-task-role"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ecs-tasks.amazonaws.com"
        }
      }
    ]
  })
}

resource "aws_iam_role_policy" "ecs_task_s3" {
  name = "ecs-task-s3-policy"
  role = aws_iam_role.ecs_task.id
  
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "s3:GetObject",
          "s3:PutObject",
          "s3:DeleteObject"
        ]
        Resource = [
          "${aws_s3_bucket.app_uploads.arn}/*"
        ]
      }
    ]
  })
}

🛡️ Security Best Practices

  • Least Privilege - Minimale erforderliche Berechtigungen
  • Role-based Access - IAM Roles statt Access Keys
  • Secrets Management - AWS Secrets Manager/SSM Parameter Store
  • Encryption - Verschlüsselung at Rest und in Transit
  • VPC Security - Private Subnets, Security Groups, NACLs
  • CloudTrail - API-Logging für Compliance

💰 Kostenoptimierung

Cost Optimization Strategien

  • Reserved Instances - 1-3 Jahre Commitment für bis zu 75% Ersparnis
  • Spot Instances - Bis zu 90% günstiger für flexible Workloads
  • Auto Scaling - Automatische Anpassung an Bedarf
  • S3 Intelligent Tiering - Automatische Storage-Optimierung

Monitoring & Alerting

  • AWS Cost Explorer - Detaillierte Kostenanalyse
  • Budgets & Alerts - Proaktive Kostenkontrolle
  • Trusted Advisor - Cost Optimization Empfehlungen
  • Resource Tagging - Kostenzuordnung nach Projekt/Team

Cost Budget mit Terraform

resource "aws_budgets_budget" "monthly_cost" {
  name         = "monthly-cost-budget"
  budget_type  = "COST"
  limit_amount = "1000"
  limit_unit   = "USD"
  time_unit    = "MONTHLY"
  
  cost_filters = {
    Service = ["Amazon Elastic Compute Cloud - Compute"]
  }
  
  notification {
    comparison_operator        = "GREATER_THAN"
    threshold                 = 80
    threshold_type            = "PERCENTAGE"
    notification_type         = "ACTUAL"
    subscriber_email_addresses = [var.admin_email]
  }
  
  notification {
    comparison_operator        = "GREATER_THAN"
    threshold                 = 100
    threshold_type            = "PERCENTAGE"
    notification_type          = "FORECASTED"
    subscriber_email_addresses = [var.admin_email]
  }
}

🎯 Zusammenfassung

Eine gut durchdachte AWS-Infrastruktur ist die Grundlage für skalierbare, sichere und kosteneffiziente Cloud-Anwendungen. Mit den richtigen Strategien erreichen Sie:

Erreichte Ziele

  • ✅ Hochverfügbare Multi-AZ Architektur
  • ✅ Automatische Skalierung und Wiederherstellung
  • ✅ Enterprise-grade Security
  • ✅ Kostenoptimierte Ressourcennutzung

Nächste Schritte

  • 🔄 CI/CD Pipeline Integration
  • 📊 Advanced Monitoring & Alerting
  • 🏗️ Infrastructure as Code (Terraform)
  • 🔧 Performance Optimization

📚 Verwandte Artikel