Intra-Cloud App Disruption Risks

Automating application deployments into the ‘cloud’ is not always as simple as it should be. Depending on how you approach this problem, you may need to delegate access to components that may increase the risk of unauthorised changes. If you’re doing this in Amazon Web Services (AWS) you may have heard of CodeDeploy. CodeDeploy is one of the methods AWS has to push application code into their cloud environment. AWS has a number of mechanisms to control and limit what actions can be performed by administrators, and by compute-instances themselves. Unfortunately, not all of the AWS systems allow granular control, and may leave your applications exposed.

Auto Scaling is one of these AWS subsystems that can be used to assist with automating application deployments. Unfortunately, Auto Scaling’s access control mechanisms do not allow granular resource restriction. The risk introduced is, if you delegate Auto Scaling permissions to AWS resources, they can make changes to ALL of your Auto Scaling settings across your entire account.

The rest of this article will cover the following:

  • How CodeDeploy works;
  • What AWS Identity and Access Management (IAM) configurations are required;
  • How to deploy apps into AWS with CodeDeploy;
  • How to integrate load balancers into your deployment;
  • The risks this introduces; and,
  • How to manage these issues.

Take the following for example. Say you want to use CodeDeploy capability to automagically bundle up an application on your developer workstation, push it to EC2 (compute) instances, and have those instances automatically enrol into an AWS load balancer (also known as an Elastic Load Balancer, or ELB). Lets also assume that the management and scaling of those EC2 instances is wrapped up in an AutoScaling configuration. There’s a lot of moving parts in this model, so refer to the below for an overview.

AWS-Deploy-Screenshot

It’s handy that the documentation outlines patterns that can be used to achieve this as it’s perceived as a fairly common requirement in the CodeDeploy world. Another central configuration requirement for this to work is to configure AWS’ Identity and Access Management, or IAM, correctly. First, you need an IAM Role (lets call it APP-DEPLOY-ROLE) that has permissions to describe instances, and also execute certain Auto Scaling actions. This role is attached to your CodeDeploy group so that it can see which instances are available for deployment, and also place instances in and out of Auto Scaling states. From the doco you can see the permissions defined as:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "autoscaling:PutLifecycleHook",
        "autoscaling:DeleteLifecycleHook",
        "autoscaling:RecordLifecycleActionHeartbeat",
        "autoscaling:CompleteLifecycleAction",
        "autoscaling:DescribeAutoscalingGroups",
        "autoscaling:PutInstanceInStandby",
        "autoscaling:PutInstanceInService",
        "ec2:Describe*"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}

The second IAM role is an EC2 instance profile that gets assigned to instances when they’re created and spun up. For the moment, lets just call this role APP-EC2-ROLE. The default configuration in the documentation primarily defines permissions to allow the instances to access your S3 App Code Bucket (AppCodeBucket), plus the default buckets where the CodeDeploy agent lives. For example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:Get*",
        "s3:List*"
      ],
      "Resource": [
        "arn:aws:s3:::AppCodeBucket/*",
        "arn:aws:s3:::aws-codedeploy-us-west-2/*",
        "arn:aws:s3:::aws-codedeploy-us-east-1/*"        
      ]
    }
  ]
}

Now, for a moment ignore the Auto Scaling groups and load balancers, and assume you have a few instances already spun up with the name “APPSERVER” and the CodeDeploy agent installed; a simple CodeDeploy process looks like this (if you use the AWS CLI):

  1. Create a CodeDeploy application

    # aws deploy create-application --application-name MYAPP

  2. Create a CodeDeploy deployment group associated with the app from #1 and the name of the instances

    # aws deploy create-deployment-group --application-name MYAPP --deployment-group-name MYAPP-DEP-GRP --service-role-arn arn:aws:iam::123456789:role/APP-DEPLOY-ROLE --ec2-tag-filters Key=Name,Value=APPSERVER,Type=KEY_AND_VALUE

  3. Push the app’s code from your current working directory into S3, and allocate it to the app from #1

    # aws deploy push --application-name MYAPP --s3-location s3://AppCodeBucket/APP.zip ...plus the rest of the configuration

  4. Create a deployment against the deployment group from #2 using the code uploaded in #3 – this will actually tell the instances to pull the code from S3 and start executing various scripts to prepare the application, and to start the various services etc

    # aws deploy create-deployment --application-name MYAPP --deployment-group-name MYAPP-DEP-GRP --s3-location bucket=AppCodeBucket,key=APP.zip,bundleType=zip,eTag="tag" ...

As touched on in step 4, when a deployment is created, instances defined in the deployment-group will retrieve the code from S3 and then execute scripts from within the uploaded code. An example of this is documented, but the idea is, in your application’s code, you place a ‘scripts’ folder which includes various bash scripts, and you tie them together with an appspec.yml file in the root folder of your app. An example appspec.yml file looks like this:

version: 0.0
os: linux
files:
  - source: /
    destination: /var/www/my_app
hooks:
  BeforeInstall:
    - location: scripts/delete_content.sh
      timeout: 300
      runas: root
  AfterInstall:
    - location: scripts/bundle.sh
      timeout: 400
      runas: root
  ApplicationStart:
    - location: scripts/start_server.sh
      timeout: 300
      runas: root
  ApplicationStop:
    - location: scripts/stop_server.sh
      timeout: 300
      runas: root

After the instance has fetched the code, it will then execute the logic defined in ApplicationStop, then BeforeInstall, then AfterInstall and finally ApplicationStart. This allows you to do things like install necessary packages, change file permissions, copy configuration files, configure Apache or whatever.

To integrate the process into Auto Scaling and Elastic Load Balancing requires a few more steps, and a modification of the IAM permissions applied to the instances. These requirements are captured in the AWS documentation here. The key IAM change is to add elasticloadbalancing:* and autoscaling:* permissions to the APP-EC2-ROLE, so it now looks like this:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:Get*",
        "s3:List*"
      ],
      "Resource": [
        "arn:aws:s3:::AppCodeBucket/*",
        "arn:aws:s3:::aws-codedeploy-us-west-2/*",
        "arn:aws:s3:::aws-codedeploy-us-east-1/*"        
      ]
    },{
       “Effect”: “Allow”,
       “Action”: [
          “elasticloadbalancing:*”,
          “autoscaling:*”
        ],
       “Resource”: [
          “*”
        ]
     }
  ]
}

The second change is to include additional scripts to be executed through the CodeDeploy lifecycle. These scripts are available on GitHub here. If you look carefully, you need to modify your appspec.yml file to include their register and deregister scripts:

version: 0.0
os: linux
files:
  - source: /
    destination: /var/www/my_app
hooks:
  BeforeInstall:
    - location: scripts/delete_content.sh
      timeout: 300
      runas: root
  AfterInstall:
    - location: scripts/bundle.sh
      timeout: 400
      runas: root
  ApplicationStart:
    - location: scripts/start_server.sh
      timeout: 300
      runas: root
   - location: scripts/register_with_elb.sh
      timeout: 300
      runas: root
  ApplicationStop:
    - location: scripts/deregister_from_elb.sh
      timeout: 300
      runas: root
    - location: scripts/stop_server.sh
      timeout: 300
      runas: root

With these in place, the deployment process is slightly different too, as we now include the creation of the ELB and the Auto Scaling configuration and groups:

  1. Create your load balancer

    # aws elb create-load-balancer --load-balancer-name APP-LB ...plus the rest of the configuration

  2. Create an Auto Scaling launch configuration referring to the EC2 instance profile defined above

    # aws autoscaling create-launch-configuration --launch-configuration-name APP-AS-CFG --iam-instance-profile APP-EC2-ROLE ...plus the rest of the configuration

  3. Create an Auto Scaling Group that refers to the launch configuration from #2 and the load balancer from #1

    # aws autoscaling create-auto-scaling-group --auto-scaling-group-name APP-AS-GRP --launch-configuration-name APP-AS-CFG --load-balancer-names APP-LB --min-size 1 --max-size 1 --desired-capacity 1 ...plus the rest of the configuration

    • At this point, new instances will be spun up and added to the load balancer, but as there’s no running web services the load balancer’s health checks will fail.
  4. Create a CodeDeploy application

    # aws deploy create-application --application-name MYAPP

  5. Create a CodeDeploy deployment group associated with the app from #4, this is slightly different to before. Instead of defining a filter for instances, you associate it with your Auto Scaling group

    # aws deploy create-deployment-group --application-name MYAPP --auto-scaling-groups APP-AS-GRP --deployment-group-name MYAPP-DEP-GRP --service-role-arn arn:aws:iam::123456789:role/APP-DEPLOY-ROLE ...

  6. Push the app’s code from your current working directory into S3, and allocate it to the app from #4

    # aws deploy push --application-name MYAPP --s3-location s3://AppCodeBucket/APP.zip ...

  7. Create a deployment against the deployment group from #5 using the code uploaded in #6 – this will actually tell the instances to pull the code from S3 and start executing various scripts to prepare the application, and to start the various services etc

    # aws deploy create-deployment --application-name MYAPP --deployment-group-name MYAPP-DEP-GRP --s3-location bucket=AppCodeBucket,key=APP.zip,bundleType=zip,eTag="tag" ...

This time, after the application code is installed and is starting, it will start its web services and then run the register_with_elb.sh script. As this configuration is using Auto Scaling groups, it will use the appropriate Auto Scale options to shift the instances’ Lifecycle Status in and out of standby. An instance in standby will not be associated with an ELB, as soon as it exits standby it will then be associated with the ELB. This can be seen in the following extracts from register_with_elb.sh:

msg "Checking if instance $INSTANCE_ID is part of an AutoScaling group"
asg=$(autoscaling_group_name $INSTANCE_ID)
if [ $? == 0 -a -n "$asg" ]; then
    msg "Found AutoScaling group for instance $INSTANCE_ID: $asg"

    msg "Attempting to move instance out of Standby"
    autoscaling_exit_standby $INSTANCE_ID $asg # Will set $? to 0 if successful
    if [ $? != 0 ]; then
        error_exit "Failed to move instance out of standby"
    else
        msg "Instance is no longer in Standby"
        exit 0
    fi
fi

The common_functions.sh defines the ‘autoscaling_group_name’ and ‘autoscaling_exit_standby’ functions, this second function including the following logic:

# Usage: autoscaling_exit_standby  
#
#   Attempts to move instance  out of Standby and into InService. Returns 0 if
#   successful.
autoscaling_exit_standby() {
    local instance_id=$1
    local asg_name=$2

    msg "Checking if this instance has already been moved out of Standby state"
    local instance_state=$(get_instance_state_asg $instance_id)
   
    if [ "$instance_state" == "InService" ]; then
        msg "Instance is already InService; nothing to do."
        return 0
    fi

    if [ "$instance_state" == "Pending:Wait" ]; then
        msg "Instance is Pending:Wait; nothing to do."
        return 0
    fi

    msg "Moving instance $instance_id out of Standby"
    $AWS_CLI autoscaling exit-standby \
        --instance-ids $instance_id \
        --auto-scaling-group-name $asg_name
    if [ $? != 0 ]; then
        msg "Failed to put instance $instance_id back into InService for ASG $asg_name."
        return 1
    fi

    msg "Waiting for exit-standby to finish"
    wait_for_state "autoscaling" $instance_id "InService"
    if [ $? != 0 ]; then
        local wait_timeout=$(($WAITER_INTERVAL * $WAITER_ATTEMPTS))
        msg "Instance $instance_id did not make it to InService after $wait_timeout seconds"
        return 1
    fi

    return 0
}

The highlighted line above is where the instance runs the AWS CLI application itself, to execute the autoscaling exit-standby command, on itself.

Now, if you recall back up to the changes made to the EC2’s IAM instance profile, we’ve granted permissions to these instances to ALL the autoscaling commands, as specified in “autoscaling:*”. This allows this instance to make changes (including deletion) to not only its own Auto Scaling group, but any other Auto Scaling group that may exist within your account.

The implications are, if this server is compromised, or someone gains unauthorised access through any means, not only can they tamper with the server itself, they can make changes to the entire Auto Scaling group, such as deleting it, or reducing the number of instances to zero. This will effectively kill that application.

It doesn’t stop there though, as the Auto Scaling permission can’t be constrained through the use of Amazon Resource Names, a compromise of a single server given Auto Scaling permissions can make similar unauthorised changes to any other Auto Scaling group, and it can’t be constrained. This limitation is captured here, and states “When writing an IAM policy to control access to Auto Scaling actions, you must use “*” as the resource. There are no supported Amazon Resource Names (ARNs) for Auto Scaling resources.

Even if you wanted to constrain what Auto Scaling groups a particular instance could modify, you can’t. You can constrain the actions, but, for this particular deployment method you need to grant DescribeAutoScalingInstances, EnterStandby and ExitStandby actions. With those three actions, you can see all the instances associated with Auto Scaling groups, including their instance IDs, and the Auto Scaling group names. And with that information, you can then command all instances in all Auto Scaling groups to enter standby, pulling them out of their load balancers, and once again disabling the application entirely. Demonstrated in a simple diagram:

AWS-Impacting-Screenshot

In this particular instance, there’s not much more you can do except tighten your general IAM policies and access settings, and always make sure you validate the security of the running applications themselves, such as source-code review, penetration testing and so on. Monitoring can also help, and is critical to be able to provide any sort of incident response capability in these circumstances.

Ideally AWS should provide a way to limit Auto Scaling permissions with more granular ARNs, or other conditional checks. In a perfect world, EC2 instance profiles should only be able to make changes to their own instance, and not other instances, or other Auto Scaling groups. We had a few discussions with AWS Security and they provided the following feedback:

“Please note, the security concern that you have reported is specific to a customer application and / or how an AWS customer has chosen to use an AWS product or service. To be clear, the security concern you have reported cannot be resolved by AWS but must be addressed by the customer, who may not be aware of or be following our recommended security best practices.”

Just remember to layer your defences as much as you can, and be vigilant with auditing your IAM configuration. Where possible, use NAT instances and control back-end access to your EC2 instances as well.

Leave a Reply

Your email address will not be published. Required fields are marked *