Background

One day I found that we’ve lost days of monitoring data of some instances, so I logged into one of these instances and run the CloudWatch monitoring script manually to send some data to AWS. However, I got this message from AWS:

The security token included in the request is expired

Security credentials are required in order to use services and resources of AWS, and it is very convenient to manage these credentials via IAM roles, which is the way we’ve always done it.

Security credentials are temporary and have limited valid time. The IAM roles will refresh these credentials before they expire (at least 5 minutes before they expire).

Locate the Issue

I did the check on the security credential:

curl http://169.254.169.254/latest/meta-data/iam/security-credentials/<role_name>

As it turned out, the response showed that Expiration was already days ago, as well as LastUpdated .

Something’s wrong with the IAM role.

Next up was trying to detach the role.

image2020-9-9_10-41-29.png

image2020-9-9_10-40-32.png

AWS showed that detachment was successful.

Then I tried to attach a new role to my instance, and I got:

The association <AssociationId> is not the active association.

Let’s check the IAM instance profile associations using:

aws ec2 describe-iam-instance-profile-associations

The response revealed some information about the corresponding association: "State": "disassociating" .

It turned out that the association was stuck in the state of disassociating.

What about associating a new role regardlessly using:

aws ec2 associate-iam-instance-profile 

The response was ok.

Checked the IAM instance profile associations again:

aws ec2 describe-iam-instance-profile-associations

Response showed that state of the new association was associating and it stayed that way for hours. It’s also stuck.

Also tried to replace the stuck association using:

aws ec2 replace-iam-instance-profile-association

I got:

The association <AssociationId> is not the active association.

WTF?

It seemed that another option was replacing the instance with a functional one.

So I tried to create an image from the instance I was about to replace.

image2020-9-9_11-30-59.png

Unfortunately, in the list AMIs, the status of my newly created image was stuck in pending . And it stayed that way after my a few retries.

It’s reasonable to assume that it’s the IAM bug that caused these problems.

Solution

It’s impossible to create an image from the instance itself, let’s try creating a snapshot of EBS.

  1. Select the corresponding volume in EBS Volumes and select create snapshot.

    image2020-9-9_11-54-34.png

    Check the relation between volume and instance id

    image2020-9-9_11-56-8.png

    image2020-9-9_11-44-57.png

  2. Go to Snapshots, select the snapshot just created and create an image from it.

    image2020-9-9_11-57-28.png

    Check the corresponding volume id

    image2020-9-9_11-58-57.png

    image2020-9-9_11-47-13.png

    After finishing creating the new image, launch a new instance from it.

Reference

AWS IAM Roles