Verifying S3 uploads


I just archived a bunch of old photo stuff from Aperture into S3/Glacier, and I wanted to make sure everything uploaded correctly before I deleted my local copy.

Here’s a script which will compare the checksums for all the files in the current directory with those in S3. It will report missing files and mismatched checksums. Pass it the name of the bucket and the prefix it should look in as parameters.

You will need:

  • s3md5 from antespi/s3md5
  • Valid AWS credentials somewhere awscli can find them

Script:

#!/bin/bash

missing_s3md5() {
        echo "Can't find/exec s3md5"
        echo "Get it from https://github.com/antespi/s3md5 and put in PATH, or export S3MD5 to point at it."
        exit 1
}

S3MD5=${S3MD5:-$( which s3md5 )}

[ -x "$S3MD5" ] || missing_s3md5

if [[ $# -ne 2 ]]; then
        echo "Compare local files by checksum in cwd to a folder in S3"
        echo "usage: $0 BUCKET_NAME PREFIX"
        exit 1
fi

BUCKET=$1
PREFIX=$2

for i in *; do
        echo -n "Checking $i..."
        REMOTE_SUM=$(aws s3api head-object --bucket "$BUCKET" --key "$PREFIX/$i" 2>/dev/null| jq -r .ETag | tr -d \")
        if [ -z "$REMOTE_SUM" ]; then
                echo "  does not exist in S3"
                continue
        fi
        LOCAL_SUM=$($S3MD5 10 "$i")
        if [[ $LOCAL_SUM = "$REMOTE_SUM" ]]; then
                echo ok
        else
                echo mismatch
        fi
done

e.g. ./compare-bucket.sh my-archive-bucket aperture_photos