Integrating S3 with Lambda

Slides:

S3 in 60 seconds

Simple Storage Service is the service that allows you to store files cheaply and durably on AWS. In AWS parlance, these files are called objects. You store these objects on thing we call buckets.

Unlike regular files on a server, S3 buckets are not filesystems so you can’t install operating systems on it, format it nor mount it as a partition on your servers. They are durable as Amazon offers 11 nines of durability (99.999999999%). They are cheap as you can get started storing files at that level of durability for $0.023 per GB.

People use it for hosting static websites, to put deployment artefacts for their deployment pipelines, to save logs, or even to share private content.

S3 and Lambda

If you wanted a way to react changes to these files, you would probably think about polling for changes on a regular schedule and run a function to take action. This could work but it wouldn’t scale. First, you may have to sift through a lot of files across many buckets. You may also be polling needlessly when no files are being changed. Ideally, you would want to execute your functions when some interesting event happens on S3 only on objects that you care about.

Thankfully, AWS provides a way to react to object events at a fairly granular level and this is one what I’m going to focus on in this video. You’re going to learn about how to react to all objects but also some following certain patterns. You’re going to learn how to avoid certain mistakes I did while making both services work together. Finally, I’ll cover a way to automate all of this (typical of LambdaTV style). Sounds good?

Main points

We’ll see about the following:

  1. Setting notification up

  2. Demo 1: Logging all supported events into a table

  3. Demo 2: Tagging images with detected celebrities and objects

  4. Automating this setup with CloudFormation

  5. Things to keep in mind when working with S3 and Lambda

  6. Use cases

Overview of events

Events we can listen to

Currently, we can listen to 3 types of events:

  1. Objects created. They can be created through several types of API calls (i.e PUT, POST, etc)

  2. Objects removed.

  3. Object lost in RRS (Reduced redundancy storage). As a reminder, Amazon offers this RRS that’s cheaper and offers a lower level of durability of 4 nines. In case these objects are lost, AWS can notify you.

Choosing files to monitor

You can choose to call a function based on any file for any of the above events. Besides, you can also choose to select just a few by specifying object name patterns. We call these Object Key Name Filtering. For example, you choose png files by specifying that you want only object names that have a suffix of .png. Similary, you can also choose log files stored under by specifying that want only object names that have a prefix of "logs/".

Enabling notifications

You must enable a bucket to send notifications to some desired place. You can have S3 send them to SNS, SQS and Lambda. In this video, I’m going to focus on Lambda (since this channel is called LambdaTV).

Link to doc:

https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html

Demo 1: Logging all events into a table

Suppose I want an easy way to glance at what important events on my S3 bucket.

Let me start by showing a simple function that listens to all (supported) events happening in a certain bucket.

Setup steps

Create the dynamodb table

Create the function and also its IAM role

Use the following IAM policy (note that I’m using a fairly restrictive IAM that includes an explicit account number but you could leave it out):

https://github.com/jeshan/lambdatv/blob/master/s3/example-event-log-iam-policy.json

Use the following code:

Create the S3 bucket

Setup the trigger

Set up the event notification on the function page:

Repeat the above for other events you want to listen to, e.g object removed:

Finally, save it:

Confirm save successful:

Uploading files

Back to the S3 bucket. Let’s upload a couple of files:

Let’s go to the dynamodb table we created. By the time you reach there, the function should already have been triggered:

Check the full contents of the record:

Therefore, we confirm that our triggers are being successfully executed.

Let’s try overwriting the file. Do you know how S3 treats overwrites? Will it get picked up by our function?

Sure it does. You will see a PUT on the same object name:

How about renaming a file? How will S3 handle a rename?

S3 will make a copy of the file with a new name and then delete the old one:

By the way, note that S3 supports object versioning where we could keep both versions but it’s outside the scope of this video.

Moving to an example that’s a bit less boring. How about tagging an image with celebrities and objects?

Demo 2: Tagging images with detected celebrities and objects

In this demo, I am going to label images to be uploaded. To do this, I’m going to use the Rekognition API. It helps us identify objects and celebrities in images amongst several other features.

This time I don’t need the function act on all events; just the objects that end with png, jpg or jpeg. Finally, the function will tag the objects with the outputs returned by the Rekognition API.

Setup steps

The function

Create the function and its IAM role

Use the following policy:

https://github.com/jeshan/lambdatv/blob/master/s3/example-image-tag-iam-policy.json

Use this function code:

Create the bucket

Next, create the bucket. Note that if you’re using my example policy above, then make the name match what’s in there:

Setup the trigger

Next, back to Lambda console, add the S3 events. Note that this time, I’ll specify a few suffixes so that the functions are only triggered for files with certain image extensions.

Also, take the opportunity to increase the resources allocated to this function. The 2 Rekognition API calls that we need take more than the default 3 seconds to run:

Make sure to save everything afterwards:

Uploading files

Next, I’m going to upload this image from Unsplash:

Then this image from HuffingtonPost showing a few leaders:

Source: https://www.huffingtonpost.com/2013/06/20/mulberry-bags-g8-leadersn3473341.html

After a few seconds, check the properties tab for each image. You’ll see that tags have been added:

First the leaders one:

And also the mountain image from Unsplash:

So that concludes this example. So what I’ve achieved here is that every image that is going to be uploaded in this bucket is going to run through the Rekognition API and tagged by its output.

Honourable mention: Cloudwatch Events

All this time, I’ve been talking about object events. But how about bucket events, e.g creating or deleting buckets? What I’ve shown you so far works with objects. To listen to many more API calls, you can use Cloudwatch Events. I’m showing you quickly here for now but I’m keeping the Cloudwatch Events discussion for a future episode:

But keep in mind that it relies on CloudTrail that logs the API calls in your account so you need to have that enabled first.

Automating all of this

That was many manual steps to get the 2 examples working. Typical of LambdaTV style, I’m going to provide a lazy way of setting all of this up. How about a one-way step that fully sets up either demo? To achieve this, I’m providing a CloudFormation template for both demos. CloudFormation will set everything that we need in the right order.

Showing the one for the event log demo here:

Link on Github:

https://github.com/jeshan/lambdatv/blob/master/s3/

You’ll find the source code for the functions in the templates there as well.

To deploy this template from the s3 directory on Github:

Then upload some files. The cloudformation stack will append an account number to the bucket names to reduce likelihood of it colliding with another user’s buckets.

There’s also the bucket events example at:

https://github.com/jeshan/lambdatv/blob/master/s3/s3-bucket-events.yaml

Use cases

Some use cases I can think of using both services together. Use them as inspiration to come up with your own ideas.

  1. I showed you how to call Rekognition. You could also integrate with other AI services available:

For example, how about uploading a file have Amazon Polly read them in a natural voice?

Check out this example on using Polly to convert blogs into podcasts:

https://github.com/aws-samples/amazon-polly-sample

  1. You could use Lambda and S3 for your testing needs. Upload load test file and run jmeter on it. Or run your UI tests massively in parallel. Check out this blog post on how to a devops team ran their UI tests:

https://aws.amazon.com/blogs/devops/ui-testing-at-scale-with-aws-lambda/

  1. Send people some files. When your file is uploaded, your funcion could generate a download link and send it to your recipient via email using Amazon SES. To learn how to use work with emails, check the last episode of LambdaTV here.

Things to keep in mind

  1. S3 sends event directly to 1 target. Use the other integrations (SNS or SQS) to run multiple functions or even have one function that fans out to other functions.

  2. Events are not fired for failed operations. So if an upload to S3 failed, the trigger won’t occur.

  3. There are certain rules to follow while filtering for objects, e.g you can’t have overlapping filtering rules targeting different functions on a bucket. Check the documentation for the rules.

  4. Beware of recursion! If your target function creates a file in the same bucket, then this file will trigger the function again which will cause an infinite loop (and a huge AWS bill!)

I made a similar recursion mistake once and it cost me $200. This guy did almost as bad as me:

https://sourcebox.be/blog/2017/08/07/serverless-a-lesson-learned-the-hard-way/

To counteract this, decrease the number of concurrent invocations allowed for the function and also set billing alerts.

Show me the code

All code discussed are on Github as I mentioned:

https://github.com/jeshan/lambdatv/blob/master/s3/

Ending

If you like such automations on AWS using Lambda, then I invite you to subscribe. I’ll be posting many more of them soon. Thanks for stopping by!

Stay up to date on serverless on AWS (and nothing else)

* indicates required