Python Extension Modules in AWS Lambda

At AWS re:Invent 2015, AWS Lambda was featured during them main stage keynote. Dr. Vogels highlighted new functions such as;

All of these features are great and significantly expand what AWS Lambda is capable of. Of course the lingering question is why not Python 2.7 and 3.5 support? But I'll leave that one for the product team.

Modules

One of the questions that immediately bubbled up in the community was around 3rd party modules. What's the best way to deploy & manage them?

The documentation has the basics down for most modules;

  1. create a virtualenv locally
  2. install the module inside the virutalenv
  3. copy the module over to your project
  4. import the module locally
  5. bundle the local module with your Python code
# local file structure
my_project
   my_project.py
   3rd_party_module_A
      3rd_party_module_A.py
   3rd_party_module_B
      3rd_party_module_B.py
   ...

But what about extension modules that need to be compiled for the local system?

Extension Modules

As a quick reminder, extension modules are a module written in C or C++ that can either extend python or call C or C++ libraries.

Extensions modules are very handy where there are very specific high performance requirements or when you want to leverage an existing C/C++ library. There are a ton of high quality, high performance libraries already out there, why not take advantage of them in Python?

Building Extension Modules

The challenge for extension modules is that they are compiled specifically on the platform. You need a build chain available in addition to pip/Setuptools to get them configured in your Python environment.

The AWS Lambda team has published the execution environment details in the documentation. So we know that our functions are executed on an Amazon Linux instance with Boto 3 available in addition to the standard library for Python 2.7.

In order to work in AWS Lambda, extension modules need to be compiled for Amazon Linux x86_64

So if you're building your project on an EC2 instance using Amazon Linux, you're all set. Your code will work on AWS Lambda (probably ;-) ). But if you're developing locally on your Macbook or Windows that's another story (and extension modules on Windows truly are another story all together).

While you can develop locally with confidence that you're Python code is going to continue to work, when it comes time to deploy it to AWS Lambda, we're going to have to do some additional work.

Build Pipeline

I like to think of AWS Lambda as a build target. I develop locally and then when it comes time to deploy, I need to ensure that I've built my project for the target environment.

When your project contains an extension module, you're going to have to ensure that you have a version compiled for Amazon Linux x86_64 before you deploy it to the service.

Typically this is going to involve;

  1. Spin up a new Amazon Linux EC2 instance
  2. Run the following setup script with admin privileges (this works nicely as a user-data script);
  1. Install any modules we need using the syntax;
pip install ______  -t ~/______
  1. Export the modules (from the folder specified after '-t') so you can bundle them with your code
  2. Shutdown the instance

This process works for the majority of extension modules where any required libraries are installed on Amazon Linux by default. If the libraries required for the extension module aren't available by default, you're going to have to package them as well.

Once you copy the exported module and any supporting libraries into your code bundle, you can zip it, and use it in AWS Lambda.

AWS Lambda Limits

This is a great time to bring up AWS Lambda limits. For example, if you were to try and deploy the SciPy stack to AWS Lambda, the total size compiled for Amazon Linux x86_64 is roughly 221 MB.

Here are the limits ruthlessly copied directly from the documentation;

ItemDefault Limit
Lambda function deployment package size (.zip/.jar file)50 MB
Size of code/dependencies that you can zip into a deployment package (uncompressed zip/jar size)250 MB
Total size of all the deployment packages that can be uploaded per region75 GB

If you deployment packages are regularly reaching this size, it's time to review your approach to function decomposition.

On my Lambda wish list is for shared libraries among functions. That would make it a lot simpler to manage a codebase and also making it easier to stay under the current (soon to be higher?) limits.

Designing For AWS Lambda

While AWS Lambda is capable of hosting an entire process or workflow in a single function, that's not a good design choice.

Circling back to the SciPy stack, odds are we'll want our functions leveraging pandas (data structuring & analysis) separate from those functions leveraging matplotlib (visualization).

The key to success in Lambda is ensuring that your application is broken down into small, efficient functions. Remember you can always tie them together using the Amazon API Gateway.

AWS Lambda is a new method for delivering asynchronous and event driving computation. Look for a lot more soon on the best approaches for various problem sets.

In the meantime, if you have any questions or comments, I'm @marknca on Twitter and GitHub :-)