Can We Improve How Station X Processed Genomics Data on AWS in 2017?
In late 2017, Station X did an AWS “This is My Architecture” video. The video talks about how they built out a genomics processing pipeline on AWS.
Now, a few years later, I react to that video and see what’s stood the test of time, what could be done simpler given today’s technology, and generally critique the design against the AWS Well-Architected Framework.
The AWS Well-Architected Framework
The AWS Well-Architected Framework is designed to help you and your team make informed trade offs while building in the AWS Cloud. It’s built on five pillars;
- Operational Excellence
- Cost Optimization
- Performance Efficiency
There pillars cover the primary concerns of building and running any solution. And as much as we’d all love to have everything, that’s just not possible.
…enter the framework.
It’ll help you strike the right balance for your goals to make sure that your build is the best it can be now and moving forward.
I often get asked why I talk about building in the cloud and architectural choices so often…aren’t I a security person?
Yes, I do focus on security and architecture is a critical part of that.
There’s really two types of security design work. The first is when you’re handed something and need to make sure the risks of that technology matches the risk appetite of the users.
The second type is when you’re building the technology. This is where making choices informed by security early in the process can have profound effects. You’re no longer bolting security on but building it in by design.
That’s why I talk about architecture and building so much. It’s where we all can have the largest possible security impact!
This video—and the ones that will come after—looks at a specific set of design decisions and how they balance the concerns of the AWS Well-Architected Framework…where security is one of the five pillars.
Station X’s Design
Station X (now defunct) built out a very simple (in concept) data pipeline to process genomics data. The customer’s sequencing equipment uploads the data directly to S3. A fleet of EC2 instances then cleans and enriches that data and re-formats it to optimize for analysis.
That analysis happens via a managed service for Hadoop. Station X then built a custom analysis front end running in another fleet of EC2 instances.
Learn more in the reaction video 👆.
Btw, I’ve updated my course, “Mastering The AWS Well-Architected Framework” on A Cloud Guru. If you want a solid walk through of the ideas behind the framework and how to apply it to your work in the AWS Cloud, check it out!