Can We Improve How Civitas Learning Batch Processed PII in 2017?
In late 2017, Civitas Learning did an AWS “This is My Architecture” video. It was one of the first. The video walks through how they handled processing personally identifiable information at scale using batch processing.
Now, a few years later, I react to that video and see what’s stood the test of time, what could be done simpler given today’s technology, and generally critique the design against the AWS Well-Architected Framework.
The AWS Well-Architected Framework
The AWS Well-Architected Framework is designed to help you and your team make informed trade offs while building in the AWS Cloud. It’s built on five pillars;
- Operational Excellence
- Cost Optimization
- Performance Efficiency
There pillars cover the primary concerns of building and running any solution. And as much as we’d all love to have everything, that’s just not possible.
…enter the framework.
It’ll help you strike the right balance for your goals to make sure that your build is the best it can be now and moving forward.
I often get asked why I talk about building in the cloud and architectural choices so often…aren’t I a security person?
Yes, I do focus on security and architecture is a critical part of that.
There’s really two types of security design work. The first is when you’re handed something and need to make sure the risks of that technology matches the risk appetite of the users.
The second type is when you’re building the technology. This is where making choices informed by security early in the process can have profound effects. You’re no longer bolting security on but building it in by design.
That’s why I talk about architecture and building so much. It’s where we all can have the largest possible security impact!
This video—and the ones that will come after—looks at a specific set of design decisions and how they balance the concerns of the AWS Well-Architected Framework…where security is one of the five pillars.
Civitas Learning’s Design
Civitas Learning processes study and course data from dozens and dozens of universities in order to help provide learning analytics. This information is both substantial in quantity and it contains sensitive personally identifiable information (PII).
The twist on this challenge is that the data is processed overnight in a batches.
This opens up the options to the team. They don’t need real time processing, so a big data approach is a bit more practical.
They choose Amazon EMR and Amazon Redshift alongside Amazon S3. In order to keep the PII separate, they use individual Redshift clusters for each school. After that sensitive processing is complete, they upload the aggregate metrics and anonymized datapoints to EMR and a different Redshift cluster.
This isolated approach to sensitive processing is a strong pattern that balances performance and safety. Because it’s a batch processing system, the overall cost is kept low because the dedicated clusters are only active when required.
It’s a strong design. The pattern still holds up years later but the components can be modernized and tools like AWS Lake Formation would significantly reduce the operation overhead and overall costs. Watch the video 👆 for more of the details!
Btw, I’ve updated my course, “Mastering The AWS Well-Architected Framework” on A Cloud Guru. If you want a solid walk through of the ideas behind the framework and how to apply it to your work in the AWS Cloud, check it out!