Watch on-demand
This talk (API302) is now available to watch on-demand.
Abstract
More and more companies are looking at event-driven architecture and Amazon EventBridge to help with application modernization.
Amazon EventBridge is helping them build loosely coupled applications that scale independently and make it easier to integrate cloud-based applications and services. But where do you start?
This session covers event-driven design concepts, outlines how to deal with Amazon EventBridge event bus topologies and multi-account patterns, and presents recommended practices for building scalable event-driven applications using Amazon EventBridge.
Speaker
Stephen Liedig is a Sr. Serverless Specialist Solutions Architect at AWS.
My Breakdown
Event routing is a critical aspect of any #serverless design. Amazon EventBridge plays a critical role in scaling your serverless apps while maintaining a strong design.
I鈥檓 hoping this talk looks at some design patterns and recommend solutions for getting the most out of EventBridge. I鈥檝e provided some more detailed thoughts on session API302 below.
Slides
The agenda looks promising. There seems to be a strong focus on how to get the most out Amazon EventBridge.
For a 300 level talk, this is the best approach. Builders can check out a 200 level talk to get the...um...101 level view 馃槈
The required quick recap; EventBridge is a serverless bus service for events generated by AWS services, your builds, and APN partner SaaS'
The logic flow the service is very straight forward;
- Event sources generate events that are sent to...
- An event bus withn EventBridge...
- Where it's processed by a set of rules and sent to a...
- Target that wants to process the event
For example; a message is published to an SNS topic which generates a "New Message" event (#1). That is received by the AWS Services event bus (#2) and a rule you've configured (#3) send that event to an AWS Lambda function (#4) you've written to process SNS messages
Addressing the "WTF are all these events?" problem, EventBridge has the ability to auto-discover event schemas. You can define your own if you want to but the auto-discover makes it a lot easier.
Furthermore, EventBridge can provide bindings for these schemas in your code. Instead of parsing through a bunch of JSON, you can work with native objects in your preferred coding language. Very handy
The biggest benefit of Amazon EventBridge is the ability to break your build down time behaviours. This falls under the "loose coupling" moniker but really, when you step back, it's the ability to design around what's happening in your build...not the decomposition you've chosen.
That deserves a bit of expansion.
Typically, when you're design an application, you're going to map out the customer requirements and start to associate those to objects and services within your mental model of the solution to the problem. This is a core tenant of object-orient programming and it's worked reasonably well since the late 1950's.
You may have seen talk of functional programming over the past couple of years...this isn't that either 馃槈
What this service enables is a way to reduce the abstractions between what your business wants to happen and how you make that happen in code.
As a short example, if you have an order submitted that needs to go to shipping. Your order service generated as "Order Submitted" event that EventBridge handles and routes to whatever service has subcribed to that event. In this case, probably shipping, finance, and auditing.
It's more than just enabling microservices design and definitely more than my poor explanation is illustrating. I'm going to have to step back and dig deeper into how to explain what I'm thinking of here...but it's exciting...at least in my own head 馃 馃ぃ
Back to Stephen's presentation and perspective, I like this prompting. This slide recognizes the questions that teams ask when they start to use EventBridge. The majority of the rest of the talk is dedicated to answering these questions
Event bus patterns
Stephen sets up a simple example. Three teams each building their own service. They each generate events and consume them as required. This is the example that's going to be used to demonstrate different design patterns.
Stephen stresses repeatedly that teams should focus on the logical architecture of what they're trying to build. Don't let physical or organizational constraints influence this early stage thinking
The big question first.
The speaker aptly demonstrates and pros and cons of taking a centralized or distributed approach. More importantly, Stephen repeatedly calls out that there's no "this is way" approach here. There are trade offs. Know them. Makes the ones that make sense for your team
Single-bus, single-account pattern
The most straight forward. Using multiple event buses (AWS events, custom apps, partner SaaS') in one accounts Amazon EventBridge, this is the simplest design.
Everything is in one place, very clean.
Stephen emphasizes that this pattern avoids IAM complexities and cross-account role troubles. This is true and highlights the challenges IAM is currently facing around usability 馃う
To get that simplicity, you're giving up some flexibility and potentially running into shared ownership challenges between teams
Single-bus, multi-account pattern
This one is a bit of a fallacy. Some triggers cannot yet be activated cross-account. This means your "single" event bus is actually two or more event buses. One in the originating account, the other in the receiving.
This actually has some benefits to offset the added complexity. With event buses forwarding to each other, the receiving bus runs the actual filter rules, or at least the "last mile." This can avoid some challenge shared team ownership issues.
EventBridge Resource Policies have greatly improved over the past year. The new level of granularity makes it a lot easier to maintain a clear, simplified resource policy structure.
Here's an example of a PutRule that only allows rules to be created that accept rules form the Purple team's E2 event.
This resource policy shows a PutEvents declaration again using a condition. In this case, the condition restricts the source of the events allowed.
The final example shows how to uses tags as a condition. This is a very flexible approach to managing and scaling access control.
You can learn more about this approach in, "AWS identity: Next-generation permission management"
Multi-bus, single account pattern
This approach uses multiple custom buses to help address ownership challenges within the team. Within the same account, there's not a lot of technical reasons to go with this design outside of reducing blast radius with regionality.
Multi-bus, multi-account pattern
This pattern leverages both the "AWS account as an access boundary" and "distributed ownership" attributes.
Each account has it's own EventBridge and that is what subscribes to various events. This allows each account full control over their own event routing and triggers.
It's important to remember鈥攅specially in the multi-bus, multi-account pattern, that Amazon EventBridge does not support transitive routing.
That's where an event is forward to another instance of EventBridge which then forwards that event to yet another instance of EventBridge.
There's a few reasons for this but the main one is to reduce the likelihood of creating routing race conditional loops (A > B > A > B > A ...).
If you do want this and try to work around it with a couple clever AWS Lambda functions, remember that the event envelope metadata isn't going to be consistent. After the second EventBridge forwards the event, the source information will be lost unless you take steps to preserve it in the event itself
Recommended practices
Use dead letter queues (DLQ) to capture events that can't be properly processed at the time they are generated.
This has two primary benefits. The first is that you aren't going to lose events that didn't process. The second is that you'll actually be able to troubleshoot event routing failures!
You can configure DLQ's per EventBridge target and via the PutTargets API. How many queues you use will depend on a number of factors in your build. There's no easy rule of thumb here.
Start with one and expand as needed for volume or to make troubleshooting easier
EventBridge adds the ERROR_CODE, ERROR_MESSAGE, RULE_ARN, and TARGET_ARN to the dead letter queue to make troubleshooting easier.
馃憜 This is a really nice, well thought out feature
I 鉂わ笍 this recommendation. It's simple and straight forward, "Avoid using the default event bus for customer application events"
Stephen didn't have any slides detailing this recommendation and delivered it with the simple advice, "Just follow this advice. Don't do it!"
The final recommendation is to create "subscriptions" with one target per rule...even though you could have up to five.
The idea here is to simplify the overall approach with consistency while at the same time reducing the blast radius for changes. This also prevents accident coupling of separate event types
This slide lays out what the subscription models looks like. Yes, you're configuring more filters but this approach greatly simplifies your overall deployment
Excellent closing advice from [Stephen](https://twitter.com/sliedigaws); focus on logical architectures first.
That's going to help put you on the best path early on and avoid a lot of hard lessons learned