Road to re:Invent - Amazon DynamoDB Redemption

Amazon DynamoDB is an exciting database services. It’s a cloud-native, NoSQL offering that is lightning quick and scales seamlessly. However, if you treat it like a traditional RDBMS, you’re not going to see any of those benefits. In this live stream, I explore some mistakes I’ve made and how to fix them by moving to a DynamoDB-friendly approach.

Here are the slides that I used during the live stream.

References

From relational DB to single DynamoDB table: a step-by-step exploration by Trek10
NoSQL Workbench for Amazon DynamoDB – Available in Preview from the AWS blog
Best Practices for Managing Many-to-Many Relationships from the Amazon DynamoDB docs

Reasonably Accurate 🤖🧠 Transcript

They do. Good morning, everybody. How you doing today? Thanks for jumping back on this. Yeah, did the database episode the other day and had some challenges had some challenges with one very definitive service that we’re going to dive into today Amazon dynamodb. So before we get started some simple Logistics, we’re broadcasting live on LinkedIn.

We are broadcasting live on Periscope for Twitter taking questions as we go taken, says we go, so, please don’t hesitate to hit me up here on Laketon. Also on Twitter. My camera seems to be having a real heck of a time focusing on me today. Let me fix that before everybody goes crazy and let me give you something better to look at while I’m doing that.

So one of the things we talked about on last episode was the brake sort of the discussion on what Amazon database Services there are now there’s a several them broken up. We’re going to review that while I fix the camera. So the Amazon B2B Services get basically get broken up into a set of structures, right? So we have relational databases.

That’s Aurora. That’s rich if that’s all right. Yes. I’m with the database migration service to help you at work between them and we have the document DB which is a mongodb solution. And we also have the graph database with his dining room or a Neptune which I really like but I wish it was a very much wish that Dyna are now.

tune was A serverless offering there we go now so I can still Focus. Long as I don’t move too much of a croquet database the database we shall not talk about the quantum Ledger one still having that problem for you. Hold on. Let me fix it. So you don’t go insane Marco’s love the attitude try to fix it till there.

We can get there at time series data base with timestream time shrooms very very cool. But also very early days for timestream even though it’s been out for a while and we have our in-memory databases with elasticache so memcached redis and then we have our Are key values which is dynamodb now dynamodb is a beast.

It is an absolute Beast. Okay, it is a Workhorse the vast majority of people are going to be leveraging and at some point it is an absolute at Ground changing our groundbreaking database from AWS. They’ve written a few papers about it. There has been a lot very good. Let me get me in Focus now.

Do Nikes that is way too much me? Quick right on there is a way it’s just one of those days that the camera people one of those days with the camera. So. Be there in a few papers on they’ve had some really fantastic success. Obviously. It is a phenomenal service.

What is going on with this camera and we are going to switch cameras while service lots of great scale and I messed it up with be 100% honest. I messed it up last time when I tried to show it to show it to you like just colossal at Bop.

So let’s fix that right now. Let us share. Let’s go split screen for a second. I want to go back to this. I’m going to add in my camera. We can there we go. We talked about relational databases quick recap of Wednesday relational databases. You have a table. So in this case, I’ve got like a name table with names in it.

I’ve got names to buses and bus routes. Wasn’t me trying to model where buses go and who rides them? So with that, let’s see that’s a traditional sort of relational database and very simple very straightforward design. And if you guys are in the comments on LinkedIn, please check it out with Andrew just posted absolutely awesome Pokemon interpretation of Amazon Neptune.

So here’s a relational database start your pretty straightforward and too. We have this monstrously horrible standard in SQL query where we talked about selecting everything from names that we call that and we join it with another table names the buses. We call that and to be we match up the IDS We join with the buses table again, we call that being we make sure that the bus ID lines up and that will actually give us a nano micro the bus here to there an app to let you graph it out which gives you this much more simple sort of logical structure.

I think this definitely is a is how most people manage or would visualize the data in? Head and then you can query it out with a very straightforward. Now. This is a craft well, but you can create with a different structure as well and This is a great way to do it but dynamodb is far more comfortable for a lot of people despite what I showed you the end of Saturday and I made a classic classic mistake what I tried to do the other day on stream and you can watch the disaster of it near the end of the last episode.

Was that a try to replicate this database structure the relational database structure in dynamodb bad hundred percent bad. I fell into the number one common Pitfall of people trying to use new data new services in the end of his Cloud was they took what they knew what they do when they turn to shove it in as is that would have worked.

Absolutely you could set this database structure up in dynamodb. You are going to be paying way more money than you need to be but this would a hundred percent work, but that’s not what the point is. You would have a bunch of challenges that scale would start to slow down.

It would really grind to a halt because it’s clunky old school thinking now if you shove this database structure into Aurora At work like a charm that’ll scale to billions of rows easily because this is a traditional database design, but that’s not what Dynamo is Dynamo is a key-value store with a lot of cool stuff behind it.

So what we’re going to do is show you how this same setup. Same data would look in dynamodb and this would all be in one big table. So I would have multiple object types sitting in the same table. Now that is amazingly uncomfortable and abnormal for a lot of people right is that you normally you don’t database logic database principles 101 is that you put one data type per table, but we’re not dealing with the traditional database you actually want to do is you want to set up multiple entities in one table because there’s a whole bunch of advantages.

They’re not to mention the fact that Dynamo won’t even miss a beat when it’s doing this. Yeah Andrew in the LinkedIn comments you guys if you’re not following Andrew Brown see you over example, you absolutely should have a fantastic guy knows his stuff inside and out as well. Have a great Community member.

I definitely follow him and follow his eyes videos in his dreams when he’s doing them and check out all the all the work that he’s done for the community, but great great example value and document ish store very True document ish. If you have a ton of information in each of these entities that you’re seeing you’re so we’ve a customer a bus and ride if we had a ton of information.

We may want to consider actually using documentdb dynamodb, but we don’t have a huge amount. Now. The reason why I say huge amount is because dynamodb is as Andrews head very appropriately a flat and dumb database. It is flat spot on that is I’m totally stealing that Andrew and I appreciate it.

It is a flat in Dumb database. The beauty of it though is it is fast as all get-out it’s Gales ridiculously big but it is limited to 400 kilobytes per entry or parenta tea. So if you put something into it into a table you can have as many attributes or there are limits but you have a ton of attributes Associated do a bunch of data news attributes as long as it all totals up to 400 kilobytes or less if you need to over 400 KGB go to documentdb.

So in this case, what we’ve got is a customer very simply right now where does Have a first name last name that’s replacing our names table and we’re going to create an object ID the key for this is going to be customer than first. Why are first - last Dash and then a number? So most of our customers are going to be one cuz it’ll be unique names believe it or not.

There is another Mark Natick open on the world distant cousin lives in the states. So if he was also a bus rider in this system, he’d BO2. All right, then we have a bus entity in this bus entity has an object ID of the bus and then the route number and with a start location to finish location than the stops right and you can see a schedule at some point.

If you wanted. Let me have a ride and to see in this ride entity is really designed to connect these two but also to provide additional information so first bus stop last bus stop the bus that you’re taking the customer that took it time things like that. And now the object idea started with was ride with a time stamp and then a bus number right so you’d say ride.

2019 11:01, you know 80501 to say someone took the bus at 8:05 in the morning and then the bus and then an index number for that. There’s a better way to do that breaking of the timestamp is a partition key will help you out or sort. He will help you filter those faster, but for this point it’s going to be enough.

So we’re going to switch over chrome and I am going to show you what this looks like. So again, I’m already dropped a link in LinkedIn on the comments so you can Fire off there to see the previous streams at all. So I’m giving a couple tax at Main Event m&a link up to the guy but what I wanted to show you was in the Dynamo console I’ve recreated some of what we’re talking about here.

So let me scale. Is that a little bit so you guys can see it better. They do need to make better stream tools to resize windows. So I’ve created exactly what we talked about. I have an object. That’s a customer Mark and you see here. I have a first name and last name and then the object.

I also have our buses I’ve entered into buses. So this is bus number 2 is bus route to starts at the airport and the convention center and I could add additional stop. So it stops at the apartment building in between these two areas. So on and so forth. Now, here’s a ride.

Now you notice the ride object ID is a ride to tell me what type of What is the timestamp which I said we could break out which might be better with a bus and then the bus number for the bus ID is this and then a ride number so we can actually make this a little more efficient at scale by eliminating the time stamp and make that into a partition key.

But we also want to make sure that this object ideas unique. It has to be unique if we are going to index it. So we want to make sure that that’s that’s a consideration moving forward. Now, you can’t change anything in an index once it there yet to actually duplicate the object and then change it and then delete the original object little Quirk from Dynamo.

But I’m going to switch to another eight of yours to on. It’s another a Taurus tool that you might not actually be aware of and it’s called the know SQL workbench for Amazon dynamodb preview. Let’s all admit. It’s just take a pause and admit the AWS has a serious naming problem.

Cory Quinn good friend of mine is all over this all the time. They just really can’t name squat know. This is really going to be hard to see and I want to see if we can actually look. Does this look good? Is it going to be too tricky to see I think all right.

You can almost see it. Yes, Andrew was correct. No SQL workbench. So you can see it already has our bus stop table or bus system table. So if we open that up, we can see the metadata that just gives us the table iron which is super important if you’re using it.

Through the CLI or programmatically but what I wanted to actually do is query something here. So bear with me cuz it is I can’t make this any bigger because of the way actually I can awesome. They are totally just making it’s a web thing underneath. There we go that we can see it.

Okay. So what I’m doing is I’m actually going to make a query so if I Scan this out. If I go build operations, I’m going to query what I was the downside of desktop app. That’s really just a web browser. I’m so my partition key is my object ID.

What I’m looking for is a first-name string that contains Mark and if I execute that you’ll see it doesn’t actually say anything, but if I come here and say first name Equals string and then the value is Mark. It’s actually going to pull that customer record up. I can do the same thing with none of the oven now the interesting thing here in the background, of course, it’s not that last name.

Come on, there we go. So you can query by these attribute values only had one customer. in our database Which you know, it’s fine. That’s fine until you have hundreds of thousands of customers. Anyone have sex successful your bus system was but you notice when you query a lot of these objects actually don’t have a first name last name attributes.

So what Dynamo does automatically is it just ignores those objects that don’t have the attribute that you crying. So if you’re looking for first name, you are not going to get a whole bunch of no records out of bus or you’re not getting any ride information. You are just going to get the customer objects that you could obviously filter your key by The Entity name, which is why it’s a good practice to have your entity name in your object ID to make it simple, but we could also sit there and go with if we go I’ll get the ID.

String bus. So if we start with begins with bus every go so we filter the object IDs beginning with bus we get to see all of our buses. We do the same thing with a ride we get to see our ride and you can see here. This is a one table solution instead of multiple tables and that’s the key for dynamodb is having one monster table and letting that take care of everything and that’s a really weird way of thinking of things like trust me that is weird.

And that’s a common mistake that people make I made that mistake on the stream on Wednesday, right? I fumbled around trying to get one solution like a traditional database solution shoved into a Dynamo. And what I want to call out is a post from the good Folks at trek10.

This is Forest a Brazil good good friend of mine EcoBoost serverlessconf and he put this up or at least his team did and they quoted him and this tweet really sums it up and I will post this actually right now in a LinkedIn chat cuz you guys should check this out blog post definitely should check this out.

So this is this is from last year from this is from Dad for one which is Advanced design patterns for dynamodb. I will read this to you just in case so far says so for the first 45 minutes is reinvent session is not in my head like yeah, that’s how I think about dynamodb then ripped morphed into some kind of know SQL wizard from outer space in my mind exploded absolute must watch and that is 100% Correct because this talk really walks you through and Rick is one of the key people behind Dynamo and the other nosql services from a dove.

Yes, and this talk really walks you through. Okay. Hey, just how you think Data, just like I did on Wednesday. Here’s why that won’t work. Well in Dynamo, even though you can kind of punch it into place. Here’s how you can start to morph and adjust these things to really take off and you’re right Marcos.

This is absolute. This post is is pure gold and there’s a couple other so the pattern that we used today in our It’s not the post I want where is it here in many many relationships to the post that we or the one of the patterns we used to get the many-to-many relationships, which is many riders at War writer can have many rides and or bus can have many writers and writers can be on many buses and we use this new the adjacency list.

supposing this are we going is another key doc. So this is a really interesting way of breaking down your primary keys. And if I can recommend anything when it comes to dynamodb and I know don’t have too many legs to stand on because of Wednesday’s Buckle, but hopefully we redeem ourselves a little bit showing how we can build it out as one table now is that you really need to think about your primary key sorts of the example that use here and unfortunately experience docs are not really written with human interaction in mind very much engineer the engineer, which is great.

If you’re in that mindset, which is why I pulled up the truck 10 post and I think the truck 10 post is far better in its And it’s language and I’m just posting another link here. There you go. There’s to heat up your ass local query tool so here and they’re talking about this scheme of how they set up Dynamo with the idea of invoices and bills and many invoices and an invoice can have many bills.

So they’ve got a partition key in this case has an invoice and then an ID and the sort key and the global secondary in index the GSI the global secondary index They’ve got in this case in the invoice ID, but then they got Bill IDs underneath it as well.

So they’re using a partition key and a sort key together to figure out what this specific entry is. Now that helps them create these different relationships because the bills can stand on their own the bills can reference out to the invoices. The invoices can reference the bills. But the important thing is you do this all in one table and that is a structure that happens again and again, Is that almost every single database design when it comes to dynamodb should be in one table just one if you’re doing multiple tables, you’re doing it wrong.

That’s why I’m doing this stream today is to make up for me doing it wrong on Wednesday. And again wrong in the standard way of adopting a w a Services when you first start it’s going to work you’re going to forklift traditional views and values over it may work for a little while.

But at some point it will die a horrible fiery death. There will be just non-stop dumpster fires that you have to clean up. So Andrew was also brought up another good, right here in the LinkedIn feed. I’m is that a huge pain in dynamos? Choosing a string format for your date * 100% man hundred percent.

So there’s two ways to tackle this and in all this multiple ways that a clip of the best way to do is to step back for a second and actually the team attract end call this out in their post and because it’s also based on AWS advice is step one Define me access patterns you think you will need.

This advice will save you so so much time and pain is that if you think about how you want to get data out, we talked about that on Wednesday across all the database Services. If you look at how you’re going to pull data out of Dynamo, that’s going to help you figure out how you’re going to set up your keys.

But also how you’re going to set up your time stamps. So common break ways. So timestamps should always be and I forgot. You know, what I don’t even apologize. I was going to apologize to my American friends, but no flat out wrong Americans do this wrong. Dates need to be sequential.

So whether you do year month day or day month year orders of magnitude man, not month day. Year, that makes no sense. So if you’re creating a date key the real question, I have so keep it sequential. I normally go year month day because that’s a logical way of drilling down on stuff.

So if I was looking at back to our boss example here when we look at the ride queue got year month than day than our than minute. This makes it easy to do those begin with queries. So if I want to begin at all rides that were in November, I begin with 2019 - 11, right if I want to know everything that happened today 2019 11:01.

So let you clearly down that way the biggest question a lot of people have and it depends on you the scale of your data and the way you’re tackling it is whether that time stamp includes more than just more than just the top of the date or if you need to add the time in there as well and that really depends on your data in your access patterns if you don’t I need the time stamp and you can remove it but also general just sort of, you know, too many years doing this experience thing.

It’s always easier to use granular data in a less granular way, then it is to try to add granularity later. So if you know the time something happened at it in just add it in your not paying like 400k. I record adding a couple additional bites to track the time in a timestamp is not the end of the world.

So I strongly recommend going year month day hour minute second if you have it because then you can filter those with the begins with right and which I find really really useful and you can also do ends with to calculate based on time. So that’s another nice thing is if you have time stamp as its own deal you can start by saying at you know begins with I’m putting this in the comments your year month.

Dede you can also do ends with an hour hour minute minute to tell me everything that happened at 8 a.m. In that 8 hour in the morning. So if you’re looking at her bus system example, that might be really think prime time. You can say I want to know everything that happened in 88 at 8 a.m.

Regardless of the day because I want to see how Primetime Ebbs and flows for that hour in my city, right so you can do that with the key because you’ve already got it set up an effect key is in your index which it should be an indexing dynamos a whole nother thing index is speed up those queries, then you’re all set.

Right so you can manipulate that time stamp using those begins and ends if you have year month day hour second now, that’s why I said also in the intro when we were looking at our data model and let me just flip back to that when we were looking at our general data model and that that timestamp in the right ID might be more useful a pulled.

As its own key as a sub key right as a sort key. So that might be far more useful because then we can do those manipulations, but you can see just even us talking this through there’s a ton of things that you really need to kind of look at to make sure that you’ve got your data model set correctly.

Now the good news is if you mess this up, I meant to get follow through the truck 10 post here, you know talking about their the partition Key Resort Key and how to set this up and keep keeping the item write in in the one table is absolutely critical that but what I’m getting at here, is that the mistake so they designed the beauty of Dynamo is that it can be so fast that you can compensate for some poor design decisions, but you can also always just move everything from one table to another easy enough to go through them and set them up and I’m relatively certain that there are actually some tools in Dynamo to help make that simpler to kind of cut things over if you did.

Completely messed things up and you need to break up a key into its own individual field so lots to think about with Dynamo a but the key takeaway and the reason why I wanted to redo this part of the stream and spend more time on it. Is that a Dynamo scales like wildfire, right? It is a fantastically as Andrew Brown Put it here on LinkedIn dumb and flat but it’s fast as all get-out right at very low cost of Roswell supposed to go to auto scaling zenplace Dynamo and prices based on number of reads and writes allocated weather use them or not.

So auto-scaling lets you go up and down in a far more sensible way now, but it’s also very it’s serverless. It’s pure cloud-native, but you need to design cloud-native way on Wednesday. I made the very common mistake of taking what I knew. He was familiar with and then just dumped it in the Dynamo and would that work absolutely would it be inefficient? Yeah hundred percent but Dynamo strong enough to overcome that efficiency, which actually can be a problem because I could Turn away without design for quite a while when it’s not the right design and flipping it over and putting everything into one table is the way to go one table.

It rules the world for Dyna most definitely I read the truck 10 post check out the Dynamo talk from reinvent. That is the way to go Regional spend 50 minutes watching that from rickets phenomenal Augusto. Just put a good comment in the LinkedIn question, but sharing these links absolutely one of the things that you have to HD streamz, so they went live on Lincoln.

We went live on Periscope at my post these to my YouTube channel, but also to my website and is part of that in my website. I will actually put the extra links. I follow and Danny slides that are used. So put that link there again, you’ll see that little be up and probably half an hour or so.

I run it through transcription and we’re all good Off to the Races. So I appreciate you guys spending the time with me today. I’m sorry, but the camera Fubar a little bit earlier. I was messing around with some settings trying to get ready for some other streaming on doing next week.

So I continued to the Stream Road to reinvent leading up to reinvent but next week. I am actually starting a new series. I’m with Trend Micro I’m beginning be doing a KakaoTalk. Let’s let’s talk cloud with Trend Micro bring it in various guests first one kicked off Monday 10 a.m.

Eastern outlook on the Trend Micro channels for more about that. But for this one, like I said come back and check out the website that has pups of Mark end. CA you’ll see if the main link there and I will put all the stuff we talked about. Thank you for letting me redeem myself after an absolute cluster you-know-what on Wednesday at the end of the stream.

Thus the rest that streams great by the way, we’ve talked about the pros and cons of each. Database service, but when I started fumbling around a Dynamo, I felt really bad because it was horrible and you guys are better. So this was the Redemption. Thanks for tuning in. I hope you guys are set up for a phenomenal weekend and I’ll let’s keep the conversation going will talk to you soon.