Archive 6 min read

Facebook Data Downloads

You upload a ton of data to Facebook and in turn, Facebook generates a ton of data about you. I built a tool to take a look at Facebook's view of you.

Facebook Data Downloads

Watch this episode on YouTube.

Reasonably Accurate 馃馃 Transcript

Morning. How's everyone doing today? It is Monday, which is always a fun day. Um At least that's what they say. Never ends up being actually true. I'm just gonna double check that we're streaming up here on all channels.

Yeah, everything is waking up. Uh I'm gonna get that echo out of here. There we go. All right. So, um on Friday, I talked about um the CS E here in Canada um pushing up some metrics that were kind of BS and I was fully intended on diving into that um throughout Friday and I did for a while.

Um But then a couple of articles started popping up on Saturday and Sunday um for various media outlets in relation to Facebook because, you know, we can't get enough about Facebook. Um And what these articles were were people are actually downloading their data from Facebook, which is something Facebook is enabled for a long time.

You can go up to your settings, click on a couple of links basically. And you know, three clicks later, you would work 10 minutes. You get an email since your download is ready and then you get this big zip file that contains your timeline.

Um a whole bunch of the metadata that they've built around you, some low res versions of your photos, stuff like that. Um In these articles, there's one in CNN, there was a couple that I encountered as well, but the link to the CNN one there we go down here doing in different angles of the visuals are all a little link down here for um the CNN article because it was interesting.

It basically it said like, hey, look at the advertisers that are looking at my um information. Um and that's interesting, but there's way more of that data file, I think that people don't realize. And so I started to pull together a little tool, couple scripts um in Python um that start to map out um what is going on.

Um In fact, let me just click over uh and see if I remember. Here we go. All right. So uh let me share this up to you. I think anything else is terrifying on my screen at the moment.

Mm Perfect. Yeah. So here, let me zoom in on this. So I'm starting the post obviously. Um I renamed it. Um But uh right now um this is useful but you can see here um in all of these uh pins on the map, the pins on the map indicate um locations that Facebook has inferred from when I've logged in.

So I haven't given any um I haven't posted anything I haven't um released my um information at all. Uh It just logging into Facebook. Um They do a whole bunch of correlation in the back end to infer a location and this is just for 2017 what it found for me.

Um So there's a lot going on there. Um And that's going to be the basis of a post that hopefully I get up today um along with some code because there's a bunch more location in that location, data in that data do and some of it is explicit, some of it is implicit.

So it's one thing if you do a status post and you check in on that status post and pin it and say like, hey, I am at the hard rock cafe or I'm at the movie theaters. That's you explicitly saying, share my location and there's a lot of stuff that happens in the background like this inferred from login um that people aren't aware of as well as all the photos.

So when the um when you take a photo on your camera, most of the time people, I love it and able to tag it to geo tag it, which is totally great because in an application like um photos on my Mac, I can pull up a map that shows me where all my photos have been taken.

I can zoom in on a location to see the photos that have been taken there for me. That's great. That's interesting. But for Facebook, they use that to get a whole bunch of extra information. And I think what's really fascinating and I will put this in the post.

What was really fascinating was the fact that when you ask for your data back from Facebook, they give you the um geography like the coordinates of the photos, but they strip a whole bunch of other information out of the photos and don't correlate that back for you.

So they actually don't even give you the time stamp of the photo back even though they know it. Because if you look at the post on Facebook, it says you posted this picture, you know, March 27th at 2:32 p.m.

When you download your data from Facebook, that information is stripped out. So even when you get your data from Facebook, they're not giving you everything. But what they have provided is a wonderful insight of sort of just how much they know. And I think it will be surprising to a lot of people because up until this last couple of weeks, um the whole Facebook knows a lot about me and that's OK has been um people assuming what Facebook knew and sort of saying here's what I've explicitly shared. They are not going to know any more than what I have explicitly shared and that is not at all correct. Facebook is very smart at correlating data from multiple sources that they get on their platform but also from third parties. So a really easy way for them to figure out when I log in from an IP here at home. Even if I'm not explicitly sharing my location, they get a much better GO IP targeting. Because traditionally when people say, hey, this IP is at this location, that's generally really crappy data. Facebook is refined it quite well because they have supplementary data where somebody in the neighborhood here has posted a photo while being checked into this neighborhood and that gives them the G GPS coordinates with an IP address and saying, hey, this IP address is far more accurate. It's not just in this city, it's in this neighborhood or on this street. So it's kind of spooky what they've been able to infer without anything. And I think that map I just showed you the real reason why that kind of hits home, I think or will hit home for a lot of people. Is that simply done from logging in.

So no other action other than entering your user name and password into a device or pulling up the web page if you're already logged in. Um That's what that map was generated from. That's it. So there's a lot of data out there. Um People are right to be mad, people are right to be concerned. We have done this to ourselves in a large part. Um But Facebook is doing a lot of stuff in the behind the scenes and totally understandable given their mission. Um But that people don't realize the implications of it. So hopefully this post when it's done will um help enlighten some people. Um And I'm gonna make the tool and share it out on github so that people can run it against their own data dump as well. It doesn't call any third party services. It is all just local. Um And it just a bunch of scripts to strip out the information from the web pages that you download from Facebook and then display it to you in a way that you can turn around and make sense of because when you see it in the data dump, it totally, it seems somewhat innocuous when you add it together to a map like I did there, you start to get the picture. So that's what I'm working on today. Um Still following up on the, the BS numbers from the billion attacks um from uh Friday as well. Um And we'll see what else happens because it's Monday. There's always going to be something else. Hope you guys have a great day. Um As always hit me up where we go, there we go, hit me up, marknca. Um or down here in the comments below, I look forward to talking to you all and uh hearing your thoughts on these issues. So uh have a great one.

Talk soon.

Read next