Django Chat

Performance

Episode Summary

Is premature optimization the root of all evil? We discuss! And also cover both common and uncommon techniques for improving performance in a Django application.

Episode Notes

Episode Transcription

Will Vincent  0:06  

Hello, and welcome to another episode of Django chat weekly podcast on the Django web framework. I'm Will Vincent joined, as always by Carlton Gibson. Hi, Carlton, Hello Will. And this week we're going to talk about performance. So performance matters because it is the probably most important part of the user experience. Google punishes slow sites with SEO these days. And even something like Amazon with e commerce has done studies showing that just a 100 millisecond slow down can cost a percentage of sales and 100 milliseconds. That's like a blink of an eye. What's the actually, as we get into it, what's the default these days for? How was it 300 milliseconds? A user can't tell the difference. But then after that every hundred milliseconds, I think,

 

Carlton Gibson  0:49  

Oh, well, there was I don't know, but there was something about iPhone. Back in the day when I was playing with those sorts of things. The there was something about responsiveness. There was a three 300 milliseconds.

 

Will Vincent  1:01  

Yeah, landed on

 

Carlton Gibson  1:01  

my clicking in a wet in a web view. And it used to drive users mad that people would be building web apps, not native apps, but using web views. And they'd beat us this this noticeable delay. Yeah. versus a native button, which just went off straightaway. And I think the delay was 300 milliseconds. And they there was some can we get rid of it? No, we can't get rid of it. Because they.

 

Will Vincent  1:27  

That was like a glitch in the system. Yeah. And it just gets super frustrated. I think 300 milliseconds definitely is noticeable. Yes.

 

Carlton Gibson  1:34  

Yes. I don't know. Why is it?

 

Will Vincent  1:36  

So we're going to go through it's a whole set of tools and approaches. Yeah,

 

Carlton Gibson  1:41  

go ahead. Well, no web web framework. No web performance metric was they talked about less than one second screen to glass, like if you could, if you if your full load time to to interactive on the glass was less than a second, then users considered that fast and that's likely the gold standard. That's Having your your your HTML delivered your JavaScript or on there your CSS in place. So at least your first layout done even if you're pulling in images and things like that, but also your page clickable and the two glasses that that is responsive that it's it's rendered and responsive in under a second. And that's considered like, you know, that's quick. That's a new talking about on mobile,

 

Will Vincent  2:23  

right. I mean, I know a website, I'd say it's less than I think over time, people have come to expect better. But that's

 

Carlton Gibson  2:32  

no, but I think that's pretty good. You bang that into even see it's at your desktop, even with a decent connection, you put that into your desktop browser, and you type any, any site that's loading any kind of JavaScript and like half of them will be slower than a second. So anyway, that that, that that that one second to glass idea is kind of like a benchmark. And if you think about how long it takes to load JavaScript, and for it to work, all the assets to work arrive, you've got network latency, and then you've got the loading time, then you've got the rendering time. It doesn't give your Django web application very much time to respond. Right? You're going to meet that.

 

Will Vincent  3:10  

Right, exactly. And last point before we get into all this, I do want to mention Donald Knuth as I said his name, canoes couldn't connect.

 

Carlton Gibson  3:19  

Yes, as Wikipedia. I always thought it was new. That was new for years. And then I looked it up is kluth.

 

Will Vincent  3:24  

So Knuth, he's at CS, a Stanford, you know, he looks at email, I think like once every six months and thinks all these deep thoughts, and he has these incredible series on computer science. Anyways, here's his quote on performance before we get into it, which is, quote, The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places, and at the wrong times. premature optimization is the root of all evil, or at least most of it in programming. So you've probably heard the second half of that, but I think it's in context. It makes a lot of sense around as we get into all these things. Basically, think about what you're doing. Don't just blindly whack every performance efficiency. that pops up. Because Yeah, those will be infinite.

 

Carlton Gibson  4:03  

But also like what it turns out, as a sort of Matter of fact, just from, you know, how programs behave in the wild that most of your performance issues will come from a very small number of places, and you won't be able to predict where they are in advance. So what we need to do is build it, just build it simply and sanely as you can, don't make optimizations Don't worry, you know, just don't spend time optimizing it at all. And then profile it and see where the three performance battle bottlenecks which are taking 70% of the timer, optimize those. And we've for a fraction of the effort of micro optimizing everything as you went along. You've got more performance web application.

 

Will Vincent  4:46  

Yes. But it is tempting to try to do it locally. So

 

Carlton Gibson  4:49  

yeah, well, it's interesting, isn't it? Oh, I can you know, can I can I if I use exists versus count. Do I get four millisecond?

 

Will Vincent  4:58  

Yeah, let's get to that. So How do you baby step up it. So the very first thing I would say is, you have to have Django debug toolbar, just to use a third party package. It lets gives you configurable panes, configurable panels. So you can see the request response cycle a page. Basically, it shows you how many queries are there and how long it takes to load locally. So this isn't a proxy for production, but it gives you a quick look at it. And the two big ones again, do this in production, the two big things you're going to want to look at for queries to is select related and prefetch related. Do you want to take a stab at those columns?

 

Carlton Gibson  5:35  

Okay, so so if you so there are, as well as Django debug toolbar, which we use locally, there are things called application APM. What is the P stands for application, something monitoring, I can never remember leaving it there. Let's pretend it's process. It might be something it could be performance. Benefit APM. So there used to be one called op beat which got bought up by elastic and then there are other ones out there, right? What's the New Relic was relic robot, I always think century should have on but I don't think they do I think yeah,

 

Will Vincent  6:05  

that's the era. But I think there's a dog, I think right okay. But anyway there's

 

Carlton Gibson  6:11  

there's loads of these and they just wrap something around your application which monitors how long execution times take? And what you're going to find if you've got any old normal application is that your biggest hit is database? Guess is the time to fetch data from the database is most of your response time from your Django application? So what select related does is it when there's a foreign key, you can say, Hey, can we just join those two together with a SQL join? And can we get them all in one database here rather than two or more? Well, you know

 

Will Vincent  6:43  

it right, because one larger and four, he said is

 

Carlton Gibson  6:46  

each related object. And then there's the other. So the other option is prefix related, which is for many, many to many or many to one, relationships where you want to fetch, I know all the authors and all the books or under together? kind of book. Yeah, book that many authors. So that's fine. And what that'll do is it'll fetch the authors, and then it'll fetch all the books that are related to the authors. And it will do that in in a couple of database queries rather than it can't do it in one because it can't do the join. But it will do it in two database queries, rather than, you know, perhaps potentially

 

Will Vincent  7:21  

hundreds. And I think the history of Django so select trade related was always there. And then I believe prefetch related was added later, I'd have to come and look

 

Carlton Gibson  7:29  

to be honest, but yeah,

 

Will Vincent  7:31  

so these are, these are the two hammers that you're gonna want to use as a first step. Generally speaking, when you see a page that's loading slowly, and

 

Carlton Gibson  7:39  

Django test suite has a really cool, cool tool called assert non queries, which if you write a test unit test fetching the data you want haven't used that you can, you couldn't assert that only one query was made when you select related to fetch your data.

 

Will Vincent  7:54  

And so it wasn't kind of something Carleton sometimes look at that. Well, I just

 

Carlton Gibson  7:58  

found it in the Django test. That's quite exciting. But yeah, so you can, so you can write a unit test fetching your, you know, say you've got a convenience method, which wraps or all the data you need for your view, and returns it nicely. So, you know, see, keep that logic out of the main line of the view. You can test that method with us with a certain, um, queries and say, Look, I'm expecting this to make two queries, because I'm using prefetch. Related, I want one for the authors and one for the books. And I don't want any more queries. So that when you you know, iterate through your list. In your test, it says, Yeah, I did fetch all of the objects here into queries, rather than one for the for the author, and then one for each of the books as I traverse the relationship. Right? Yeah.

 

Will Vincent  8:43  

Yeah, I like that. Okay, I'm gonna have to use that. So what else?

 

Carlton Gibson  8:48  

So yeah, so to reduce the number of queries, right, that's the first thing and then make sure your queries are efficient. So that's the second thing that says indexes. So this is a quick look at the queries you're making in your views. Then make sure that those you can use explain the database have been equal sets now from 2.2, have an explained method, which is kind of nice. It saves you having to extract the query from the query set using query query, and then putting that into your shell to explain it, you can just call explain, and it will give you it'll send that off to the database and ask it to explain says what it's doing. And you have to, you have to do that a few times and read them but it'll say, look, and now I'm doing a role scan. And what a rosca means is, I'm going through every single row of the database table to see what the matches are. And what you don't want that. That's where you want an index because you want it to just go and look up in the index and get the matching values from the index, which is much quicker operation. So reduce the number of queries and make sure you're using index correctly. That's my big one or two. And then you were just about to say,

 

Will Vincent  9:49  

caching. Yes. Which we have a whole episode on this. And I talked about briefly before. Yeah. So cash, all the things

 

Carlton Gibson  10:00  

Well, yeah, but well, you know, if it took a long time to come out of the database, and you gave me all the time, cash it, I worked on a site, which was an API, which was serving social media data. And there was a competitor, some one of these.

 

I can't even remember

 

so long ago. But anyway, it was a social media, data mining nonsense. And they had clients that were making lots and lots and lots of API requests all the time. And every every request, they had to check the API key. So you don't want to go fetching all the API keys from the database every single request just to check whether the API matched. So we would fetch it once an hour or whatever. And we would then check against the cache whether the API key was correct, rather than against the database because,

 

Will Vincent  10:51  

well, that's quicker, right? But the key thing is you're doing this on real live production data, because again, I'll say this again to folks don't waste time doing this locally. It's so tempting.

 

Carlton Gibson  11:02  

But you need Yeah, but you need a code path has to be the same right? So that Django gives you a dummy cash back end, which is great for local development, because it's, it's exposes the cache, but it's just it doesn't work. It doesn't do anything. So you can say, is this in the cache? No, it's not, because the dummy cache never catches anything. And then you can go and hit the database. So in in development, even though you need you just use a dummy cash bucket, it's a bit like using the console email back end, you know?

 

Will Vincent  11:29  

Right. Right. Yeah. indexes I want to quickly note. So one thing, which I think is cool is that starting with 111, you can do this in a meta class on your models, instead of adding a DB index field. I personally find that doing it through meta is a little more readable, and I can put more things in there. But what would be so indexes are abused, like what's the downside of just indexing everything,

 

Carlton Gibson  11:54  

right, what time and space so it takes so when you when you if you've got an indexed field. Yeah, right. Yeah, index field, it will column in the database table, it will take longer to write that record to the database if it has to index it at the same time. So that's time. So right performance is impaired if you've got an index in place, but also space because, you know, it's like a phonebook. Right? So the classic example of an index is a phone where I, you know, I want to look at someone's phone book, phone number up by their name. Okay, so instead of going through the list, one by one, I can just go to the alphabet, look at go to the right place in the alphabet and get the number. But phone boots are big and fat, right? They take up a lot of space on the coffee table and in the hallway under the telephone. So that's the same, same problem for every index that you add.

 

Will Vincent  12:49  

Right? They're not costly.

 

Carlton Gibson  12:50  

Yeah, they're not cost. It's but to be honest, on balance, are you making actual queries on this data? If so, you probably want an indexing plate. But until you've started once you building before you build your application, you probably haven't designed it well enough to know which

 

Will Vincent  13:08  

columns you'll actually be correct. I think that's the key thing is your schema can and will change. And especially once you start indexing scheme of 1.0, it's, that's the wrong approach. I've done that.

 

Carlton Gibson  13:19  

I don't so I don't quite like

 

Will Vincent  13:21  

if you, you start, you start with your basic schema. Yeah, you go hog wild with indexes, and then you find out that I want to change the end of the change the schema around because my data is different, right, add new features, but then you've got the indexes, and it's too much too much. It's premature optimization.

 

Carlton Gibson  13:37  

Yeah, I mean, this this is, you know, this happens in no SQL land quite a lot. Because for instance,

 

Will Vincent  13:45  

you have to be

 

Carlton Gibson  13:46  

you have to you have to create these views, which are essentially indexes and you have to specify them upfront and so you think are gonna be isn't it is an index, I'll create this here's a view or greatness, and it goes and processes more and then you realize that was totally wrong. This happens with Elastic Search as well, because you think I'm going to search on there. So you create an index, you know, searching with these fields, and then you have to reindex it all she realized that isn't right. It's quite difficult.

 

Will Vincent  14:10  

But well, this is the thing with no SQL, it's, it needs to be used appropriately. Because when you first start using it, you're like, this is amazing. Like, I don't need to. But yeah,

 

Carlton Gibson  14:20  

yeah, but so anyway, use Django debug toolbar. When you've got your application running you I'm going to deploy this. Okay. So go through locally, go through see what the actual queries er use the explained to. Are these query sensible? Putting indexing is the improved, and whilst local isn't a proxy for production? It kind of is. It won't tell you the exact numbers but it will tell you the relative scale.

 

Will Vincent  14:46  

Yeah, I mean, look at it for sure. Yeah. And we've mentioned, you know, the other third, the fourth big area, I would, I would say be the front end assets, which is a Jenga developer, you probably don't have as much control of but you can use to For example, you can use Django compressor. Third project super package, I'm Carlton maintains, well, you want to see what your help with, you can use a CDN, you know, I, there's a whole actually a link to it. I don't know at, as Mani, who's at Google has a whole free web book on images. And which is really fascinating. I mean, for example, if you haven't thought about it, you know, you can use easy thumbnails, which is a package so that rather than showing the two megabyte version of let's say, it's a photo user profile photo, and someone can upload a photo, but when you show it on the screen, it's a tiny little thing. Well, you can have a thumbnail version of it, the full version of it. These are sort of basic steps that are really performant. So just front end assets in general, and especially if you look at Google PageSpeed. There's other ones, all the major browsers how to evaluate site speed, they will help you especially with the front end assets, to see like your JavaScript is way too big, or you have

 

Carlton Gibson  15:57  

as your web server configured to cache these things. Send the right caching headers to say, look, you know, this, this this CSS file, cache it indefinitely. And one thing that Django compressor will give you is a nice concatenated file, but it has a silly hash in the in the file name. So you can cache that forever. Because if you change your CSS, that hash is different. And so the file name is different. And so you can configure your web server to say to tell the browser's and to tell the proxy caches out there, caches forever,

 

Will Vincent  16:28  

how it's fun to do performance stuff, it's just never ending. The last thing I would say, and you can say what you'd like Carlton is I find that the Django extensions third party package is very helpful because it's got a whole bunch of things, but specifically with Shell plus, which will auto load models into your shell when you need to drop in. It also has run server plus there's it's sort of a Swiss Army Knife of tools. I find myself using it all the time. Because whenever I go into the shell, I want the models loaded and it's just I can't live without Django extensions.

 

Carlton Gibson  17:01  

Yeah, I love him big fan of it. It's, it's got this. I can't remember the exact command. But it's got this ability to output a picture, a dot A dot file, which is a graphics format of your models. And you can drag that into you can either view that in in what's the Come on, I probably can't remember, but you drag it into omnigraffle. And then you've got a nice diagram of all your models and the relationships between them. And I hate it, you know?

 

Will Vincent  17:31  

Yeah, well, I mean, the hard thing with all these favorite packages and tips is just figuring out the priority and the curation of them. This is why for example, like so awesome. Django is a repo, I maintain there's a whole bunch of third party packages and I'm, I appreciate lots of prs and issues people put in there, but I don't want it to be 1000 packages long. I'm trying to keep it curated. But when Carlton I mentioned Django debug toolbar Django extensions, I would say almost Every Django site should use those.

 

Carlton Gibson  18:02  

Yeah, I don't have a problem saying that it's amongst the packages that I pip install without really having any concerns.

 

Will Vincent  18:08  

Yeah, that actually would be a cool thing to like, what's your top, your top five top 10 third party must haves if we surveyed some talking heads, and there might be a fun thing to do. Yeah, I should do that. Any last things on performance? we've we've really hit that that kind of the high points, but

 

Carlton Gibson  18:26  

Django probably is probably isn't your problem, as long as you're not making 200 database requests. All right. single thing you know, if you do the basics is Django probably isn't your bottleneck for you know, most web apps, it's your JavaScript and your front end stuff that's that will have more of an effect. But, you know, if you're pushing if you're pushing it to, you know, to its, you know, if your server is doing something intensive, use your Django application that's driving it, then you select related prefetch related in indexing, then caching this Like, serialization can cost time, you know, rest framework if you're using rest framework. If you're really pushing that to limit serialization, it's like template rendering. It's an expensive process. So there are alternate serializing options, which you might go for if you were really, you know, driving, make sure your middlewares are optimized. You know, the episode. Yeah, but what did you know, premature optimization is the root of all evil, chances are, you will get the throughput you need. Doing the two or three things which are eating all the time, rather than, you know, worrying about Oh, should I should I spend a week changing my serializer layer to

 

Will Vincent  19:42  

say, I mean, 20% of the effort, okay. 80% of the way there. But I feel like I've been saying this a lot recently, but it is so tempting to dive into these small little micro changes that will have an impact and ignore talking to users, you know, changing more important things around It feels personally it feels better to endlessly optimize performance. So I have to watch myself with that. It's like the equivalent. I mean, some people they're busy work is answering email, right for some

 

Carlton Gibson  20:12  

Yeah, he's doing a little. I'm doing a performance optimization. Look, I've got 5% more throughput. Yes. called once by back in work, right? Totally pointless.

 

Will Vincent  20:23  

Anyway, okay, that's the high points as ever. We are at Jango chat.com, or chat Django, on Twitter for the dyslexic folks out there. And we'll see you all next week. Bye. Bye. All right. Take care. Buh bye.