In this 2nd excerpt from an internal programming discussion on Deacom’s ERP software, Jay Deakins talks about the impact that retrieving data out of the database has on performance. The mailbox example he provides is a great analogy for a way to improve the structure of data retrieval within the system.
The full transcription follows:
So we talked about some of the things we learned during that performance thing… Deacom’s a little bit different than some, there’s different types of programming that you might do. If you’re programming video games, the performance is going to be around refreshing the screen.
In the Deacom world, screen refreshes are free. They essentially cost… It’s not free, but it’s so miniscule that to the time it takes to redraw the screen is so miniscule that you just don’t have to worry about it.
The second thing is, again with Deacom, if you’re running a company doing say DNA research, the amount of lines of code that you have to run become very significant. You might be running billions of lines of code, which is going to take a while to process.
In the Deacom world lines of code are essentially free. It doesn’t matter how many lines of code you have they are essentially going to run instantaneously. We actually tested this back in 2002, we created a little loop program, and said, ok run ten million lines of code, how long does it take, and it’s not a long time.
And so from Deacom, we are not code intensive per se… we’re logic intensive and the last part is something we have to worry about, we are very data intensive in Deacom. The thing, which is your performance hindrance, is generally always going to be data.
Now it’s interesting, we did a lot of testing around this and the number of rows in a table… if I’m going to find one record and I have a properly indexed table, the number of rows in the record does not significantly impact performance. I was actually pretty surprised to find that. That if you go… we actually wrote programs to load up millions and millions and millions of lines of records into a table and if I’m going to do a seek and find one record, if I have 10,000 records in the table or I have 10,000,000 records or a hundred million records it’s not a significant performance difference.
So when I’m getting one record, it should be free. If you’re seeing a problem in there, it means you have an indexing problem. But again, if I’m going to get one, or inserting one record, it’s essentially instantaneous.
So what does take time, and what is volume dependent, is grabbing data out of the database. And what’s interesting, is we did a lot of experimenting with this… We don’t get to control the number of rows that are returned… So you think about that, if you have a customer that does a lot of sales orders… They’re going to have, in their sales order table they’re going to have lots of rows… and we can’t really control that…. there are some ways to control that in what we pull back, but essentially we can’t control how many rows they put in.
If you have 500,000 sales orders a year you’re going to have 500,000 rows in your database. So we can’t control that.
What we can control is the width. Because there’s two things that impact that performance: The first thing is how many rows we pull back, and the second thing, which is equally as important, is the width. And performance is pretty linearly calculated as X times Y, it’s the number of rows times the width. So one thing you’ll see as you look in Deacom, we have very narrow tables as a general rule – or as narrow as they can be. So things like the General Ledger table is pretty narrow and your results set you pull back is critically, critically important.
You guys are probably all familiar with how we do the grid searches, we only pull back what they actually need – something we came up with in the late 90’s – it radically increased the performance of the system.
Again, we couldn’t impact how many rows… if you say I want to see all the sales order lines I shipped this year, I got to show you all the sales order lines. But we could impact what we brought back. And so that width of that table is very impactful and there’s a direct correlation between that and performance. If you bring back twice as wide a table, it’s going to take twice as long – and that’s going to slow things down pretty dramatically when you do that.
So that’s the most important thing, in performance, is the size of those results you’re bringing back. So one of the concepts I always talk about is how and when you do this, and how you distribute load. I talk about the mailbox example. At my house I have about a 100-yard driveway, and there’s a mailbox at the end. And if the mailman comes and puts a bunch of letters in there and I go out and get one letter and walk back to the house, then walk back to the mailbox, get a letter, walk back to the house, it’s going to take me a long time.
Now by changing the structure of that, by grabbing them all together and bringing them back and sorting them out I can speed that up. Where I can speed it up even better, is if I have a thousand letters out there, it might take me a long time to pull that up. If I can throw things out at the curb, and throw them right into the trash, it’s going to make me faster.
So how we do things and that set theory of how we bring things back is actually very, very, very important.