Quick comments on scaling an application up

Unfortunately I cannot find the original usenet post, so here’s the paraphrased summary:

Two programmers are discussing what to do with a slow program and the junior of the two laments “If only there was a way to make the computer run faster.” to which the senior replies “You cannot make the computer run faster, but you can make it do less.” The gist of which I can explain from my own experience.

Caching

With some exceptions, generally it doesn’t really matter what language you choose to write implement a program or application in…as long as it is fast enough. Instead you need to look at what you’re application is spending most of it’s time doing and I don’t mean just a cursory look but really dig into there. In almost every case, the primary culprit to scaling out is going to be whatever you are using for a data-backend.

If you’re fetching from the database a User credential or profile record, you’ve suddenly locked the speed of your entire application to the max number of connections ( not queries ) your database can do. For MySQL that’s about 150-180/second ( or 220-250/second if you have a full time DBA ). If you get more then 250 user requests to your webstack, then your application is locked up solid. So it should be obvious that the solution is to case everything and anything that’s needed from the databases that won’t be changing too often.

My prefered solution for the above is to use memcache with as much ram as you can throw at it, at minimum 2Gbs but I’ve worked on on 128GB categorized arrays before. Now memcache can be summarized as an unreliable key/value data store. You might put a key pair in and it might be there for the next minute or so.

By implementing caching into your application, you’re making it do less. So instead of a 1 to 1 relationship between user requests and databases connections it might go up to 10 to 1.

Division of concerns

This usually catches almost all junior and mid-level developers off guard. If your application serves static content from a Python or Ruby script, your burning capacity up. Instead a better plan is to split your application up into two subprojects: Application and Application Content. From the outside looking in, http://derpCorp.com/application/url and http://static.derpCorp/staticContent/ Generally ngin-x or lighttpd can trounce almost anything else for serving content. Again not applicable to everyone, the cost of infrastructure will lean heavily towards new application servers and not your content servers… so by dividing the two now, when you can you set yourself up for investing wisely vs. just throwing money at the problem.

Divide and conquer

The minute one piece of an application becomes a critical component the door to unending misery begins to open. That one critical piece is going to reliably fail at every investor presentation, at 4am on saturday, and about ten minutes after hit rush hour evening traffic. Usually the critical component is the database and almost always the first solution is to throw more memory and disks at it, hoping the beast will be sated forever and ever. This should be a sign that something needs to change, but sometimes it isn’t heard. Instead of scaling up, the proven winning solution is to scale out. If you have two or more schema’s on the same server, it might be time to separate them. Does User A need to cohabitate with User B’s data?

Don’t ignore your problems

Usually there is a small clan of people clustered around an application, it provides money and stability for them. Sometimes this clan sacrifices their youth, sanity, and credit ratings for the application like it’s some sort of messed up deity. Unfortunately you’re application is stupider then the bacteria growing in your kitchen sink and though the causation of throwing money and time at a half ass solution may seem to correlate with resolution, correlation does not equal causation…especially with software. If half of the application randomly goes belly up every week at the same time… don’t ignore that problem or worse try to bury it, pick someone in your team and send them off on a mission to find the problem and fix it. Otherwise what was once a problem may end up being your clan’s apocalypse.