The US Federal government cannot be trusted.

This is my one and only political, totally off subject post, and it’s damn important.

In my youth I was a heavily proactive individual, when the bridges got washed out in my neighborhood I made up maps to help the wayward tourists find their way, I’ve donated my time to Habitat for humanity, built community parks, worked as a special need’s teacher’s assistant, volunteered for Al Gore’s 2000 presidential bid, and then I did a stint in the US military. To some extent I am disenfranchised with my government, specifically I think Capital Hill should be renamed Imperial Hill but at the same time the US democratic foundation isn’t completely dead yet.

After the attacks of 9/11, opportunistic and deeply self serving people set the US on a dead end course to destruction. One of the things done while the country was numb and fearful was deploy the Patriot Act, a bill purported to make America safer and prevent future terrorist attacks. The truth isn’t really close in how the Patriot Act has been used ( http://www.aclu.org/national-security/aclu-releases-comprehensive-report-patriot-act-abuses ), instead eroding constitution rights and making it easier for the greedy to ransack the country under the guise of national security.

For anyone naive enough to believe the latest anti-piracy internet bill won’t be abused, well I’d like to talk to you about a fantastic Real estate propspect for a bridge connecting Manhatten island. Please contact your representative ( if you are a US Citzen ). Those who sacrifice their freedom for security deserve neither.

Lose a car window, lose a day

Yesterday morning I walked out to my carport to discover someone had bashed in my wife’s back window and one of her quarter panel windows on her Volvo. So began a fun filled journey to hunt down replacement glass ( Safelite had the back window but not the Quarter, speciality shop had quarter but at a steep price ). The truly fascinating part is that my car, a $36,000 price tag Volkswagen GTI was left unscathed sitting next to her old Volvo.

Eh, going to see if I can finish up Veterans Guide today and spend a little bit on the security service. In the interim, I think I’ve found a bug in txmongo but its posing to be a doozy to narrow down WHERE exactly the problem lies and how to make a unit-test to repeat the problem.

Two startup’s for half the sleep!!!

This weekend I enrolled into Startup Weekend Denver #3 AND also started a personnel project for the LinkedIn Veteran’s hackday.

For SWDenver, the project is an penetration testing service. Thanks to my experiences with EC2, I’m relatively confident I can make it marginally profitable by implementing an intelligent booking/scheduling system and the power of twisted. At this point I’ve had a couple Venture Capitalist minions peak in on my team and they’ve walked with a gleam in their eye. Who knows?

For LinkedIn Vet’s day, I’ve been building a Veteran Events and Services national directory using Django and Mongodb. My goal is to run this as a charity/public service to my peers ( USAF Senior Airmen Dev Dave 2001-2005 IYAAYAS ). That said I’ve got the models done, view scaffolding is in place, working on setting up a UserAuth submission pipeline.

As it stands, looks like I will have the SaaS for SWDenver on its way out the door by tomorrow afternoon and the VetDay directory by Monday morning. Nice to flex my RockStar muscles and kick out a distributed SOA project and a national directory in the space of a weekend.

The mess that is the Veterans hack day project is here @ https://github.com/devdave/VetHackday minus 6-7 pushes.

For Hire

Despite sometimes major butchering of English grammar, I think this blog covers that I am at least competent and well versed in software development. That said, for US citizens I’m willing to pay out a $500 bounty for anyone who can point me in the direction of a Python contract or full time remote position that I get placed into.

contact me at fromthewordress at ominian dot net or dot com.

Philosophy on getting it done

I think a few times now I’ve lamented my experiences with PHP. I’ve been accused by a few people of being two-faced in this regards; openly saying I don’t want to work with PHP any more as a tool set while when faced with a client seeking my advice on what platform to use I have pointed them towards PHP. The truth is really simple, I want to get my client up and going or back on the road. Sure Ruby on rails or Django are superior products in many ways but they are also more expensive in professional resources to maintain. A good example is one of my clients that is the IT department for a major corporation with tens of thousands of customers and just as many computers to maintain. My client is well versed in system administration but not in software engineering. So I wrote up a Kohana v3 application that uses mod_php and jquery. If I get hit by a bus tomorrow they will still be alright and happy.

I don’t think I’m explaining myself too well just yet, so a much better written tale from the industy is in order via the DailyWTF @ ( article ). Yes PHP is not sexy, its almost downright ugly, but it does what the client needs and is mostly reliable in that regard. If tomorrow someone said they needed an asyncronous event handler and they wanted it in PHP5; I’d probably laugh in their face then proceed to start crying when I realized they were serious. Still, if tomorrow someone said they needed a simple low-traffic inventory management system and they already have on hand people competent in LAMP then I’d lean towards PHP.

Why I’m like this, going for the simplest most reliable solution originates with some of my more desperate past clients.

The super website constructor… of doom

Directory websites are usually fairly simple MVC applications that parse browser get requests and dump out a nice simple web structure on demand. This client’s system did all of that work upfront by ingesting a hierarchical data structure and generated a root page, regional pages, sub-pages, and detail pages in one go into a static 4.7GB structure. The plus was that the sales/marketing people could make rudimentary hand changes to pages and they would work… but the MAJOR downside was that structural changes were not possible.

The complexity required to build such a builder was unbelievable and as such was not very future safe. In fact it was generator 1-2 GigaBytes of error/warning messages a day to syslog. In reflection I think the project was built the way it was because of two reasons: first was my predecessors lack of experience/knowledge and secondly more so was from sheer boredom. They made this because it was challenging to implement. Once implemented and the stack went into a maintain and feature addition state, they left.

I hate PHP so I shall do something else.

Another client horror story was a scenario where the company Rockstar got bored with the languages currently in use. Having read an extensive amount of this Rockstar’s code I could clearly see they were a very brilliant individual but at the same time they were shooting their client in the foot with their perpetual need to be brilliant. I would often lament in private conversation with my peers that Rockstar while brilliant always leaned towards the path less taken, making code that was harder to maintain or sometimes understand without a long analysis period.

The final straw for this rock star was when they started writing backend services in a language no one else in the company knew. I think the grand total was half a mega-byte of code that was super critical to the company, very pretty, but also seriously fragile. As expected, the Rock star grew bored and moved on. A few months later it was a somewhat horrifying moment when one of these almost forgetton gems broke under unexpected circumstances and brought an entire application array of 30 servers to a dead stop. The culprit was a UTF-8 character that broke a high volume data extraction script and caused the producer to block, waiting for the stdout pipe to clear up some space.

Fixing that took all of the King’s men, some of the horses, the company CTO, and me. I won’t go into specifics but it turned out to be a combination of weak code and a bug in the target language.

Summary

I don’t resent these people for getting bored; more so I resent that they got bored and didn’t realize it until they had abused an unspoken trust between professional and employer. Code monkeys write code to make their client money, not to entertain themselves. Whenever I get bored I will go on a serendipity hike with my personnel time to see what I can pull off, ultimately a lot of these pet projects
end of going no where… but it’s more about the journey then the destination for me. Each one of my pet’s has ended up teaching me more about things I might not work with professionally then any book or blog post could. Just because you are a professional and work in your vocation doesn’t mean you are going to wake up one day and be a master in that profession, just like martial arts and any other vocation professional or hobby, it takes continual investment of time and energy, striving for harder and more unusual problems to solve to become a master.

Adventures in SSH: Agent authentication

As mentioned previously; I’m working on hacking/implementing agent support to Twisted.conch. Fortunately, Exarkun pointed me in the direction of twisted.conch.ssh.agent.SSHAgentClient which implements the wire protocol logic of communicating with the Agent, but there is still a gaping hole to fill in.

Briefly, when a user configures their ssh client to allow for agent forwarding, almost immediately after userauth completion, the client sends a session request for ‘auth-agent-req@openssh.com’. For openssh, the service then kicks off a process of creating a named socket that usually resides in /tmp/ , announces the user’s agent presence in the shell environment, and then binds a specialized TCP port forwarding channel from the named socket back to the client on a channel called “auth agent”. When a service local ssh client then begins it’s own authentication process, it finds this special socket and sends down the wire a request identities or sign request Agent protocol message. Ideally the response will be a correctly counter-signed value and the user can progress.

The last point can be found in the session.c file of OpenSSH as:

239		nc = channel_new("auth socket",
240		    SSH_CHANNEL_AUTH_SOCKET, sock, sock, -1,
241		    CHAN_X11_WINDOW_DEFAULT, CHAN_X11_PACKET_DEFAULT,
242		    0, "auth socket", 1);

Unfortunately I haven’t hunted down what the global const SSH_CHANNEL_AUTH_SOCKET correlates to in regards to Python. I believe argument 1 “auth socket” is equivalent to the class attribute name in channel.SSHChannel. So the skeleton for an “Auth socket” channel might look like:


class AuthAgentChannel(SSHListenForwardingFactory):
   name = "auth agent"

Alas I haven’t had time to test. I’m debating hacking up some sort of SSH/twisted.conch testing platform to allow for me to execute arbitrary calls, that would probably make this exercise a tad easier to figure out.

All you need to know about SSH, you can learn from session.c

For a secret squirrel project, I’ve been diving fairly deep into SSH land. While in the process of implementing my own SSH service via Twisted.Conch, I ran into the problem of trying to figure out how to support agent forwarding.

While tracing through an SSH connection, I got the session request name ‘auth-agent-req@openssh.com’ and after grepping over the openSSH code, sure enough there’s a check for that exact request type.

Will update when/if I can figure out how to translate to Python/Twisted. In the interim, session.c can be viewed here http://anoncvs.mindrot.org/index.cgi/openssh/session.c?view=markup. In passing, I have to say this is some of the most immaculate C code I have ever seen in my life.

Amazon EC2, HBase, and IMHO benefits of ephemeral over EBS

Damn you hbase, damn you to hell

All of the other core services I’ve dealt with in Hadoop play by the system rules, if I populate fake DNS values in /etc/hosts by golly the services are going to believe it. Well all except for Hbase which didn’t seem to play fair with /etc/resolve.conf or /etc/hosts and did fairly low level reverse DNS lookups against the network DNS, which in this case was provided by Amazon. I so do love those super descriptive ip-101-202-303-404.internal addresses.

Still, once you abandon the long term untenable idea of using /etc/hosts and just get into the habit of memorizing IP/internal DNS addresses its not so bad. Otherwise a stable arrangement was debain squeeze with Cloudera CDH3 Update 2, the stability improvements were painfully obvious as HBase stopped murdering its own HDFS entries and became performant.

Last bit, for small clusters it makes sense to use EBS backed volumes for the datanodes, but generally I felt that the ephemeral volumes were slightly faster in seek times and throughput. This became especially important under very high load HDFS scenario’s where an EBS array on a datanode is capped collectively to 1GB/s but emphemeral can go higher.

Still focusing on pro-emphemeral nodes, the reality is that you’ve lost the game if a single datanode has more then 250GB of JBOD volumes and it’s going to quickly become expensive if you have multiple terabytes of EBS backed data ( .10 USD a GigaByte and .10 USD per million I/O ops ). Instead, the reality is that with 2 or 3 levels of HDFS replication, something downright catastrophic would need to occur to take all of your datanodes completely down. Plus with S3 being right next door to EC2, it’s hard to find a excuse not to make vital backups.

SSH SOCKS proxy and Amazon EC2 to the rescue

I’m currently somewhere in the process of building a hadoop clouster in EC2 for one of my clients and one of the most important parts for keeping my sanity is the ability to access all of the node’s web interfaces ( jobtracker, namenode, tasktrackers’, datanodes, etc ). If you aren’t abs(crazy) all of these machines are jailed inside a locked down security group, a micro walled garden.

SSH -D 8080 someMachine.amazon-publicDNS.com

That will setup a socks between your machine and some instance that should be in the same SG as the hadoop cluster… now unless you are a saddist and like to write dozens of host file entries, the SOCKS proxy is useless.

But wait! Proxy Auto-configuration to the rescue! All you really need to get started is here at Wikipedia ( http://en.wikipedia.org/wiki/Proxy_auto-config ) but to be fair a dirt simple proxy might look like:

hadoop_cluster.pac
function FindProxyForURL(url, host) {
if (shExpMatch(host, "*.secret.squirrel.com")) {
return "SOCKS5 127.0.0.1:8080";
}
if (shExpMatch(host, "*.internal")) {
return "SOCKS5 127.0.0.1:8080";
}

return "DIRECT";
}

Save this to your harddrive then find the correct “file:///path/2/hadoop_cluster.pac” from there go into your browsers proxy configuration dialog window and paste that URL into the Proxy Auto-configuration box. After that, going to http://ip-1-2-3-4.amazon.internal in a web browser will automatically go through the SSH proxy into Amazon EC2 cloud space, resolve against Amazon DNS servers, and voila you’re connected.

NOTE: Windows users

It shouldn’t be a surprise that Microsoft has partially fucked up the beauty that is the PAC. Fortunately, they provide directions for resolving the issue here ( http://support.microsoft.com/kb/271361 ).

tl;dwrite – Microsoft’s network stack caches the results of the PAC script instead of checking it for every request. If your proxy goes down or you edit the PAC file, those changes can take sometime to actually come into play. Fortunately Firefox has a nifty “reload” button on their dialog box, but Microsoft Internet Explorer and the default Chrome for windows trust Microsofts netstack.

Quick comments on scaling an application up

Unfortunately I cannot find the original usenet post, so here’s the paraphrased summary:

Two programmers are discussing what to do with a slow program and the junior of the two laments “If only there was a way to make the computer run faster.” to which the senior replies “You cannot make the computer run faster, but you can make it do less.” The gist of which I can explain from my own experience.

Caching

With some exceptions, generally it doesn’t really matter what language you choose to write implement a program or application in…as long as it is fast enough. Instead you need to look at what you’re application is spending most of it’s time doing and I don’t mean just a cursory look but really dig into there. In almost every case, the primary culprit to scaling out is going to be whatever you are using for a data-backend.

If you’re fetching from the database a User credential or profile record, you’ve suddenly locked the speed of your entire application to the max number of connections ( not queries ) your database can do. For MySQL that’s about 150-180/second ( or 220-250/second if you have a full time DBA ). If you get more then 250 user requests to your webstack, then your application is locked up solid. So it should be obvious that the solution is to case everything and anything that’s needed from the databases that won’t be changing too often.

My prefered solution for the above is to use memcache with as much ram as you can throw at it, at minimum 2Gbs but I’ve worked on on 128GB categorized arrays before. Now memcache can be summarized as an unreliable key/value data store. You might put a key pair in and it might be there for the next minute or so.

By implementing caching into your application, you’re making it do less. So instead of a 1 to 1 relationship between user requests and databases connections it might go up to 10 to 1.

Division of concerns

This usually catches almost all junior and mid-level developers off guard. If your application serves static content from a Python or Ruby script, your burning capacity up. Instead a better plan is to split your application up into two subprojects: Application and Application Content. From the outside looking in, http://derpCorp.com/application/url and http://static.derpCorp/staticContent/ Generally ngin-x or lighttpd can trounce almost anything else for serving content. Again not applicable to everyone, the cost of infrastructure will lean heavily towards new application servers and not your content servers… so by dividing the two now, when you can you set yourself up for investing wisely vs. just throwing money at the problem.

Divide and conquer

The minute one piece of an application becomes a critical component the door to unending misery begins to open. That one critical piece is going to reliably fail at every investor presentation, at 4am on saturday, and about ten minutes after hit rush hour evening traffic. Usually the critical component is the database and almost always the first solution is to throw more memory and disks at it, hoping the beast will be sated forever and ever. This should be a sign that something needs to change, but sometimes it isn’t heard. Instead of scaling up, the proven winning solution is to scale out. If you have two or more schema’s on the same server, it might be time to separate them. Does User A need to cohabitate with User B’s data?

Don’t ignore your problems

Usually there is a small clan of people clustered around an application, it provides money and stability for them. Sometimes this clan sacrifices their youth, sanity, and credit ratings for the application like it’s some sort of messed up deity. Unfortunately you’re application is stupider then the bacteria growing in your kitchen sink and though the causation of throwing money and time at a half ass solution may seem to correlate with resolution, correlation does not equal causation…especially with software. If half of the application randomly goes belly up every week at the same time… don’t ignore that problem or worse try to bury it, pick someone in your team and send them off on a mission to find the problem and fix it. Otherwise what was once a problem may end up being your clan’s apocalypse.