Category Archives: subjective

Two startup’s for half the sleep!!!

This weekend I enrolled into Startup Weekend Denver #3 AND also started a personnel project for the LinkedIn Veteran’s hackday.

For SWDenver, the project is an penetration testing service. Thanks to my experiences with EC2, I’m relatively confident I can make it marginally profitable by implementing an intelligent booking/scheduling system and the power of twisted. At this point I’ve had a couple Venture Capitalist minions peak in on my team and they’ve walked with a gleam in their eye. Who knows?

For LinkedIn Vet’s day, I’ve been building a Veteran Events and Services national directory using Django and Mongodb. My goal is to run this as a charity/public service to my peers ( USAF Senior Airmen Dev Dave 2001-2005 IYAAYAS ). That said I’ve got the models done, view scaffolding is in place, working on setting up a UserAuth submission pipeline.

As it stands, looks like I will have the SaaS for SWDenver on its way out the door by tomorrow afternoon and the VetDay directory by Monday morning. Nice to flex my RockStar muscles and kick out a distributed SOA project and a national directory in the space of a weekend.

The mess that is the Veterans hack day project is here @ https://github.com/devdave/VetHackday minus 6-7 pushes.

For Hire

Despite sometimes major butchering of English grammar, I think this blog covers that I am at least competent and well versed in software development. That said, for US citizens I’m willing to pay out a $500 bounty for anyone who can point me in the direction of a Python contract or full time remote position that I get placed into.

contact me at fromthewordress at ominian dot net or dot com.

Philosophy on getting it done

I think a few times now I’ve lamented my experiences with PHP. I’ve been accused by a few people of being two-faced in this regards; openly saying I don’t want to work with PHP any more as a tool set while when faced with a client seeking my advice on what platform to use I have pointed them towards PHP. The truth is really simple, I want to get my client up and going or back on the road. Sure Ruby on rails or Django are superior products in many ways but they are also more expensive in professional resources to maintain. A good example is one of my clients that is the IT department for a major corporation with tens of thousands of customers and just as many computers to maintain. My client is well versed in system administration but not in software engineering. So I wrote up a Kohana v3 application that uses mod_php and jquery. If I get hit by a bus tomorrow they will still be alright and happy.

I don’t think I’m explaining myself too well just yet, so a much better written tale from the industy is in order via the DailyWTF @ ( article ). Yes PHP is not sexy, its almost downright ugly, but it does what the client needs and is mostly reliable in that regard. If tomorrow someone said they needed an asyncronous event handler and they wanted it in PHP5; I’d probably laugh in their face then proceed to start crying when I realized they were serious. Still, if tomorrow someone said they needed a simple low-traffic inventory management system and they already have on hand people competent in LAMP then I’d lean towards PHP.

Why I’m like this, going for the simplest most reliable solution originates with some of my more desperate past clients.

The super website constructor… of doom

Directory websites are usually fairly simple MVC applications that parse browser get requests and dump out a nice simple web structure on demand. This client’s system did all of that work upfront by ingesting a hierarchical data structure and generated a root page, regional pages, sub-pages, and detail pages in one go into a static 4.7GB structure. The plus was that the sales/marketing people could make rudimentary hand changes to pages and they would work… but the MAJOR downside was that structural changes were not possible.

The complexity required to build such a builder was unbelievable and as such was not very future safe. In fact it was generator 1-2 GigaBytes of error/warning messages a day to syslog. In reflection I think the project was built the way it was because of two reasons: first was my predecessors lack of experience/knowledge and secondly more so was from sheer boredom. They made this because it was challenging to implement. Once implemented and the stack went into a maintain and feature addition state, they left.

I hate PHP so I shall do something else.

Another client horror story was a scenario where the company Rockstar got bored with the languages currently in use. Having read an extensive amount of this Rockstar’s code I could clearly see they were a very brilliant individual but at the same time they were shooting their client in the foot with their perpetual need to be brilliant. I would often lament in private conversation with my peers that Rockstar while brilliant always leaned towards the path less taken, making code that was harder to maintain or sometimes understand without a long analysis period.

The final straw for this rock star was when they started writing backend services in a language no one else in the company knew. I think the grand total was half a mega-byte of code that was super critical to the company, very pretty, but also seriously fragile. As expected, the Rock star grew bored and moved on. A few months later it was a somewhat horrifying moment when one of these almost forgetton gems broke under unexpected circumstances and brought an entire application array of 30 servers to a dead stop. The culprit was a UTF-8 character that broke a high volume data extraction script and caused the producer to block, waiting for the stdout pipe to clear up some space.

Fixing that took all of the King’s men, some of the horses, the company CTO, and me. I won’t go into specifics but it turned out to be a combination of weak code and a bug in the target language.

Summary

I don’t resent these people for getting bored; more so I resent that they got bored and didn’t realize it until they had abused an unspoken trust between professional and employer. Code monkeys write code to make their client money, not to entertain themselves. Whenever I get bored I will go on a serendipity hike with my personnel time to see what I can pull off, ultimately a lot of these pet projects
end of going no where… but it’s more about the journey then the destination for me. Each one of my pet’s has ended up teaching me more about things I might not work with professionally then any book or blog post could. Just because you are a professional and work in your vocation doesn’t mean you are going to wake up one day and be a master in that profession, just like martial arts and any other vocation professional or hobby, it takes continual investment of time and energy, striving for harder and more unusual problems to solve to become a master.

Amazon EC2, HBase, and IMHO benefits of ephemeral over EBS

Damn you hbase, damn you to hell

All of the other core services I’ve dealt with in Hadoop play by the system rules, if I populate fake DNS values in /etc/hosts by golly the services are going to believe it. Well all except for Hbase which didn’t seem to play fair with /etc/resolve.conf or /etc/hosts and did fairly low level reverse DNS lookups against the network DNS, which in this case was provided by Amazon. I so do love those super descriptive ip-101-202-303-404.internal addresses.

Still, once you abandon the long term untenable idea of using /etc/hosts and just get into the habit of memorizing IP/internal DNS addresses its not so bad. Otherwise a stable arrangement was debain squeeze with Cloudera CDH3 Update 2, the stability improvements were painfully obvious as HBase stopped murdering its own HDFS entries and became performant.

Last bit, for small clusters it makes sense to use EBS backed volumes for the datanodes, but generally I felt that the ephemeral volumes were slightly faster in seek times and throughput. This became especially important under very high load HDFS scenario’s where an EBS array on a datanode is capped collectively to 1GB/s but emphemeral can go higher.

Still focusing on pro-emphemeral nodes, the reality is that you’ve lost the game if a single datanode has more then 250GB of JBOD volumes and it’s going to quickly become expensive if you have multiple terabytes of EBS backed data ( .10 USD a GigaByte and .10 USD per million I/O ops ). Instead, the reality is that with 2 or 3 levels of HDFS replication, something downright catastrophic would need to occur to take all of your datanodes completely down. Plus with S3 being right next door to EC2, it’s hard to find a excuse not to make vital backups.

Jumping off the couch and heading to MongoDB

It sucks to admit when you’ve made a bad decision, but step one is admitting you have a problem. My problem specifically was with what I wanted to accomplish and what CouchDB had to offer. To be clear the problem was with me and not CouchDB, its a great tool and resource for someone out there, but not for me.

context

One of my unpublished pets is a delicious clone ( live as of last weekend ) that was designed from month two to be a single user affair ( delicious data, custom spiders, reports, and a reddit cross-analysis ranking thing ). Delicious is/was a bookmarking website where you could apply arbitrary tags to bookmarks then retrieve all bookmarks related to a bookmark.

The data could be modeled as a URL has many tags and a tag has many urls. In SQL you could do something like

table BookMark:
url: text(255)
name: text(255)
date_created: text(255)

table Tags:
name: text(255)
date_created: text(255)

table BookMarks2Tags
tad_id: int
bookmark_id: int

Odd data, odd results

I can’t find the map logic I used, but the gist of it was that the results I getting back were
less then idea. It was easy to aggregate tag counts but to grab all bookmarks that had a specific tag was somewhat contorted.

Lack of straight forward documentation

Rechecking couchdb’s documentation website, I really hate information overload style doc’s. In the beginning I don’t care how something does what it does, just show me well documented examples of accomplishing the basics: Create, replace, update, delete. Probably immediately after that I’ll need how I can connect relate two entities of separate types and do CRUD on that. Rinse and repeat until I’ve made something so goddamn complicated that maybe its time to figure out how the whole mess works.

Error logging from hell

This could be the fault of the Ubuntu package manager for CouchDB or just my cluelessness, but I absolutely hate 5-10 page long exception traces that include a lot of stuff I don’t give a crap about… just tell me I’m an idiot and their’s a runtime syntax error on line two or the road peg doesn’t go in the square hole.

Lack of python support

To be fair, CouchDBKit rocks and did take some of the sting out of learning a new technology, but in earlier 2011 late 2010 I found the python CouchDB view interpreter left a lot to be desired ( partially due to CouchDB’s excessive error vomit traces ). Never mind that typing in whitespace sensitive code into a textarea field for adhoc query testing was entertaining.

Alright, I think I’m done ranting. Does this mean I’m going to completely swear off CouchDB? No. I keep proclaiming I’m never going to do anymore PHP contracts, and then the next thing you know I’m staring at an IDE full of PHP 5.0 ( for non PHP people PHP 5.0 was as good as MySQLDB 5.0… for non MySQL people, MySQL 5.0 was scary ).

Stupid until proven otherwise

Feels like a life time ago, I was in the architect’s seat for a multi-million dollar proposed project. From my perspective I made several really damning design choices that I can only argue originated from my lack of more experience. Regardless of these mistakes, the team lead and the other code monkeys got the project off the floor and handling several thousands of concurrent requests/second without to much pain and suffering. So in the grand scheme I suppose the whole thing could be check on the plus column as a success.

Still, years later, my biggest mistake was abusing the crap out of data hiding in the form of protected/private parameters, guarded methods, and such… all because I didn’t trust the junior and mid-level dev’s to get it right. Because of these somewhat draconian design, the natural result was that the code monkeys worked around my restrictions at the cost of precious CPU cycles, extra memory, and some truly horrendous hacks. Sure as the architect I should have caught all of these things, but by that time I had become overbooked as lead on more critical projects while also being the acting DBA and sys. admin for everything ( except the obligatory exchange server which I successfully pretended didn’t exist ).

Thinking over how I would have handled things better, I think the over-arching gist of Alex Martelli’s presentation on API Anti-patterns from PyCon 2011 – http://vodpod.com/watch/5757194-pycon-2011-api-design-anti-patterns

Escape from PHP

I am going to be honest with you. I… hate… this place, this zoo, this prison, this reality—whatever you want to call it, I can’t stand it any longer. It’s the smell, if there is such a thing. I feel saturated by it. I can taste your stink, And every time I do, I fear that I’ve somehow been infected by it; it’s repulsive, isn’t it? I must get out of here. I must get free.

— Agent Smith, The Matix

To be quite honest, even though I am a master PHP programmer, I pretty much despise the language and to an extent the culture around PHP.

“Hyuck, why use an open source framework when we can write our own” – 3 different clients last year said that

“We can’t do unit-testing, it will cost too much to implement.” – Pretty much every client, which is my fault because I didn’t properly sell them on TDD or just unit-testing.

Or hell, this isn’t fair to PHP’s culture, but I’ve lost count of how often MySQL is used like a gigantic dump truck. “Normalization? Fuck that, MySQL can handle it… ‘Can’t ya ol’girl?'” At which point the master MySQL server either catches on fire or gridlocks itself into oblivion.

PrototypeJS, a fond memory

Four-five years ago there was a crossroad that in reflection I screwed up on. I was so sure that PrototypeJS was the right library to invest in, it made working with multi-browser javascript a breeze & had a plethora of fantastically useful helper functions. Now, present day, its been almost three years of non-stop jQuery and I’m kind of fine with that. Though a mere shadow of the power that is ExtJS, jQuery UI is a semi-decent library for widgets and asthetics, jQuery itself does what I need, and I don’t get looks of WTF when I mention I’m using jQuery over Prototype at the local developer meetups.