False positive blocking calls in twisted with Chrome

I have a simple txWeb service with endpoints:

index
say
hear

Index prints

"hello world %s" % time.time()

Say pushes

"hello %s" % time.time()

onto a 0MQ pub/sub socket

Hear waits for a message to happen on the 0MQ pub/sub socket and than publishes it.

Now say I’ve got 4 browser windows open: 1 called index, 2 are blocking on hear, and finally I call say.

I expected both hears to end their blocking state and print the same “hello 1234” message BUT instead the first hear returns while the second one stays blocking.

This took me a bit to debug BUT what happens is that the first /hear blocks on its call to the server while the second is QUEUED INSIDE CHROME and never calls the server. It’s only after the first one completes ( timeout or success ) that the second one calls the server.

Project notes

http://www.stevecoursen.com/209/stackless-python-meets-twisted-matrix/
https://github.com/smira/txZMQ
https://pypi.python.org/pypi/Twistless/1.0.0

goals: evaluate speed over memory for pypy using stackless/coroutines with ZMQ and twistless ( if its even possible without rebuilding txZMQ ).

Initial goal would be to take github.com/devdave/pyfella and let two stacks fight it out for an hour ( assuming 10x deep othello minimax ).

Interesting way to safely debug multiprocessing python systems

I have one particular “job” that has 3 sub processes moving as fast as humanly possible to build a report. The main slowdown is an external data source which isn’t outright terrible but its not great either. The worst possible outcome is when this thing hangs or misses available work which it was predisposed to do a lot.

Various kill signals usually failed to give me an idea of where the workers were getting hung up and I wasn’t really excited about putting tracer log messages everywhere. Fortunately I have a dbgp enabled IDE and I found this answer on SO. http://stackoverflow.com/a/133384/9908

Taking that I modified it to look like this:

import traceback, signal

#classdef FeedUserlistWorker which is managed by a custom multiprocessing.Pool implementation.

    @classmethod
    def Create(cls, feed, year_month = None):

        signal.signal(signal.SIGUSR1, FeedUserlistWorker._PANIC)

        try:
            return cls(year_month=year_month, feed=feed).run()
        except Exception as e:
            from traceback import print_exc
            print_exc(e)
            sys.stderr.flush()
            sys.stdout.flush()

the print_exc is there because there isn’t a very reliable bridge to carry Exceptions from child to parent. Flush’s are there because stdout/stderr are buffered in between the parent pool manager.

    @classmethod
    def _PANIC(cls, sig, frame):
        d={'_frame':frame}
        d.update(frame.f_globals)
        d.update(frame.f_locals)

        from dbgp.client import brk; brk("192.168.1.2", 9090)

The only thing that matters is that call to dbgp. Using that tool, I was able to step up the call stack, fire adhoc commands to inspect variables in the stack frame, and find the exact blocking call, which turned out to be the validation/authentication part of boto s3. That turned out to be a weird problem as I had assumed the busy loop/block was in my own code ( eg while True: never break ), fortunately it has an easy fix https://groups.google.com/forum/#!msg/boto-users/0osmP0cUl5Y/5NZBfokIyoUJ which resolved the problem as my Pool manager doesn’t mark tasks complete and failures will only cause the lost task to be resumed from the last point of success.

Cassandra 1.2.x – Wide rows are freaky on AWS m1.xlarge

CQL3 is a very nice abstraction to Cassandra but its important to pay attention to what it is doing.

In SQL land, 1 record == 1 row. In Cassandra 1 record == 1 row, but 2+ records can ALSO be on the same row. This has to do with CQL’s partition and primary keys. Your partition key is what decides which row a record belongs to while the primary key is where the record is in a row. If you only have a primary key and no partition key, 1 record == 1 row, but if you have a composite ( partition key, primary key) every record where partition key is the same is going on the same row.

I had a few rows that were ~30GB in size which put stress on nodes using m1.xlarge ( 8GB heap, 300MB new heap size ) with epic Compaction cycles of doom.

quick setup Datastax Cassandra for Ubuntu 12.04

curl -L http://debian.datastax.com/debian/repo_key | sudo apt-key add -
sudo sh -c 'echo "deb http://debian.datastax.com/community/ stable main" >> /etc/apt/sources.list.d/datastax.list'

sudo apt-get update
sudo apt-get install cassandra

Really wish there was a cassandra-core and cassandra-server divide as more often than not I just need the c* tools and libraries and not so much the server itself.

Also 1.2.5 is TOXIC as all hell for high read/write environments.

AWS + python + new datastax driver for python

These directions will setup openvpn ( you need that unless your dev’ing inside AWS )
http://sysadminandnetworking.blogspot.com/2012/12/openvpn-on-ec2aws.html

New Python cassandra driver is here https://github.com/datastax/python-driver
It’s really new so it has sharp edges.

Howto: Make a HTML desktop app with PySide

Given a project structure like:

my_project
 main.py
 www/
    index.html
    scripts/ 
        jquery-1.9.1.min.js

main.py looks like:

import sys
from os.path import dirname, join

#from PySide.QtCore import QApplication
from PySide.QtCore import QObject, Slot, Signal
from PySide.QtGui import QApplication
from PySide.QtWebKit import QWebView, QWebSettings
from PySide.QtNetwork import QNetworkRequest



web     = None
myPage  = None
myFrame = None

class Hub(QObject):

    def __init__(self):
        super(Hub, self).__init__()


    @Slot(str)
    def connect(self, config):
        print config
        self.on_client_event.emit("Howdy!")

    @Slot(str)
    def disconnect(self, config):
        print config

    on_client_event = Signal(str)
    on_actor_event = Signal(str)
    on_connect = Signal(str)
    on_disconnect = Signal(str)


myHub = Hub()

class HTMLApplication(object):

    def show(self):
        #It is IMPERATIVE that all forward slashes are scrubbed out, otherwise QTWebKit seems to be
        # easily confused
        kickOffHTML = join(dirname(__file__).replace('\\', '/'), "www/index.html").replace('\\', '/')

        #This is basically a browser instance
        self.web = QWebView()

        #Unlikely to matter but prefer to be waiting for callback then try to catch
        # it in time.
        self.web.loadFinished.connect(self.onLoad)
        self.web.load(kickOffHTML)

        self.web.show()

    def onLoad(self):
        if getattr(self, "myHub", False) == False:
            self.myHub = Hub()
         
        #This is the body of a web browser tab
        self.myPage = self.web.page()
        self.myPage.settings().setAttribute(QWebSettings.DeveloperExtrasEnabled, True)
        #This is the actual context/frame a webpage is running in.  
        # Other frames could include iframes or such.
        self.myFrame = self.myPage.mainFrame()
        # ATTENTION here's the magic that sets a bridge between Python to HTML
        self.myFrame.addToJavaScriptWindowObject("my_hub", myHub)
        
        #Tell the HTML side, we are open for business
        self.myFrame.evaluateJavaScript("ApplicationIsReady()")


#Kickoff the QT environment
app = QApplication(sys.argv)

myWebApp = HTMLApplication()
myWebApp.show()

sys.exit(app.exec_())

and index.html looks like:


    
        
        

        
    
    
        Tell the hub to connect
        
    
    

Basically class Hub is a bridge/interface pattern between Python and the HTML Javascript engines. It’s probably best to pass things between the two as JSON as that’s a language both sides are readily prepared to deal with.

Stuff I didn’t deal with yet: The Main window title ( self.web.setWindowTitle), setting a window Icon ( no clue on this one ).

Otherwise this gives me an OS agnostic UI, I can leverage my HTML skillset without having to learn QML or QT Designer’s UI interface, and I can hopefully recycle some logic. Additionally I can go full fledged with the bridge logic, split it between the two, or shove all the complexity into JS and have basic Python endpoints exposed.

Second thoughts on the ZMQInline logic

Initially I was going to set the experiment with ZMQInline aside but I am starting to realize that it does have some merits.

Below is a burnout test I did using dotGraph ( current name candidate )

@ZMQInline()
    def do_kickstart(self):
        self.log.debug("do_kickstart")
        for i in range(1, 100):
            try:
                try:
                    self.log.debug("Process Loop {0}".format(i))
                    response = yield Request(self.control, RequestKickstart(), timeout = 5000)
                except TimedoutError:
                    self.log.exception("Timedout")

                except Exception:
                    self.log.exception("Unexpected exception")
                else:
                    self.log.debug("Got "+ repr(response))

                    if RespondWithKickstart.Matches(response):
                        self.log.debug("Got Kickstart!")
            except Exception as e:
                self.log.error("Caught an exception leak {0}".format(e))

        self.loop.stop()

In Dotgraph, Kickstart is the first task both client’s and services need to accomplish and it’s basically “Hello I am here, who do I talk to?”. Depending on the Kickstart response, clients and services might be sent to a completely different host, port set. The idea is that if DotGraph needs to grow, it can have an array of slave hubs and then delegate new client’s to the most underbooked.

One thing I realized is that the above structure would perfect for doing standoff logic ( attempt, time out…wait, attempt N seconds later). Of course it begs the question of WHY the other party hasn’t responded in time ( are they down or just backlogged? ). Other problem is how to do unit-testing cleanly? Still, I haven’t thrown it away yet so maybe I will find a use for it in DotGraph.

FYI on pyzmq and IOLoop

A funny little quirk has gotten me twice now ( hopefully it won’t be three times ).

In Twisted, once the generator has been called ( even inadvertently ) it becomes the defacto instance.

In PyZMQ, calling ioloop.IOLoop() creates a new loop ( and blows the old one away )… The correct approach is to always call it via

    ioloop.IOLoop.Instance()

I don’t want to talk about how much time I wasted working out how to make a pyzmq like interface similar to Twisted’s inlineCallback, but it was enough!

On a lighter note: how to make my co-worker do XXX

From stack exchange – http://programmers.stackexchange.com/questions/185923/how-can-i-deal-with-a-team-member-who-dislikes-making-comments-in-code/185926#185926

I think anyone who has written software alongside anyone else has had at least one grievance with their peers work. Sometimes it can be benign like

if(foo == bar): do_stuff

versus

if(foo == bar) {
   do_stuff
}

And other times it can a tad more annoying like

class Foo { function Foo(arg1,arg2){ this.arg1=arg1, this.arg2=arg2} }

To get a co-worker to change that style, from what I’ve seen it’s not going to happen by asking them to use more XXX and formatting politely. I worked with someone like this and they got immediately defensive and said “I just don’t like wasting space.” which kind of took me off guard as it didn’t really make any sense. For that case I was able to pull rank and kickback their commit’s which probably isn’t the best solution. I’ve seen other solutions like implementing a CI that stage one was an automatic style checker… which is basically what I did but automated and probably a worse solution as sometimes you need to write something really dirty. Maybe a better solution is code reviews but it is fairly rare to see a company institute those. 9 out of 11[SIC] companies, the guy writing the checks for the software engineers isn’t an engineer themselves and in the short term they’re going to see that one of their senior engineers isn’t directly doing work that correlates to profits. In that last case, when it comes up in a review the lead can justify the expenses by pointing out that they’re minimizing the wasted time other staff have to spend just to comprehend something.

Finally at the end of the day, while it’s important that a person finds reward in their job… it’s still a job and not your private playpen. Instead it’s more like software survivor island, the majority has to work together to keep their company/product going and if one goof ball insists on writing software in a conflicting manner then the majority it doesn’t really matter if they are “right”, they need to conform or GTFO.

For scenario’s where you ARE that goofball, you need to get political and I don’t mean selling the highest non-technical person on your island but demonstrating to your team the merits of an idea and working with their feedback. For real world example, I joined a consultancy as the goofball and started with the company architect and development manager. I didn’t sell them on the idea that I was right but on the idea until it became their “right” idea. From there I prodded momentum until one day the idea stopped being a minority opinion and a coup was staged. A week later the company made a fairly dramatic shift in how they did development. That’s a nice happy ending scenario but it doesn’t always happen. Twice I’ve failed to sell the virtues of unit-testing ( one case they assigned me the task of writing unit-tests which was just a means of getting me to shut up about them… but completely negated the idea) and twice I walked away because I got tired of having last minute “Oh my god everything is fucked up” hack sessions to fix prod. That’s life, sometimes you lose and sometimes you win.