Author Archives: David

About David

A mostly professional code monkey, to contact me translate the following "direct from blog to email" at the website ominian.net. Replace spaces with underscores or dashes, whatever is valid. A script will pick up your email, scrub it against a white list, and if your not a spammer I will get an email.

Filtering and ordering by date with SQLAlchemy

You want the extract command which is documented here -> https://docs.sqlalchemy.org/en/14/core/sqlelement.html#sqlalchemy.sql.expression.extract

The list of options are generally the same regardless of dialect/SQL server so a reference of types can be seen here for SQLite3 https://github.com/sqlalchemy/sqlalchemy/blob/main/lib/sqlalchemy/dialects/sqlite/base.py#L1229

A common base extract type arguments to SQL arguments is here ->https://github.com/sqlalchemy/sqlalchemy/blob/main/lib/sqlalchemy/sql/compiler.py#L299

Extract() is transformed for SQLite3 into `strftime` -> https://github.com/sqlalchemy/sqlalchemy/blob/main/lib/sqlalchemy/dialects/sqlite/base.py#L1272

The base visitor/transformer for extract is here https://github.com/sqlalchemy/sqlalchemy/blob/main/lib/sqlalchemy/sql/compiler.py#L2124

Finally, a basic example might be something like MyTable.query.filter(extract('year', MyTable.date_field) == 2022) which would produce something like SELECT ...hell of a lot of columns... FROM MyTable WHERE STRFTIME("%y", MyTable.date_field); 

Process finished with exit code -1073740791 (0xC0000409) – PySide2 & PyQT5

I am working on a WinAmp clone called PySongMan(ager) and kept getting a stackoverflow bug. Drilling my code downward, simplifying it as I went, I got a script like:

Which kept throwing a 0x0…409 error which is a stack overflow error. Finally, somewhat by accident, I figured out my mistake. The good code looks like:

and runs without issue. So the problem is that before any QT widget/window can be created, QApplication must be called first.

Flask class routing example

I’ve written several web frameworks in my life and while I don’t have the desire to keep up with current technology trends, I still like to dabble. book.py is the best example of what this does versus page.py which shows more advanced use cases.

Portable/reusable flask app skeleton

I have a lot of Flask apps running in the background on my home server to do various tasks (home wiki, some CRM stuff, etc) and I end up making the same structure over and over so I figured I would simplify the process and make a repo with just the skeleton of an app.

A few benefits:

  1. as long as you use relative imports . and .. (eg from .. import app) your web application is name agnostic.
  2. The flask application instance of Flask() can be accessed from anywhere in the web application without a risk of circular import problems.
  3. It’s entirely possible to copy and paste web application modules (eg models) into another web application and it will mostly just work (baring configuration needs).

https://github.com/devdave/skeleton_flask

import logging
import sys
from flask import Flask

app:Flask = Flask(__name__)
log:logging.Logger = None

def create_app(config=None)->Flask:
    global app, log
    from . import conf

    log = logging.getLogger(__name__)
    fmt = logging.Formatter(app.config['APP_LOGGING_FMT'])
    hndl = app.config['APP_LOGGING_HANDLER']  # type: logging.Handler
    hndl.setFormatter(fmt)
    hndl.setLevel(app.config["APP_LOGGING_LEVEL"])
    log.propagate = False
    log.handlers.clear() # This removes flask's default handler
    log.addHandler(hndl)
    log.debug(f"{__name__} loading components")



    from . import lib
    from . import models
    from . import views
    from . import settings

    return app

https://github.com/devdave/skeleton_flask/blob/master/init.py

This is the __init__.py file in the base of the web app. To use it with flask you would do something like this on the commandline

#>set FLASK_RUN_PORT=1234
#>set FLASK_APP = "webapp:create_app()"
#>set FLASK_ENV = development
#>python -m flask run
OR 
#>flask run

The FLASK_APP environment variable is documented here https://flask.palletsprojects.com/en/1.1.x/cli/#application-discovery and it’s pretty straight forward module:function_name() where function_name is defined in module/__init__.py

The reason for having the imports for lib models views settings in create_app is to prevent a circular import and allow sub modules like views to do from .. import app to access the Flask application instance.

Data migration with SQLAlchemy and Alembic

I needed to optimize an unruly table filled with floats but I also didn’t want to lose my data. Unfortunately the documentation on the alembic website doesn’t mention anything or give any hints on how to do a data migration versus just a schema migration.

Fortunately I was able to run a symbolic debugger against alembic and figured out that all of the op.<method>`calls are atomic. If you have an add_column call, it adds the column when it executes that method. So that opened the door to data migrations.

One note before I pasted the code. You don’t need to specify all of the columns of the source table when used in a data migration scope. This makes your code a lot cleaner as the working model code is specific to what data you plan on using.

Alright, no more babbling, here is the example code.

A while back I downloaded my google location and history data and ran into these strange lat7 and long7 columns (paraphrasing as I don’t remember their exact names). The data were these large integer numbers that I couldn’t figure out how to decode. Suddenly it became obvious when I noticed all of the latitude fields started with 35 and the longitude started with -104. 35, -104 is approximately a few hundred miles from where I live. By doing lat7 / 10000000 (10e7 or 10**7) I was able to get floating point GPS coordinates.

Since then, when it comes time to optimize database schemas I’ve always started with figuring out if I can shift the percentage out and use integers instead. If using sqlite3, a Float is actually a varchar and that’s huge in comparison to using a byte or two of signed integers. Throw a million records on and it can get up to 30-40% of wasted diskspace.

Anyway where was I. Since I wanted to get rid of all of the floats and replace the real fields with @hybrid_propertyand @hybrid_property.expression I renamed latitude to _latitude, shifted out the percent, and used the aforementioned decorators to transform the integers back to floats on demand.

Non-blocking python subprocess

I am working on a pet project to compress a terabyte of video into a slimmer format. While I have been able to automate working with ffmpeg, I didn’t like the fact that I couldn’t follow along with the subprocess running ffmpeg.

I tried a few different ideas of how to watch ffmpeg but also avoid the script from blocking because I wanted to be able to time and monitor it’s progress

import subprocess

process = subprocess.pOpen

stdout, stderr = process communicate blocks until the process is finished

subprocess.stdout.readline() and subprocess.stderr.readline() will both block until there is sufficient data. In ffmpeg’s case there is never stdout output so it will block indefinitely.

https://gist.github.com/devdave/9b8553d63e24ef19eea7e56f7cb95c78

By using threading Queue and constantly polling the process, I can watch the output as fast as it can come in but not worry about the main process blocking, just the threads.

A further improvement on the idea would be to have two threads (for stdout and stderr respectively) with the queue items put with sentinels like queue.put((STDERR, line_from_stderr)) and a sentinel for STDOUT.

To use - 

r = Runner(["some_long_running_process", "-arg1", "arg1 value"])

for stdout, stderr in r.start():
    print("STDOUT", stdout)
    print("STDERR", stderr)

Slimmed down twisted compatible reloader script

Working on txweb again, I decided to give it a real flask/django style reloader script. Directly inspired by <a href=”
https://blog.elsdoerfer.name/2010/03/09/twisted-twistd-autoreload/ “>this</a> blog post I decided to cut down on the extraneous bits and also change what files it watched.

https://gist.github.com/devdave/05de2ed2fa2aa0a09ba931db36314e3e

DCDB post-mortem

I went into writing DCDB with little or no plan besides building it around dataclasses. The result is a bit rough and precarious.

That said I think I am going to progress onward with making a DCDB2 library that will change a few things. The first would be to completely separate the DCDB tables themselves from the SQL processing logic in a way similar to sqlalchemy’s session system. I do have some other changes in mind, notably a better separation between the ORM domain classes and business logic as well as changes to how relationship’s work.

On the subject of relationship handling. That one would be a bit more complicated as the DCDB2 design idea I had was to use placeholders for the relationship (what does it connect too and in what way), then have the real instrumented handlers created and assigned to a constructed domain class. That last sentence is a bit painful to read which tells me I need to mull that one over a bit more. Regardless, the hack I put together in DCDB was just way too fragile.

Unit-testing sqlalchemy with pytest

@pytest.fixture(scope="function")
def conn(request):
    ts = int(time.time())
    db_path_name = "db"
    db_name = f"{request.function.__name__}.sqlite3"
    filepath = pathlib.Path(__file__).parent / db_path_name / db_name

    LOG.debug(f"Test DB @ {filepath}")
    engine = create_engine(f"sqlite:///{filepath}")
    connection = engine.connect()

    sal2.Base.metadata.drop_all(bind=engine)
    sal2.Base.metadata.create_all(bind=engine)
    factory = scoped_session(sessionmaker(bind=engine))
    sal2.Base.query = factory.query_property()
    session = factory()

    yield ConnResult(connection, session, engine)

    connection.close()
    engine.dispose()

Inside of my “tests” directory I added a “db” directory. Given the logic above, it spawns an entire new database for each test function so that I can go back and verify my database. For someone elses code, you just need to swap out “sal2” with the module name holding your sqlalchemy base and associated model classes. The only thing I wonder about is the issue with create_all. I remember there is a way to bind the metadata object without create_all but damn if I can remember it right now.

Python dataclass database (DCDB)

Why

While I do use sqlalchemy and to some extent peewee for my projects, I slowly got tired of having to relearn how to write SQL when I’ve known SQL since the mid-90’s.

DCDB’s design is also aiming for simplicity and minimal behind the scenes automagical behaviors.   Instead complexity should be added voluntarily and in such a way that it can be traced back.   

Example

import dataclasses as dcs
import dcdb 

@dcs.dataclass()
class Foo:
    name:str
    age:int

db = dcdb.DBConnection(":memory:") # alternatively this can be a file path
db.bind(Foo)
"""
   Bind doesn't change Foo in the local scope but instead
   it creates a new class DCDB_Foo which is stored to the DBConnection in it's 
   table registry.

   Behind the scenes, a table `Foo` is created to the connected database.  No changes to the name are made (eg pluralization). How you wrote your bound dataclasses is almost exactly how it is stored in the sqlite database.

   An exception is that a .id instance property along with DB methods like: update/save, Create, Get, and Select are added to the class definition.

"""
record = db.t.Foo(name="Bob", age="44")
assert record.name == "Bob"
same_record = db.t.Foo.Get("name=?", "Bob")
assert record.age == 44
assert record.id == same_record.id

record.age = 32
record.save()

same_record = db.t.Foo.Get("age=?", 32)
assert record.id == same_record.id
assert same_record.age == 32

same_record.delete()

"""
Note it is important to notice that currently same_record and 
record have the same .id # property but they are different 
instances and copies of the same record with no shared reference.   
Changes to one copy will not reflect with the other.

"""

Github DCDB