Tag Archives: sqlalchemy

Data migration with SQLAlchemy and Alembic

I needed to optimize an unruly table filled with floats but I also didn’t want to lose my data. Unfortunately the documentation on the alembic website doesn’t mention anything or give any hints on how to do a data migration versus just a schema migration.

Fortunately I was able to run a symbolic debugger against alembic and figured out that all of the op.<method>`calls are atomic. If you have an add_column call, it adds the column when it executes that method. So that opened the door to data migrations.

One note before I pasted the code. You don’t need to specify all of the columns of the source table when used in a data migration scope. This makes your code a lot cleaner as the working model code is specific to what data you plan on using.

Alright, no more babbling, here is the example code.

A while back I downloaded my google location and history data and ran into these strange lat7 and long7 columns (paraphrasing as I don’t remember their exact names). The data were these large integer numbers that I couldn’t figure out how to decode. Suddenly it became obvious when I noticed all of the latitude fields started with 35 and the longitude started with -104. 35, -104 is approximately a few hundred miles from where I live. By doing lat7 / 10000000 (10e7 or 10**7) I was able to get floating point GPS coordinates.

Since then, when it comes time to optimize database schemas I’ve always started with figuring out if I can shift the percentage out and use integers instead. If using sqlite3, a Float is actually a varchar and that’s huge in comparison to using a byte or two of signed integers. Throw a million records on and it can get up to 30-40% of wasted diskspace.

Anyway where was I. Since I wanted to get rid of all of the floats and replace the real fields with @hybrid_propertyand @hybrid_property.expression I renamed latitude to _latitude, shifted out the percent, and used the aforementioned decorators to transform the integers back to floats on demand.

Unit-testing sqlalchemy with pytest

@pytest.fixture(scope="function")
def conn(request):
    ts = int(time.time())
    db_path_name = "db"
    db_name = f"{request.function.__name__}.sqlite3"
    filepath = pathlib.Path(__file__).parent / db_path_name / db_name

    LOG.debug(f"Test DB @ {filepath}")
    engine = create_engine(f"sqlite:///{filepath}")
    connection = engine.connect()

    sal2.Base.metadata.drop_all(bind=engine)
    sal2.Base.metadata.create_all(bind=engine)
    factory = scoped_session(sessionmaker(bind=engine))
    sal2.Base.query = factory.query_property()
    session = factory()

    yield ConnResult(connection, session, engine)

    connection.close()
    engine.dispose()

Inside of my “tests” directory I added a “db” directory. Given the logic above, it spawns an entire new database for each test function so that I can go back and verify my database. For someone elses code, you just need to swap out “sal2” with the module name holding your sqlalchemy base and associated model classes. The only thing I wonder about is the issue with create_all. I remember there is a way to bind the metadata object without create_all but damn if I can remember it right now.

Flask CRUD with sqlalchemy and jinja2 contextfilters

Quick disclaimer, the Flask CRUD thing is not public domain yet and is very volatile.

The project is here
https://github.com/devdave/wfmastery/tree/revamp_1/wfmastery
And the outline for the crud thing is in this commit https://github.com/devdave/wfmastery/commit/e249895ddc53c0696f59d3def5718e76855af5b9

https://github.com/devdave/wfmastery/blob/revamp_1/wfmastery/crud.py
https://github.com/devdave/wfmastery/blob/revamp_1/wfmastery/views.py
https://github.com/devdave/wfmastery/blob/revamp_1/wfmastery/templates/equipment_list.j2.html

First is how the crud is currently constructed

class Equipment(CrudAPI):

    def populate(self):

        self.record_cls = db.Equipment
        self.identity = "equipment"

        self.template_form = "equipment_form.j2.html"
        self.template_list = "equipment_list.j2.html"

        self._listColumn("id")
        self._listColumn("hidden")
        self._listColumn("name", magic_field="magic-string")
        self._listColumn("pretty_name", magic_field="magic-string")

        self._addRelationship("category", "name", magic_field="magic-filter")
        self._addRelationship("subcategory", "name", magic_field="magic-filter")

both vars “template_form” and “template_list” are going to be preset once I am certain that the templates can stand on their own with the context vars provided. The “magic-” params and their use are very much magic (eg really toxic) and would recommend ignoring them.

From there the CrudAPI takes over. Skipping ahead to how this relates to context filters. I had this tag mess here in the template

-{%-      for column_name in origin.list_columns -%}
 -{%-          if column_name in origin.magic_columns -%}
 -        {{ cell("", column_name|title, classes=origin.magic_columns[column_name]) -}}
 -{%-          else -%}
 -        {{ cell("", column_name|title) -}}
 -{%          endif %}
 -{%-      endfor %}

and was really not happy with it. So I dived into Flask and Jinja2’s documentation and code to figure out if I could apply Python code inline.

The answer is yes via jinja2’s contextfilters which are not exposed to Flask but can still be used.

@App.template_filter("render_header")
def render_header(context, column_name, value="", **kwargs):
    result = ""
    if column_name in context['origin'].magic_columns:
        result = context['cell'](value, column_name.capitalize(), classes=context['origin'].magic_columns[column_name])
    else:
        result = context['cell'](value, column_name.capitalize())


    return result

render_header.contextfilter=True

The trick to going from filter to contextfilter is just applying `my_func.contextfilter = True` outside of your functions scope. From there you have access to almost everything (if not everything). The var “origin” is the CrudAPI’s instance passed to the template.

This has opened a lot more opportunities to do clean up. Taking


{% macro data_attributes(data_map, prefix="data-") -%}

    {%- for name, value in data_map.items() -%}
    {{" "}}{{prefix}}{{name}}="{{value}}"
    {%- endfor -%}
{%- endmacro %}
  

{% macro cell(name, value, classes=None, data_attrs={}) %}
        
      {{- caller() if caller else value -}}
  {%- endmacro -%}

and condensing it down to

{% macro cell(name, value, classes=None, data_attrs={}) %}
        
    {{- caller() if caller else value -}}
{%- endmacro -%}

via a simple non-context filter

@App.template_filter("dict2attrs")
def dict_to_attributes(attributes, prefix=None):
    results = []
    name2dash = lambda *x: "-".join(x)
    format_str = "%s-{}=\"{}\"" % prefix if prefix else "{}=\"{}\""

    for key, value in attributes.items():
        results.append(format_str.format(key, value))

    #TODO disable autoescape
    return " ".join(results)

Just note that at the moment output is still managed by Jinja’s autoescape and I’d rather not shut that off so calls MUST be suffixed with “|safe” as used above.

As for the Crud API, I feel like that is coming along nicely.