Tuesday, February 14, 2017

SQLAlchemy - DISTINCT, LIMIT, or OFFSET Causing Subqueries

This section at the bottom of The Zen of Eager loading is really important:
When using joined eager loading, if the query contains a modifier that impacts the rows returned externally to the joins, such as when using DISTINCT, LIMIT, OFFSET or equivalent, the completed statement is first wrapped inside a subquery, and the joins used specifically for joined eager loading are applied to the subquery. SQLAlchemy’s joined eager loading goes the extra mile, and then ten miles further, to absolutely ensure that it does not affect the end result of the query, only the way collections and related objects are loaded, no matter what the format of the query is.
I made an example to illustrate this here: https://gist.github.com/pawl/cb57e0ddbdd0b2e64b75e94116873367

On MySQL this can be responsible for some really poor query performance, because it can cause it to use temporary tables and filesort.

The best way I've found to prevent the subqueries is by first querying for the ids only, then running another query that includes all relations. For example:
ids = session.query(Product.id).limit(20)
Product.query.filter(Product.id.in_(ids))

Sunday, February 12, 2017

SQLAlchemy - "Base.query = db_session.query_property()"

I just saw this line in the SQLAlchemy flask patterns example:
Base.query = db_session.query_property()
I went searching for what it does and found this: https://eoyilmaz.blogspot.com/2014/01/i-feel-so-much-hungry-about-posting.html

Now I know how Flask-SQLAlchemy made queries work with Model.query instead of session.query(Model).

Saturday, February 11, 2017

SQLAlchemy - Lost connection to MySQL server during query

Here's was my situation:
  • The database was set up behind behind an AWS ELB and HAProxy. 
  • The idle connection timeout on the ELB was set to 60 mins.
  • All the relevant timeouts on HAProxy seemed to be set to 60 mins too. 
  • The pool_recycle in SQLAlchemy was set to 30 mins.
  • I was still seeing the occasional "Lost connection to MySQL server during query" when small queries were running after the connection had some time to sit around.
The solution ended up being setting my pool_recycle down to 5 mins, but I'm still not sure what was causing connections to time out after 5 mins.

There are definitely other things that can cause this problem too. For example, it can happen if your data exceeds max_allowed_packet. See this page for more details: https://dev.mysql.com/doc/refman/5.7/en/error-lost-connection.html

Most of this also applies for "MySQL server has gone away".

You should also make sure your pool_recycle is set lower than your 'interactive_timeout' and 'wait_timeout' properties in the mysql config file to the values you need.