The Robustness Principle

14 Mar 2017 I learned about this at a talk called "Implementing Evolvable APIs" at SXSW: https://en.wikipedia.org/wiki/Robustness_principle

For example, making an API that throws errors when an unexpected parameter is provided is a bad idea. What if you need to make changes to the client to add the new parameter? You will need to make sure you deploy the code on the server side first, otherwise it will cause errors.

SQLAlchemy - DISTINCT, LIMIT, or OFFSET Causing Subqueries

15 Feb 2017 This section at the bottom of The Zen of Eager loading is really important:

When using joined eager loading, if the query contains a modifier that impacts the rows returned externally to the joins, such as when using DISTINCT, LIMIT, OFFSET or equivalent, the completed statement is first wrapped inside a subquery, and the joins used specifically for joined eager loading are applied to the subquery. SQLAlchemy’s joined eager loading goes the extra mile, and then ten miles further, to absolutely ensure that it does not affect the end result of the query, only the way collections and related objects are loaded, no matter what the format of the query is.

I made an example to illustrate this here: https://gist.github.com/pawl/cb57e0ddbdd0b2e64b75e94116873367

On MySQL this can be responsible for some really poor query performance, because it can cause it to use temporary tables and filesort.

The best way I've found to prevent the subqueries is by first querying for the ids only, then running another query that includes all relations. For example:

ids = session.query(Product.id).limit(20)
Product.query.filter(Product.id.in_(ids))

SQLAlchemy - "Base.query = db_session.query_property()"

13 Feb 2017 I just saw this line in the SQLAlchemy flask patterns example:

Base.query = db_session.query_property()

I went searching for what it does and found this: https://eoyilmaz.blogspot.com/2014/01/i-feel-so-much-hungry-about-posting.html

Now I know how Flask-SQLAlchemy made queries work with Model.query instead of session.query(Model).

SQLAlchemy - Lost connection to MySQL server during query

11 Feb 2017

Here's was my situation:

The database was set up behind behind an AWS ELB and HAProxy.
The idle connection timeout on the ELB was set to 60 mins.
All the relevant timeouts on HAProxy seemed to be set to 60 mins too.
The pool_recycle in SQLAlchemy was set to 30 mins.
I was still seeing the occasional "Lost connection to MySQL server during query" when small queries were running after the connection had some time to sit around.

The solution ended up being setting my pool_recycle down to 5 mins, but I'm still not sure what was causing connections to time out after 5 mins.

There are definitely other things that can cause this problem too. For example, it can happen if your data exceeds max_allowed_packet. See this page for more details: https://dev.mysql.com/doc/refman/5.7/en/error-lost-connection.html

Most of this also applies for "MySQL server has gone away".

You should also make sure your pool_recycle is set lower than your 'interactive_timeout' and 'wait_timeout' properties in the mysql config file to the values you need.

SQLAlchemy - Unexpected Lazy Loading

27 Jan 2017 If you're seeing unexpected lazy loading on a lazy="joined" relationship in SQLAlchemy, it might be because you're accessing those relationships after you've already run session.commit(). By default, session.commit() will expire the data on your relationships, meaning it will try to fetch it again next time you try to access those attributes.

The relevant section of the docs for session.commit():

By default, the Session also expires all database loaded state on all ORM-managed attributes after transaction commit. This so that subsequent operations load the most recent data from the database. This behavior can be disabled using the expire_on_commit=False option to sessionmaker or the Session constructor.

Python - David Beazley's "Understanding The Python GIL"

20 Jan 2017 This talk explains a lot about Python's multithreading: http://www.dabeaz.com/python/UnderstandingGIL.pdf

In Python 2.7, it knows to switch to different thread whenever it starts waiting on IO. And, it will check if it needs to switch threads every 100 bytecode instructions.

SQLAlchemy - Is is necessary to commit after session.execute?

17 Jan 2017

The docs for session.execute say:

Execute a SQL expression construct or string statement within the current transaction.

So, the answer is yes, you need to issue a commit after running session.execute().

Here’s a code example along with the output from “echo=True” showing it begins the session but doesn’t end it unless you run session.commit():

from sqlalchemy import create_engine from sqlalchemy.orm import scoped_session, sessionmaker engine = create_engine('mysql://root@localhost/test') db_session = scoped_session(sessionmaker(bind=engine)) db_session.execute(''' INSERT INTO user (first_name, last_name, username, email) VALUES ("blah", "blah", "blah", "blah@blah.com"); ''')

	from sqlalchemy import create_engine
	from sqlalchemy.orm import scoped_session, sessionmaker

	engine = create_engine('mysql://root@localhost/test')
	db_session = scoped_session(sessionmaker(bind=engine))

	db_session.execute('''
	INSERT INTO user (first_name, last_name, username, email)
	VALUES ("blah", "blah", "blah", "blah@blah.com");
	''')

view raw 1_sqlalchemy_session_execute.py hosted with ❤ by GitHub

	2017-01-16 22:54:11,788 INFO sqlalchemy.engine.base.Engine SHOW VARIABLES LIKE 'sql_mode'
	2017-01-16 22:54:11,788 INFO sqlalchemy.engine.base.Engine ()
	2017-01-16 22:54:11,790 INFO sqlalchemy.engine.base.Engine SELECT DATABASE()
	2017-01-16 22:54:11,790 INFO sqlalchemy.engine.base.Engine ()
	2017-01-16 22:54:11,791 INFO sqlalchemy.engine.base.Engine show collation where `Charset` = 'utf8' and `Collation` = 'utf8_bin'
	2017-01-16 22:54:11,791 INFO sqlalchemy.engine.base.Engine ()
	2017-01-16 22:54:11,792 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS CHAR(60)) AS anon_1
	2017-01-16 22:54:11,792 INFO sqlalchemy.engine.base.Engine ()
	2017-01-16 22:54:11,793 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS CHAR(60)) AS anon_1
	2017-01-16 22:54:11,793 INFO sqlalchemy.engine.base.Engine ()
	2017-01-16 22:54:11,794 INFO sqlalchemy.engine.base.Engine SELECT CAST('test collated returns' AS CHAR CHARACTER SET utf8) COLLATE utf8_bin AS anon_1
	2017-01-16 22:54:11,794 INFO sqlalchemy.engine.base.Engine ()
	2017-01-16 22:54:11,795 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
	2017-01-16 22:54:11,796 INFO sqlalchemy.engine.base.Engine
	INSERT INTO user (first_name, last_name, username, email)
	VALUES ("blah", "blah", "blah", "blah@blah.com");

	2017-01-16 22:54:11,796 INFO sqlalchemy.engine.base.Engine ()

view raw 2_output.txt hosted with ❤ by GitHub

Global .gitignore

27 Nov 2016 https://help.github.com/articles/ignoring-files/#create-a-global-gitignore

SQLAlchemy 1.1.4's New "server_side_cursors" Option

18 Nov 2016 Before SQLAlchemy 1.1.4, if you wanted to stream your MySQL query using server side cursors, you would need to run your query using "execution_options(stream_results=True)" and pass SSCursor into create_engine's connect_args. Now, all you need to do is pass "server_side_cursors=True" into create_engine and it will automatically stream the results for your select queries.

http://docs.sqlalchemy.org/en/latest/dialects/mysql.html#server-side-cursors

Is it necessary to run SQLAlchemy's session.remove()?

23 Oct 2016 If you're using flask-sqlalchemy, then the answer is no - it's already doing it for you when the request finishes: https://github.com/mitsuhiko/flask-sqlalchemy/blob/2.1/flask_sqlalchemy/__init__.py#L823

If you're using scoped_session and not explicitly running session.remove(), then your connections will only be returned to the pool after your thread finishes and garbage collection occurs. It's not a good idea to leave this to garbage collection, because you can't guarantee your connections will be returned to the pool when your application is busy. This will often lead to errors like this: "TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30"

The solution is to make sure session.remove() runs when the work finishes (like flask-sqlalchemy does), as described here: http://docs.sqlalchemy.org/en/latest/orm/contextual.html#using-thread-local-scope-with-web-applications

Another alternative is not making the session global and using a context manager, as described at the bottom of this section: http://docs.sqlalchemy.org/en/latest/orm/session_basics.html#when-do-i-construct-a-session-when-do-i-commit-it-and-when-do-i-close-it However, this can make things difficult when you need to access the query results outside of the context manager. If you access the query results outside of the context manager without running session.expunge_all(), you'll see all kinds of errors like this: "DetachedInstanceError: Instance <ReportingJob at 0xa41cd8c> is not bound to a Session; attribute refresh operation cannot proceed"

Another option is setting autocommit to True, but that has quite a few gotchas. Turning on autocommit will acquire connections from the engine on an as-needed basis and return them immediately after their use. You will have to explicitly start transactions with session.begin() if you enable it. It seems like this setting should be called "autotransaction" instead. It's also not very efficient, because it needs to get and return a connection each time you run a query.