Thursday, November 17, 2016

SQLAlchemy 1.1.4's New "server_side_cursors" Option

Before SQLAlchemy 1.1.4, if you wanted to stream your MySQL query using server side cursors, you would need to run your query using "execution_options(stream_results=True)" and pass SSCursor into create_engine's connect_args. Now, all you need to do is pass "server_side_cursors=True" into create_engine and it will automatically stream the results for your select queries.

http://docs.sqlalchemy.org/en/latest/dialects/mysql.html#server-side-cursors

Sunday, October 23, 2016

Is it necessary to run SQLAlchemy's session.remove()?

If you're using flask-sqlalchemy, then the answer is no - it's already doing it for you when the request finishes: https://github.com/mitsuhiko/flask-sqlalchemy/blob/2.1/flask_sqlalchemy/__init__.py#L823

If you're using scoped_session and not explicitly running session.remove(), then your connections will only be returned to the pool after your thread finishes and garbage collection occurs. It's not a good idea to leave this to garbage collection, because you can't guarantee your connections will be returned to the pool when your application is busy. This will often lead to errors like this: "TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30"

The solution is to make sure session.remove() runs when the work finishes (like flask-sqlalchemy does), as described here: http://docs.sqlalchemy.org/en/latest/orm/contextual.html#using-thread-local-scope-with-web-applications

Another alternative is not making the session global and using a context manager, as described at the bottom of this section: http://docs.sqlalchemy.org/en/latest/orm/session_basics.html#when-do-i-construct-a-session-when-do-i-commit-it-and-when-do-i-close-it However, this can make things difficult when you need to access the query results outside of the context manager. If you access the query results outside of the context manager without running session.expunge_all(), you'll see all kinds of errors like this: "DetachedInstanceError: Instance <ReportingJob at 0xa41cd8c> is not bound to a Session; attribute refresh operation cannot proceed"

Another option is setting autocommit to True, but that has quite a few gotchas. Turning on autocommit will acquire connections from the engine on an as-needed basis and return them immediately after their use. You will have to explicitly start transactions with session.begin() if you enable it. It seems like this setting should be called "autotransaction" instead. It's also not very efficient, because it needs to get and return a connection each time you run a query.

Friday, September 2, 2016

PHP - max_execution_time doesn't work for socket operations

In new relic transaction logs, I was seeing some PHP requests last as long as an hour. It turns out that time spent waiting on sockets doesn't apply toward max_execution_time, and loops that involved waiting on sockets could end up taking a really long time without timing out.

PHP docs have this to say about max_execution_time:
"The set_time_limit() function and the configuration directive max_execution_time only affect the execution time of the script itself. Any time spent on activity that happens outside the execution of the script such as system calls using system(), stream operations, database queries, etc. is not included when determining the maximum time that the script has been running. This is not true on Windows where the measured time is real."

The solution ended up being setting request_terminate_timeout in php-fpm.

Saturday, August 27, 2016

Python - Cachetools LRUCache KeyError

If your application is threaded and you're getting a "KeyError" while using the non-decorator version of cachetool's LRUCache, then you need to put whatever is manipulating the cache object inside of a lock. Also, since LRUCache is modified when values are gotten from it, you will also need to make sure you're locking when you get values from cache too. If you can use the decorator version of LRUCache, that's preferred since it has built-in locking.

Here's an example of the error:

And an example of the fix: https://bitbucket.org/zzzeek/dogpile.cache/pull-requests/32/add-a-cachetools-lru-lfu-in-memory-backend/diff#comment-22242704

Tuesday, June 21, 2016

Javascript's Round vs PHP's Round

Today I learned that by default PHP rounds differently than javascript.

PHP (using the default PHP_ROUND_HALF_UP)

php > echo round(-1.5);
-2
"Round val up to precision decimal places away from zero, when it is half way there. Making 1.5 into 2 and -1.5 into -2."

Javascript

Math.round(-1.5);
-1
"For negative numbers, if the decimal portion is exactly -0.5, the return value is the smallest integer that is greater than the number."
https://msdn.microsoft.com/en-us/library/5cza0web(v=vs.94).aspx

Saturday, May 21, 2016

Django-Moderation

It would be nice if there were a sqlalchemy/flask equivalent to this: https://github.com/dominno/django-moderation

The equivalent might use sqlalchemy_utils for its generic relationships: https://sqlalchemy-utils.readthedocs.io/en/latest/generic_relationship.html

Thursday, May 19, 2016

Modified Preorder Tree Traversal

This was a great explanation of Modified Preorder Tree Traversal (or MPTT): https://www.sitepoint.com/hierarchical-data-database-2/

A common use-case for MPTT is categorization data, like with the way django-oscar uses django-mptt: https://github.com/django-mptt/django-mptt

Sunday, May 8, 2016

Datatables - Uncaught TypeError: Cannot read property 'style' of undefined

If you see this error in datatables: "Uncaught TypeError: Cannot read property 'style' of undefined"

It's likely that you're missing a header column in your table. Make sure you have the same number of <th> elements as items in the "columns" section of your datatables initialization.

Select2 Replacement

This selectize.js library looks like a good replacement for Select2: https://selectize.github.io/selectize.js/

Wednesday, May 4, 2016

Javascript Data Tables


Sunday, April 10, 2016

SQLAlchemy - JSON not changing on commit

If your changes to a JSON, JSONB, or HSTORE field aren't saving after you commit, then you need to wrap your field in the MutableDict shown here: http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html#sqlalchemy.dialects.postgresql.HSTORE

Saturday, April 9, 2016

SQLAlchemy - AttributeError: type object 'JSONB' has no attribute 'lower'

This error happened because I mistyped "db.column" instead of "db.Column" when I was creating my database model using SQLAlchemy.

This might be your problem if db.create_all() isn't creating all the columns you expected.

Saturday, March 19, 2016

PHP - Problems With Long Running Processes

Great article about the problems with using PHP for long running processes: https://software-gunslinger.tumblr.com/post/48215406921/php-is-meant-to-die-continued

Describes the main reason why I started using Python for long-running jobs.

Thursday, March 3, 2016

Ansible Vault - storing secrets in repos

When others are deploying a project for you, it's easy for mistakes to be made when secrets must be updated in environmental variables. Ansible-vault takes a different approach and encrypts the secrets - allowing you to store the secrets in your repo.

To encrypt a file: ansible-vault encrypt secrets.py
To decrypt a file: ansible-vault decrypt secrets.py

More documentation is available here: http://docs.ansible.com/ansible/playbooks_vault.html

Python - isort, useful tool for sorting python imports

http://timothycrosley.github.io/isort/

I saw it in the django coding style documentation: https://docs.djangoproject.com/en/1.9/internals/contributing/writing-code/coding-style/

To make an entire project's imports easier to read, all you need to do is:

  • pip install isort
  • isort -rc .

There's also a sublime plugin for it here: https://github.com/thijsdezoete/sublime-text-isort-plugin#install

Thursday, February 18, 2016

git commit --amend -C HEAD

Ever committed something and needed to make changes later? Just rebase and squash it, right? Or maybe "git reset --soft HEAD~1" and commit again?

There's an even better solution: "git commit --amend -C HEAD"

It will add the combine your commit with your last commit. If you already pushed, you will need to force push your change.

pip install -e .

I've wasted a lot of time running "python setup.py install" before testing my changes to flask-admin. It turns out you can pip install a project as "editable", which points the install toward your local directory instead of copying where the rest of your python modules are.

To install a project as editable, navigate to the repo and run "pip install -e .".

Saturday, February 13, 2016

Python - Dropping Into pdb From Nose

This document describes how to drop into pdb when your Python nose tests fail: http://nose.readthedocs.org/en/latest/plugins/debug.html

It just requires running "nosetests --pdb".

Very helpful if you're debugging why a test failed.

Great Hadoop + Spark Tutorial

This great tutorial takes you through installing hadoop and spark to analyze Reddit comment data: http://blog.insightdatalabs.com/spark-cluster-step-by-step/

Hadoop Namenode Not Starting

My namenode was not starting because I had the wrong host configured in yarn-site.xml, mapred-site.xml, and core-site.xml.

When you're running start-dfs.sh on your namenode, ensure the line that says "starting namenode" shows "/usr/local/hadoop/logs/hadoop-ubuntu-namenode-<your namenode's hostname>.out" in the output. This is how you know your configuration is correct.

You can check your server's hostname on Ubuntu by running "echo $(hostname)".