Paul's Programming Notes     Archive     Feed     Github

Django - RelatedManager.set not removing models

Python - Pipenv to pip-tools

I've been using Pipenv for the last few months and my biggest issue is that "--keep-outdated" has been broken in the latest release (2018.11.26) for a while. I've needed to install Pipenv from the master branch to make it functional. However, the last time I used "--keep-outdated" from the master branch, it wouldn't automatically update the hash of the dependency being updated.

Updating specific requirements is something I need to do pretty often, and it's not fun to explain all the Pipenv quirks to the team.

Pip-tools looks like it does everything I need and has fewer quirks, so I ended up making the switch.

Pipenv uses pip-tools under the hood, so the migration to pip-tools was very smooth. The migration process was:
  1. Copy the dev-packages and packages sections of the Pipfile to their own requirements.in files.
  2. Run pip-compile
  3. Copy over the specific versions and hashes from the Pipfile.lock to the generated requirements.txt. 
I did have a small issue where updating a specific package with pip-tools removed a bunch of dependencies from the requirements.txt unexpectedly, but running pip-compile with "--rebuild" fixed it.

AWS - redirecting domain to url using a 302 redirect (without running a server)

I wanted to make a domain name (heckingoodboys.com) redirect to a multisubreddit for dog pictures, but I didn't want to run a web server for it.

Here's what I did:
  1. Purchase the domain using Route53.
  2. Create two public s3 buckets (www.heckingoodboys.com and heckingoodboys.com)
  3. Enable "Static website hosting" on www.heckingoodboys.com and redirect to heckingoodboys.com.
  4. Enable "Static website hosting" on heckingoodboys.com, select "use this bucket to host this website", and use routing rules similar to this:
    <RoutingRules>
      <RoutingRule>
        <Redirect>
          <Protocol>https</Protocol>
          <HostName>www.reddit.com</HostName>
          <HttpRedirectCode>302</HttpRedirectCode>
          <ReplaceKeyPrefixWith>user/heckingoodboys/m/heckingoodboys/</ReplaceKeyPrefixWith>
        </Redirect>
      </RoutingRule>
    </RoutingRules>
  5. Back to Route53 - Create an A record for both www.heckingoodboys.com and heckingoodboys.com using the alias to their respective buckets. (this will be the first option in autocomplete)
For more details: https://medium.com/@P_Lessing/single-page-apps-on-aws-part-1-hosting-a-website-on-s3-3c9871f126

Why not just use a CNAME from www.heckingoodboys.com to heckingoodboys.com? AWS says they don't charge for aliases, but they do charge for CNAMEs. So, I used an alias to a bucket instead.

Sphinx Search - Lessons Learned

Here are a few things I've learned while working on a project that uses Sphinx search:
  • It's important to know the difference between fields and attributes. Attributes are basically unindexed columns and you should try to avoid filtering only on these columns. Fields support full text search. 
  • It supports its own custom binary protocol and the MySQL protocol (recently they also added a HTTP API). When you see "listen = localhost:9306:mysql41" in the config, that means it's listening for MySQL protocol traffic on port 9306.
  • https://github.com/a1tus/sphinxapi-py3 appears to be the best Python client for the binary api at the moment. This doesn't support INSERTing things into the index (you'll need to use the MySQL protocol for that).
  • The version of sphinxapi-py3 on pypi is a fork with just a few minor fixes and appears to be safe.
  • It does not match partial words by default. Turning on partial matching can also increase the size of your index dramatically. You can also limit the fields that support partial matching with the "infix_fields" and "prefix_fields" setting.
  • Stemmers aren't turned on by default. So, searching for "dog" will not match "dogs".
  • Most special characters ($, @, &, etc) are ignored by default. You will need to add them to charset_table if you want them to be searchable.
  • Ruby's thinking-sphinx looks much more battle tested than all of the Python binary api clients: https://github.com/pat/thinking-sphinx
  • You will need to use a real-time index if you want to INSERT/DELETE records immediately.
  • If you're using a real-time index, you will probably need to increase the "rt_mem_limit" from its default of 128mb. If this limit is too low, you'll see a high number of "disk chunks" when you run the "SHOW INDEX rtindex STATUS" query. More info: http://sphinxsearch.com/blog/2014/02/12/rt_performance_basics/
  • You have to use a special dialect if you want to use SQLAlchemy with sphinx: https://github.com/conversant/sqlalchemy-sphinx
  • This appears to be the best Dockerfile for sphinx: https://github.com/leodido/dockerfiles
I probably won't be using Sphinx search for any new projects. Elasticsearch seems preferable these days.