What can you do with custom storage systems?:
"Django abstracts file storage using storage backends, from simple filesystem storage to things like S3. This can be used for processing file uploads, storing static assets, and more." -https://tartarus.org/james/diary/2013/07/18/fun-with-django-storage-backends
- It's important to know the difference between fields and attributes. Attributes are basically unindexed columns and you should try to avoid filtering only on these columns. Fields support full text search.
- It supports its own custom binary protocol and the MySQL protocol (recently they also added a HTTP API). When you see "listen = localhost:9306:mysql41" in the config, that means it's listening for MySQL protocol traffic on port 9306.
- https://github.com/a1tus/sphinxapi-py3 appears to be the best Python client for the binary api at the moment. This doesn't support INSERTing things into the index (you'll need to use the MySQL protocol for that).
- The version of sphinxapi-py3 on pypi is a fork with just a few minor fixes and appears to be safe.
- It does not match partial words by default. Turning on partial matching can also increase the size of your index dramatically. You can also limit the fields that support partial matching with the "infix_fields" and "prefix_fields" setting.
- Stemmers aren't turned on by default. So, searching for "dog" will not match "dogs".
- Most special characters ($, @, &, etc) are ignored by default. You will need to add them to charset_table if you want them to be searchable.
- Ruby's thinking-sphinx looks much more battle tested than all of the Python binary api clients: https://github.com/pat/thinking-sphinx
- You will need to use a real-time index if you want to INSERT/DELETE records immediately.
- If you're using a real-time index, you will probably need to increase the "rt_mem_limit" from its default of 128mb. If this limit is too low, you'll see a high number of "disk chunks" when you run the "SHOW INDEX rtindex STATUS" query. More info: http://sphinxsearch.com/blog/2014/02/12/rt_performance_basics/
- You have to use a special dialect if you want to use SQLAlchemy with sphinx: https://github.com/conversant/sqlalchemy-sphinx
- This appears to be the best Dockerfile for sphinx: https://github.com/leodido/dockerfiles
I probably won't be using Sphinx search for any new projects. Elasticsearch seems preferable these days.
- "name" (without trailing whitespace)
- "name " (with trailing whitespace)
To my surprise, I got a duplicate error on that 2nd insert. It turns out that MySQL ignores that trailing whitespace when it makes comparisons.
The MySQL docs say this: "All MySQL collations are of type PAD SPACE. This means that all CHAR, VARCHAR, and TEXT values are compared without regard to any trailing spaces. “Comparison” in this context does not include the LIKE pattern-matching operator, for which trailing spaces are significant." (https://dev.mysql.com/doc/refman/5.7/en/char.html)
The solution? You should probably be trimming trailing whitespace in your API endpoints and on your front-end.
I made a script to test: https://gist.github.com/pawl/56100a4ef958374a433840be8037b11b
Here are the results:
verify=True took: 40.3454630375 secs
verify=False took: 39.3803040981 secs
gevent verify=True took: 2.23735189438 secs
gevent verify=False took: 1.58263015747 secs
I suspect that gevent is having trouble using pyopenssl concurrently because it's a C library.
The first thing I came across was this "memcached-tool" which has a dump command: https://github.com/memcached/memcached/blob/master/scripts/memcached-tool
There's another article that mentions using memdump and memcat: http://www.dctrwatson.com/2010/12/how-to-dump-memcache-keyvalue-pairs-fast/
Unfortunately, those methods only dumped a few mb of data. This post explains why: https://stackoverflow.com/a/13941700
You can only dump one page per slab class (1MB of data)So, I ended up writing a script that loops through the expected cache keys, gets the data in cache, then sets the data in the new cache server.
[error] 10#0: *14843 connect() to unix:/tmp/gunicorn.sock failed (11: Resource temporarily unavailable) while connecting to upstream, client: 220.127.116.11, server: , request: "GET / HTTP/1.0", upstream: "http://unix:/tmp/gunicorn.sock:/", host: "18.104.22.168"I ended up making an example dockerfile with nginx + gunicorn + flask to reproduce this problem: https://github.com/pawl/somaxconn_test
Bumping the "net.core.somaxconn" setting ended up fixing it.
Before SQLAlchemy 1.2.0, if you use an empty list with in_(), it will emit some crazy SQL that will query your entire table. The best solution is probably to upgrade to SQLAlchemy 1.2.0.
For example, making an API that throws errors when an unexpected parameter is provided is a bad idea. What if you need to make changes to the client to add the new parameter? You will need to make sure you deploy the code on the server side first, otherwise it will cause errors.