Paul's Programming Notes     Archive     Feed     Github

Linux - Forgetting Exec

Today, I came across a bug where a Celery worker wasn’t gracefully shutting down, and it was causing some odd “Connection Refused” errors from requests within the task being run by the worker. It was also shutting down before it could send errors to Rollbar/Sentry for the team to know they need to address it.

This was happening because the entrypoint had a script that effectively did this:

echo "starting celery worker"
celery -A tasks worker

The problem with that entrypoint: it’s not using exec to run the child process. Without exec, it will not forward signals like SIGTERM to the process the entrypoint is waiting on. The Celery worker child process won’t know it needs to shut down gracefully if the SIGTERM is not forwarded to it.

The fixed example would look like this:

echo "starting celery worker"
exec celery -A tasks worker

Linux - Dig's +short Option

I learned Linux’s dig has a “+short” option that returns only the IP address for a hostname:

$ dig +short google.com
142.251.32.174

Linux - Ping Not Showing Lost Packets

I was recently troubleshooting some packet loss with ping on linux, and I noticed by default ping won’t explicitly show lost packets:

$ ping x.x.x.x
64 bytes from x.x.x.x: icmp_seq=8 ttl=52 time=18.1 ms
64 bytes from x.x.x.x: icmp_seq=11 ttl=52 time=21.2 ms

(notice the skipped 8-10, those were lost packets)

To fix this, you can add run ping with -O:

$ ping -O x.x.x.x
64 bytes from x.x.x.x: icmp_seq=11 ttl=52 time=19.0 ms
no answer yet for icmp_seq=12
no answer yet for icmp_seq=13
64 bytes from x.x.x.x: icmp_seq=14 ttl=52 time=18.9 ms

Related askubuntu.com Thread

Linux - Improving Traceroute Output

Usually when I use traceroute without any options, it gets stuck showing output like this:

 14   *  *  * 
 15   *  *  * 
 16   *  *  * 

For more info about why this is happening: Link

To improve this output, try adding -I (traceroute -I) to make it use ICMP ECHO instead of UDP for probes.

If you want better statistics about packet loss, you can also use a tool called mtr which combines traceroute and ping. If you’re on Ubuntu, you can install this with sudo apt-get install mtr.

Docker - WORKDIR Creates Directories

Which user do you think owns the src/ directory in this Dockerfile example?:

FROM alpine:3.13.2
RUN adduser -D wendy
USER wendy
WORKDIR src/
COPY --chown=wendy . src/

The answer?: root

This happens because WORKDIR will create directories that don’t exist with the root user.

What if you wanted wendy to be the owner of that src/ directory?

You would need to COPY the src/ directory into place before setting it as the WORKDIR. For example:

FROM alpine:3.13.2
RUN adduser -D wendy
USER wendy
COPY --chown=wendy . src/
WORKDIR src/

More info: Closed GitHub Issue & Docker Code

DIY NAS

If you have an old computer with spare hard drives, it might be useful to use it to share files with computers on your network by turning it into a NAS (network attached storage). There are several popular DIY NAS options:

I ended up going with OpenMediaVault due to it being Debian based and super popular. To set it up, I followed this incredibly thorough guide: link

I had a few issues that caused it to stop working after restarting.

The first issue was related to the disk being encrypted. OpenMediaVault started in emergency mode due to being unable to access the encrypted disk. Emergency mode unfortunately doesn’t allow SSH access, which means you need to plug a monitor and keyboard into it. You have to make sure you have nofail in the options section of the lines of /etc/fstab starting with /dev/disk. You also need to add nonempty and remove x-systemd.requires=<disk> from the options section of the lines starting with /srv/dev-disk-by-uuid (for mergerfs). I’ve had to redo the removing x-systemd.requires part every time I apply settings from the web UI. More information about this: 1 2 3 4 5 6 7

The second issue involved losing network adapter configuration after restarting. I had to run omv-firstaid and configure the network adapter with a static IP to resolve this. This may have something to do with an empty configuration from the web UI ovewriting the existing working configuration.

Docker - ufw rules ignored

I recently learned that docker will ignore ufw (uncomplicated firewall) rules by default. This means that it will still expose ports that are blocked by ufw.

The fix involved adding this to /etc/docker/daemon.json:

{
    "iptables": false,
    "ip6tables": false
}

Then I restarted the docker daemon with sudo systemctl restart docker.

More details

Raspberry Pi Zero Not Connecting to Unifi AP

I recently had a lot of trouble getting my Raspberry Pi Zero W 2 to connect to WiFi on my Ubiquiti Unifi AP AC Lite after a firmware upgrade to 5.43.46. It’s a 2.4Ghz-only device, and actually all of my 2.4Ghz-only devices wouldn’t connect.

Apparently something I did set a setting called PMF (Protected Management Frames) to “Required”, and the Pi Zero W 2 doesn’t support that. In theory, this setting should help prevent clients from getting disconnected through de-auth packets from an attacker. However, not all devices support this.

To fix the issue, you need to find the PMF setting under Settings -> Wifi -> <your network> -> Advanced -> Security and change it to “Optional”.

Fixing Memory Leaks In Popular Python Libraries

I was recently able to make a minimal example that reproduced a Celery memory leak. The memory leak would happen on the main Celery worker process that’s forked to make child processes, which makes the leak especially bad. Issue #4843 has been around for 3+ years and has 140+ comments, so this one has been causing a lot of problems for Celery users for a while.

The memory leak has been causing a lot of issues at my work too, and I was able to get some help resolving the issue during a work hackathon. My coworker Michael Lazar was able to find the root cause of the issue and make a pull request to fix it in py-amqp (a celery dependency when using RabbitMQ as a broker). The code with the issue was 10 years old!

Here’s what the bug looks like:

try:
    sock.shutdown(socket.SHUT_RDWR)
    sock.close()
except OSError:
    pass

The problem occurs when socket.shutdown fails on an OSError and doesn’t proceed to socket.close to clean up the socket and allow garbage collection to release the memory used for it. The OSError on shutdown can occur when the remote side of the connection closes the connection first.

The fixed example (with separate try/except blocks):

try:
    sock.shutdown(socket.SHUT_RDWR)
except OSError:
    pass

try:
    sock.close()
except OSError:
    pass

I was able to make the same fix to a few other popular Python libraries too:

I also found another way to reduce memory usage of Connections in py-amqp and librabbitmq by changing how active channel IDs are stored.

Update 12/20: Hacker News user js2 pointed out that Python will automatically close the socket when all the references to the socket are gone.

Update 12/23: I got a pull request merged into Kombu with the same memory usage reduction fix I made to py-amqp and librabbitmq. I also opened another pull request to Kombu that should fix a memory leak issue when using Celery with Redis.

Update 12/25: I wrote a section for the Celery docs about optimizing memory usage. I also fixed another leak in Celery that happens when connection errors occur on a prefork worker.