I got to work on a Python code-base in the last days and have been struggling to get to a stable development environment. I learned some lessons along the way. Here is one of them, as it seems to be rather common and causes headaches for many people (a simple search has a whopping half a million hits on Google).
ModuleNotFoundError: No module named '_lzma'
I found this error at the bottom of a very long stack trace after running a unit test. That was suprising as the tests had worked fine before. I stashed all current changes, but the error persisted. I recognized that the lzma module is part of the standard library. I could not find the module _lzma though. It turned out that the problem could be reproduced with a simple import statement:
So somehow the Python installation was defective.🤔
It turns out that there is a hint in the name of the module: Starting a module name with an underscore seems weird as it starts with an underscore. That is no coincidence as PEP8 states:
When an extension module written in C or C++ has an accompanying Python module that provides a higher level (e.g. more object oriented) interface, the C/C++ module has a leading underscore (e.g. _socket).
So the Python installation missed an extension module not written in C. I used pyenv to install and manage Python versions. This seems to be a popular choice. But it turns out there is a gotcha to be found in the documentation:
pyenv will try its best to download and compile the wanted Python version, but sometimes compilation fails because of unmet system dependencies, or compilation succeeds but the new Python version exhibits weird failures at runtime.
The document then lists the recommended build environment for different environments. This is the recommendation of packages for my system of choice (Debian/Ubuntu):
sudo apt-get update sudo apt-get install make build-essential libssl-dev zlib1g-dev \ libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm \ libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev
Quite a big list and while some of those dependencies are pretty common others are rather exotic. If any of those dependencies are missing, then you also might be in for those weird failures at runtime.
The solution involves two steps:
Find the missing system dependencies.
Recompile the dependency.
So in the case of a Python installation through pyenv this means removing the faulty installation, installing the dependencies above and then reinstalling aka recompiling Python.
As a side note I think this is part of why Conda became quite popular: As it ships many (binary) dependencies problems like this one are largely avoided. It should also be stated that Python's ability to create easy interfaces for mainly C/C++ libraries is a strength that gives it a lot of reach and part of the reason it became so big in Data Science and ML.