Snake bites: Beware malicious Python libraries

Malware posing as Python libraries is routinely showing up on PyPI, Python’s official package index

Comments

Earlier this week, two Python libraries containing malicious code were removed from the Python Package Index (PyPI), Python’s official repository for third-party packages.

It’s the latest incarnation of a problem faced by many modern software development communities, raising an important question for all developers who rely on open source software: How can you make it possible for people to contribute their own code to a common repository for re-use, without those repos becoming vectors for attacks?

By and large, the official third-party library repositories for languages run as open source projects, like Python, are safe. But malicious versions of a library can spread quickly if unchecked. And the fact that most such language repositories are overseen by volunteers means that only so many eyes are on the lookout and contributions don’t always get the scrutiny needed.

The two malicious packages removed from PyPI this week used a trick called “typo squatting,” i.e. choosing names that are similar enough to commonly used packages to slip notice, and that can result in accidental installation if someone mistypes the intended name. Attempting to masquerade as the dateutil and jellyfish packages—used for manipulating Python datetime objects and performing approximate matches on strings, respectively—the malicious packages were named python-dateutil and jeIlyfish (with an uppercase I instead of the first lowercase L).

When installed, python-dateutil and jeIlyfish behaved exactly like the originals—except for attempting to steal personal data from the developer. Paul Ganssle, a developer on the dateutil team, told ZDNet that the likely reason for the attack was to figure out what projects the victim worked on, in order to launch later attacks on those projects.

Python libraries generally fall into two camps—the modules that make up the standard library shipped with the Python runtime, and third-party packages hosted on PyPI. Whereas the modules in the standard library are closely inspected and rigorously vetted, PyPI is far more open by design, allowing the community of Python users to freely contribute packages for re-use.

Malicious projects have been found on PyPI before. In one case, malicious packages typo squatted the Django framework, a staple of web development in Python. But the problem seems to be growing more urgent.

“As a member of the Python security team (PSRT) I’m getting reports about typo squatting or malicious packages every week,” said Christian Heimes, a core Python developer, in Python’s official development discussion forum. “(Fun fact: There were four email threads about malicious content on PyPI this month and today is just Dec 4.)”

The Python Software Foundation has plans on the table for protecting PyPI against abuse, but they will take time to fully roll out. Earlier this year, the Python team rolled out two-factor authentication as an option for PyPI users who upload packages. That provides a layer of protection for developers who upload to PyPI, making it harder to hijack their accounts and upload malware in their name. But it doesn’t address typo squatting or other abuses of the commons.

Other initiatives include looking at ways to offset those problems with automation. The working group within the Python Software Foundation that handles packaging has received a grant from Facebook Research to create more advanced PyPI security features, such as cryptographic signing of PyPI packages, and automated detection of malicious uploads (rather than labor-intensive manual screening).

Third parties offer some protection as well. Reversing Labs, an independent security firm, discovered a PyPI-based attack after conducting a scan of the entire repository for suspicious file formats. But the company admits that such scans aren’t a replacement for internal vetting. “To greatly reduce the possibility of hosting malware,” the company wrote, “such repositories would all benefit from continuous processing and a better review process.”

The best solution, as Python’s own developers are aware, must come from within.