This is a proof of concept of using [external]
metadata - i.e., metadata for
Python packages of build and runtime dependencies on non-Python packages, see
PEP 725 - plus a "name mapping mechanism"
to build wheels from source in clean Docker containers with a plain:
pip install <package-name> --no-binary <package-name>
The purpose of the name mapping mechanism is to translate [external]
metadata,
which uses package URLs's (PURLs)
plus PURL-like "virtual dependencies" for more abstract requirements like "a
C++ compiler", into system package manager specific package names.
The CLI interface to the name mapping mechanism is a py-show
CLI tool. It can
also show install commands specific to the system package manager, which is
potentially useful for end users.
Note: all of this is currently experimental, and under the hood doesn't look anything like a production-ready version would. Please don't use this for anything beyond experimenting.
The scripts, CI setup and results in the repo basically do the following:
Determine which of the top 150 most downloaded packages (current monthly downloads, data from hugovk/top-pypi-packages) have platform-specific wheels on PyPI.
For each such package, determine its external dependencies and write those
into a package_name.toml
file.
In a matrix'ed set of CI jobs, build each package separately from source in
a clean Docker container, with the external dependencies being installed
with a "system" package manager. This is currently done for three package
managers and distros: dnf
(Fedora), pacman
(Arch Linux), and
micromamba
(conda-forge). The CI jobs do roughly the following:
python
with the system package manager[external]
metadata at the end ofpyproject.toml
(for packages without a pyproject.toml
, inject a basicsetuptools.build_meta
as the build backend)py-show
tool to read the [external]
metadata and generate anpip install amended_sdist.tar.gz
(no customimport pkg_import_name
check.Analyze the results - successful package builds yes/no, duration, dependencies used.
These are the main results as of 19 Oct 2023.
Overall number of successful builds per distro:
distro | success |
---|---|
Arch | 35/37 |
Fedora | 33/37 |
conda-forge | 33/37 |
Average CI job duration per package for the heaviest builds:
package | duration |
---|---|
scipy | 13m 39s |
scikit-learn | 13m 5s |
grpcio-tools | 9m 17s |
pandas | 7m 55s |
pyarrow | 5m 44s |
numpy | 5m 16s |
pynacl | 4m 20s |
pydantic-core | 4m 6s |
matplotlib | 3m 41s |
cryptography | 2m 26s |
pillow | 1m 56s |
sqlalchemy | 1m 41s |
Per-package success/failure:
package | Fedora | Arch | conda-forge |
---|---|---|---|
charset-normalizer | ✔️ | ✔️ | ✔️ |
cryptography | ✔️ | ✔️ | ✔️ |
pyyaml | ✔️ | ✔️ | ✔️ |
numpy | ✔️ | ✔️ | ✔️ |
protobuf | ✔️ | ✔️ | ✔️ |
pandas | ✔️ | ✔️ | ✔️ |
markupsafe | ✔️ | ✔️ | ✔️ |
cffi | ✔️ | ✔️ | ✔️ |
psutil | ✔️ | ✔️ | ✔️ |
lxml | ❌ | ❌ | ❌ |
sqlalchemy | ✔️ | ✔️ | ✔️ |
aiohttp | ❌ | ✔️ | ❌ |
grpcio | ❌ | ❌ | ❌ |
pyarrow | ✔️ | ✔️ | ✔️ |
wrapt | ✔️ | ✔️ | ✔️ |
frozenlist | ✔️ | ✔️ | ✔️ |
coverage | ✔️ | ✔️ | ✔️ |
pillow | ✔️ | ✔️ | ✔️ |
greenlet | ✔️ | ✔️ | ✔️ |
yarl | ✔️ | ✔️ | ✔️ |
multidict | ✔️ | ✔️ | ✔️ |
scipy | ❌ | ✔️ | ✔️ |
httptools | ✔️ | ✔️ | ✔️ |
pynacl | ✔️ | ✔️ | ✔️ |
psycopg2-binary | ✔️ | ✔️ | ✔️ |
rpds-py | ✔️ | ✔️ | ✔️ |
bcrypt | ✔️ | ✔️ | ✔️ |
scikit-learn | ✔️ | ✔️ | ✔️ |
msgpack | ✔️ | ✔️ | ✔️ |
matplotlib | ✔️ | ✔️ | ❌ |
regex | ✔️ | ✔️ | ✔️ |
kiwisolver | ✔️ | ✔️ | ✔️ |
pydantic-core | ✔️ | ✔️ | ✔️ |
pyrsistent | ✔️ | ✔️ | ✔️ |
grpcio-tools | ✔️ | ✔️ | ✔️ |
pycryptodomex | ✔️ | ✔️ | ✔️ |
google-crc32c | ✔️ | ✔️ | ✔️ |