APACHE-2.0 License
Experiments for evaluating the compilability of a large corpus of programs in various compilation environments:
Tuscan uses a fork of the
libEAR project as a submodule. You
will thus need to pass the --recursive
switch to the git clone
command when cloning Tuscan, or else run git submodule init; git submodule update
after you have cloned it.
On the first run, Tuscan will attempt to download a copy of every
official Arch Linux binary package, as well as their corresponding
sources. These files are placed in the mirror
and sources
directory,
respectively. During the first run, Tuscan will also create and commit
an Arch Linux Docker image called tuscan_base_image
, whose software
databases are in sync with the downloaded sources and binaries. Do not
remove this image from your system; if you do, you will need to
re-create it and also re-download up-to-date versions of the sources and
binaries.
Occasionally, downloads of a source or binary may fail; when this
happens for a particular package, a $PACKAGE_NAME.log
file will be
left in the sources
or mirror
directory, containing information on
the failure.
For binaries, you may wish to download the binary yourself and place
it in the mirror
directory. An archive of all Arch Linux binaries is
hosted at the Arch Archive;
make sure that the version number that you're downloading matches the
version described in the .log
file.
A failed download of a source file usually indicates that the upstream source is broken. It may be appropriate to file a bug about this on the Arch Linux bug tracker. There is not much that can be done about this; Tuscan will fail to build that package and any packages that depend on it.
Binaries are downloaded from an Arch Linux mirror. If you don't live in
the United States, you may wish to replace "United States" in the
stages/create_base_image/main.sh
script. The invocation of reflector
in that script finds the fastest and most up-to-date Arch mirrors from
the specified country.
Generating data for a toolchain:
./tuscan.py build TOOLCHAIN
The names of toolchains are subdirectories of toolchains/
. Currently,
the only toolchain is vanilla
, which builds Arch Linux packages using
the default compiler and standard libraries.
Post-processing the data from a build:
./tuscan.py post
The resulting JSON files are dumped in the output/post/TOOLCHAIN
directory, one file per package. The schema for the resulting JSON file
is described by the post_processed_schema
structure in
tuscan/schemata.py
.
Generating a HTML report from post-processed data:
./tuscan.py html
The resulting HTML pages are dumped in the output/html/TOOLCHAIN
directory, one page per package.
Generating figures from post-processed data:
./tuscan.py figures
The resulting figures are dumped in the output/figures
directory.
Tuscan is structured as a set of stages under the stages/
directory.
These are scripts that are run inside Arch Linux containers, performing
tasks on Arch Linux packages. These tasks include dependency resolution
and setting up a build environment for the chosen toolchain.
The final outcome is that every package in the official Arch Linux repository is built with a particular toolchain, in reverse dependency order. Packages that have been built successfully are added to a local package repository, so that packages that depend on them can install them later.
The stages write a makepkg.ninja
file that describes dependency
relationships between Arch Linux packages.
Some stages depend on other stages having been run before they
themselves are run. Some stages also depend on +data-only containers+
having been created. For each experiment, these dependencies are
described in the file stages/$STAGE_NAME/deps.yaml
.
Software:
docker
(you should add your user to the 'docker' group)
gnuplot
Tuscan also requires several Python packages. These can be installed
with pip
. Tuscan code that runs on your host machine is written in
Python 2, so if your operating system uses Python 3 by default you may
need to install these packages with pip2
.
ninja
PyYAML
Jinja2
docker-py
Circular dependencies
If the ninja build for package foo
fails because the
bar.pkg.tar.xz
file could not be found, this is likely due to a
circular dependency (foo
makedepends on, and is makedepended on by,
bar
). This can be confirmed by reading the PKGBUILDs for foo
and
bar
: search for these packages on www.archlinux.org/packages, and
then browse to the PKGBUILD by clicking on the Source Files
link.
This issue can be fixed by adding foo
and bar
to the
circular_dependency_breakers
array in get_base_package_names
.
Provider packages
Sometimes foo
will depend on bar
, but deps_to_ninja
will not
find any package called bar
in the Arch Build Repository. More
precisely, there will be no PKGBUILD in the ABS such that the PKGBUILD
contains bar
in its pkgname
array, which would seem to indicate
that package bar
does not get built by any PKGBUILD.
In fact, this could be because bar
is a metapackage, that is, a
package provided by another package baz
. An example is
sh
---many packages depend on sh
, but in practice this package is
provided by bash
. More precisely, the PKGBUILD for bash
will
contain provides=(... "sh" ...)
somewhere in the PKGBUILD.
The current solution to this is to add "sh" : "bash"
to the
provides
hash in provides.py
whenever this causes a problem. These
instances are hard to detect automatically because they require
parsing the PKGBUILD (a bash script), since the provides
array might
be hidden inside a package_
function.