Journey into gentoo eclass
I spent some days writing a portage eclass for gentoo. I want to share my experience.
Why
The rust build experience is quite complete and powerful, particularly thanks to cargo and rustup. While cargo is a well integrated project and easy to script rustup is not the best choice, mainly because is not integrated in a package/build system. Portage is wonderful project and is a pity to keep rust out of it!
Rustup has some features that I want to have into portage:
- toolchain management
- component
- directory sugar
The toolchain support is already in tree with multilib and multitarget support. The user can easily handle that with a rust-eselect module that sets the default toolchain and some tools (rustc, rust-gdb, rust-doc, rust-lldb).
gentoo-test ~ # eselect rust list
Available Rust versions:
[1] rust-1.26.2
[2] rust-1.27.1 *
[3] rust-9999
Usually rust projects are statically compiled, this is very handy when it
stands to dependencies management, but some projects have differente needs.
For example rustfmt, clippy, bindgen, rls and some others tools, usually
distributed as components in rustup, depend on libsyntax
.
I dedicated some time to look for a good solution to integrate this pattern in gentoo, this brought me to an interesting journey in gentoo build ecosystem.
Introduction to portage
Portage is a very powerful ports collection system that thanks to its long time history can handle quite every corner needs. In this context the requirements are quite straightforward and easily paired with portage features.
feature | gentoo |
---|---|
multiple package version | slotting |
language-specific behaviour | eclass |
consistent configuration system | eselect |
Setup test environment
Portage is very resilient to complex configuration and is very common also as user to use the layman tool to handle external overlays. Overlays works intuitively one on each other and portage can looks for a package in every overlay. But because I have to work on tree and I want to keep my env as clean as possible I decided to maintain my own fork in a VM to work with. This is not necessary and quite useless for package development but this project have to touch a lot of stuff like profile and eclass.
First I have to configure my fork:
gentoo-test ~ # echo "[DEFAULT]
main-repo = gentoo
[gentoo]
location = /usr/portage
sync-type = git
sync-uri = https://github.com/gibix/gentoo.git
auto-sync = true
" > /etc/portage/repos.conf/gentoo.conf
gentoo-test ~ # emerge --sync
This will take a bit because portage has to rebuild all its cache but now the environment can be easily keep in sync with the public repository or also with rsync on the laptop.
First draft
Let’s start to code from the eclass, looking at existing projects I ended up watching the python eclass and is a rich source of inspiration.
What is the core workflow of the eclass:
- keep track of the available rust implementation
- look to supported implementation in the cargo ebuild
- export some tool for ebuild to execute a work on multiple target
Looking at the requirements the eclass need at least an external variable that will be filled by the eclass consumer.
# @ECLASS-VARIABLE: RUST_COMPAT
# @DESCRIPTION:
# This variable contains a list of Rust implementations the package
# supports. It must be set before the `inherit' call. It has to be
# an array.
#
# Example:
# @CODE
# RUST_COMPAT=( rust1_26, rust1_27 )
# @CODE
#
The portage syntax is very clear and minimal, but why I need to setup the
variable before the inherit
function call in the consumer ebuild? This is
because the portage syntax is nothing more than bash with some sugar. Bash
like the most of the scripting languages is a top-down interpreter that
expands functions while are encountered. In particular the inherit function
used in portage calls source
on each eclass passed as argument. Because
of that every consumer-defined component that needs to be used inside the
eclass must be configured before.
Now the eclass need some way to have a list of the targets that have to be
used. But how the targets will be defined by the user, we need also a way
to setup global defaults for that. I don’t want to specify a stable target
for every rust ebuild, could be quite crazy to maintain! Fortunately
portage have some sugar for that! USE_EXPAND
are some particular use
flags that have a name that can be used as reference for global
configuration or as prefix in local flag. What does it means? Let’s have a
look.
First the USE_EXPAND
has to be defined in the profile used by portage,
the best place for a generic one like rust is in base/make.conf
. Now I
can easily add a line to my make.conf
for test a global configuration:
RUST_TARGETS="rust1_27"
But how is viewed by our eclass? Is an array of elements like
rust_targets_rust1_27
. Thats good.
Let’s define a variable to keep track of available implementations and configure it.
# @FUNCTION: _rust_set_impls
# @INTERNAL
# @DESCRIPTION:
# Check RUST_COMPAT for well-formedness and validity, if RUST_COMPAT
# is not used looks to _RUST_ALL_IMPLS then set two global variables:
#
# - _RUST_SUPPORTED_IMPLS containing valid implementations supported
# by the ebuild (RUST_COMPAT),
#
# - and _RUST_UNSUPPORTED_IMPLS containing valid implementations that
# are not supported by the ebuild.
#
_rust_set_impls() {
local i supp=() unsupp=()
if ! declare -p RUST_COMPAT &>/dev/null; then
for i in "${_RUST_ALL_IMPLS[@]}"; do
supp+=( "${i}" )
done
else
if [[ $(declare -p RUST_COMPAT) != "declare -a"* ]]; then
die 'RUST_COMPAT must be an array.'
fi
for i in "${RUST_COMPAT[@]}"; do
# trigger validity checks
_rust_impl_supported "${i}"
done
for i in "${_RUST_ALL_IMPLS[@]}"; do
if has "${i}" "${RUST_COMPAT[@]}"; then
supp+=( "${i}" )
else
unsupp+=( "${i}" )
fi
done
fi
[...]
}
# @ECLASS-VARIABLE: _RUST_ALL_IMPLS
# @INTERNAL
_RUST_ALL_IMPLS=(
rust1_25
rust1_26
rust1_27
)
readonly _RUST_ALL_IMPLS
# @FUNCTION: _rust_impl_supported
# @USAGE: <impl>
# @INTERNAL
_rust_impl_supported() {
local impl=${1}
case "${impl}" in
"${_RUST_ALL_IMPLS[@]}")
return 0
;;
esac
return 1
}
We reached a good point, all the major functions to support multitarget are defined, but all this functions are never invoked! So…
_rust_set_globals() {
local deps
_rust_set_impls
local requse=""
[ ${#flags[@]} -gt 0 ] && requse="|| ( ${flags[*]} )"
local flags=( "${_RUST_SUPPORTED_IMPLS[@]/#/rust_targets_}" )
REQUIRED_USE=${requse}
IUSE=${flags[*]}
}
_rust_set_globals
unset -f _rust_set_globals
The targets is not yet used, but could be at least recognized.
gentoo-test ~ # emerge -p rustfmt
These are the packages that would be merged, in order:
Calculating dependencies... done!
[ebuild N ~] dev-util/rustfmt-0.10.0 USE="-debug -fetch-crates" RUST_TARGETS="-rust1_27 -rust1_26"
gentoo-test ~ # RUST_TARGETS="rust1_27 rust1_26" emerge -p rustfmt
These are the packages that would be merged, in order:
Calculating dependencies... done!
[ebuild N ~] dev-util/rustfmt-0.10.0 USE="-debug -fetch-crates" RUST_TARGETS="rust1_26 rust1_27"
Good! Portage is working as expected!
Add dependency
The situation now is quite defined, the rust eclass can handle the targets
and is quite easy to maintain, the only absolute definition of the target
is in _RUST_ALL_IMPLS
. But could be quite a pain to maintain all this
dependency in the ebuild, could be better to have the eclass to keep track
of all this stuff for the ebuild.
Again, what are the needs:
- an ebuild that inherit from the rust eclass should automatically resolve the toolchain dependencies based on the rust targets.
- an easy way to retrive the rust ebuild in tree from a rust target
First let’s look how dependencies are defined in portage. For a complete reference look at the complete gentoo devmanual.
In the current context the dependencies are binded to a use flag. Portage makes available a simple interface to define USE-conditional dependencies:
(!)USEFLAG? ( DEPS )
in our case we want something like:
rust_targets_rust1_27? (=virtual/rust-1.27.2) ...
For having this work is necessary to generate the deps:
rust_package_dep() {
case ${1} in
rust1_25)
echo "=virtual/rust-1.25*"
;;
rust1_26)
echo "=virtual/rust-1.26*"
;;
rust1_27)
echo "=virtual/rust-1.27*"
;;
*)
die "Invalid implementation: ${impl}"
esac
}
Now is just necessary to add some lines in the set_globals
function:
_rust_set_globals() {
local deps
[...]
for i in "${_RUST_SUPPORTED_IMPLS[@]}"; do
deps+="rust_targets_${i}? ( $(rust_package_dep ${i}) ) "
done
[...]
RDEPEND=${deps}
[...]
}
Is time to test if the implementation is working correctly. Is simply necessary to append rust as inherited eclass in a test ebuild. In my case rustfmt.
gentoo-test ~ # RUST_TARGETS="rust1_27 rust1_26" emerge -p rustfmt
These are the packages that would be merged, in order:
Calculating dependencies... done!
[ebuild N ~] virtual/rust-1.27.1-r1
[ebuild N ~] virtual/rust-1.26.2-r1
[ebuild N ~] dev-util/rustfmt-0.10.0 USE="-debug -fetch-crates" RUST_TARGETS="rust1_26 rust1_27"
It’s time to build
All the tools are available for enforce multitarget build of a rust project.
Some breadcrumbs:
- rely on cargo as high level eclass
- cargo should use some high-level function to trigger the build for every defined target
- cargo must have two different behaviour: one in case of single-target (more common) and one for multitarget
Gentoo has a very nice eclass for this! Multibuild.
Let’s write this part with a top-down approach, starting from the cargo. In order to have multibuild work properly is necessary to setup variants and some environment variables to run the build with the correct toolchain. This feature will be privided by the cargo eclass, just assume that is already available.
_cargo_run_foreach_impl() {
MULTIBUILD_VARIANTS=${RUST_TARGETS[@]}
rust_build_foreach_variant "${@}"
}
Whit this change is necessary to adapt all the other cargo functions to use
the brand new foreach_impl()
.
cargo_src_install() {
debug-print-function ${FUNCNAME} "$@"
_cargo_run_foreach_impl cargo_install
}
Ok, now all the cargo step in the pipeline are executed within multibuild.
It’s time to come back to our loved rust eclass and write
rust_build_foreach_variant
.
rust_build_foreach_variant() {
debug-print-function ${FUNCNAME} "${@}"
local MULTIBUILD_VARIANTS
_rust_obtain_impls
multibuild_foreach_variant _rust_multibuild_wrapper "${@}"
}
_rust_multibuild_wrapper() {
rust_export ${MULTIBUILD_VARIANT}
"${@}"
}
Multibuild runs a command for each variant in MULTIBUILD_VARAIANTS
, we are
lucky because we already have SUPPORTED_IMPLS
filled with the proper
information, but not all the implementations in there must be triggered. Is
necessary to check witch of them are triggered by the configuration.
_rust_obtain_impls() {
MULTIBUILD_VARIANTS=()
local impl
for impl in "${_RUST_SUPPORTED_IMPLS[@]}"; do
use "rust_targets_${impl}" && MULTIBUILD_VARIANTS+=( "${impl}" )
done
}
The only missing part now is export
function that have to setup RUSTC
environment variable. This is read by cargo for use a specific compiler.
rust_export() {
local impl
case "${1}" in
rust*)
impl=${1/rust/rustc-}
impl=${impl/_/.}
shift
;;
*)
die "rust export called without a valid rust implementation"
;;
esac
}
All done? Not exactly, thinking a bit about the workflow is clear that there is a big incomplete configuration. If a project is build with more than a rust implementation all the binaries will be installed in the same path, with the same name!
Here there are multiple possible solutions from my point of view.
Have all the binaries installed normally in /usr/bin
with a postfix, for
example my-binary-rust-1.27
and a my-binary
symlink to it. But this
could stand to a quit messy situation. Why not have a bin
directory in
/usr/lib/rust_${version}
? Let’s try this solution.
Because ${version} has to be configured for each implementation we know were to touch:
rust_export() {
[...]
if [ ${#MULTIBUILD_VARIANTS[*]} -gt 1 ]; then
export RUST_ROOT="/usr/$(get_libdir)/$(ls /usr/$(get_libdir) | grep ${impl/rustc/rust})"
else
export RUST_ROOT="/usr"
fi
}
and finally.
cargo_install() {
cargo install -j $(makeopts_jobs) --root="${D}${RUST_ROOT}" \
$(usex debug --debug "") \
|| die "cargo install failed"
rm -f "${D}${RUST_ROOT}.crates.toml"
}
Enforce the work done
Usually if a project is build with multiple target is because has runtime
linking dependency to rustc libraries like libsyntax
so is important that
if my-project-binary
is called in that moment there is a proper library
path configured. This is done correctly via eselect-rust
module.
gentoo-test ~ # grep LDPATH /etc/env.d/50rust-*
/etc/env.d/50rust-1.26.2:LDPATH="/usr/lib64/rust-1.26.2"
/etc/env.d/50rust-1.27.1:LDPATH="/usr/lib64/rust-1.27.1"
/etc/env.d/50rust-9999:LDPATH="/usr/lib64/rust-9999"
Or can be overridden at runtime with LD_LIBRARY_PATH
:
LD_LIBRARY_PATH=/usr/lib64/rust-1.26.2 rustfmt
Furthermore the same system is used by eselect-rust
to handle rust
versions, can we use the same system without reinventing the wheel?
A clear solution is to have a *binaries-${target}
file in
/etc/env.d/rust
for every project that have multiple variants compiled.
The eselect-rust
patch is quite simple and I will not describe the code
in details. Can be found here.
But let’s look at the last fix that we have to bring to the cargo.eclass
.
cargo_install() {
[...]
if [ ${#MULTIBUILD_VARIANTS[*]} -gt 1 ]; then
env_file="${PN}-binaries-$(basename ${RUST_ROOT})"
for binary in ${D}${RUST_ROOT}/bin/*; do
echo /usr/bin/$(basename $binary) >> "${T}/${env_file}"
done
dodir /etc/env.d/rust
insinto /etc/env.d/rust
doins "${T}/${env_file}"
fi
}
Conclusion
This work has been realized as part of my Google summer of code project, required some time to investigate also alternative solutions that I decided to drop during the time.
The code can find at this pull request.