autopkgtest support in Debian: a more optimistic view

Yesterday I posted about the history, in numbers, of the support for autopkgtest in the Debian archive. I had analyzed the presence of a Testsuite: field in source packages, from wheezy to trixie, and noticed a slowdown in the growth rate of autopkgtest support, in proportional terms. In each new release, the percentage of packages declaring a test suite grew less than in the previous release, for the last 4 releases.

A night of sleep and a rainy morning later, I come back with a more optimistic view, and present to you the following data, expanded from the raw data:

Release year Release Yes No Total Δ Yes Δ No Δ Total
2013 wheezy 5 17170 17175 -- -- --
2015 jessie 1112 19484 20596 1107 2314 3421
2017 stretch 5110 19735 24845 3998 251 4249
2019 buster 9966 18535 28501 4856 -1200 3656
2021 bullseye 13949 16994 30943 3983 -1541 2442
2023 bookworm 17868 16473 34341 3919 -521 3398
2025 trixie 21527 16143 37670 3659 -330 3329

A few observations:

  • Since stretch, we have been consistently adding autopkgtest support to close to 4,000 packages on each release, on average.
  • Since buster, the number of packages without autopkgtest support has decreased in the hundreds.
  • On average, each release has 3,400 packages more than the previous, while also bringing 4,000 extra packages with autopkgtest support. I have the following hypotheses for this:
    1. a large part of new packages are added already with autopkgtests;
    2. a smaller but reasonably large number of existing packages get autopkgtests added on each release.

All in all, I think this data show that Debian maintainers recognize the usefulness of automated testing and are engaged in improving our QA process.

Past halfway there: history of autopkgtest support in Debian

The Release of Debian 13 ("Trixie") last month marked another milestone on the effort to provide automated test support for Debian packages in their installed form. We have achieved the mark of 57% of the source packages in the archive declaring support for autopkgtest.

Release Packages with tests Total number of packages % of packages with tests
wheezy 5 17175 0%
jessie 1112 20596 5%
stretch 5110 24845 20%
buster 9966 28501 34%
bullseye 13949 30943 45%
bookworm 17868 34341 52%
trixie 21527 37670 57%

The code that generated this table is provided at the bottom.

The growth rate has been consistently decreasing at each release after stretch. That probably means that the low hanging fruit -- adding support en masse for large numbers of similar packages, such as team-maintained packages for a given programming language -- has been picked, and from now on the work gets slightly harder. Perhaps there is a significant long tail of packages that will never get autopkgtest support.

Looking for common prefixes among the packages missing a Testsuite: field gives me us the largest groups of packages missing autopkgtest support:

$ grep-dctrl -v -F Testsuite --regex -s Package -n . trixie | cut -d - -f 1 | uniq -c | sort -n| tail -20
     50 apertium
     50 kodi
     51 lomiri
     53 maven
     55 libjs
     57 globus
     66 cl
     67 pd
     72 lua
     79 php
     88 puppet
     91 r
    111 gnome
    124 ruby
    140 ocaml
    152 rust
    178 golang
    341 fonts
    557 python
   1072 haskell

There seems to be a fair amount of Haskell and Python. If someone could figure out a way of testing installed fonts in a meaningful way, this would a be a good niche where we can cover 300+ packages.

There is a another analysis that can be made, which I didn't: which percentage of new packages introduced in a given release have declared autopkgtest support, compared with the total of new packages in that release? My data only counts the totals, so we start with the technical debt of the almost all of the 17,000 packages with no tests in wheezy, which was the stable at the time I started Debian CI. How many of those got tests since then?

Note that not supporting autopkgtest does not mean that a package is not tested at all: it can run build-time tests, which are also useful. Not supporting autopkgtest, though, means that their binaries in the archive can't be automatically tested in their installed, but then there is a entire horde of volunteers running testing and unstable on a daily basis who test Debian and report bugs.

This is the script that produced the table in the beginning of this post:

#!/bin/sh

set -eu

extract() {
  local release
  local url
  release="$1"
  url="$2"

  if [ ! -f "${release}" ]; then
    rm -f "${release}.gz"
    curl --silent -o ${release}.gz "${url}"
    gunzip "${release}.gz"
  fi

  local with_tests
  local total
  with_tests="$(grep-dctrl -c -F Testsuite --regex . $release)"
  total="$(grep-dctrl -c -F Package --regex . $release)"

  echo "| ${release} | ${with_tests} | ${total} | $((100*with_tests/total))% |"
}

echo "| **Release** | **Packages with tests** | **Total number of packages** | **% of packages with tests** |"
echo "|-------------|-------------------------|------------------------------|------------------------------|"
for release in wheezy jessie stretch buster; do
  extract "${release}" "http://archive.debian.org/debian/dists/${release}/main/source/Sources.gz"
done
for release in bullseye bookworm trixie; do
  extract "${release}" "http://ftp.br.debian.org/debian/dists/${release}/main/source/Sources.gz"
done

gotcha: using ccache in Debian package builds

Before I upload packages to Debian, I always do a full build from source under sbuild. This ensures that the package can build from source on a clean environment, implying that the set of build dependencies is complete.

But when iterating on a non-trivial package locally, I will usually build the package directly on my Debian testing system, and I want to take advantage of ccache to cache native (C/C++) code compilation to speed things up. In Debian, the easiest way to enable ccache is to add /usr/lib/ccache to your $PATH. I do this by doing something similar to the following in my ~/.bashrc:

export PATH=/usr/lib/ccache:$PATH

I noticed, however, that my Debian package builds were not using the cache. When building the same small package manually using make, the cache was used, but not when the build was wrapped with dpkg-buildpackage.

I tracked it down to the fact that in compatibility level 13+, debhelper will set $HOME to a temporary directory. For what's it worth, I think that's a good thing: you don't want package builds reaching for your home directory as that makes it harder to make builds reproducible, among other things.

This behavior, however, breaks ccache. The default cache directory is $HOME/.ccache, but that only gets resolved when ccache is actually used. So we end up starting with an empty cache on each build, get a 100% cache miss rate, and still pay for the overhead of populating the cache.

The fix is to explicitly set $CCACHE_DIR upfront, so that by the time $HOME gets overriden, it doesn't matter anymore for ccache. I did this in my ~/.bashrc:

export CCACHE_DIR=$HOME/.ccache

This way, $HOME will be expanded right there when the shell starts, and by the time ccache is called, it will use the persistent cache in my home directory even though $HOME will, at that point, refer to a temporary directory.

Debian CI: 10 years later

It was 2013, and I was on a break from work between Christmas and New Year of 2013. I had been working at Linaro for well over a year, on the LAVA project. I was living and breathing automated testing infrastructure, mostly for testing low-level components such as kernels and bootloaders, on real hardware.

At this point I was also a Debian contributor for quite some years, and had become an official project members two years prior. Most of my involvement was in the Ruby team, where we were already consistently running upstream test suites during package builds.

During that break, I put these two contexts together, and came to the conclusion that Debian needed a dedicated service that would test the contents of the Debian archive. I was aware of the existance of autopkgtest, and started working on a very simple service that would later become Debian CI.

In January 2014, debci was initially announced on that month's Misc Developer News, and later uploaded to Debian. It's been continuously developed for the last 10 years, evolved from a single shell script running tests in a loop into a distributed system with 47 geographically-distributed machines as of writing this piece, became part of the official Debian release process gating migrations to testing, had 5 Summer of Code and Outrechy interns working on it, and processed beyond 40 million test runs.

In there years, Debian CI has received contributions from a lot of people, but I would like to give special credits to the following:

  • Ian Jackson - created autopkgtest.
  • Martin Pitt - was the maintainer of autopkgtest when Debian CI launched and helped a lot for some time.
  • Paul Gevers - decided that he wanted Debian CI test runs to control testing migration. While at it, became a member of the Debian Release Team and the other half of the permanent Debian CI team together with me.
  • Lucas Kanashiro - Google Summer of Code intern, 2014.
  • Brandon Fairchild - Google Summer of Code intern, 2014.
  • Candy Tsai - Outreachy intern, 2019.
  • Pavit Kaur - Google Summer of Code intern, 2021
  • Abiola Ajadi - Outreachy intern, December 2021-2022.

Triaging Debian build failure logs with collab-qa-tools

The Ruby team is working now on transitioning to ruby 3.0. Even though most packages will work just fine, there is substantial amount of packages that require some work to adapt. We have been doing test rebuilds for a while during transitions, but usually triaged the problems manually.

This time I decided to try collab-qa-tools, a set of scripts Lucas Nussbaum uses when he does archive-wide rebuilds. I'm really glad that I did, because those tols save a lot of time when processing a large number of build failures. In this post, I will go through how to triage a set of build logs using collab-qa-tools.

I have made some improvements to the code. Given my last merge request is very new and was not merged yet, a few of the things I mention here may apply only to my own ruby3.0 branch.

collab-qa-tools also contains a few tools do perform the builds in the cloud, but since we already had the builds done, I will not be mentioning that part and will write exclusively about the triaging tools.

Installing collab-qa-tools

The first step is to clone the git repository. Make sure you have the dependencies from debian/control installed (a few Ruby libraries).

One of the patches I sent, and was already accepted, is the ability to run it without the need to install:

source /path/to/collab-qa-tools/activate.sh

This will add the tools to your $PATH.

Preparation

The first think you need to do is getting all your build logs in a directory. The tools assume .log file extension, and they can be named ${PACKAGE}_*.log or just ${PACKAGE}.log.

Creating a TODO file

cqa-scanlogs | grep -v OK  | sed -e 's/$/ -- TODO/' > todo

todo will contain one line for each log with a summary of the failure, if it's able to find one. collab-qa-tools has a large set of regular expressions for finding errors in the build logs

It's a good idea to split the TODO file in multiple ones. This can easily be done with split(1), and can be used to delimit triaging sessions, and/or to split the triaging between multiple people. For example this will create todo into todo00, todo01, ..., each containing 30 lines:

split --lines=30 --numeric-suffixes todo todo

Triaging

You can now do the triaging. Let's say we split the TODO files, and will start with todo01.

The first step is calling cqa-fetchbugs (it does what it says on the tin):

cqa-fetchbugs --TODO=todo01

Then, cqa-annotate will guide you through the logs and allow you to report bugs:

cqa-annotate --TODO=todo01

I wrote myself a process.sh wrapper script for cqa-fetchbugs and cqa-annotate that looks like this:

#!/bin/sh

set -eu

for todo in $@; do
  # force downloading bugs
  awk '{print(".bugs." $1)}' "${todo}" | xargs rm -f
  cqa-fetchbugs --TODO="${todo}"

  cqa-annotate \
    --template=template.txt.jinja2 \
    --TODO="${todo}"
done

The --template option is a recent contribution of mine. This is a template for the bug reports you will be sending. It uses Liquid templates, which is very similar to Jinja2 for Python. You will notice that I am even pretending it is Jinja2 to trick vim into doing syntax highlighting for me. The template I'm using looks like this:

From: {{ fullname }} <{{ email }}>
To: submit@bugs.debian.org
Subject: {{ package }}: FTBFS with ruby3.0: {{ summary }}

Source: {{ package }}
Version: {{ version | split:'+rebuild' | first }}
Severity: serious
Justification: FTBFS
Tags: bookworm sid ftbfs
User: debian-ruby@lists.debian.org
Usertags: ruby3.0

Hi,

We are about to enable building against ruby3.0 on unstable. During a test
rebuild, {{ package }} was found to fail to build in that situation.

To reproduce this locally, you need to install ruby-all-dev from experimental
on an unstable system or build chroot.

Relevant part (hopefully):
{% for line in extract %}> {{ line }}
{% endfor %}

The full build log is available at
https://people.debian.org/~kanashiro/ruby3.0/round2/builds/3/{{ package }}/{{ filename | replace:".log",".build.txt" }}

The cqa-annotate loop

cqa-annotate will parse each log file, display an extract of what it found as possibly being the relevant part, and wait for your input:

######## ruby-cocaine_0.5.8-1.1+rebuild1633376733_amd64.log ########
--------- Error:
     Failure/Error: undef_method :exitstatus

     FrozenError:
       can't modify frozen object: pid 2351759 exit 0
     # ./spec/support/unsetting_exitstatus.rb:4:in `undef_method'
     # ./spec/support/unsetting_exitstatus.rb:4:in `singleton class'
     # ./spec/support/unsetting_exitstatus.rb:3:in `assuming_no_processes_have_been_run'
     # ./spec/cocaine/errors_spec.rb:55:in `block (2 levels) in <top (required)>'

Deprecation Warnings:

Using `should` from rspec-expectations' old `:should` syntax without explicitly enabling the syntax is deprecated. Use the new `:expect` syntax or explicitly enable `:should` with `config.expect_with(:rspec) { |c| c.syntax = :should }` instead. Called from /<<PKGBUILDDIR>>/spec/cocaine/command_line/runners/backticks_runner_spec.rb:19:in `block (2 levels) in <top (required)>'.


If you need more of the backtrace for any of these deprecations to
identify where to make the necessary changes, you can configure
`config.raise_errors_for_deprecations!`, and it will turn the
deprecation warnings into errors, giving you the full backtrace.

1 deprecation warning total

Finished in 6.87 seconds (files took 2.68 seconds to load)
67 examples, 1 failure

Failed examples:

rspec ./spec/cocaine/errors_spec.rb:54 # When an error happens does not blow up if running the command errored before execution

/usr/bin/ruby3.0 -I/usr/share/rubygems-integration/all/gems/rspec-support-3.9.3/lib:/usr/share/rubygems-integration/all/gems/rspec-core-3.9.2/lib /usr/share/rubygems-integration/all/gems/rspec-core-3.9.2/exe/rspec --pattern ./spec/\*\*/\*_spec.rb --format documentation failed
ERROR: Test "ruby3.0" failed:
----------------
ERROR: Test "ruby3.0" failed:      Failure/Error: undef_method :exitstatus
----------------
package: ruby-cocaine
lines: 30
------------------------------------------------------------------------
s: skip
i: ignore this package permanently
r: report new bug
f: view full log
------------------------------------------------------------------------
Action [s|i|r|f]:

You can then choose one of the options:

  • s - skip this package and do nothing. You can run cqa-annotate again later and come back to it.
  • i - ignore this package completely. New runs of cqa-annotate won't ask about it again.

    This is useful if the package only fails in your rebuilds due to another package, and would just work when that other package gets fixes. In the Ruby transition this happens when A depends on B, while B builds a C extension and failed to build against the new Ruby. So once B is fixed, A should just work (in principle). But even if A would even have problems of its own, we can't really know before B is fixed so we can retry A.

  • r - report a bug. cqa-annotate will expand the template with the data from the current log, and feed it to mutt. This is currently a limitation: you have to use mutt to report bugs.

    After you report the bug, cqa-annotate will ask if it should edit the TODO file. In my opinion it's best to not do this, and annotate the package with a bug number when you have one (see below).

  • f - view the full log. This is useful when the extract displayed doesn't have enough info, or you want to inspect something that happened earlier (or later) during the build.

When there are existing bugs in the package, cqa-annotate will list them among the options. If you choose a bug number, the TODO file will be annotated with that bug number and new runs of cqa-annotate will not ask about that package anymore. For example after I reported a bug for ruby-cocaine for the issue listed above, I aborted with a ctrl-c, and when I run my process.sh script again I then get this prompt:

----------------
ERROR: Test "ruby3.0" failed:      Failure/Error: undef_method :exitstatus
----------------
package: ruby-cocaine
lines: 30
------------------------------------------------------------------------
s: skip
i: ignore this package permanently
1: 996206 serious ruby-cocaine: FTBFS with ruby3.0: ERROR: Test "ruby3.0" failed:      Failure/Error: undef_method :exitstatus ||
r: report new bug
f: view full log
------------------------------------------------------------------------
Action [s|i|1|r|f]:

Chosing 1 will annotate the TODO file with the bug number, and I'm done with this package. Only a few other hundreds to go.


For older posts, see the blog archive.