The Test Advisor

Rafa de la Torre & Miguel Angel García, Test FW Engineers

The Test Advisor is a tool that lets you know what set of tests you should execute given a set of changes in your source code. It is automatic and does not require you to tag all of your test cases.

A Bit of Background

The story of CI has been a tough one here at Tuenti. At one point, our CI system had to handle a huge load of tests of different types for every branch. We are talking about something around 20K test cases, a 2.4h build time, and Jenkins acting as a bottleneck for the development process. From a developer perspective, we track a measure of their suffering, called FPTR (short of “from push to test results”). It peaked at 8h.

Therefore, something had to be done with the tests. There were multiple options: add more hardware, optimize the tests, optimize the platform, delete the tests... or a combination of these options. When considering solutions, you need to keep in mind that products evolve which could potentially produce more tests that worsen the problem. In the long run, the best solution will always somehow involve test case management and process control, but that’s another story for a future post...

After adding more hardware, implementing several performance improvements, and managing slow and unstable tests, we concluded that we had to improve the situation in a sustainable way.

One solution we thought of was to identify the relevant tests for the changes being tested and execute only those tests, instead of executing the full regression. The idea was not novel and it may sound pretty obvious, but implementing it is not at all.

In short, the idea is to identify the changes in the system, the pieces that depend upon those changes and then execute all of the related tests. Google implemented it on top of their build system. However, we don’t have that great of a build system in which dependencies and tests are explicitly specified, and our starting point was a rather monolithic system.

The first approach we took in that direction was to identify the components of the system and annotate the tests indicating what component they are aiming at testing. Thus the theory was: “If I modify component X, then run all the tests annotated with @group X”. It didn’t work well: the list of components is live and evolves with the system and requires maintenance, tests needed to be annotated manually, maintaining synchronicity required a lot of effort, and there was no obvious way to check the accuracy of the annotations.

A different approach is to gather coverage information and exploit it in order to relate changes in source files and tests covering those changes. Getting coverage globally is not easy with our setup. We still use a custom solution based on phpunit+xdebug. There are still some problems with that approach though, that mostly affect end-to-end and integration tests: it is hard to relate the source files with the files finally deployed to tests servers, partly due to the way our build script works. Yes, it is easier for unit tests, but they are not really a problem since they are really fast. Additionally, we did not want to restrict our solution strictly to php code.

What it Is

The Test Advisor is a service that gathers “pseudo-coverage” information to be used later on, essentially to determine what the relevant test cases for a given changeset are.

When it was proposed, some of the advantages that we envisaged were:

Reduced regression times in a sustainable way. It would no longer be limited by the size of the full regression.
Improved feedback cycles
No need for manual maintenance (as opposed to annotating tests)

Regarding feedback cycles, a couple of benefits we foresaw were related to some pretty common problems in CI setups: robustness and speed of the tests. A common complaint was about robustness and false positives: “My code doesn’t touch anything related to that test but it failed in Jenkins”. If we had the Test Advisor, who would suffer bad quality tests in the first place? The ones modifying code related to those tests and who want to get their changes into production. No more discussion about ownership. No more quick and dirty workarounds for the flakiness of the tests. The same applies to their speed. It would be in your best interest to develop and maintain high quality tests :)

How It Works

Most of our product stack works under debian linux. We decided to use the inotify kernel subsystem to track filesystem events. This way, we can get that pseudo-coverage at a file level, independent from the language and the test framework used, or even the test runner.

We developed it using python as the main language, which is a good language for developing a quick prototype and putting all the pieces together. We also used pynotify.

The Test Advisor is conceptually composed of three main parts:

A service for information storage
A service for collecting coverage information. It is notified by the test runner when a test starts and ends. It has the responsibility of setting up and retrieving information about the files being accessed to complete the test scenario.
A client to retrieve and exploit information from the TestAdvisor.

This is how it looks at a high level:

As you can see from the figure, the TestAdvisor can take advantage of any test type being executed (unit, integration or E2E browser tests). It is also independent from the language used to implement the tests and the SUT.

Problems Along the Way

So we were easily able to get the files covered by a particular test case, but those types files are not actual source files. They go through a rather complex build process in which files are not only moved around, but they are versioned, stripped, compressed, etc. and some of them are even generated dynamically to fulfill http requests in production. We tried to hook into the build process to get a mapping between source and deployed files but it turned out to be too complex. We finally decided to use the “development mode” of the build, which basically links files and directories.

Another problem was the file caching. The hiphop interpreter (among others) caches the php files being processed and decides whether a file has changed based on its access time, and inotify does not provide a way of monitoring stat operations on the files. We researched a few possibilities for getting around this:

Override system calls to stat and its variants by means of LD_PRELOAD, not an easy one to get fully working (and also made us feel dirty messing there).
Instrument the kernel with systemtap, kprobes or similar. A bit cleaner, but a mistake and your precious machine may freeze. We also got the feeling that we were trying to re-implement inotify.

Finally, we picked the KISS solution: just perform a touch on the files accessed between tests. This way, the regression executed to grab information will be slower, but we don’t care much about the duration of a nightly build, do we? :)

Work that Lies Ahead

Right now, the TestAdvisor is used as a developer tool. We’ve achieved great improvements in the CI by other means (most of them process-related changes). However, we are still eager to integrate the TestAdvisor into our development and release cycle within our CI setup.

Developers will probably use a pipeline for their branches, consisting of 3 distinct phases: Unit Testing , Advised Testing and (optionally) Pre-Integration. An “advised build” in green can be considered sufficiently trustable to qualify for integration into the mainline.

This approach has been applied to the core of our products, the server-side code, and the desktop and mobile browser clients. We expect it to also be applied in the CI of the mobile apps at some point.