User:Tflink/AutoQA nose pytest comparison

= Introduction = There is currently a desire to add more internal test coverage to AutoQA but we need to make a decision on the test tools to use. The following is my comparison of nose and py.test in the context of finding the better solution for AutoQA.

To make sure that there isn't a misunderstanding, this proposal is for AutoQA code only and not for any packages or projects that may be tested by AutoQA. This is not an attempt to standardize on a single method of testing for outside projects.

I think that I have made it pretty clear what is my opinion and what is a plain comparison (YMMV). Please let me know if I missed some important aspect or if I got something wrong (or just edit the page)

= Source Code = What good is a comparison without code to go with it?


 * pytest proof of concept
 * nose proof of concept

= Comparison = While nose and py.test are similar tools, they do have their differences and we want to make the best choice for us. Based on the two proofs of concepts I did, I will outline what I see as the advantages and disadvantages of both tools. For the purposes of this comparison, I will be talking about pytest 2.0.1 and nose 1.0.0.

Documentation
From what I saw, the documentation for pytest is far superior to the documentation for nose. The documentation for nose lacked enough detail in order to get some functions working (I had problems getting @with_setup to behave) and lacks examples. Pytest, on the other hand has more detailed documentation, examples and pointers to specific blog entries that detail some non-standard functions.

Test Detection
AutoQA is related to testing and we have some classes and functions that have “test” in the name. Since the default naming convention for tests in python is anything with “test” in it, there are some false positives for tests in our code base. Both nose and pytest use test discovery but there are different effects of the two specific approaches.

Test Detection in nose
Nose uses regular expressions in order to determine what is and is not a test. The exact regular expression used is easily configured from either the command line or a configuration file. This single regular expression is used for file, class, module and function detection. Short of writing custom plugins, this seems to be the only way to change the test detection mechanism.

When differentiating between unit tests and functional tests, it is pretty easy to set up decorators and use command line options to specify which test decorators should be run.

Test Detection in pytest
From a user perspective, pytest relies on multiple glob statements instead of a single regular expression when determining what is and is not a test. There are separate configuration options for detecting files, classes and functions which makes it easier to change the naming convention for functions and have better granularity for eliminating false positives without having to resort to complicated regular expressions and/or strict naming conventions.

It is not difficult to modify pytest to differentiate between unit and functional tests but the easiest way to do so still sets up all of the tests even if they aren't executed. This makes the test run as a whole very slow. I was able to get around the slowdown by using a different method for excluding tests based on filename but it is something to keep in mind.

Integration with unittest and doctest
This isn't a huge issue for us seeing as we don't have a whole lot of existing tests but historically, nose has had better integration with unittest than pytest had. There has been an effort to improve this in the newest release and pytest now claims to have unittest support equivalent to nose.

As a side note, both nose and py.test have plugins for detecting and running doctests embedded inside code.

Test Isolation
Both nose and pytest have at least some facilities for reverting in-test changes to sys modules. Pytest has better integration for doing and undoing changes to arbitrary modules on a per-test basis through the monkeypatch plugin.

Neither nose nor pytest would have any issues integrating with virtualenv and both are in PyPI and thus installable through pip with no additional work on our part.

Pytest does have better support for temp directory management than nose does. This would likely have a greater effect functional testing more than unit testing but lends itself well to package downloads and any logfiles or output generated by external tools.

Customization and Extension
Both pytest and nose have good facilities for writing plugins so customization wouldn't be a huge issue for either.

Pytest does have several well-defined hooks to override some of its default behavior without writing plugins.

Output For Test Failures
I introduced the same failure to both proofs of concept in order to demonstrate the output on test failure. Note the extra details provided by the output from py.test with no extra code in the test itself: it doesn't just report the failure but shows a difference in the lists used inside the assert statement.

nose
(test_env)[tflink@localhost autoqa-devel]$ nosetests lib/python/tests/ ...F...........

=
========================================================= FAIL: test_koji_utils.TestGetNvrRpms.test_should_return_filename -- Traceback (most recent call last): File "/srv/code/autoqa-devel/test_env/lib/python2.7/site-packages/nose/case.py", line 187, in runTest self.test(*self.arg) File "/srv/code/autoqa-devel/lib/python/tests/test_koji_utils.py", line 83, in   test_should_return_filename assert test_filename == [self.ref_filename] AssertionError

-- Ran 15 tests in 6.387s

FAILED (failures=1)

pytest
(test_env)[tflink@localhost autoqa-devel]$ py.test lib/python/tests/

=
================== test session starts =============================== platform linux2 -- Python 2.7.0 -- pytest-2.0.1 collected 5 items

lib/python/tests/test_koji_utils.py ...F.

=
======================= FAILURES ===================================== ______________________ TestGetNvrRpms.should_return_filename ______________________

self =  monkeypatch = <_pytest.monkeypatch.monkeypatch instance at 0x135e758>

def should_return_filename(self, monkeypatch): monkeypatch.setattr(self.testkoji, 'nvr_to_urls', self.nvr_to_urls) monkeypatch.setattr(self.testkoji, 'pkgurl',         'http://koji.fedoraproject.org/packages') test_filename = self.testkoji.get_nvr_rpms(self.test_nvr, self.ref_dir) >      assert test_filename == [self.ref_filename] E      assert ['makeitfail-.../rpmdir/rpm1'] == ['/tmp/rpmdir/rpm1'] E        At index 0 diff: 'makeitfail-0.1-2.noarch' != '/tmp/rpmdir/rpm1' E        Left contains more items, first extra item: '/tmp/rpmdir/rpm1'

lib/python/tests/test_koji_utils.py:96: AssertionError

=
========== 1 failed, 4 passed in 0.10 seconds ========================

A Note On The Extra Detail in py.test
In order to get the extra detail from failed assertions, py.test does use some introspection and metaprogramming. My personal thoughts on the matter are that while I might hesitate to use some of these methods normally, I don't have a problem using them wisely inside testing tools in order to gain better test output or a better testing environment. There is such a thing as too much "magic" in any kind of code, though.

From the py.test FAQ:

Around 2007 (version 0.8) some people claimed that py.test was using too much “magic”. It has been refactored a lot. Thrown out old code. Deprecated unused approaches and code. And it is today probably one of the smallest, most universally runnable and most customizable testing frameworks for Python. It’s true that py.test uses metaprogramming techniques, i.e. it views test code similar to how compilers view programs, using a somewhat abstract internal model.

It’s also true that the no-boilerplate testing is implemented by making use of the Python assert statement through “re-interpretation”: When an assert statement fails, py.test re-interprets the expression to show intermediate values if a test fails. If your expression has side effects the intermediate values may not be the same, obfuscating the initial error (this is also explained at the command line if it happens). py.test --no-assert turns off assert re-intepretation. Sidenote: it is good practise to avoid asserts with side effects.

Other Testing Features
None of these features were used in the proofs of concept but I thought that they were interesting and potentially useful enough to warrant mention. Both of these features are from pytest and while I'm sure you can do similar things with nose, (as far as I know) it would involve custom code and/or plugins.

Test Parameterization
Py.test has an interesting feature called Test Parameterization where you can change some of the test inputs pragmatically. The example they use is to swap out databases for different testing but we might be able to use it for package sources or something similar.

Application Specific Test Fixtures
Another built-in pytest feature of note is the ability to have fixtures specific to an application. The advantage to this is that you can consolidate complex application specific setup code in one place instead of duplicating it across multiple test classes. I can see this also being useful in working with test resources (koji, bodhi, repositories etc. )

Package Availability
Neither pytest 2.0.1 nor nose 1.0.0 are currently available in the Fedora repositories. One potential difficulty with pytest is is related to its recent split from pylib. Since pytest is currently a part of the pylib pacakge that is currently in the Fedora repositories, any farther upgrades would require that package to be split into two separate packages instead of a simple upgrade. I imagine that this will happen in the future but it is one thing to consider.

Local Experience
Local is a relative term here, since most of us are in different parts of the world. As an example, Anaconda's test suite seems to be mostly written unittest using nose as a runner. While they seem to have used a slightly different strategy than what is proposed here, it would be easier to leverage their experience with testing frameworks if we used the same system they do.

= Conclusion = After a detailed comparison of the two tools, I have come to the conclusion that both nose and py.test are capable of fulfilling our needs for self-testing in AutoQA. However, the point of this comparison was to select one of the two frameworks.

Considering the comparison above, I think that py.test would be the better choice for AutoQA. Py.test has better documentation, more detailed output on test failure, more customizability without resorting to custom plugins and better support for test isolation. While we would not be able to leverage as much local experience with py.test, better documentation should lead us towards finding solutions in that documentation instead of having to rely on the experience of others to find those solutions.