Features/StaticAnalysisOfCPythonExtensions

From FedoraProject

Jump to: navigation, search


Contents

Static Analysis of CPython Extensions

Summary

I'm working on a static analysis tool that can detect common mistakes made in Python extension modules written in C. We'll run it on all such code in Fedora, fixing any problems we find, and send the patches to the appropriate upstream projects.

Owner

Current status

The code works, but only for checking Python's argument parsing API. This can detect real bugs, but the signal:noise ratio isn't great yet.

Having said that, automating reference-count checking is the really compelling aspect of this feature, and that part isn't yet ready. It works on various small examples, but there are plenty of examples of real code where it either crashes or gives misleading results. I want to work on fixing this, but at this stage I'm not going to get it into a shape where it's meaningful for 3rd-party testing by the 2011-07-26 deadline.

Given that, it may be worth either reducing the scope of this feature to the stuff that works, or postponing it to Fedora 17.

My preference is to postpone it to Fedora 17.

See the Fedora 17 continuation of this work: Static Analysis of Python Reference Counts


Detailed Description

Python makes it relatively easy to write wrapper code for C and C++ libraries, acting as a "glue" from which programs can be created.

Unfortunately, there are various mistakes that are commonly made in such wrapper code, and these mistakes can lead to /usr/bin/python leaking memory or segfaulting. There are other mistakes that only manifest as bugs when run on less common CPU architectures.

I'm working on static analysis code for C, to detect common errors in C extension modules for Python. The plan is to integrate this with Fedora's packaging, so that all C extension modules packaged for Python 2 and Python 3 can be guaranteed free of such errors (by adding hooks to the python-devel and python3-devel packages). We can also send fixes for this code as needed to upstream projects, when it reports problems.

For this to be viable, we'll need the tool to achieve a good signal:noise ratio. Part of this will need to involve having "good" error messages, spelling out how the problem occurs, what the impact is, and how to fix.

This will also benefit PyPy. PyPy has its own implementation of the CPython extension API, and certain bugs in extension code can lead to more severe symptoms with PyPy than with CPython. Specifically, some reference-counting bugs that are harmless on CPython can lead to segfaults of PyPy. So by fixing these kinds of bug, we also help PyPy.

Benefit to Fedora

Fedora is already a great environment for doing Python development - having a good-quality static analysis tool integrated into Fedora's build system for python extension modules will make Fedora even more compelling for Python developers. (Naturally the tool will be Free Software, and thus usable on other platforms; but we'll have it first).

The presence of the tool should also make it easier to fix certain awkward bugs, and make it easier to support secondary CPU architectures.

Scope

This involves:

The bugs I intend for the tool to detect are:

There are two approaches to integrating it:

"all in": turning it on by default, by adding the relevant compilation flags to sysconfig/distutils: -fplugin=python2 -fplugin-arg-python2-script=PATH_TO_/cpychecker.py so that all compilation using python-devel and python3-devel uses it, and providing flags to turn it off for when it's problematic.

"gcc-with-cpychecker": package it, leaving it optional, providing a /usr/bin/gcc-with-cpychecker wrapper script, to be invoked in place of gcc, so that people can opt in to using it.

In both cases, I plan to run all of the C Python extension code in Fedora 16 through it.

How To Test

Exactly how to test will depend on which of the two approaches we go with (see "Scope" above)

Try to compile C Python extension code.

I'll provide an example of buggy extension code within the documentation part of the package, to make it easy to verify that GCC detects the bugs.

User Experience

Non-technical end-users of Fedora should see no difference (other than more a robust operating system).

Python users/developers should see additional warnings/errors when building Python extension modules that contain bugs. The exact experience will depend on how much we can be sure that an issue is a real problem; we don't want to impact the ability for people to do automated buildouts from PyPI.

For examples of the output from the checker, see: http://dmalcolm.livejournal.com/6560.html

Dependencies

I'm planning to do this via a GCC plugin that embeds Python, so that I can write the checker in Python itself.

FWIW I also investigated a few other approaches to doing this:

Contingency Plan

There can be various levels of fallback:

I'm not yet sure what the structures of opt-in/opt-out and per-test/per-file/per-build should be.

Documentation

Upstream documentation: http://readthedocs.org/docs/gcc-python-plugin/en/latest/cpychecker.html

Release Notes

Fedora now ships with a gcc-with-cpychecker variant of GCC, which adds additional compile-time checks to Python extension modules written in C, detecting various common problems (e.g. reference counting mistakes). This variant is itself written in Python.

Comments and Discussion