Python in Fedora 13
Software developers building applications which require compatibility with future releases of Python will find two significant enhancements in Fedora 13's Python stack.
Tell us about yourself a bit.
Hi, I'm David Malcolm. I got interested in Linux maybe 10 years ago, working on various things in the GNOME community. In my day job I work at Red Hat, and am in the fortunate position of being paid to work on Free Software (yay!) I learned Python a few years ago, and it quickly became my favourite programming language. Red Hat is now paying me to work on making Python better.
What is it that you like about Python?
It fits my mental model of programming very well: things that should be simple are simple, but it can scale up to handle the complicated things as well - without introducing needless complexity as it does it - so I can write a simple script to Get Stuff Done, but it can potentially develop into something larger.
There are 3 Python-related features coming out in F13 Let's start with parallel installation of Python 2 and Python 3. What is this? Why is it cool?
Python 3 fixes a lot of long-standing issues in the language, but the cost of doing that is that a lot of things change from Python 2 to Python 3; in some ways you can think of them as different languages.
When we talk about installing Python, there are three things: the core "runtime", the "standard library", and a host of other 3rd-party modules on top. The standard library is often described as "batteries included" as it does a lot, but even so there's a need for the 3rd-party modules. There are hundreds if not thousands of modules, some of which need other modules, and they're all at some point on a spectrum of support for Python 3.
So this Python 2 vs Python 3 decision is something that a lot of Python developers will be facing soon - "Does the Python 2 or Python 3 universe give me the modules with the functionality I need?"
Python provides a tool called "2to3" which can automatically convert much Python 2 code to Python 3, provided the code follows some rules. Unfortunately it's often not clear which modules are ported yet, and if they need conversion. Some new modules were written directly for Python 3; others are pre-existing modules that already support it, some have only just begun porting.
And Fedora's answer is "well, we'll have both."
For Fedora 13, we provide two Python stacks, the Python 2 one, and the Python 3 one.
And you can use both Python 2 and Python 3 simultaneously - you don't have to pick one or the other.
I'm not sure how many [packages] we have for Python 2 in Fedora, but there are a lot.
A note for our readers who may be Python developers - Python 2 is the existing Python stack in Fedora, so if you've been creating or running Python code in Fedora, Python 2 is what you have been using.
For the Python 3 one, we've tried to provide RPM packages of Python code known to work with Python 3. One approach we could have followed was to simply run "2to3" on everything, but doing that you have no guarantee that the end-result actually does what it's meant to.
So these packages in F13 have been tested to work with Python 3?
Yes. If you see a "python3-foo" RPM in Fedora 13, you know that it should actually work. We're not just throwing them over the wall; we've gone through various modules, picked the ones that are known to work (possibly requiring steps to make them do so e.g. "2to3"), and tested them.
And we're doing this in part because we need the Python 3 stuff ourselves.
We use Python 2 extensively within Fedora. Much of Fedora's web infrastructure is written in Python, and the system tools like the updater ("yum"), the installer ("anaconda"), and a slew of graphical config tools ("system-config-*"). My hope is that for Fedora 14 we can start cutting over some of our tools to Python 3.
What kind of development had to take place in order to make this possible?
We had to make some cleanups to RPM to support multiple Python stacks; I added some tests to the "rpmlint" tool for this. I helped port RPM's Python bindings so that they can work with Python 3 (this is in rpm-4.8.0 IIRC).
One other thing I did was write a tool to help people port their C extension modules. One nice thing about Python is that it makes it very easy to write wrapper code that bridges between Python and C, and there's a lot of this code around. Unfortunately it needs some changes between Python 2 and Python 3. I ran into this porting RPM's python bindings. Half the work requires thought, but the other half is fairly mindless, once you get the hang of it.
So I wrote a tool to help with the mindless parts, which I called 2to3c, in homage to the 2to3 tool. John Palmieri used this to help him port the DBus python bindings.
Nice. I see the download/usage instructions - looks like it's a pretty new project that's looking for testers/feedback/help.
Yes, it's rather bleeding edge right now. Help would be most welcome!
So hopefully we now have an excellent Python 3 platform in Fedora 13: I believe we have a well-tuned build of Python 3, and a good selection of add-on modules available via RPM. This should be useful to people looking to port their code or to learn the language; arguably Python 3 is easier to learn than Python 2; a lot of unnecessary complexity was removed.
What's the best set of instructions for people going "cool, how do I start?"
https://fedoraproject.org/wiki/Features/Python3F13#How_To_Test. I think that section could be improved.
We'll mark those as needing attention, and move on. If anyone would like to help with our Python 3 documentation, please feel free to edit the page! David, any last comments on parallel-installable?
It's something that people have wanted for a while. There have been a few proposals on the matter on the mailing list.
Ensuring that it was independent of the Python 2 stack was the most important detail, so that we can be sure we don't break it. "Don't cross the streams!" (How is this looking?)
You made a Ghostbusters reference. We're all good. Moving on to SystemTap probes! So... I've done a bit of Python development, but I look at the feature description for this and I'm confused. What is this?
So, Systemtap is a tracing/probing/monitoring tool. The idea is (metaphor alert!) that you can stick probes under the hood of the engine and see what's going on. In the past, most of the places where you could probe were in the kernel. For Fedora 13, I've added places to the Python 2 and Python 3 runtimes that you can monitor - specifically, Python function calls. So you can write scripts that watch for calls to a particular module, or watch for calls of a particular Python function, across the whole computer, or just in a given process.
Can you give some examples?
As examples, I provide precanned scripts. I've written a "top"-like tool that shows you all python function calls per-second across the whole system, and [another that] shows you the function call and return hierarchy for all Python that's running. These ought to be useful as is, and people can write their own function calls using systemtap's mini-language.
What sort of Python programmers might care most immediately about this? Are there particular types of projects that this is good for?
I showed [my scripts] to Paul Frields (Ed. note: Paul is a relatively new Python programmer) running on a program that he wrote and his eyes lit up. It's a great teaching tool: you can see what your code is doing, directly.
So it's something that's made to be helpful for programming novices.
One other use case: a busy Python-based website could use this for profiling, see what parts are getting used a lot.
Are there any other technical details we should know?
I should mention that this relates to work done by Sun/Apple on DTrace, which is an analogue to SystemTap. There have been some patches to add this support to Python floating about on the upstream bug tracker for a while - for DTrace, Mark Wielaard added some partial DTrace compatibility to Systemtap. So it looks like (during the Python build) that we're running DTrace, but actually it's all shimming into Systemtap.
I'm still trying to figure out how a Normal Python Programmer would get started with this coolness.
I think a pair of screencasts is the way to go, showing rather than telling.
Ok - we'll make a note to make those screencasts. On to debugging?
Tell us about "Easier Python Debugging." What does that mean?
One of the great things about Python is how easy it is to wrap external libraries (e.g. written in C).
What this means is that if you have some code that's written in another language - C is a common example - that you want to interface with in the Python code you're writing - Python makes it easy to do that. You can have your C code and your Python code "talk" to each other by writing a little bit of Python code to go around the C.
The downside of this is that if one of these libraries has a bug, then that bug takes out the whole of the Python process, without giving you a nice Exception/traceback.
I found an example of a... not-nice Exception/traceback from when this kind of thing happens.
Since we added the ABRT tool, I see a lot of Python crashes - which typically aren't crashes in Python itself, they're crashes in the libraries. I've spent a lot of time debugging these things, and I wanted to make my life easier.
For example, in Fedora 12 (I believe), we shipped GTK-2.18, which contained Alex Larsson's bug rewrite of how GTK writes stuff to the screen, greatly reducing on-screen flicker. But the downside is that a few applications broke. An example turned out to be the "istanbul" screencast-recording tool; figuring that out was "fun."
Python has long had a set of macros - small libraries - for gdb, the gnu debugger, that let you connect to a running (or dying) python process and debug what's going on, but they're fiddly to use and they assume the process is only "lightly broken." For example, they add a "pyo" command, for printing python objects. In theory, it's equivalent to "print" in Python on that object, but if the object is internally corrupt, if you run it, you'll merely get another crash.
The other big problem is that the macros really assume you're proficient with gdb and know your way around the insides of Python. So I started looking for a better way of doing this.
In Fedora 12 (I believe), Fedora gained a shiny new version of gdb. Various people worked on improving C++ debugging, but one of the by-products of that was that gdb 7 now has the ability to be extended using Python. A bunch of Red Hatters added this; it's now possible to write Python code that hooks into the debugger, to pretty-print data types.
What I did was use this to write Python code that knows about the insides of Python itself, so you now have Python code running inside the gdb process, which knows how to scrape data out of another dying process. The practical upshot is that it's now possible to attach to an already-running Python process with gdb and type:
...which will show you the python source code that's currently running,
...which will show you a Python-level backtrace,
...which will take you up the call stack, and
...which will take you down the call stack. And when you print data, it will tell you what the data is, in a meaningful way. So rather than being told the hexadecimal address of where the object is stored in RAM, gdb should tell you that e.g. you have a [1, 2, 3]. Plus, now if ABRT, the Automatic Bug Reporting Tool, detects a crash of a python process, the report should automatically the file/line information at the Python level and the values of all of the Python vars, rather than just hexadecimal noise.
Sounds like another getting-started screencast we should make.
The caveat is that it works well on i686, but less well on x86_64; it ought to work on Python 3, but I think there are some bugs there. I've set it up so that if you install python-debuginfo, it should all Just Work. I think I still have some testing to do on Python 3 for this, so I'd recommend trying it out on python 2, with i686.
Please file bug reports against "python" and "python3" as appropriate - this stuff lives in the -debuginfo subpackages of those src.rpms. If you see a Python traceback inside gdb, then that's likely a bug in my code; please file a bug if you do see this. The code tries to be robust in the face of arbitrary breakage of the process being debugged - we are trying to debug crashes, after all!
Now, this feature is something that was originally made for Fedora - this is the first place it's come out?
Yes. I also recently got this code into upstream, into Python's SVN repository, and it's likely to be in Python 2.7 when that comes out, though it works fine with 2.6.
In other words, the Python community liked the work you were doing so much they decided to make it part of the Python language itself. This is a nice example of Fedora being a place where innovation happens in free software, then goes upstream to benefit the rest of the open source ecosystem.
I believe Debian and Ubuntu have a version of my patch, though I believe their version of gdb doesn't have all of the patches needed to fully support all the extension commands (though the prettyprinting should work for them).
I'm guessing that testing and feedback is the most helpful thing people can do for this feature.
Yes. Please test. I've tried to make it robust, but there are plenty of surprising ways in which a complicated program + libraries can fail - so if you see Python tracebacks inside gdb, please do file bugs. Also, suggestions for ways of making Python easier to debug would be good. For Fedora 14 I want to take this further, e.g. maybe adding python-level breakpoints to gdb.
One nice thing about this feature is that although it's quite "low-level", the code is written in Python, so a Python developer with an idea for making this better may well be able to do so directly. I have a very keen, not-at-all-vested interest in making Python easier to debug!
Thanks, David. By the way, what do you do when you're not hacking?
Hanging out with my wife and cat, puttering about in our garden.
Sounds like the good life.
When it's not raining!
Thanks for taking the time to talk with us, David!