GSOC 2012/Student Application Alexandermezin/Java API changes checker

=Java API changes checker=

Overview
Original idea: Summer_coding_ideas_for_2012

''Libraries written in Java add, remove and modify their public interfaces from time to time. This is normal, but currently it is very hard to guess effect an update of library to new version will have on rest of the system. What is needed is a tool that would be able to tell us that "With update of package java-library to version 2.0, function X(b) has been removed. This function is used in package java-app" ''

I will create a tool that takes a set of .jar archives (and/or .class files) and tries to resolve dependencies between them like C/C++ linker do. Checking if new version of library doesn't break anything will be easy - just replace the library with new version and check dependencies again. Of course, if dependency check fails, this tool must output all information that possibly can help. And, of course, all dependency information must be generated automatically from .jars/.class files. As RPM is just a specially structured archive, it will be easy to read .jars/classes from it. Old dependency information must be kept somewhere, so error reports would contain not only message "Class/method xxx required by yyy not found", but also "Xxx was provided by zzz".

Additionally, I think another feature is useful: this tool will be able to automatically find all dependencies of a specific jar/package.

The need you believe it fulfills
Typically Java searches for a class/method at runtime when it is requested by currently executing code. If you launched a java application and it didn't report any errors at startup, you can't be sure that the application wouldn't crash because of unsatisfied dependencies later.

Proposed tool will be able to ensure that all dependencies can be resolved, without launching or even installing/unpacking java applications. It will enable easier and safer updates of java applications/libraries, and will make java package maintainers' life easier.

Any relevant experience you have
I have much experience of writing code in Java, because in my university java is used for more than half of all programming courses. Also, I am familiar with packaging system. I didn't create a new package, but at least I am able to add a patch or rebuild kernel package with my own configuration.

How do you intend to implement your proposal
I will write a class library and a set of command-line utilities that will:

1. Parse java .class files and generate lists of methods defined and lists of methods and classes referenced by the code inside the class. Parsing will be done using some third-party library, possibly it will be Apache Commons BCEL.

Class files from .jar archives can be easily read using java's class library, .rpm's can be opened using jRPM.

2. Merge lists generated on previous step, and produce a combined list of defined methods/classes and still required methods/classes (unresolved dependencies).

This can be used to generate lists of defined/required methods and classes for .jar file or even .rpm package.

Also, API changes can be viewed using diff on lists generated by different versions of library/package.

3. Record defined classes/methods in database, with package/library name and version, and provide search by class name/method signature on this database.

To check a set of .jars/.rpms, lists of classes/methods should be generated for every class inside them and all these lists should be merged. After that, every element of list of still required classes/methods can be searched in database.

Such set of utilities won't be restricted only to checking if something is broken with update. For example, it will be able to automatically generate list of dependencies. It will be even possible to generate .spec files from output of these utilities.

I think it will be useful to implement the command-line utilities as Ant tasks also.

Maybe it seems that I try to implement too many features, but I think that Apache BCEL will make development significantly easier, and full source code will not exceed 3000-4000 lines.

Final deliverable of the proposal at the end of the period

 * Command-line Java utilities and set of Ant tasks.
 * RPM packages of these utilities.
 * Possibly separate packages for dependencies.

A rough timeline for your progress

 * April, 24 - May, 15 - Discuss additional details about project implementation and future usage. I should have done this before, but I've seen this project idea too late.
 * May, 15 - June, 1 - Implement utility that generates lists of defined and required methods/classes for a set of .class files. It seems that using Apache BCEL I will have to write just few lines of code.
 * June, 1 - July, 1 - Add support for .jar's and .rpm packages. So I will have a completely usable part of project before mid-term evaluations, at least it will be able to show differences in API between library versions. Also write Ant tasks.
 * July, 1 - August, 1 - Implement a database for storing information about defined classes/methods. Add support for searching in it to the code that is written before.
 * August, 1 - August, 13 - Make rpm packages for applications/libraries developed and, possibly, dependencies that isn't in Fedora repositories yet.

I have exams in June, so most of work will be done in July.