GSOC 2012/Student Application syst3mw0rm/HyperKitty

=Replace with your actual long proposal name=

Proposal Description
Please describe your proposal in detail. Include:

Objective:
One of the projects that the Fedora Infrastructure Team is undertaking this year is to create a new archive interface for mailing lists. They've been building an interface named hyperkitty. My Summer of Code work would add some of the planned features to hyperkitty. https://fedoraproject.org/wiki/Fedora_Engineering/FY13_Plan#Mailing_List_Improvement_Application

Why HyperKitty ?
Mailman is in need for replacement of its default pipermail archiver. It is over 10 years old and users’ expectations have changed and their requirements are more sophisticated than the current archiver can deliver on. Mailman3 is the currently under active development and it offers a pluggable architecture where multiple archivers can be plugged to the core without too much pain.

Some of drawbacks of pipermail :
 * It does not support stable URLs.
 * It has scalability issues (It was not suitable for organizations working with hundred of thousand of messages per day, e.g, Launchpad)
 * The web interface is dated and does not output standards-compliant HTML nor does it take advantage of new technologies such as AJAX.

The HyperKitty archiver addresses most of the drawbacks of pipermail.

Overview:

After few discussions about archivers I found that around 80% of the mailing lists currently served by Mailman can be served simply by using maildir with small database (preferably sqlite) with REST architecture. For bigger organizations, Grackle is worth considering. The project is still raw and not in production, however, in my opinion we should just concentrate upon the major mailman audience for the time being and I’ll be willing to revisit grackle integration after some time.

The options worth considering are:

1. Grackle Archive framework : The framework is being developed by Launchpad team to solve their scalability problems. They are planning to use Cassandra to store mails at the backend. Apache cassandra provides scalability with high availability. It is suitable for organizations where hundreds of messages are sent on daily basis. For most of the organizations, using cassandra will be overkill (it is JAVA based). Also, they haven’t yet figured out, how to implement full text search through archives.

2. HyperKitty : It uses mongodb to store the mails at the moment. The project is under active development and the demo is available here : http://mm3test.fedoraproject.org/. Project homepage can be found here : https://fedorahosted.org/hyperkitty/. I’m not sure if it can solve the scalability problem faced by large organizations but it can certainly replace pipermail archiver for 80% of the mailing lists where scalability is not a issue.

3. New Archiver : Build a new archiver from scratch having UI as good as of HyperKitty, good search functionality, which can scale easily. Bulding such an archiver, with no previous experience of working with archivers, I guess it will be too difficult for me.

I believe I should devote my time this summer to work on HyperKitty project as part of GSoC. My reasons include :
 * Avoiding duplication of effort.
 * I will be more likely to achieve my goals with the support of a large, active upstream community.
 * The HyperKitty project is already under active development, so more man power will be available overall and we will be much more likely to release a fully polished archiver by the end of summer.

Project Overview:
HyperKitty is Mailman3 archiver, aimed to address issues listed at ModernArchiving. It is under development by fedora project people. The UI mockups for HyperKitty looks very ambitious. There are several ideas posted by Máirín on her blog at ideas 1 to 16 and ideas 17 to 32.

I’m planning to include the following functionality during GSoC in HyperKitty in no particular order:

1. Login mechanism : The login mechanism is already there in mailman admin side and same has to be implemented in the archiver side. The user once logged in will be able to upvote/downvote, and probably do other stuff which needs user authentication.

2. Promoting good posts : This feature will let the logged in users to upvote/downvote a post based on its relevance and content. It will help mailman users to get to the best posts easily. This will be a standalone feature.

3. User Profile : In the beginning, we should be able to display what are the various previous posts are made by the user. We can also have concept of ‘karma’ in user profiles which will show the activity of user for mailing list.

4. In thread survey : This will help users to give feedback of a particular after going through it. While you’re reading in a thread, you might be thinking to yourself, “ugh, this thread sucks!” or “wow, this is funny, tee hee.” This could help warn others.

Optional Since, I’m not sure how much time each of the feature would take to implement. Also, due to active development, if some of the features are already implemented then I’ll be left with free time in which I would also like to implement the following features as explained in mizmo’s blog post in the following order:

1. List summary page : Useful for newcomer who can easily go through this page to get the rough idea about the list. It will be almost same as current recent page, it will take time to polish it though. It will display the list summary for any newcomer to easily get the idea about the list. 2. #22 Mentioned in thread refs. 3. #2 Embedded keyword highlights 4. #9 Read via Thread Timeline 5. #14 Keyword-Based Thread Browse 6. #10 Top 10 threads

Future I would be involved with HyperKitty development even after completion of my GSoC project. I would like to rethink the backend structure of HyperKitty. HyperKitty is currently using mongodb for its backend data storage. I would like to lead discussion about the best backend storage for HyperKitty with mailman-developers and will implement changes if seems necessary.

Project Schedule :
[Now - Before Coding Period Starts]
 * Set up a working development environment for Mailman by configuring the MTA on my local machine into a working state that receives incoming mail and delivers it to mailman scripts.
 * Get HyperKitty running on my own instance.
 * To familiarize myself with archiver structure and how it is going to be plugged into mailman3.
 * Get familiarized with the current code and fix couple of bugs.
 * Investigate postorious UI and interfaces
 * Lead discussions about each and every aspect of ModernArchiving with mailman-developers.

[2 weeks] [Login mechanism] Task : Login mechanism for users will be implemented.


 * Understand the mailman core and get in touch with mailman-developers to know more about how to implement login mechanism (Terri and florian did it during pycon, IIRC).
 * Login mechanism is to be built on top of mailman core. It’s already there in mailman admin panel, so I will have to explore the code from there.
 * Implement it in archiver.
 * Test to make sure it works.

[2-3 Weeks] [Promote good posts] Task : functionality to let logged in users to promote good posts.


 * Discuss the changes in database.
 * change database structure to accommodate the +/- field used for voting.
 * build the feature in archiver UI.
 * solve any unexpected problems encountered
 * Test to make sure everything works.

[2 weeks] [ Basic User Profile] Task : Basic working user profile.


 * Build basic user profile to display the basic information about user.
 * make the user profile editable.
 * make sure everything works.

[2 weeks] [first phase of testing + fix bugs + Buffer period] Task : write down test cases for features implemented so far. This time is also reserved for any unexpected delays and fixing bugs as they arise.


 * write the documentation of code till now.
 * handling any unexpected problems that came up.
 * Begin rigorous testing by writing out detailed test cases based on expected outcomes.

[Mid Term Evaluation]

[1-2 weeks] [Enhance user profiles]


 * discuss the concept of ‘karma’ with mailman-developers and get feedback.
 * introduce the karma concept in user profiles.
 * preferences to be saved in user profile.
 * come up with algorithms/strategy to calculate the karma.

[3 weeks] [In thread survey] Task : In thread survey.


 * will have to discuss in depth with mizmo.
 * change the db structure accordingly.
 * integrate in the UI.

[1-2 weeks] [Buffer time + bug fixing + documentation + final wrap up]


 * Buffer Time for any unpredicted delay.
 * Work on any unexpected/related things that come up
 * Continue testing and fixing bugs as a result of testing
 * Wrap up testing/bug fixing

[Final evaluation]

Biography :

I am 3rd year year student pursuing my under-graduation in Computer science at IIT Roorkee. I am a open source lover. I like to contribute in open source software because it gives me experience, technical expertise and it has taught me the spirit of teamwork. Last year I was part of the KDE Summer of Code, working with the ownCloud project. I’m quite comfortable with Git, and bzr is not much different in basic functionality. I have working knowledge of Python and Django. I have worked with PostgreSQL and MySQL before but not very much experienced with mongodb though.

Communication

Email will be my preferred method of communication. I will be sending mails about problems encountered and progress made to mailman-developers. Also, I will try to post a blog entry about the progress made on weekly basis. I am available 24x7 on IRC using quassel client-core setup, so I won’t miss any IRC discussions going on.

Have you communicated with a potential mentor? If so, who?
Yes. I am in touch with Toshio Kuratomi toshio@fedoraproject.org a.badger@gmail.com. I have also contacted Duffy (duffy@redhat.com), she might be willing to act as co-mentor. Pingou can also help me during the course of project as a potential co-mentor.