GSOC 2017/Student Application pravinkc

From FedoraProject

Jump to: navigation, search

Contents

Contact Information

Name: Pravin Chaudhary
Email: pravin.chaudhary.me@gmail.com
IRC: pravinkc


About Me

I have tinkered with computers for a very long time now and have been fascinated with them since as long ago as I can think back. My journey with computer science has been the reinforcement of one continuous principle - getting down the levels of abstraction.

When I initially started using MS Paint, I did not know (or care about) operating systems, programs, the hardware. All I had was the MS Paint interface to draw pictures in. It was fascinating. Later, I went down one layer of abstraction and realized that MS Paint sits on top of MS Windows. Moreover, I was astonished to learn that Windows is not the only operating system out there, there are many more. I still remember the first time I tried out GNU/Linux (it was Ubuntu). Later, I realized that there are many distros out there and Ubuntu is not the only one.

I stumbled on Fedora and have stuck to it since then. I always wanted to contribute to the project and I am glad for this opportunity of being finally able to do so.

The Project

The project I wish to work on is Centralized Metrics generation application

I choose this project because of it's importance in the overall objectives of the Fedora Infrastructure team. Managing the tens of live apps we have within Fedora is no easy task and the fedmsg project solves the problem to a large extent. We also have tools like datagrepper which can query fegmsg directly and can be used with a bunch of scripts to generate reports etc. However, it is difficult and inefficient to query datagrepper directly without caching it's output. Building apps that consume specific metrics like all the references pertaining to a single user etc is difficult when using datagrepper directly.

This problem is solved by statscache - it is plugin for the fedmsg-hub that listens to the messages much like datagrepper. Where it shines is it's:

  • extensibility - it supports plugins that can subscribe to certain "topics" on the fedmsg bus
  • fault tolerance - if statscache goes down, it falls back on querying datagrepper for the lost data and passing it to the plugins, so no data is ever lost
  • rest api - it also has a flask front end that can be queried for the data which each plugin receives.

The idea is officially described as:

   Right now, metrics collection in CommOps is not very efficient and requires a lot of manual work. Metrics for various events/FAS groups/users are collected using scripts which query datagrepper and return results. This process is very time consuming and writing scripts each time is a very tedious process. Also, querying the datagrepper to get data everytime is redundant and time-consuming.


I would be working on the following 2 broad areas:

I) Refactoring Statscache itself

1) extend the REST API exposed by statscache by adding more endpoints, refactoring old ones 2) integrate a dashboard framework within statscache

  • which can be used to view the data from various loaded plugins (currently, the data can be viewed as JSON only)
  • each plugin would not need custom graphics logic, it would just focus on gathering and persisting data
  • statscache core would query the plugin data and generate the graphics

3) add more tests to statscache, comment existing/new code, write documentation

II) Writing plugins for statscache

1) integrate fedora-stats-tool into statscache

  • fedora-stats-tool is a command line tool that is used to prepare SVG charts by querying datagrepper
  • this tool can be better re-modeled as a plugin for statscache. This would make it more efficient in that it won't have to reply on loose scripts that query datagrepper

2) integrate thisweekinfedora into statscache

  • this tool is a CLI as well and relies on querying datagrepper.
  • can be refactored to be a plugin to statscache

3) write a reporting plugin

  • this plugin would capture weekly/monthly/quaterly/yearly reports on any metric as required
  • examples include - activity by a certain user in the last year, build failures of a certain app in the last week etc
  • the reports would be generated in various formats (SVG, HTML, PDF ?) by statscache core using it's dashboard framework (see above)

4) write a newcomer plugin

  • this plugin would focus on data from the new joinees of the fedora community by tracking all the references on the fedmsg directed to them
  • can be used to track their progress in the community, the badges earned by them, their activity etc
  • this can improve the onboarding experience

Final list for plugins to be built will be decided as per discussion with mentors and community: https://pagure.io/fedora-commops/issue/105

5) explore the possibility of using the fedmsg bus for static analysis, and if so, write a plugin for that

  • currently, the static analyzer team uses a XML to transfer static scan findings
  • it would be better if fedmsg was used instead, as this would bring the group into the standard practice as followed by the community
  • this would also facilitate building of a metrics dashboard within statscache for the static analyzer results


I have been in contact with the mentors skamath/bee2502 and discussed with them my ideas. I also wrote a plugin for statscache that reads a list of users to track from an enviornment variable and shows all references to them on the fedmsg bus

On successful completion of this project, we would have a central metrics generation system for all of Fedora. All these projects will be re-written as statscache plugins: fedora-stats-tools, thisweekinfedoram, gsoc-stats, cardsite. This would make them more robust (tolerant to break downs), more efficient (they don't have to query datagrepper directly) and give them a unified UI - currently each has their custom logic for generating SVGs, PDFs etc. This custom logic can be removed from the plugins and they would focus only on gathering data. The statscache core would have the dashboard/graphics framework that would handle the graphics logic.


Project Timeline and Workflow

Timeline

Dates Details Deliverable
May 4 - June 15
  • Refactor the statscache code, add documentation, write tests.
  • Focus on adding more comments, pep8tifying the code
  • A refactored statscache core, with better commented code, more tests, better documentation
June 15 - June 26
  • Work on designing the graphics framework within statscache core
  • Expose more REST API endpoints in statscache to query the plugins
  • A detailed layout for the graphics framework
  • A dashboard with basic graphics showing the count of entries captured by the plugins etc
June 26 END OF PHASE 1
The first phase was focused on statscache core. Next, I will complete the graphics framework and add plugins
June 26 - July 28
  • Finish the graphics framework. Write tests.
  • Start moving the datagrepper CLI tools to statscache ecosystem as plugins
  • fedora-stats-tools plugin
  • thisweekinfedora plugin
  • gsoc-stats plugin
July 28 - August 29
  • Add more plugins (cardsite), clear backlogs (if any), fix any inconsistencies in the project
  • Additional tests for the entire ecosystem (statscache+plugins)
  • extensive documentation
  • Finished project, ready for deployment


Why me

I want to continue my journey of diving down layers of abstraction and understanding the internals of a large project like Fedora. I am a very quick learner and have the track record of completing what I propose and do more on top of that as well. I will continue to work for Fedora and this GSoC would only be the start of a lifelong relationship I hope.

Prior commitments and Plans during the summer

I would have nothing other than GSoC on my hands this summer. I would be able to happily put in over 50 hours to this project and will be available on IRC, email, skype thru-out the project period.