GSOC 2015/Student Application Gulic

Contact Information

Your name: Jovanka Gulicoska
FAS Account: Gulic
Fedora userpage: https://fedoraproject.org/wiki/User:Gulic
Email Address: jovanka.gulicoska AT gmail.com
Blog URL: www.gulic.wordpress.com
Freenode IRC Nick: gulic

NOTE: We require all students to blog about the progress of their project. You are strongly encouraged to register on the Freenode network and participate in our IRC channels. For more information and other instructions contact Org Admins.

please answer following questions

Why do you want to work with the Fedora Project?

I’m using Fedora for a long time now. While working on various GNOME projects, I was using Fedora for development, so working and contributing to Fedora project makes a perfect sense.

Do you have any past involvement with the Fedora project or any other open source project as a contributor?

I’ve been part of Gnome Women Outreach Program from December 2011 till March 2012 and I was working on Empathy. During the program I was mostly working on some design, implementing some new features and fixing bugs mostly about IRC. I also did some refactoring of the code since the change from GTK2 to GTK3.

In 2012 I was part of Google Summer of Code and I was accepted to work on Gnome-Boxes, working on implementation of Saving/Loading(Import/Export) of VMs. I was working on Libvirt-glib package, mostly on libvirt-glib. During the program I was implementing most of the bindings in libvirt-glib, that were needed so the functionality of import/export can be implemented in Gnome-Boxes.

In 2013 I was part of GSOC 2013 and I was working on CERNVM-Online. My work was to create a marketplace for contexts, which are publicly available, and can help other users not to create the same context definition, instead get the one that someone else has already created.

This year I have received a scholarship from HP Helion OpenStack Scholarship for women. Currently I’m working with openstack-infra team and I’m starting to work on Storyboard, implementing file upload.

I’m president of Free Software Macedonia organization and member of the local hackerspace KIKA in Skopje. I’m involved in the development of many internal projects at the hackerspace.

Did you participate with the past GSoC programs, if so which years, which organizations?

Yes GSOC 2012 - Gnome, Gnome-Boxes GSOC 2013 - CERN, CERNVM-Online

Will you continue contributing/ supporting the Fedora project after the GSoC 2014 program, if yes, which team(s), you are interested with?

Definitely yes. Since I’m involved in the free software community in my country I will continue to contribute and support Fedora. As till now, I’ll continue promoting and try to involve more people in the project. I see myself helping and working with Fedora Infrastructure team.

Why should we choose you over other applicants?

I’ve choose Shumgrepper/summershum because I already have lots of experience with Python, Flask and SQLAlchemy and web developement in general. For the past 4 years I have been mostly working as a web developer, so I think that my experience is perfect for this project. I enjoy learning new things and apply and bring what I already know to a project. I’m not afraid of any challenge that I encounter.

Project Details

Overview/Goal

Shumgrepper is a web interface for summershum. Shumgrepper queries from summershum's database which collects the md5sum, sha1sum, sha256sum of every file present in every package in Fedora, which allows the user to find duplicate files in multiple packages. Since Shumgrapper was started last year, there is an API that allows to query by values like sha1sum, sha256sum, md5sum and tar_sum, find the files bundled within a package and compare files and tar files. Still, there is a list of things that needs to be worked on, so Shumgrapper can be deployed and used by users, like design changes, functionality changes, new functionality and improvements. By using datagraper we can request and get the data that we need from summershum's database. The end goal of my proposal is to get a working and deployed version of Shumgrapper.

Project Description and Functionality

In order to get the idea around my proposal, I’m going to divide the work in 2 sections: Design and Functionality/Improvements.

Design:

Overall, the page needs a complete design, so we can get a better user experience. I’ve provided some mockups about the design as an example.

design a logo
index page

   Information about Shumgrepper. Maybe also display latest packages versions

documentation page

   Needs changes since the querying  will be done differently

query by hashes page

   Checkbox for md5sum, sha1sum, sha256sum, tar_sum and a text field

package list page

   Improve pagination and display of the list of packages

single package page

   Display basic information about package and maintainers, link to Fedora Packages, versions of package

files in package page

   Better representation of files in package

files of package versions page

   Better representation of files in package

query by filename page

   Better representation of file details matching this filename

page for comparing two or more tar_files

   Selecting packages, selecting checkbox common or different, change design for displaying results

package history page

   Better representation of common and different files, so user can distinct them, design of table

Some rough mockups for the design: https://www.dropbox.com/s/l1jme61bvyf2lgc/Screenshot%20from%202015-03-27%2016%3A27%3A09.png?dl=0 https://www.dropbox.com/s/knyolzekif1jcs5/Screenshot%20from%202015-03-27%2016%3A41%3A16.png?dl=0

I'll be working on the mockups in more detail during the period before coding officialy starts.

Functionality/Improvements

Because of the new design there will be need to change the calls of already implemented functions. For instance searching by hash will be done with checkbox and a text field, so for this we’ll need new function that accordingly will redirect to the corresponding hash information page. Same goes for query by filename, files in package, query by filename and comparing 2 or more files.

When getting JSON from the query, JSON should be displayed within the page instead of redirecting and provide raw JSON if user needs it

Getting the number of different copies of GPL license, needs to be implemented. For this functionality we need methods that will query data from summershum and check the license, and return the information that we need. This functionality will also provide a way to find out which packages are shipped under the GPL license. This functionality will require database modification for storing the infromation. Querying can be done as following:

 - by selecting a GPL license and get all packages that have that licence
 - selecting package and get it’s license
 - check the license in different package version
 - query by sha1sum, sha256sum, md5sum, tarsum and find the GPL licence

By requesting this data from summershum, we can also use it to find which packages are using the old FSF address.

Add functionality to request and get file that has a particular tarball md5sum and a particular filename

For single package page, we can pull some basic information about the package, such as description and package administrators, from Fedora Packages page and provide a link to Fedora Packages

Displaying changes of packages versions, or when comparing common/different files we can make it something git-like or with chart, so user can easily see what is the difference between the two versions. Write functions according to the design.

Improve database performance by investigating the effect of adding postgres ‘indexes’. Indexes enhance database performance by allowing the server to find specific rows much faster, but they also add overhead in the database. Postgres offers several index types, such as B-tree, Hash, GiST and GIN. I’ll work on finding the best suited index for the database in order to enhance the speed. Accordingly there will be changes in the models in the database. Getting faster response from the application is very important, since the data that the website holds will grow in time and get larger.

History page displays information of how package was changed across versions, and there is no way of determinating which files have changed and which stayed the same. This can be improved by displaying the changes in chart or git-like style, so user can determinate easier what has been changed.

For better user experience instead of redirecting the user through the page, more information for example about the packages, like files, versions etc. can be combined and displayed in one page

Unit tests

Deployment of project

Testing

Timeline

	Task
April 28 - May 25	Community bonding period, work on design mockups, communicate with mentors and community
May 25	Official program start
May 25 - June 14	Start to work on project, working on investigating adding postgres ‘indexes’, work on design and implement
June 15 - June 25	Design implementation, changes in functions for design, continue work on adding ‘indexes’, implement query by md5 and filename
June 26 - July 3	Mid-term evaluations, testing and improving application
July 4 - July 19	Work on querying GPL licence, implement design changes
July 20 - August 2	design changes, unit test, bug fixing
August 3 - August 16	Fixing bugs, make final changes, working on documentation
August 17 - August 21	Pencils down
August 28	Final evaluation deadline

NOTE: During the internship period I will constantly communicate with mentors and community, and if needed there will be changes in the timeline, accordingly to what needs to be done. Also some tasks, improvements or implementation that is not taken into consideration and it's important will be implemented.

Search