From Fedora Project Wiki
 
(72 intermediate revisions by the same user not shown)
Line 30: Line 30:
'''Great Learning Opportunity'''
'''Great Learning Opportunity'''


Due to the flat hierarchy in Fedora, I have already collaborated with or worked under some of the long term contributors and important figures in Fedora community. This experience has been a great learning opportunity in many different ways and I look forward to many such chances in the future.
Due to the flat hierarchy in Fedora, I have already collaborated with or worked under some of the long term contributors and important figures in the Fedora community. This experience has been a great learning opportunity in many different ways and I look forward to many such chances in the future.


I look forward to work and be involved with Fedora. I aim to stick around and become a long term contributor in the Fedora community.
I look forward to work and be involved with Fedora. I aim to stick around and become a long term contributor in the Fedora community.
Line 38: Line 38:
Yes, I have been involved with the Community Operations team since the past six months. Some of my past contributions include -
Yes, I have been involved with the Community Operations team since the past six months. Some of my past contributions include -


* Collaborated with [[User:Jflory7|Justin Flory]] on [https://communityblog.fedoraproject.org/women-in-computing-and-fedora/ Women in Computing and Fedora article].
==== Statistics related Contributions ====


* Helped [[User:Jkurik|Jan Kurik]] organize ''' F23 Elections''' ! Also compiled the post-election metrics. Read more about the F23 elections on the CommOps retrospective [https://communityblog.fedoraproject.org/commops-2015-elections-retrospective/ here].
* Data Analytics to understand impact of FOSDEM : [https://networksfordata.wordpress.com/2016/03/08/fedora-at-fosdem/ read here] and [https://github.com/fedora-infra/fedora-stats-tools/blob/develop/event-activity.py code here]
* Year in Review metrics for Fedora CommOps : read the report with information about API queries, analysis and data visualizations [https://networksfordata.wordpress.com/2016/01/22/2015-in-numbers-fedora-commops/ here]
* Community Blog statistics : read the report with information about API queries, analysis and data visualizations [https://networksfordata.wordpress.com/2016/01/22/2015-in-numbers-fedora-community-blog/ here]
* Outreachy Impact metrics : [https://communityblog.fedoraproject.org/women-in-computing-and-fedora/ read here] and [https://apps.fedoraproject.org/datagrepper/charts/line?user=charul&user=pjha&user=riecatnor&user=ktnode&user=housewifehacker&user=smanuel16&user=marija&user=keekri&user=bee2502&user=dhrish20&user=devyani7  related API query here]
* F23 Dec/Jan Election related metrics : [https://communityblog.fedoraproject.org/commops-2015-elections-retrospective/ read here] and related statistics  [https://admin.fedoraproject.org/voting/results/famsco-nov-dec-2015 here], [https://admin.fedoraproject.org/voting/results/fesco-nov-dec-2015 here] and [https://admin.fedoraproject.org/voting/results/council-nov-dec-2015 here]
* IRC metrics using fedmsg activity and datagrepper : [https://communityblog.fedoraproject.org/meetbot-data-analytics-peek-fedora-irc-meetings/ read here] and [https://github.com/fedora-infra/fedora-stats-tools/blob/develop/scripts/meetbot_stats.py code here]
* Spammer Activity in Fedora - some graphs [https://lists.fedoraproject.org/archives/list/commops@lists.fedoraproject.org/message/4WREEL7EBU7X26Y33GEINLD5P3RWLM7S/ ML thread here] and related API query [https://apps.fedoraproject.org/datagrepper/charts/line?topic=org.fedoraproject.prod.fas.group.member.apply&delta=2592000 here] and [https://apps.fedoraproject.org/datagrepper/charts/line?topic=org.fedoraproject.prod.fas.group.member.apply&delta=31536000 here]


* Compiled Fedora IRC metrics [https://communityblog.fedoraproject.org/meetbot-data-analytics-peek-fedora-irc-meetings/ here]
==== Other Technical Contributions ====


* Fedora Badges post for Newcomers : 'How to get started with Fedora Badges?' [https://networksfordata.wordpress.com/2015/10/19/fedorabadges/ here].
* Contributed to Fedora Hubs for gaining technical knowledge of the codebase which would be helpful in developing metrics related widgets.
Issues I have fixed include :


* Some bug fixes for fedora-infra repo along with contributing to Community Blog and Fedora Magazine.
[https://pagure.io/fedora-hubs/issue/106 https://pagure.io/fedora-hubs/issue/106]


* Community Blog statistics [https://networksfordata.wordpress.com/2016/01/22/2015-in-numbers-fedora-community-blog/ link here]
[https://pagure.io/fedora-hubs/issue/96 https://pagure.io/fedora-hubs/issue/96].


I am also a member of Fedora Women and recently started contributing to Fedora Hubs development too.
You can see my closed PR's [https://pagure.io/fedora-hubs/pull-requests?status=False&author=bee2502 here]


Apart from Fedora, I have done an Open Source Data Analytics project for Measurement Lab
My Hubs related fedmsg activity [https://apps.fedoraproject.org/datagrepper/raw?category=pagure&user=bee2502 here]


===Did you participate with the past GSoC programs, if so which years, which organizations?===
==== Other Contributions ====


No
* Collaborated with [[User:Jflory7|Justin Flory]] on [https://communityblog.fedoraproject.org/women-in-computing-and-fedora/ Women in Computing and Fedora article].


===Will you continue contributing/ supporting the Fedora project after the GSoC 2016 program, if yes, which team(s), are you interested with?===
* Helped [[User:Jkurik|Jan Kurik]] organize ''' F23 Elections''' ! They were the fourth most participated elections in all time.Read more about the F23 elections on the CommOps retrospective [https://communityblog.fedoraproject.org/commops-2015-elections-retrospective/ here].


I will, of course. I'll continue with the CommOps team and Hubs development. I am also interested in being an Ambassador(but that's for a bit later)
* Completed the [https://fedoraproject.org/wiki/CommOps/Join CommOps Join Process]. Wrote a Fedora Badges post to aid Newcomers : 'How to get started with Fedora Badges?' [https://networksfordata.wordpress.com/2015/10/19/fedorabadges/ read it here].


===Why am I the best fit for this project idea?===
* Helping in diversity and women outreach efforts of Fedora by being an active member in Fedora women community.


I am really passionate about Data Analytics. With data, I want to understand and impact the community by bringing to light the critical issues along with identifying our strengths and weaknesses to help the leadership make informed decisions.My work in the Community Operations team at Fedora has revolved around these areas and I couldn't be more grateful for this wonderful experience and the awesome community.
* I have also contributed to the Fedora Community Blog (see my works [https://communityblog.fedoraproject.org/author/bee2502/ here]) and to Fedora Magazine(see my works [https://fedoramagazine.org/fedora-looks-back-ahead-women-computing/ here])


Apart from that,
* I have a good knowledge of the wiki,Trac,IRC and mailing-lists and I am comfortable with using them to communicate effectively.    I have communicated and interacted with my mentors and other team members in IRC and ML and having been involved with CommOps, I understand the ethics and values that make up the Fedora Community.


* I'm really passionate about open source, love the CommOps and Fedora community and I will continue to contribute to Fedora and CommOps even when the project ends.
* You can see my overall contribution activity via fedmsg [https://apps.fedoraproject.org/datagrepper/raw?user=bee2502 here] and [https://apps.fedoraproject.org/datagrepper/charts/line?user=bee2502 here]
So, Choose me ! Choose me ! Choose me !


// something about watching CommOps grow
Apart from Fedora, I have done an Open Source Data Analytics project for Measurement Lab
// Fedora community here


* "Bee has been a founding member of the CommOps team since October 2015. In her time contributing to CommOps, she has helped with F23 elections (which was the fourth most participated in election in Fedora history), generated metrics analyzing impact at the FOSDEM conference and telling the story of Fedora's Ambassadors in quantifiable terms (and being featured on the Fedora Magazine for it), and added her unique perspective and wisdom into the decision-making behind many CommOps decisions. Bee has been an integral part of helping CommOps succeed." --[[User:Jflory7|Jflory7]] ([[User talk:Jflory7|talk]]) 14:35, 16 March 2016 (UTC)
===Did you participate with the past GSoC programs, if so which years, which organizations?===


==Project Proposal==
No


===Overview===
===Will you continue contributing/ supporting the Fedora project after the GSoC 2016 program, if yes, which team(s), are you interested with?===


CommOps
I will, of course. I'll continue with the CommOps team and Hubs development. I am also interested in being an Ambassador(but that's for a bit later)


Metrics
===Why am I the best fit for this project idea?===
Overall Goals for these Tasks include :


-Learn more about Fedora Users as well as Contributors.
I am really passionate about Data Analytics. With data, I want to understand and impact the community by bringing to light the critical issues along with identifying our strengths and weaknesses to help the leadership make informed decisions. My proposal for the Community Operations slot for Fedora in GSoC revolves around this idea. Along with sound technical skills required to implement this proposal, I also feel that I have the required non-technical skills ideal for effective open source contributions
-Improve Contributor Experience as a whole.
-Along with onboarding strategies, Improve contributing experience for newcomers.
-Improve Engagement of existing Fedora Contributors(through badges series' and events)
-Suggest strategies for making passive/stagnant contributors more active.
-Observe and Record Fedora Contributor behavior.
-Identify common trouble spots for Fedora Users and suggest Improvements.  
* ''' Develop Metrics to learn more about Fedora Users as well as Contributors to aid in generate strategies using these metrics.'''


We can also use data analytics to find triggers which caused contributors to become passive ?
Some relevant points include :
-Can we Relate this patterns to newcomers to identify their longetivity and supress such triggers ?
-How to help passive contributors make a comeback ? How to increase such numbers ?
-Are there any such previous comeback cases ?


Impact
* I'm really passionate about open source, love the CommOps and Fedora community and I will continue to contribute to Fedora and CommOps even when the project ends.
* I am comfortable with coding in Python , C++ , R and can write queries in SQL. I also have intermediate knowledge of HTML and CSS.
* I have working knowledge of fedmsg system and datagrepper queries and have done related data analytics projects before [https://fedoraproject.org/wiki/GSOC_2016/Student_Application_bee2502#Statistics_related_Contributions link here]
* I also know Machine Learning and NLP and I am interested in using these techniques to understand Fedora community better.
* I am learning Data Visualization techniques like d3.js so that I can develop interactive visualizations from data.
* I also have contributed to Fedora Hubs development in the past and have knowledge of the codebase.Issues I have fixed include :


[https://pagure.io/fedora-hubs/issue/106 https://pagure.io/fedora-hubs/issue/106]


=== Tasks List ===
[https://pagure.io/fedora-hubs/issue/96 https://pagure.io/fedora-hubs/issue/96].


==== POSSIBLE METRICS IDEAS ====
You can see my closed PR's [https://pagure.io/fedora-hubs/pull-requests?status=False&author=bee2502 here]. My Hubs related fedmsg activity [https://apps.fedoraproject.org/datagrepper/raw?category=pagure&user=bee2502 here]. Being familiar with the codebase, I can help in CommOps related tasks for Hubs development


===== STACKOVERFLOW SURVEY DATA ANALYSIS =====
* My contributions to Fedora have not just been limited to technical aspects. To gain a deeper understanding of the Fedora Project, I have tried to contribute in diverse areas including helping [[User:Jkurik|Jan Kurik]] organize F23 elections(which was the fourth most participated elections in Fedora history) , writing a Fedora Badges article to help newcomers[https://networksfordata.wordpress.com/2015/10/19/fedorabadges/ link here], contributing to Community Blog (see my works [https://communityblog.fedoraproject.org/author/bee2502/ here]) and to Fedora Magazine(see my works [https://fedoramagazine.org/fedora-looks-back-ahead-women-computing/ here]) and helping in diversity and women outreach efforts of Fedora by being an active member in Fedora women community.


* Demographical analysis of data (country, age, gender) and experience and employment
* I blog regularly, and I believe this will help me develop interesting and well laid-out documentation as well as data analytics reports for the project.
* Analysis of teens and college going students to identify OS preferances and measure impact of the University Involvement initiative.
* I have a good knowledge of the wiki,Trac,IRC and mailing-lists and I am comfortable with using them to communicate effectively.
ML Discussion [https://lists.fedoraproject.org/archives/list/commops@lists.fedoraproject.org/message/YRP5VAA44TYXQPWRJKBUXECI6EKQ66XV/ here]
* I have communicated and interacted with my mentors and other team members in IRC and ML and having been involved with CommOps, I understand the ethics and values that make up the Fedora Community.


===== FEDORA EVENT STATISTICS =====
* '''If you haven't guessed it by now, I really love CommOps and contributing to Fedora and GSoC offers me a great opportunity to do so over the summer !!! ''' Additionally, I get to work on statistics and Machine Learning - what more could I ask ?!


* Analyse the participation in past Fedora events including conferences like FOSDEM , FLOCK
==Project Proposal==
* Develop metrics for contribution activity of participants
* Analyse the short term and long term impact of the events on Fedora Community


=====  fedmsg STATISTICS  =====
===Overview===


* Generate Fedora Wiki Monthwise/ Weekwise/ Daywise topic related metrics
Fedora Community Operations(CommOps) : Statistical Simulation and Data Analytics for Fedora Infrastructure Message Bus Activity
* Timewise metrics for the past year for all topics, if possible
* Identify active projects from fedmsg statistics
* Which build has most contributions ?
* How many contributors are just long-tail from packaging one thing?


===== FEDORA BADGES STATISTICS=====
[https://fedoraproject.org/wiki/CommOps Community Operations], a.k.a. CommOps, aims to address the area of community infrastructure by providing the tools, resources, and utilities for the different subgroups of Fedora to increase communication across the Project.


* Compile an overall Badges Metrics Report to find fedoraproject.org Improvement Areas
Because of the fedmsg stack, Fedora has very detailed raw data on Fedora contributor activity. My proposal revolves around programmatically querying Datagrepper API for data collection to build automated tools using Statistical and Machine Learning techniques for data analysis and visualization for different parameters.
(eg : Only 8.1 % of overall contributors have earned "Baby Badger" ! -> 91.9% of contributors have never logged into their Fedora Badges Account -> Are these passive/active ? Need to promote Fedora Badges more in fedoraproject.org community)


* Monthwise Trends in Badges Collection . By badge type, ideally :)
=== GOALS ===


===== MAILING LISTS STATISTICS =====
====  STATISTICAL TOOL FOR FEDORA EVENT ANALYTICS====


* Mailing list activity (Daywise, Weekwise, Monthwise)
* Develop automated tools for data collection using tahrir API in conjunction with Fedora Infrastructure Message Bus activity  
* Overall Metrics like Average size of thread , number of people in a thread, trends in a thread as possible go
* Develop statistical tools and algorithms for real-time data analysis using numpy and pandas python libraries.
* Mailing List wise Metrics like which lists are highly active lists(get more posts),levels of traffic ,traffic over time , length of discussions
* Also, identify and code suitable Clustering algorithms for demographical analysis using scipy and sci-kit learn python libraries
* Also answer questions like : Where are posters from? Redhat v.s. Non-redhat (can be misleading, because mattdm uses fp.o address, but you get idea)
* Provide real - time interactive data visualizations using suitable tools from matplotlib or d3.js
* ML Discussion [https://lists.fedoraproject.org/archives/list/commops@lists.fedoraproject.org/thread/W2OMDI5MO3BN7SFHPIJ3DZD6VN5R63YU/#VK5PRO4HW5J7UKZFSZBR6IFR7YF6D5KO here] and [https://lists.fedoraproject.org/archives/list/commops@lists.fedoraproject.org/thread/GJCVSS6M4KOT4CZBDFQECFVXWPRJ4BAO/#ZPR2XMYHU4BMZP5ESLVQJJIJEPYFYJNH here]


===== BUGZILLA and GITHUB STATISTICS =====


* Timewise statistics of bugs/issues like Bug turnaround/Ticket Turnaround.
==== STATISTICAL TOOL FOR FEDORA INFRASTRUCTURE MESSAGE BUS ACTIVITY ANALYTICS  ====
* Identifying repositories/bugs which need most help.
* If possible ,system to identify bugs/issues suitable for newcomers. (Need more discussion on this ).


=====FEDORA CONTRIBUTOR STATISTICS =====
* Develop automated tools for data collection using Datagrepper API queries for Fedora Infrastructure Message Bus activity.
* Develop statistical tools and algorithms for real-time data analysis using numpy and pandas python libraries for generating sub-project wise metrics, contributor wise metrics like mean Contributor Age(Fedora Activity wise), Retention  Rate of Contributors
* Generate programmatic python scripts for Time Series Modelling of data using scikit-learn and scipy libraries in python.
* Also, identify and implement suitable Machine Learning algorithms(like Temporal Clustering) to find similarity patterns in sub-project activity, build contributions etc
* Provide real - time interactive data visualizations using suitable tools from matplotlib or d3.js


* Red Hat vs Non Red Hat Contributors
* Develop an automated tool for the fedmsg statistics for quarterly report for Fedora Project.
* Area wise top contributors
* Develop statistical tools to identify long-tail patterns in contribution activity(How many contributors are just long-tail from packaging one thing?)
* Average Contributor Age(Fedora Activity wise) , Retention  Rate of Contributors
* Use Machine Learning algorithms like Logistic Regression,SVM or Neural Networks to distinguish Redhat vs Non-redhat contributers on lists and conduct a statistical analysis.
* Does contribution activity over time follow a longtail pattern ?
* Develop a Temporal Clustering based tool to identify similarity in contribution patterns for long-time contributors ( Do successful/old contributors have diverse contributions ? Are their contributions in bursts or continous over a period of time ? ) (optional)
* Alternatively, statistics tools could also be implemented as [https://github.com/fedora-infra/statscache statscache] plugins instead of automated python scripts, depending on feasibility.


* Identify patterns in contribution behavior for successful contributors ( Do successful/old contributors have diverse contributions ? Are their contributions in bursts or continous over a period of time ? )
* ML Discussion [https://lists.fedoraproject.org/archives/list/commops@lists.fedoraproject.org/message/ZXD2YGW2UREARMNGOUJRMW5YLFG7NCAR/ here] , [https://lists.fedoraproject.org/archives/list/commops@lists.fedoraproject.org/message/GZ5HB7KYZ4AF53NGAOSQXATVCSKKJ5PJ/ here]and [https://lists.fedoraproject.org/archives/list/commops@lists.fedoraproject.org/message/ZXD2YGW2UREARMNGOUJRMW5YLFG7NCAR/ here] and related ticket on CommOps Trac instance [https://fedorahosted.org/fedora-commops/ticket/32 here] and [https://fedorahosted.org/fedora-commops/ticket/31 here]


* Do people care about things outside of their own packages? ( How many developers have a significant number of content badges and visa versa? )
==== STATISTICAL TOOL FOR MAILMAN/HYPERKITTY ACTIVITY ANALYTICS  ====


* '''Data Analytics for Newcomer Retention and improving contribution activity of community'''
* Develop automated tools for data collection using HyperKitty API in conjunction with Fedora Infrastructure Message Bus activity
* Analyse Fedora Badges Activity of Newcomers to identify suitable tasks
* Develop statistical tools and algorithms for real-time data analysis using numpy and pandas python libraries to generate statistics like mean/median size of ML thread , number of people in a thread, mean length of discussions, redhat vs non-redhat activity.
* Identify impact of Attending Fedora Events on Contribution Activity
* Generate programmatic python scripts for Time Series Modelling for ML activity data using scikit-learn and scipy libraries in python to identify activity patterns(bursts/highs and lows).
* ML Discussion [https://lists.fedoraproject.org/archives/list/commops@lists.fedoraproject.org/message/ZXD2YGW2UREARMNGOUJRMW5YLFG7NCAR/ here]
* Also, identify and implement suitable Clustering algorithms to find activity-wise and trend-wise similarity patterns in lists.
* Provide real - time interactive data visualizations using suitable tools from matplotlib or d3.js
* Alternatively, statistics tools could also be implemented as [https://github.com/fedora-infra/statscache statscache] plugins instead of automated python scripts, depending on feasibility.


===== FEDORA ELECTION METRICS =====
* ML Discussion [https://lists.fedoraproject.org/archives/list/commops@lists.fedoraproject.org/thread/W2OMDI5MO3BN7SFHPIJ3DZD6VN5R63YU/#VK5PRO4HW5J7UKZFSZBR6IFR7YF6D5KO here] and [https://lists.fedoraproject.org/archives/list/commops@lists.fedoraproject.org/thread/GJCVSS6M4KOT4CZBDFQECFVXWPRJ4BAO/#ZPR2XMYHU4BMZP5ESLVQJJIJEPYFYJNH here]. Related tickets on CommOps Trac instance [https://fedorahosted.org/fedora-commops/ticket/42 here] and [https://fedorahosted.org/fedora-commops/ticket/26 here]


* Work with the Fedora Community Operations Team in improving Voter Turnout.
==== STATISTICAL TOOL FOR BUGZILLA ANALYTICS ====
* Do people feel elections are meaningful? Do they not vote because they are confident, or not confident?
* Do the loud people who get elected really represent the base? Is there a vast silent majority?
* Do people get elected on incumbency?
* Was bundling really such a big issue?
* Agenda and Campaign Analysis of Candidates, if possible


=====Collaborate with other teams in Fedora Community on metrics related tasks wherever required.=====
* Develop automated tools for data collection using Bugzilla API in conjunction with Fedora Infrastructure Message Bus activity
* Develop statistical tools and algorithms for real-time data analysis using numpy and pandas python libraries.
* Generate programmatic python scripts for Time Series Modelling for data using scikit-learn and scipy libraries in python to identify activity patterns(bursts/highs and lows/mean Bug turnaround time).
* Also, identify and implement suitable Clustering algorithms to find activity-wise and trend-wise similarity patterns.
* Identify relevant algorithms and develop Machine Learning based tool to identify easy-fix or most relevant bugs (Optional)
* Provide real - time interactive data visualizations using suitable tools from matplotlib or d3.js
* Alternatively, statistics tools could also be implemented as [https://github.com/fedora-infra/statscache statscache] plugins instead of automated python scripts, depending on feasibility.


* Work with Fedora Diversity and Inclusion Advisor to  programmatically create, deploy, and most importantly, analyze the  Contributor Demographics Survey
=== STRETCH GOALS ===
* Collaborate with other teams like Design team, Infra Team and Marketing Team on Metrics Related Tasks


==== AUTOMATE METRICS related TASKS ====
==== STATISTICAL TOOL FOR FEDORA BADGES ====


* Automate the fedmsg statistics for quarterly report for Fedora Project. ML Discussion [https://lists.fedoraproject.org/archives/list/commops@lists.fedoraproject.org/message/GZ5HB7KYZ4AF53NGAOSQXATVCSKKJ5PJ/ here]
* Develop automated tools for data collection using tahrir API in conjunction with Fedora Infrastructure Message Bus activity
 
* Develop statistical tools and algorithms for real-time data analysis of badge collection activity using numpy and pandas python libraries.
* Automate Elections related metrics Tasks
* Also, identify and code suitable Clustering algorithms for demographical analysis using scipy and sci-kit learn python libraries
 
* Generate programmatic python scripts for Temporal Analysis for data using scikit-learn and scipy libraries in python to identify activity patterns
==== COMMOPS TOOLBOX  ====
* Provide real - time interactive data visualizations using suitable tools from matplotlib or d3.js
 
* Develop Tools for CommOps Toolbox


==== FEDORA HUBS WIDGETS ====
==== FEDORA HUBS WIDGETS ====


* Componentization of CommOps deliverables into Fedora Hubs Widgets.
* Componentization of CommOps deliverables into Fedora Hubs Widgets.
* Develop metrics related widgets for Fedora Hubs
* Develop metrics and statistics related widgets for Fedora Hubs


==== Some other cool Ideas bee2502 would like to work on ====
==== Some other cool Ideas bee2502 would like to work on ====


* NLP analysis to find the expertise Fedora Contributers
* '''Automated NLP-based tool to find the expertise Fedora Contributers'''


To answer the question "How can best solve my doubt?" OR "Who is the most qualified person for this task?" using meeting logs from IRC meetings
Develop an automated tool using NLP techniques to find the expertise of Fedora contributors using meetbot logs of IRC meetings.
We are looking to answer questions like "Who can best solve my doubt?" OR "Who is the most qualified person for this task?"  
NLP libraries for python like gensim or nltk will be used for tool development.


* Badge Recommendation Engine widget for Hubs
* '''Badge Recommendation Engine widget for Hubs'''
    
    
Much along the lines of Stack Overflow Badge Recommendations : "You are 50% of the way to earning the 'Master Editor Badge' "
Much along the lines of Stack Overflow Badge Recommendations : "You are 50% of the way to earning the 'Master Editor Badge' "
Provide recommendations like "80% of contributors who last collected 'White Rabbit Badge' went on to collect 'Origin Badge' next "
Provide recommendations like "80% of contributors who last collected 'White Rabbit Badge' went on to collect 'Origin Badge' next "
Develop an automated tool with backend using Tahrir API to fetch data.
Identify suitable Recommendation Algorithms like Collaborative Filtering and develop the engine using them


This could be especially helpful for newcomers to explore different areas of Fedora Project
This could be especially helpful for newcomers to explore different areas of Fedora Project
Some related representations by mizmo : https://fedoraproject.org/wiki/Fedora_RPG_OLD
Some related representations by mizmo : https://fedoraproject.org/wiki/Fedora_RPG_OLD


==== Other CommOps related stuff ====
* '''Automated Tool to publish IRC meetings word clouds to social media like Twitter'''


* Publish Wordclouds based on IRC Meetings to twitter and CommBlog.
Generate wordclouds from meetbot data using NLP techniques or wordcloud tools/libraries for python.
Develop a tool using Twitter API to publish these wordclouds to Fedora handles on social media.


* Onboarding Series Badges
==== STATISTICAL TOOL FOR GITHUB ANALYTICS ====


* Update fedmsg Documentation. Ongoing work [https://github.com/fedora-infra/fedmsg_meta_fedora_infrastructure/pull/338#issuecomment-152013863 here]
* Develop automated tools for data collection using Github API in conjunction with Fedora Infrastructure Message Bus activity
* Develop statistical tools and algorithms for real-time data analysis using numpy and pandas python libraries.
* Generate programmatic python scripts for Time Series Modelling for data using scikit-learn and scipy libraries in python to identify activity patterns(bursts/highs and lows/mean issue turnaround time).
* Also, identify and implement suitable Clustering algorithms to find activity-wise and trend-wise similarity patterns.
* Identify relevant algorithms and develop Machine Learning based tool to identify easy-fix or most relevant issues (Optional)
* Provide real - time interactive data visualizations using suitable tools from matplotlib or d3.js


* Selecting and building the Delegation of Subproject membership within the CommOps team. More info [http://decausemaker.org/posts/proposal-commops-for-fedora.html here]
=== Final Deliverables ===


* Editing Wiki Pages/ Wiki Gardening tasks whenever possible
* Automated statistics tools(scripts in python,preferably) for data collection,analysis and visualizations to be committed to the [https://github.com/fedora-infra/fedora-stats-tools fedora-stats-tools github repo] and/or [https://fedoraproject.org/wiki/CommOps#Toolbox Community Operations Toolbox] whichever appropriate. Alternatively, statistics tools could also be implemented as [https://github.com/fedora-infra/statscache statscache] plugins instead of automated python scripts, depending on feasibility.


* Assist Diversity Advisor on various diversity related issues. Actively participate in Fedora Community and be involved with projects like Fedora Women
* Data files(.csv) to be committed to the [https://github.com/fedora-infra/fedora-stats-tools fedora-stats-tools github repo]


=== Final Deliverables ===
* Analysis Reports and documentation of work to be published to community via Mailing lists, Fedora Planet and/or Community Blog posts whichever appropriate.


* Report back weekly on Community Operations to Mailing Lists, Community Blog, and other channels when appropriate.
* Report back weekly on Community Operations to Mailing Lists, Community Blog, and other channels when appropriate.
* Metrics deliverables would include data files(.csv), automated python scripts used to generate the statistics(committed to the [https://github.com/fedora-infra/fedora-stats-tools fedora-stats-tools github repo]) along with data visualizations and reports published to community via Mailing lists, Fedora Planet and/or Community Blog posts whichever appropriate.
'''Must Deliverables '''
* fedmsg Statistics
* Mailing List Statistics
* Stack Overflow
* Event Statistics
* Automating Quarterly Metrics
* Fedora Contributor Statistics
* Work with Fedora Diversity and Inclusion Advisor to programmatically create, deploy, and most importantly, analyze the Contributor Demographics Survey
''' Optional Deliverables(Time and Priority wise)'''
* Metrics Widgets for Fedora Hubs
* Fedora Badges statistics
* Bugzilla and Github statistics
* Automating Election related tasks
=== Related Initial Contributions during the Application Period ===
*And I have learnt to communicate in mailing lists and IRC channels as well by subscribing into the Fedora summer-coding mailing list and Fedora developers mailing and as well as to the IRC channels.
*I have communicated with the mentors Remy Decausemaker(decause) , Corey Sheldon(linux modder) and Justin Flory(jflory7) via mailing lists and via IRC and got to know more about the project and the technical things that I need to master in developing this project.


=== Timeline ===
=== Timeline ===


I would like to start having a look and master the technical stuff that I need to fulfill even before the Community bonding period starts.
I would like to start having a look and master the technical stuff that I need to fulfill even before the Community bonding period starts.
====Upto the start of Community Bonding Period (25th of March - 22nd of April)====
* Automating Quarterly metrics script for fedmsg
* Stack Overflow Data Analytics


====Community bonding period (22nd of April - 25th of May)====
====Community bonding period (22nd of April - 25th of May)====


* Generalizng Fedora Event Statistics(FLOCK and other events)
* Work on improving the technical skills needed for the project (especially data visualizations)
* Stack Overflow Data Analytics
* Understand the bugzilla API
* Fedora Contributor Statistics
* Discuss and finalize the Machine Learning Algorithms neccesary for FEDORA INFRASTRUCTURE MESSAGE BUS ACTIVITY statistical analysis tool
* Discuss project specifications with mentor.
* Develop an automated tool for the fedmsg statistics for quarterly report for Fedora Project.


====Work Period until mid-term evaluations (25th of May – 20th of June)====
====Work Period until mid-term evaluations (25th of May – 20th of June)====


* Fedora Contributor Statistics
* Work on STATISTICAL TOOL FOR FEDORA EVENT ANALYTICS
* fedmsg statistics
* Work on STATISTICAL TOOL FOR FEDORA INFRASTRUCTURE MESSAGE BUS ACTIVITY ANALYTICS
* Automate Election Related Tasks
* Communicate to the team regarding weekly status
* Update personal blog posts to be syndicated on Fedora Planet as per the progress


====Period of submitting mid-term evaluations (20th of June - 27th of June)====
====Period of submitting mid-term evaluations (20th of June - 27th of June)====


* Election Metrics
* Clean Code and test for bugs.
* Fix related bugs and write documentation.
* Code review by mentor.
* Submitting and completing midterm evaluations.
* Update personal blog posts for weekly status, to be syndicated on Fedora Planet


====Work Period (27th of June – 15th of August)====
====Work Period (27th of June – 15th of August)====


* Mailing List Statistics
* Work on STATISTICAL TOOL FOR MAILMAN/HYPERKITTY ACTIVITY ANALYTICS
* Fedora Badges statistics
* Work on STATISTICAL TOOL FOR BUGZILLA ANALYTICS
* Communicate to the team regarding weekly status
* Update personal blog posts to be syndicated on Fedora Planet as per the progress
* Also attend FLOCK :)


==== Final Week(15th August - 23rd August) ====
==== Final Week(15th August - 23rd August) ====


Wrap up and Complete tasks
* Clean Code and test for bugs.
 
* Fix related bugs and write documentation.
==== Other Tasks with timeline to be finalized as needed ====
* Code review by mentor.
 
* Submitting and completing midterm evaluations.
* Work with Fedora Diversity and Inclusion Advisor to programmatically create, deploy, and most importantly, analyze the Contributor Demographics Survey
* Update personal blog posts for weekly status, to be syndicated on Fedora Planet
* Publish Wordclouds based on IRC Meetings to twitter and CommBlog.
* Wrap up and Complete tasks
* Onboarding Series Badges


===Potential Mentors===
===Potential Mentors===

Latest revision as of 18:33, 25 March 2016

Contact Information


Why do you want to work with the Fedora Project?

I love Fedora OS

While Fedora isn't the first Linux distribution I have used, it is surely one which I have used the longest and am most comfortable with.

I love the Fedora Community

The Fedora community is very warm and welcoming. I especially like that CommOps encourages contributors to work in diverse areas and to try out new stuff, with the Fedora community always ready to help you out if stuck.

I love Fedora CommOps

I love the work. I love the team and I want to continue contributing and helping improve Fedora. Period.

High Impact

Even as a newcomer, I have had the opportunity to work on high impact projects like organizing elections or working on metrics which affect strategic decisions. The huge impact your work can have on milllions of Fedora users and contributors is something which motivates me to contribute to Fedora.

Great Learning Opportunity

Due to the flat hierarchy in Fedora, I have already collaborated with or worked under some of the long term contributors and important figures in the Fedora community. This experience has been a great learning opportunity in many different ways and I look forward to many such chances in the future.

I look forward to work and be involved with Fedora. I aim to stick around and become a long term contributor in the Fedora community.

Do you have any past involvement with the Fedora project or any other open source project as a contributor?

Yes, I have been involved with the Community Operations team since the past six months. Some of my past contributions include -

Statistics related Contributions

  • Data Analytics to understand impact of FOSDEM : read here and code here
  • Year in Review metrics for Fedora CommOps : read the report with information about API queries, analysis and data visualizations here
  • Community Blog statistics : read the report with information about API queries, analysis and data visualizations here
  • Outreachy Impact metrics : read here and related API query here
  • F23 Dec/Jan Election related metrics : read here and related statistics here, here and here
  • IRC metrics using fedmsg activity and datagrepper : read here and code here
  • Spammer Activity in Fedora - some graphs ML thread here and related API query here and here

Other Technical Contributions

  • Contributed to Fedora Hubs for gaining technical knowledge of the codebase which would be helpful in developing metrics related widgets.

Issues I have fixed include :

https://pagure.io/fedora-hubs/issue/106

https://pagure.io/fedora-hubs/issue/96.

You can see my closed PR's here

My Hubs related fedmsg activity here

Other Contributions

  • Helped Jan Kurik organize F23 Elections ! They were the fourth most participated elections in all time.Read more about the F23 elections on the CommOps retrospective here.
  • Helping in diversity and women outreach efforts of Fedora by being an active member in Fedora women community.
  • I have also contributed to the Fedora Community Blog (see my works here) and to Fedora Magazine(see my works here)
  • I have a good knowledge of the wiki,Trac,IRC and mailing-lists and I am comfortable with using them to communicate effectively. I have communicated and interacted with my mentors and other team members in IRC and ML and having been involved with CommOps, I understand the ethics and values that make up the Fedora Community.
  • You can see my overall contribution activity via fedmsg here and here

Apart from Fedora, I have done an Open Source Data Analytics project for Measurement Lab

Did you participate with the past GSoC programs, if so which years, which organizations?

No

Will you continue contributing/ supporting the Fedora project after the GSoC 2016 program, if yes, which team(s), are you interested with?

I will, of course. I'll continue with the CommOps team and Hubs development. I am also interested in being an Ambassador(but that's for a bit later)

Why am I the best fit for this project idea?

I am really passionate about Data Analytics. With data, I want to understand and impact the community by bringing to light the critical issues along with identifying our strengths and weaknesses to help the leadership make informed decisions. My proposal for the Community Operations slot for Fedora in GSoC revolves around this idea. Along with sound technical skills required to implement this proposal, I also feel that I have the required non-technical skills ideal for effective open source contributions

Some relevant points include :

  • I'm really passionate about open source, love the CommOps and Fedora community and I will continue to contribute to Fedora and CommOps even when the project ends.
  • I am comfortable with coding in Python , C++ , R and can write queries in SQL. I also have intermediate knowledge of HTML and CSS.
  • I have working knowledge of fedmsg system and datagrepper queries and have done related data analytics projects before link here
  • I also know Machine Learning and NLP and I am interested in using these techniques to understand Fedora community better.
  • I am learning Data Visualization techniques like d3.js so that I can develop interactive visualizations from data.
  • I also have contributed to Fedora Hubs development in the past and have knowledge of the codebase.Issues I have fixed include :

https://pagure.io/fedora-hubs/issue/106

https://pagure.io/fedora-hubs/issue/96.

You can see my closed PR's here. My Hubs related fedmsg activity here. Being familiar with the codebase, I can help in CommOps related tasks for Hubs development

  • My contributions to Fedora have not just been limited to technical aspects. To gain a deeper understanding of the Fedora Project, I have tried to contribute in diverse areas including helping Jan Kurik organize F23 elections(which was the fourth most participated elections in Fedora history) , writing a Fedora Badges article to help newcomerslink here, contributing to Community Blog (see my works here) and to Fedora Magazine(see my works here) and helping in diversity and women outreach efforts of Fedora by being an active member in Fedora women community.
  • I blog regularly, and I believe this will help me develop interesting and well laid-out documentation as well as data analytics reports for the project.
  • I have a good knowledge of the wiki,Trac,IRC and mailing-lists and I am comfortable with using them to communicate effectively.
  • I have communicated and interacted with my mentors and other team members in IRC and ML and having been involved with CommOps, I understand the ethics and values that make up the Fedora Community.
  • If you haven't guessed it by now, I really love CommOps and contributing to Fedora and GSoC offers me a great opportunity to do so over the summer !!! Additionally, I get to work on statistics and Machine Learning - what more could I ask ?!

Project Proposal

Overview

Fedora Community Operations(CommOps) : Statistical Simulation and Data Analytics for Fedora Infrastructure Message Bus Activity

Community Operations, a.k.a. CommOps, aims to address the area of community infrastructure by providing the tools, resources, and utilities for the different subgroups of Fedora to increase communication across the Project.

Because of the fedmsg stack, Fedora has very detailed raw data on Fedora contributor activity. My proposal revolves around programmatically querying Datagrepper API for data collection to build automated tools using Statistical and Machine Learning techniques for data analysis and visualization for different parameters.

GOALS

STATISTICAL TOOL FOR FEDORA EVENT ANALYTICS

  • Develop automated tools for data collection using tahrir API in conjunction with Fedora Infrastructure Message Bus activity
  • Develop statistical tools and algorithms for real-time data analysis using numpy and pandas python libraries.
  • Also, identify and code suitable Clustering algorithms for demographical analysis using scipy and sci-kit learn python libraries
  • Provide real - time interactive data visualizations using suitable tools from matplotlib or d3.js


STATISTICAL TOOL FOR FEDORA INFRASTRUCTURE MESSAGE BUS ACTIVITY ANALYTICS

  • Develop automated tools for data collection using Datagrepper API queries for Fedora Infrastructure Message Bus activity.
  • Develop statistical tools and algorithms for real-time data analysis using numpy and pandas python libraries for generating sub-project wise metrics, contributor wise metrics like mean Contributor Age(Fedora Activity wise), Retention Rate of Contributors
  • Generate programmatic python scripts for Time Series Modelling of data using scikit-learn and scipy libraries in python.
  • Also, identify and implement suitable Machine Learning algorithms(like Temporal Clustering) to find similarity patterns in sub-project activity, build contributions etc
  • Provide real - time interactive data visualizations using suitable tools from matplotlib or d3.js
  • Develop an automated tool for the fedmsg statistics for quarterly report for Fedora Project.
  • Develop statistical tools to identify long-tail patterns in contribution activity(How many contributors are just long-tail from packaging one thing?)
  • Use Machine Learning algorithms like Logistic Regression,SVM or Neural Networks to distinguish Redhat vs Non-redhat contributers on lists and conduct a statistical analysis.
  • Develop a Temporal Clustering based tool to identify similarity in contribution patterns for long-time contributors ( Do successful/old contributors have diverse contributions ? Are their contributions in bursts or continous over a period of time ? ) (optional)
  • Alternatively, statistics tools could also be implemented as statscache plugins instead of automated python scripts, depending on feasibility.

STATISTICAL TOOL FOR MAILMAN/HYPERKITTY ACTIVITY ANALYTICS

  • Develop automated tools for data collection using HyperKitty API in conjunction with Fedora Infrastructure Message Bus activity
  • Develop statistical tools and algorithms for real-time data analysis using numpy and pandas python libraries to generate statistics like mean/median size of ML thread , number of people in a thread, mean length of discussions, redhat vs non-redhat activity.
  • Generate programmatic python scripts for Time Series Modelling for ML activity data using scikit-learn and scipy libraries in python to identify activity patterns(bursts/highs and lows).
  • Also, identify and implement suitable Clustering algorithms to find activity-wise and trend-wise similarity patterns in lists.
  • Provide real - time interactive data visualizations using suitable tools from matplotlib or d3.js
  • Alternatively, statistics tools could also be implemented as statscache plugins instead of automated python scripts, depending on feasibility.
  • ML Discussion here and here. Related tickets on CommOps Trac instance here and here

STATISTICAL TOOL FOR BUGZILLA ANALYTICS

  • Develop automated tools for data collection using Bugzilla API in conjunction with Fedora Infrastructure Message Bus activity
  • Develop statistical tools and algorithms for real-time data analysis using numpy and pandas python libraries.
  • Generate programmatic python scripts for Time Series Modelling for data using scikit-learn and scipy libraries in python to identify activity patterns(bursts/highs and lows/mean Bug turnaround time).
  • Also, identify and implement suitable Clustering algorithms to find activity-wise and trend-wise similarity patterns.
  • Identify relevant algorithms and develop Machine Learning based tool to identify easy-fix or most relevant bugs (Optional)
  • Provide real - time interactive data visualizations using suitable tools from matplotlib or d3.js
  • Alternatively, statistics tools could also be implemented as statscache plugins instead of automated python scripts, depending on feasibility.

STRETCH GOALS

STATISTICAL TOOL FOR FEDORA BADGES

  • Develop automated tools for data collection using tahrir API in conjunction with Fedora Infrastructure Message Bus activity
  • Develop statistical tools and algorithms for real-time data analysis of badge collection activity using numpy and pandas python libraries.
  • Also, identify and code suitable Clustering algorithms for demographical analysis using scipy and sci-kit learn python libraries
  • Generate programmatic python scripts for Temporal Analysis for data using scikit-learn and scipy libraries in python to identify activity patterns
  • Provide real - time interactive data visualizations using suitable tools from matplotlib or d3.js

FEDORA HUBS WIDGETS

  • Componentization of CommOps deliverables into Fedora Hubs Widgets.
  • Develop metrics and statistics related widgets for Fedora Hubs

Some other cool Ideas bee2502 would like to work on

  • Automated NLP-based tool to find the expertise Fedora Contributers

Develop an automated tool using NLP techniques to find the expertise of Fedora contributors using meetbot logs of IRC meetings. We are looking to answer questions like "Who can best solve my doubt?" OR "Who is the most qualified person for this task?" NLP libraries for python like gensim or nltk will be used for tool development.

  • Badge Recommendation Engine widget for Hubs

Much along the lines of Stack Overflow Badge Recommendations : "You are 50% of the way to earning the 'Master Editor Badge' " Provide recommendations like "80% of contributors who last collected 'White Rabbit Badge' went on to collect 'Origin Badge' next "

Develop an automated tool with backend using Tahrir API to fetch data. Identify suitable Recommendation Algorithms like Collaborative Filtering and develop the engine using them

This could be especially helpful for newcomers to explore different areas of Fedora Project Some related representations by mizmo : https://fedoraproject.org/wiki/Fedora_RPG_OLD

  • Automated Tool to publish IRC meetings word clouds to social media like Twitter

Generate wordclouds from meetbot data using NLP techniques or wordcloud tools/libraries for python. Develop a tool using Twitter API to publish these wordclouds to Fedora handles on social media.

STATISTICAL TOOL FOR GITHUB ANALYTICS

  • Develop automated tools for data collection using Github API in conjunction with Fedora Infrastructure Message Bus activity
  • Develop statistical tools and algorithms for real-time data analysis using numpy and pandas python libraries.
  • Generate programmatic python scripts for Time Series Modelling for data using scikit-learn and scipy libraries in python to identify activity patterns(bursts/highs and lows/mean issue turnaround time).
  • Also, identify and implement suitable Clustering algorithms to find activity-wise and trend-wise similarity patterns.
  • Identify relevant algorithms and develop Machine Learning based tool to identify easy-fix or most relevant issues (Optional)
  • Provide real - time interactive data visualizations using suitable tools from matplotlib or d3.js

Final Deliverables

  • Automated statistics tools(scripts in python,preferably) for data collection,analysis and visualizations to be committed to the fedora-stats-tools github repo and/or Community Operations Toolbox whichever appropriate. Alternatively, statistics tools could also be implemented as statscache plugins instead of automated python scripts, depending on feasibility.
  • Analysis Reports and documentation of work to be published to community via Mailing lists, Fedora Planet and/or Community Blog posts whichever appropriate.
  • Report back weekly on Community Operations to Mailing Lists, Community Blog, and other channels when appropriate.

Timeline

I would like to start having a look and master the technical stuff that I need to fulfill even before the Community bonding period starts.

Community bonding period (22nd of April - 25th of May)

  • Work on improving the technical skills needed for the project (especially data visualizations)
  • Understand the bugzilla API
  • Discuss and finalize the Machine Learning Algorithms neccesary for FEDORA INFRASTRUCTURE MESSAGE BUS ACTIVITY statistical analysis tool
  • Discuss project specifications with mentor.
  • Develop an automated tool for the fedmsg statistics for quarterly report for Fedora Project.

Work Period until mid-term evaluations (25th of May – 20th of June)

  • Work on STATISTICAL TOOL FOR FEDORA EVENT ANALYTICS
  • Work on STATISTICAL TOOL FOR FEDORA INFRASTRUCTURE MESSAGE BUS ACTIVITY ANALYTICS
  • Communicate to the team regarding weekly status
  • Update personal blog posts to be syndicated on Fedora Planet as per the progress

Period of submitting mid-term evaluations (20th of June - 27th of June)

  • Clean Code and test for bugs.
  • Fix related bugs and write documentation.
  • Code review by mentor.
  • Submitting and completing midterm evaluations.
  • Update personal blog posts for weekly status, to be syndicated on Fedora Planet

Work Period (27th of June – 15th of August)

  • Work on STATISTICAL TOOL FOR MAILMAN/HYPERKITTY ACTIVITY ANALYTICS
  • Work on STATISTICAL TOOL FOR BUGZILLA ANALYTICS
  • Communicate to the team regarding weekly status
  • Update personal blog posts to be syndicated on Fedora Planet as per the progress
  • Also attend FLOCK :)

Final Week(15th August - 23rd August)

  • Clean Code and test for bugs.
  • Fix related bugs and write documentation.
  • Code review by mentor.
  • Submitting and completing midterm evaluations.
  • Update personal blog posts for weekly status, to be syndicated on Fedora Planet
  • Wrap up and Complete tasks

Potential Mentors

Remy Decausemaker(decause) , Corey Sheldon(linux modder) and Justin Flory(jflory7)