From Fedora Project Wiki

Questions Looking to Answer

  • What is the first thing a user does with their FAS acocunt?
  • What is the last thing a user does with their FAS acocunt?
  • What applications does a new user access in their first 30 days?

Blueprint for POC

  1. Backup and Restore Datanommer2 database - http://infrastructure.fedoraproject.org/infra/db-dumps
  2. Identify a SQL dashboard solution that may make sense to present queries

Draft Proposal

Problem Space

Contributor metrics and analytics are essential for understanding the contributor journey within the Fedora Project. Similar to a customer journey, we need to identify key milestones, track the progression of new contributors, and determine the last activities performed before contributors leave. Currently, we lack a comprehensive system to collect and analyze this data, hindering our ability to optimize the contributor experience and address retention challenges effectively.

Proposed Solution

We propose leveraging the Fedora Project Message Bus data to build contributor metrics and analytics. By utilizing the Datanommer PostgreSQL database and message history, we can construct a robust data set and normalize it into a contributor pathway. As an interim step, we suggest starting with a proof of concept using the Badges system data. Badges provide a quick way to identify milestones since they are earned based on Message Bus data. This approach allows us to efficiently analyze contributor engagement and progression.

Scope

The proposed solution will impact the following areas:

  • Data collection and storage: Utilizing the Datanommer PostgreSQL database to store and manage the Message Bus data.
  • Analytics and reporting: Developing tools and processes to analyze the collected data and generate meaningful contributor metrics.
  • Pathway mapping: Creating a pathway map for each contributor role within the Fedora Project, considering the unique requirements and milestones of different contributor paths (e.g., Documentation, Packaging).

Active Ingredients

To implement the solution, the following components are required:

Component 1: Data Collection and Storage

  • Set up a pipeline to collect and store Fedora Project Message Bus data from the Datanommer PostgreSQL database.
  • Develop mechanisms to ensure data integrity, privacy, and compliance with relevant data protection regulations.

Component 2: Analytics and Reporting

  • Design and implement analytical tools to process and extract insights from the collected data.
  • Develop customizable reports and visualizations to present contributor metrics and trends effectively.

Component 3: Pathway Mapping

  • Collaborate with each contributor role (e.g., Documentation, Packaging) to define and refine the ideal pathway and milestones.
  • Create a comprehensive map for each contributor role, capturing the sequence of activities, milestones, and badges earned.

Discussion Points

We would like to discuss the following aspects of the proposal with the community:

  • Privacy Concerns:
    • Address any potential concerns regarding the analytics and usage of contributor data.
    • Validate our process would follow existing privacy policies and practices to ensure the protection of contributor information.
  • Technology:
    • Evaluate existing systems such as Grimoire Labs and Apache Kibble for their suitability in fulfilling our requirements.
    • Determine the feasibility of utilizing these systems instead of reinventing the wheel.
  • Pathway Setup:
    • Seek input from the community on the ideal pathways and milestones for each contributor role.
    • Determine the variations in pathways based on different roles and identify common milestones.


Example Queries from Tahir

WITH ranked_badges AS (
 SELECT
   person_id,
   badge_id,
   issued_on,
   ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY issued_on) AS badge_rank
 FROM
   assertions
 where badge_id in (select badge_id from assertions group by badge_id having count(*) > 100)
), commonpaths as (
SELECT
 person_id,
 STRING_AGG(badge_id::VARCHAR, '|' ORDER BY badge_rank) AS badge_sequence,
 COUNT(*) AS sequence_count
FROM
 ranked_badges
GROUP BY
 person_id
HAVING
 COUNT(*) > 1
ORDER BY
 sequence_count DESC)
 select badge_sequence, count(*) from commonpaths group by badge_sequence order by 2 desc