Talk:Statistics 2.0

From FedoraProject

Jump to: navigation, search

Please use the "+" button at the top of the page to add your thoughts. This will split each one into a section so that discussion can follow.

Sign your comments with --~~~~!

Contents

Jsmidt's suggestion

Sorry to list what seems like "everything" but I believe these statistics are all important to various groups of people. As for use cases, each of these could be important for marketing: "Look we fixes a high percentage of bugs", "We have lots of new packages coming in", "Documentation is really improving", etc... Furthermore, each thing I list helps establish the health of the project in each area. "Are we doing enough to encourage translators to contribute", "Are we fixing bugs well", etc...

--Jsmidt 01:16, 18 June 2009 (UTC)

Thanks for the wonderful suggestions, I'll get these integrated into the main use cases page shortly.
And just so you know — it never feels like time is linear ;) --Ian Weller 01:22, 18 June 2009 (UTC)

Statistic about the distribution itself

I don't know if this really is in alignment with this effort but there are some metrics that could be used to monitor the "health" and development of the distribution or various parts of it:

If you don't mind me asking, what purpose would this use case serve the Fedora community? --Ian Weller 14:31, 25 June 2009 (UTC)
It would give a an overview of how much packages different application domains contain and how they develop over time. It would also show how far the comps groups grow together with the distribution (or not). --Ffesti 14:21, 30 June 2009 (UTC)
I think that might be outside the scope of this project, but I'll definitely give it some thought. --Ian Weller 14:35, 25 June 2009 (UTC)
Figuring out the details and implementing them might be a bit too much for this project. Anyway, if you like the idea and know how to hook in a "statistic module" drop me a note. I might do that as a nice side project. --Ffesti 17:03, 30 June 2009 (UTC)

--Ffesti 10:04, 18 June 2009 (UTC)

talk/action ratios

[[User:Mchua|Mel Chua]] 18:41, 18 June 2009 (UTC)

Package information

—Preceding unsigned comment added by Mmcgrath (talkcontribs)

Marketing Use Cases

To go along with Project FooBar.

Where are our visitors coming from?

What types of information are they consuming most?

Do they prefer one content type over another? i.e. audio over video

More fine grained metrics about news-related posts, i.e. does this have broad reach

A way to judge international uptake of all types of posts, and which languages should we focus translations on?

Does posting certain types of content lead to attrition? (attrition=drop in people coming to the site)

Maybe something about our rate of new people viewing material or visiting sites?

Activity Cycles, i.e. are there periods of time that are more conducive to posting content?

This list goes on...

Jack Aboutboul on 2009.06.30

Mapping events to the master timeline

For all of the stats that track against time, it would be useful to map real world events on a timeline for comparison and analysis.

Examples include regular, irregular, and rare events:

--Quaid 09:01, 6 July 2009 (UTC)

Automagic next-gen interpretation and analysis

Similar to the 'talk to action' measurement mentioned elsewhere in this Talk page.

What could we learn if we used tools such as natural language analysis? Or even really clever regular expression matching?

What if we could cross-compare activities of users in lists/IRC/blog posts with the skills and experiences of that user as captured in an opt-in database?

There is a level where we can seriously map based on our experiences with community building. For example, when the level of people not @redhat.com participating on a project list originally sparked by Red Hat reaches a certain level (~40%?), we can see that the project is more controlled and influenced by the wider community than just being a pet project of Red Hat.

For example, if we had these items ...

... we could determine something otherwise arbitrary, such as, "FAS user 'juansmith' is a Linux sysadmin expert, has tons of experience with GFS/LVM, and is a member of the bug triage and IRC helper teams. By analyzing IRC logs with natural language and pattern matching tools, we can determine that 'juansmith' asks and answers a large amount of questions about LVM and far less about GFS in #fedora."

In the past, people were interested in or scared of such data because it looks too much like performance analysis, as in, "Let's give 'juansmith' a t-shirt for answering 100 questions in #fedora!!!11!!!1!1!"

I can see other uses for this data. We can get an idea of where we have expertise in the community for helping each other, and where we do not. What kinds of problems plague our users the most, beyond the stores we tell and what is captured in the wiki. Who and what type of users are asking for what kind of help, how much they are getting helped, do they seem to be coming back for more such help, is the help on topic for the channel, etc.

--Quaid 23:55, 6 July 2009 (UTC)

Package review quotient

It's occasionally useful to know the number of reviews done per review submitted (or the reciprocal, since most folks review far less than they submit).

It's just two bugzilla queries: component->"Package Review", reporter->address gives you submitted reviews and assignee->address gives you reviewed tickets. There are some corner cases but by and large that's close enough.

Tibbs 20:06, 11 July 2009 (UTC)

exposing some statistics on the distribution lists

I can see from the "would like" list published on the Statistics_2.0 page that there is interested in making making some Mailing lists statistics available;

'''Mailing lists'''
List activity
Popular threads
Most active posters
Number of subscriptions/unsubs over time

I can think of some useful visualizations of this data if it was available in some sanitised format.