DNF Better Counting
Right now, we estimate installed Fedora systems by counting unique IP addresses which show up in our updates mirror statistics. We need better data than that. There are some proposals for more complicated systems, but a quick thing we can do now to greatly improve what we have without a gigantic new infrastructure.
This is an update of a previous proposal to use a UUID to distinguish unique systems, as openSUSE does (see https://metrics.opensuse.org/). See also this previous Fedora Council discussion and this devel list thread.
- Name: Matthew Miller
- Email: mattdm
- Release notes owner:
- Targeted release: Fedora 30
- Last updated: 2019-01-16
- Tracker bug: <will be assigned by the Wrangler>
- A. Currently, we can only count Fedora OS use by observing IP addresses. This is subject to undercounting due to NAT — and to overcounting due to short DHCP leases and laptops moving between work or school and home or coffee shop.
- B. We can count what releases are observed, but we can’t distinguish variants.
- C. We can’t count quickly because various logs are copied back to a central server and data is not consistent for several days.
- The Fedora community cares about privacy and is adverse to tracking measures. We don't want to track; just count.
- For this reason, we don’t want to use any identifier like /etc/machine-id which may be used for other purposes — or in fact any UUID at all
- And, also for that reason, there needs to be a relatively easy way to opt out.
- This needs to work with Yum/DNF, MicroDNF, PackageKit, Cockpit, rpm-ostree, GNOME Software, Muon, and software update mechanisms used in other spins.
- We need to be able to distinguish between short-lived instances (like temporary containers or test machines) and actual installations.
- We don’t want to track users, just count systems.
- Except for distinguishing temporary installations from “real” use, we don’t need to track systems over time. We just want a daily or weekly moment-in-time count.
- Being able to see how systems are upgraded over time might be interesting but isn’t as important as privacy concerns.
- Add VARIANT_ID (see Changes/Label Our Variants) to string reported to when metadata is requested from fedora update servers
- Current requests include machine architecture and Fedora OS version as part of the path; we may want to also put those in a standard format for easy processing (implementation detail)
- Add a new "countme" variable. This variable will:
- Start as a "true" value,
- Reset to a "false" value the first time the client successfully makes a request to Fedora mirror servers, and
- Be reset to a "true" value after seven days.
This way, rather than filtering by unique IP addresses, we can count only the "true" requests, so we count each machine once — but no more than once.
Options for "true" values
Rather than a simple boolean, we'd like the "countme" variable to act as an increment-counter. That is, it would be "1" the first week, "2" the second week, "3" the third week, and so on. This will let us sort out short-lived test or CI infrastructure machines and get a better picture of how systems are used over time, without tracking individual systems. Optionally, we could have a cap on the maximum value to mitigate risk of uniqueness for systems which have been running for a very long time (it may be that there are only a few systems running for exactly 327 weeks, for example). As the supported lifetime of a Fedora release is about 30 months, a logical cutoff would be around 60 weeks — the counter could go from "59" to "old".
Benefit to Fedora
- Better metrics overall
- Public stats page updated automatically
- Better knowledge of relative use of different variants
- Insight into Fedora's use in short-lived test systems and temporary containers vs. longer-term installations
- Proposal owners: work with DNF team and infrastructure to implement the countme feature and corresponding backend data collection
- DNF team: feature work
- Maintainers of other package management tools: make sure feature works in these cases as well
- Other developers: Spin maintainers should make sure that VARIANT_ID is being set in /etc/os-release
- Release engineering: #Releng issue number (a check of an impact with Release Engineering is needed): may need changes to fedora-repos package
- List of deliverables: affects all deliverables
- Policies and guidelines: none
- Trademark approval: none
Older versions will not have the UUID counting enabled; we will keep collecting stats in the traditional way for those systems.
How To Test
Once the system is in place, we will see data collected.
User experience will not change. Users who wish to opt out of counting will have an easy way to do so.
- Contingency mechanism: continue counting the old way
- Contingency deadline: does not block release; we can ship with the feature incomplete, although it would certainly be most useful to have it available at GA
- Blocks release? No
- Blocks product? No
Release notes need to be written, and documentation describing how to opt out.
This needs to be written but depends on exact implementation.