Fedora Contributor Activity Statistics
======================================
Purpose
-------
In order to have a quantitative understanding of how the contributor activity has
changed over the years and to provide the foundational support to the Fedora Project
strategy 2028's guiding star about doubling the number of active contributors every
week, it is important to have a service that tracks their statistics. This measurement
would help make the strategy goal meaningful as well as assist the Fedora Council and
the related bodies understand how far they have progressed into making this happen and
identify the underlying particular problems that act as a barrier in realizing this
objective.
Background
----------
There was a `Fedora Council `_ Face To
Face 2023 Hackfest organized in Frankfurt, Germany that was attended by the Fedora
Council members, `Akashdeep Dhar `_,
`Alexandra Fedorova `_, `Ben Cottom
`_, `David Cantrell
`_, `Justin W. Flory
`_, `Matthew Miller
`_, `Sumantro Mukherjee
`_ and `Vipul Siddharth
`_. Among a bunch of strategy
goals discussed about and decided upon there, the core driving goal for the five-year
strategy plan was to facilitate a community environment where the number of active
contributors double up every week.
This was previously proposed as an Fedora Infrastructure `ticket
`_ by `Michal Konecny
`_ on Matthew Miller's request and
addressed by Akashdeep Dhar in the project called `Fedora User Activity Statistics
`_. During the `Community Platform Engineering
`_ `Face To Face Meeting 2023
`_
in Barcelona, Spain - the scope of the project was revisited by Akashdeep Dhar, `Adam
Saleh `_, `David Kirwan
`_, `Kevin Fenzi
`_ and Matthew Miller which led to the
refinement of the projects purpose and an increase in the deliverable requirements.
Following the expanded scope of the project, the previously provided solution no longer
addressed the updated set of requirements. Adam Saleh and Akashdeep Dhar had a
discussion about efficient methods of extracting information from the Datanommer
service. The project was proposed to be an initiative in `this ticket
`_ by `Aoife Moloney
`_. The project was then scoped for
ARC investigation for the period of Q2 2023 before it is sent for implementation by the
respective initiative team assigned to the said project.
Functional requirements
-----------------------
The following section details about the requirements for the project in both aspects -
the bare minimum outcome to be able to call the project as success as well as the list
of nice-to-have wishes that constitute the absolute maximum outcome. Please note that
these requirements must be taken as recommendations and changes introduced to them
during the implementation phase of the project when bound by the circumstances is
acceptable.
Minimal
~~~~~~~
- Processing - A collector service for legitimate human-owned/run accounts
- Output - Statistical information created in JSON format
Maximal
~~~~~~~
- Processing - Analyzing activity from meetbot logs
- Output - Report automatically being generated on a weekly basis
Resources
---------
- `Fedora User Activity Statistics `_
- `Datagrepper `_
- `Monitor Dashboard
`_
- `Datanommer `_
- `Original Fedora Infrastructure ticket
`_
- `Renewed Initiative Proposal ticket
`_
Index
-----
.. toctree::
:maxdepth: 2
creation_workflow
creation_gram
creation_fail
solution_datanote
solution_dataeplt
solution_examples
solution_probntec
solution_techtool
Conclusions
-----------
After understanding how effective the project can be in helping the Fedora Council
achieve its strategic objective of doubling the number of active contributors present
over a given period of time, the options for making the said service as useful as
possible were explored. It was concluded that the historical data collected by the
Datanommer from the Fedora Messaging bus would be indeed helpful in tracking
contribution activities and detailing on contribution statistics and that it should be
theoretically possible for the team to implement such a service.
Roadmap
-------
- **Step 1** - Connect with the data scientists to understand which data elements need
to be focused on
- **Step 2** - Author codebase to obtain details on human-run and human-owned legitimate
accounts
- **Step 3** - Author SQL queries for obtaining historical contribution statistics per
username
- **Step 4** - Author SQL queries for obtaining historical contribution statistics per
service
- **Step 5** - Adapt the queries to create a service to obtain current and future
statistics
- **Step 6** - Expose necessary endpoints or integrations on the dashboard for the
analytics
- **Step 7** - Setup the staging environment for the dashboard in a limited testing
environment for inspection
- **Step 8** - Deploy to the production environment after ironing out the vertex cases
for statistics and... PROFIT?
Estimate of work
----------------
As this service makes active use of technologies that are already created and maintained
such as Fedora Messaging, Datagrepper, Datanommer, FASJSON etc., and assuming that the
team that is to work on this down the road has people who are experienced in the
aforementioned technologies, the service should not take any longer than two quarters to
hit the staging environment and one more quarter to make it to the production one.