Fedora Contributor Activity Statistics
In order to have a quantitative understanding of how the contributor activity has changed over the years and to provide the foundational support to the Fedora Project strategy 2028’s guiding star about doubling the number of active contributors every week, it is important to have a service that tracks their statistics. This measurement would help make the strategy goal meaningful as well as assist the Fedora Council and the related bodies understand how far they have progressed into making this happen and identify the underlying particular problems that act as a barrier in realizing this objective.
There was a Fedora Council Face To Face 2023 Hackfest organized in Frankfurt, Germany that was attended by the Fedora Council members, Akashdeep Dhar, Alexandra Fedorova, Ben Cottom, David Cantrell, Justin W. Flory, Matthew Miller, Sumantro Mukherjee and Vipul Siddharth. Among a bunch of strategy goals discussed about and decided upon there, the core driving goal for the five-year strategy plan was to facilitate a community environment where the number of active contributors double up every week.
This was previously proposed as an Fedora Infrastructure ticket by Michal Konecny on Matthew Miller’s request and addressed by Akashdeep Dhar in the project called Fedora User Activity Statistics. During the Community Platform Engineering Face To Face Meeting 2023 in Barcelona, Spain - the scope of the project was revisited by Akashdeep Dhar, Adam Saleh, David Kirwan, Kevin Fenzi and Matthew Miller which led to the refinement of the projects purpose and an increase in the deliverable requirements.
Following the expanded scope of the project, the previously provided solution no longer addressed the updated set of requirements. Adam Saleh and Akashdeep Dhar had a discussion about efficient methods of extracting information from the Datanommer service. The project was proposed to be an initiative in this ticket by Aoife Moloney. The project was then scoped for ARC investigation for the period of Q2 2023 before it is sent for implementation by the respective initiative team assigned to the said project.
The following section details about the requirements for the project in both aspects - the bare minimum outcome to be able to call the project as success as well as the list of nice-to-have wishes that constitute the absolute maximum outcome. Please note that these requirements must be taken as recommendations and changes introduced to them during the implementation phase of the project when bound by the circumstances is acceptable.
Processing - A collector service for legitimate human-owned/run accounts
Output - Statistical information created in JSON format
Processing - Analyzing activity from meetbot logs
Output - Report automatically being generated on a weekly basis
- Existing solution
- Diagrams of Existing Solution
- Why does the existing solution not work?
- Planned Statistics Sources
- Data Exploration and Significance
- Activity Entry from Datanommer
- Username of the “subject”
- Username of the “object”
- Datetime data of a specific contribution activity
- Datetime data of a grouped contribution actvitity
- Service where a specific contribution activity happened
- Service where a grouped contribution activity happened
- Activity trends per username
- Examples of Contributor Actitivies
- Conundrum of Tracking of Non-Technical Contributions
- Technologies suggested
After understanding how effective the project can be in helping the Fedora Council achieve its strategic objective of doubling the number of active contributors present over a given period of time, the options for making the said service as useful as possible were explored. It was concluded that the historical data collected by the Datanommer from the Fedora Messaging bus would be indeed helpful in tracking contribution activities and detailing on contribution statistics and that it should be theoretically possible for the team to implement such a service.
Step 1 - Connect with the data scientists to understand which data elements need to be focused on
Step 2 - Author codebase to obtain details on human-run and human-owned legitimate accounts
Step 3 - Author SQL queries for obtaining historical contribution statistics per username
Step 4 - Author SQL queries for obtaining historical contribution statistics per service
Step 5 - Adapt the queries to create a service to obtain current and future statistics
Step 6 - Expose necessary endpoints or integrations on the dashboard for the analytics
Step 7 - Setup the staging environment for the dashboard in a limited testing environment for inspection
Step 8 - Deploy to the production environment after ironing out the vertex cases for statistics and… PROFIT?
Estimate of work
As this service makes active use of technologies that are already created and maintained such as Fedora Messaging, Datagrepper, Datanommer, FASJSON etc., and assuming that the team that is to work on this down the road has people who are experienced in the aforementioned technologies, the service should not take any longer than two quarters to hit the staging environment and one more quarter to make it to the production one.