Fedora Contributor Activity Statistics ====================================== Purpose ------- In order to have a quantitative understanding of how the contributor activity has changed over the years and to provide the foundational support to the Fedora Project strategy 2028's guiding star about doubling the number of active contributors every week, it is important to have a service that tracks their statistics. This measurement would help make the strategy goal meaningful as well as assist the Fedora Council and the related bodies understand how far they have progressed into making this happen and identify the underlying particular problems that act as a barrier in realizing this objective. Background ---------- There was a `Fedora Council `_ Face To Face 2023 Hackfest organized in Frankfurt, Germany that was attended by the Fedora Council members, `Akashdeep Dhar `_, `Alexandra Fedorova `_, `Ben Cottom `_, `David Cantrell `_, `Justin W. Flory `_, `Matthew Miller `_, `Sumantro Mukherjee `_ and `Vipul Siddharth `_. Among a bunch of strategy goals discussed about and decided upon there, the core driving goal for the five-year strategy plan was to facilitate a community environment where the number of active contributors double up every week. This was previously proposed as an Fedora Infrastructure `ticket `_ by `Michal Konecny `_ on Matthew Miller's request and addressed by Akashdeep Dhar in the project called `Fedora User Activity Statistics `_. During the `Community Platform Engineering `_ `Face To Face Meeting 2023 `_ in Barcelona, Spain - the scope of the project was revisited by Akashdeep Dhar, `Adam Saleh `_, `David Kirwan `_, `Kevin Fenzi `_ and Matthew Miller which led to the refinement of the projects purpose and an increase in the deliverable requirements. Following the expanded scope of the project, the previously provided solution no longer addressed the updated set of requirements. Adam Saleh and Akashdeep Dhar had a discussion about efficient methods of extracting information from the Datanommer service. The project was proposed to be an initiative in `this ticket `_ by `Aoife Moloney `_. The project was then scoped for ARC investigation for the period of Q2 2023 before it is sent for implementation by the respective initiative team assigned to the said project. Functional requirements ----------------------- The following section details about the requirements for the project in both aspects - the bare minimum outcome to be able to call the project as success as well as the list of nice-to-have wishes that constitute the absolute maximum outcome. Please note that these requirements must be taken as recommendations and changes introduced to them during the implementation phase of the project when bound by the circumstances is acceptable. Minimal ~~~~~~~ - Processing - A collector service for legitimate human-owned/run accounts - Output - Statistical information created in JSON format Maximal ~~~~~~~ - Processing - Analyzing activity from meetbot logs - Output - Report automatically being generated on a weekly basis Resources --------- - `Fedora User Activity Statistics `_ - `Datagrepper `_ - `Monitor Dashboard `_ - `Datanommer `_ - `Original Fedora Infrastructure ticket `_ - `Renewed Initiative Proposal ticket `_ Index ----- .. toctree:: :maxdepth: 2 creation_workflow creation_gram creation_fail solution_datanote solution_dataeplt solution_examples solution_probntec solution_techtool Conclusions ----------- After understanding how effective the project can be in helping the Fedora Council achieve its strategic objective of doubling the number of active contributors present over a given period of time, the options for making the said service as useful as possible were explored. It was concluded that the historical data collected by the Datanommer from the Fedora Messaging bus would be indeed helpful in tracking contribution activities and detailing on contribution statistics and that it should be theoretically possible for the team to implement such a service. Roadmap ------- - **Step 1** - Connect with the data scientists to understand which data elements need to be focused on - **Step 2** - Author codebase to obtain details on human-run and human-owned legitimate accounts - **Step 3** - Author SQL queries for obtaining historical contribution statistics per username - **Step 4** - Author SQL queries for obtaining historical contribution statistics per service - **Step 5** - Adapt the queries to create a service to obtain current and future statistics - **Step 6** - Expose necessary endpoints or integrations on the dashboard for the analytics - **Step 7** - Setup the staging environment for the dashboard in a limited testing environment for inspection - **Step 8** - Deploy to the production environment after ironing out the vertex cases for statistics and... PROFIT? Estimate of work ---------------- As this service makes active use of technologies that are already created and maintained such as Fedora Messaging, Datagrepper, Datanommer, FASJSON etc., and assuming that the team that is to work on this down the road has people who are experienced in the aforementioned technologies, the service should not take any longer than two quarters to hit the staging environment and one more quarter to make it to the production one.