Existing solution

The existing solution for the problem statement came to address the previous ticket which can be found here. The project repository is located at Fedora User Activity Statistics.

How does it work?

The project consists of two main functional units: the namelist unit and the actvlist unit. The namelist unit facilitates the retrieval of usernames from the FASJSON service by the service runner, while the actvlist unit verifies the activity status of the names listed in the aforementioned file through Datagrepper. Both units are executed as automated cronjobs, scheduled to run at specific intervals. This ensures that the service maintains an up-to-date list of usernames and a count of active users. The service’s behavior is controlled by a configurable file, allowing administrators to tailor it according to their specific needs.

Usage

Usage: fuas [OPTIONS] COMMAND [ARGS]...

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  activity  Fetch a list of active usernames from Datagrepper
  namelist  Fetch a list of usernames on the Fedora Account System

Configuration file

The sample configuration file can be found here that can be made a copy of and edited by the users to tailor-fit the service according to their requirements.

The following is an exhaustive list of customizable variables. These variables are intended to be customized by the users.

  1. daysqant (Default - 90) - Number of days for which the activity record is requested for

  2. pagerows (Default - 1) - Number of rows to be displayed on a page when requesting data from Datagrepper

  3. minactqt (Default - 5) - Minimum number of activities to be considered to count as user as “active”

  4. services (Default - [“Pagure”]) - Services to probe into for activity records pertaining to the users

  5. jsonloca (Default - “https://fasjson.fedoraproject.org”) - Location where the FASJSON service is hosted

  6. dgprlink (Default - “https://apps.fedoraproject.org/datagrepper/v2/search”) - Location where the Datagrepper service is hosted

  7. useriden (Default - “t0xic0der@FEDORAPROJECT.ORG”) - User to masquerade as for probing into the FASJSON records

  8. listlink (Default - “https://raw.githubusercontent.com/t0xic0der/fuas/main/data/namefile”) - Location where the list of available users is present

  9. namefile (Default - “/var/tmp/namefile”) - Location where the list of available users is to be stored locally

  10. actvfile (Default - “/var/tmp/actvfile”) - Location where the list of active users is to be stored locally

  11. acqtfile (Default - “/var/tmp/acqtfile”) - Location where the count of active users is to be stored locally

The config file also consists of a list of computing variables of the global scope. These variables are intended only for developers.

  1. dfltsize (Default - 1000) - Size of iterable pages for all entities present in the FASJSON service

The namelist unit

The service unit takes up the configuration variables like username and password for the user to masquerade as while probing into the FASJSON service, jsonloca for getting the location where the FASJSON service is hosted, namefile for storing the list of usernames received. Using a session created with the masquerading user, the unit queries for the list of all available users to FASJSON service and stores them into the file specified in the configuration variable.

The aforementioned session is created by using the krb5 packages and the username and password are passed in the standard input of the console. While this works for a smaller scale run where the said service unit is run in ephemeral containers, this approach is highly discouraged and instead, a session created using a keytab file is recommended in its stead. Also, a set of workarounds must be placed in the default krb5 configuration file to allow for seamless authentication.

As this is a unit that runs for a longer period of time and makes queries that are performance intensive in nature, it is strongly recommended to run this unit no more than once or twice in a span of 24 hours. Also, it is essential to ensure that the internet connection is reliable and the devices are not turned off while the long-running service unit is in progress. This is owing to the fact that the service unit is non-resumable in nature and writes to disk only when the fetch is complete.

To ensure a proper running of the service unit without any possible interruptions, the service unit is run as a workflow on GitHub Actions. The workflow file can be found here at https://github.com/t0xic0der/fuas/blob/main/.github/workflows/main.yml that helps to set up the environment for the service unit to run, fetches the list of usernames and then commits them back to the same repository - making that list publicly available for consumption. The time limit for running a workflow on GitHub Actions is, however, 6 hours and that might, in some cases, lead to timeouts and incomplete runs.

The actvlist unit

The service unit takes up the configuration variables like listlink for locating the file containing the list of all users registered on Fedora Accounts System, daysqant for limiting the activity queries to under a said number of days, minactqt for getting the bare minimum amount of activities for a user to be counted as “active”, services for looking into their records for activities, dgprlink for getting the location where the Datagrepper service is hosted, actvfile and acqtfile for storing the names as well as counts of the active users respectively.

The service unit fetches the list of users from the aforementioned configuration variables and iterates through them to find the activities pertaining to the user in question. The period limit is appropriately set and if the count of activities under the said period comes out to be greater than or equal to the minimum number of activities decided - that user is considered to be “active”. Their username gets added to the list of all active users and the count of active users is incremented accordingly. Both of these are stored in the files specified in the configuration variables.

As this is a unit that runs for a longer period of time and makes queries that are performance intensive in nature, it is strongly recommended to run this unit no more than once or twice in a span of 24 hours. Also, it is essential to ensure that the internet connection is reliable and the devices are not turned off while the long-running service unit is in progress. This is owing to the fact that the service unit is non-resumable in nature and writes to disk only when the fetch is complete. In an average, this service unit takes at least 4-6 times more time than the former service unit.

To ensure a proper running of the service unit without any possible hiccups, the service unit is run as a workflow on GitHub Actions. The workflow file can be found here at https://github.com/t0xic0der/fuas/blob/main/.github/workflows/actv.yml than helps to set up the environment for the service unit to run, fetches the list of active usernames as well as the count and then commits them back to the repository - making that list as well as the count publicly available for consumption. The time limit for running a workflow on GitHub Actions is, however, 6 hours and that might, in some cases, lead to timeouts and incomplete runs.