Existing solution
The existing solution for the problem statement came to address the previous ticket which can be found here. The project repository is located at Fedora User Activity Statistics.
How does it work?
The project consists of two main functional units: the namelist
unit and the
actvlist
unit. The namelist
unit facilitates the retrieval of usernames from the
FASJSON service by the service runner, while the actvlist
unit verifies the activity
status of the names listed in the aforementioned file through Datagrepper. Both units
are executed as automated cronjobs, scheduled to run at specific intervals. This ensures
that the service maintains an up-to-date list of usernames and a count of active users.
The service’s behavior is controlled by a configurable file, allowing administrators to
tailor it according to their specific needs.
Usage
Usage: fuas [OPTIONS] COMMAND [ARGS]...
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
activity Fetch a list of active usernames from Datagrepper
namelist Fetch a list of usernames on the Fedora Account System
Configuration file
The sample configuration file can be found here that can be made a copy of and edited by the users to tailor-fit the service according to their requirements.
The following is an exhaustive list of customizable variables. These variables are intended to be customized by the users.
daysqant
(Default - 90) - Number of days for which the activity record is requested forpagerows
(Default - 1) - Number of rows to be displayed on a page when requesting data from Datagrepperminactqt
(Default - 5) - Minimum number of activities to be considered to count as user as “active”services
(Default - [“Pagure”]) - Services to probe into for activity records pertaining to the usersjsonloca
(Default - “https://fasjson.fedoraproject.org”) - Location where the FASJSON service is hosteddgprlink
(Default - “https://apps.fedoraproject.org/datagrepper/v2/search”) - Location where the Datagrepper service is hosteduseriden
(Default - “t0xic0der@FEDORAPROJECT.ORG”) - User to masquerade as for probing into the FASJSON recordslistlink
(Default - “https://raw.githubusercontent.com/t0xic0der/fuas/main/data/namefile”) - Location where the list of available users is presentnamefile
(Default - “/var/tmp/namefile”) - Location where the list of available users is to be stored locallyactvfile
(Default - “/var/tmp/actvfile”) - Location where the list of active users is to be stored locallyacqtfile
(Default - “/var/tmp/acqtfile”) - Location where the count of active users is to be stored locally
The config file also consists of a list of computing variables of the global scope. These variables are intended only for developers.
dfltsize
(Default - 1000) - Size of iterable pages for all entities present in the FASJSON service
The namelist
unit
The service unit takes up the configuration variables like username
and password
for the user to masquerade as while probing into the FASJSON service, jsonloca
for
getting the location where the FASJSON service is hosted, namefile
for storing the
list of usernames received. Using a session created with the masquerading user, the unit
queries for the list of all available users to FASJSON service and stores them into the
file specified in the configuration variable.
The aforementioned session is created by using the krb5
packages and the
username
and password
are passed in the standard input of the console. While
this works for a smaller scale run where the said service unit is run in ephemeral
containers, this approach is highly discouraged and instead, a session created using a
keytab file is recommended in its stead. Also, a set of workarounds must be placed in
the default krb5
configuration file to allow for seamless authentication.
As this is a unit that runs for a longer period of time and makes queries that are performance intensive in nature, it is strongly recommended to run this unit no more than once or twice in a span of 24 hours. Also, it is essential to ensure that the internet connection is reliable and the devices are not turned off while the long-running service unit is in progress. This is owing to the fact that the service unit is non-resumable in nature and writes to disk only when the fetch is complete.
To ensure a proper running of the service unit without any possible interruptions, the service unit is run as a workflow on GitHub Actions. The workflow file can be found here at https://github.com/t0xic0der/fuas/blob/main/.github/workflows/main.yml that helps to set up the environment for the service unit to run, fetches the list of usernames and then commits them back to the same repository - making that list publicly available for consumption. The time limit for running a workflow on GitHub Actions is, however, 6 hours and that might, in some cases, lead to timeouts and incomplete runs.
The actvlist
unit
The service unit takes up the configuration variables like listlink
for locating the
file containing the list of all users registered on Fedora Accounts System, daysqant
for limiting the activity queries to under a said number of days, minactqt
for
getting the bare minimum amount of activities for a user to be counted as “active”,
services
for looking into their records for activities, dgprlink
for getting the
location where the Datagrepper service is hosted, actvfile
and acqtfile
for
storing the names as well as counts of the active users respectively.
The service unit fetches the list of users from the aforementioned configuration variables and iterates through them to find the activities pertaining to the user in question. The period limit is appropriately set and if the count of activities under the said period comes out to be greater than or equal to the minimum number of activities decided - that user is considered to be “active”. Their username gets added to the list of all active users and the count of active users is incremented accordingly. Both of these are stored in the files specified in the configuration variables.
As this is a unit that runs for a longer period of time and makes queries that are performance intensive in nature, it is strongly recommended to run this unit no more than once or twice in a span of 24 hours. Also, it is essential to ensure that the internet connection is reliable and the devices are not turned off while the long-running service unit is in progress. This is owing to the fact that the service unit is non-resumable in nature and writes to disk only when the fetch is complete. In an average, this service unit takes at least 4-6 times more time than the former service unit.
To ensure a proper running of the service unit without any possible hiccups, the service unit is run as a workflow on GitHub Actions. The workflow file can be found here at https://github.com/t0xic0der/fuas/blob/main/.github/workflows/actv.yml than helps to set up the environment for the service unit to run, fetches the list of active usernames as well as the count and then commits them back to the repository - making that list as well as the count publicly available for consumption. The time limit for running a workflow on GitHub Actions is, however, 6 hours and that might, in some cases, lead to timeouts and incomplete runs.