4K-Soft Ltd. | Does it make sense to build metrics in open-source projects?

Does it make sense to build metrics in open-source projects?

December 09, 2022

Those who maintain open-source project communities often wonder about connecting metrics systems. The purpose of collecting usage data is to demonstrate the importance of a particular feature or application to the user base. No one can say for sure if this makes sense, but we will try to explain the pros and cons of collecting data about users and their actions in the application.

The good news is that there are already ready metrics, which, albeit in part, can meet the needs of the resource administration about most of the users' number and actions.

Collecting metrics

When you go to any website, it is very easy for the resource administration to track this by using data about the request to the web server. A good example of a download counter would be a counter in older versions of the browser Firefox - which previously had similar functionality. However, relying on the download is not worth it, because this counter for the connection requests processes all requests, which does not in all cases ensure the accuracy of the final number.

For Open-Source projects, you can analyze the set of options for downloading, such as:

1. Site;
2. Package managers (npm, PyPi, Maven);
3. Repositories (GitHub, Gitee, GitLab);
4. Number of clones;
5. Number of archives.

Source code download statistics are even less reliable than binary downloads. Suppose the developer is going to use the latest version of your code and has set up a clone repository for each build. Now, if there is a build error, code cloning will happen all the time, which will prevent you from getting reliable data.

What problems can you run into?

Obviously, most users don't put extra strain on the code or the server, but this can be caused by bugs or errors on the server side. As a general rule, for every 10,000 users, there are 100 people who find a problem in the code and one person who fixes that code. Depending on the type of user, these ratios may vary.

In conclusion, collecting download data is good for determining application usage trends. We cannot accurately determine how the utilization rate relates to actual application usage. It's just a good metric for determining the growth trend of an app or the performance of an advertising campaign.