Software Archeology @ RIT

[ar·che·ol·o·gy] n. the study of people by way of their artifacts
Week 7 - Solidifying Metrics and a Look at New Code

17 Mar 2014

On the Topic of Metrics

Our team has continued this week to collaborate on what has recently become very handy as a makeshift discussion forum, our Github issues page! (Although we do still put software development-related issues there, as you would expect.) We have been continuing to hold thought-provoking discourse, and generate additional potential metrics. As a preview to some metrics that you may see later, here are a sampling of a few questions we are looking into answering (please note these are all still under consideration):

  • How many issues has a reviewer reviewed in the same file system of a file they are currently reviewing?
  • What is the average percent participation in code reviews?
  • What is the average amount of churn (lines added/removed) in a review?
  • How many reviewers are commonly on a review?
  • How often do people jump into a review they aren’t specifically invited to?
  • Is a file reviewed by someone with more expertise in that area less likely to have a vulnerability later?
  • Are reviewers more likely to contribute in a review with less participants?

This week our team also came to a key decision while dicussing metrics. Although we have the potential to gather valuable metrics on many interesting topics, including security and transfer of knowledge, we have decided to prioritize producing security metrics over performing social network analysis. There are many reasons for this; a primary reason at this point is we have more information on security/CVEs than developer relationships. We are, however, still very much interested in looking into developer expertise, because that will help generate interesting security-related metrics (such as the second-to-last bullet point above). The general procedure for getting this information would be to gather information on the types of files a developer usually contributes to or reviews, define the developer as a sort of expert in that field, and use that information in our metrics later.

Recent Code

Some interesting functionality our team has generated recently includes..

  1. Writing a method to get the maximum amount of churn on a Code Review (by adding up and taking the max of churn for the associated patch sets)
  2. Modifications to a CVE loader which loads CVEs from a GoogleDoc we have
  3. Addition of a rake task to run statistics - this currently generates histograms on various attributes of a code review (ex: messages per code review)

Overall, our team has done a great job proposing a lot of new metrics and producing exciting new functionality.

« Home