Software Archeology @ RIT

[ar·che·ol·o·gy] n. the study of people by way of their artifacts
Week 9 - Code Ownership and NLP

24 Oct 2014

Team Work Summary

Over the past few weeks, our team has continued to focus on natural language processing, as well as code ownership. We have looked into a couple of different Chris Bird papers to figure out how we are going to determine major and minor contributors within the project.

Code Ownership and NLP

This past week, Alvaro and I spent most of our time figuring out how Chris Bird’s research could be used in our project in regards to ownership. Chris Bird defines a major contributor as a developer who has at least 5% ownership of a component. He bases his idea of ownership on the number of commits a developer has made in relation to the total number of commits in a given component. However, we feel as though basing ownership off of commits isn’t quite good enough; we want to omit all non-trivial commits when we determine ownership. So, our plan is to calculate the aggregated churn data based on non-trivial commits in order to determine who is a major contributor and who is a minor contributor.

As for NLP, I am going to be helping Brian out with some code-review-related language processing. The first step is to make sure we omit things like links, quotes, and possibly some pasted lines of code from our data, as these things are irrelevant to the data. I am going to be focusing on technical words within the code reviews for the next few weeks once I have a better understanding of how NLP works.

« Home