Last Week

Last week, we looked at how we might model our data. We drew ER (Entity-Relation) diagrams of what the database might look like. Due to ease of use and familiarity, we decided to use ActiveRecord as the ORM, and Chris is looking into how to go about doing that.

This Week

During our meeting, we further discussed how we would go about collecting our data. Due to the large overhead of pulling all the database information that is useful to us over JSON/HTTP, we are considering contacting the Chromium Development team to see if they have any suggestions.

Prof. Meneely, Danielle, and I got together for about two and a half hours on Thursday, and worked on the data fetching script together. We made improvements to calculate the average and standard deviation for the time to download an issue, and for the number of patches per issue, as both these numbers are important in determining the time of grabbing all the data.

We ran the script over a set of 1000 randomly chosen API ids. Later, I went back on my own, and fetched all the related patchsets for the issues. This sampleset of data will be used in protyping our database.

Next Week

Next week, we will decide whether based on the results of the scraping benchmarks to contact the Chromium team or not. We will also look at the roles of Sherrifs and Gardeners in the Chromium community, and how they play into what data we might want to collect.

This project is sponsored by the Collaborative Research Experiences for Undergraduates, funded by the National Science Foundation.

« Home

Software Archeology @ RIT

Last Week

This Week

Next Week