We’ve made a lot of progress over the last few weeks. Danielle addresses some of our configuration problems in her last post, but since then we have been able to sort out the problems. Our switch to PostgreSQL has been successful, as well as our decision to use Rails and ActiveRecord. Our progress since then has been focused in a few keys areas:
The scraper is finally finished! The objective of the scraper is to gather all of the code review data available for the general public off of the Chormium website. We will get that in the form of a JSON file. We have already scraped about a thousand code reviews to use as test data.
One of our main goals these last few weeks has been creating “loaders”. We use loaders to take the JSON data collected by the scraper and process it into our database schema. So far we have loaders for:
Data Integrity Testing is important, especialy because we made a decision not to use foreign keys in our data table. For this reason we need to be sure that all of our data is consistent. An example of one of these tests is checking for the consistency of CVE numbers. Any CVE number recorded in the code review table should also be recorded in the CVE table. The integrity testing will make sure this is true for every case, and report back any abnormalities it catches within the data.
This week we started a Goal/Question/Metric spread sheet. You can read more about this strategy here, but basically its a way to take the main questions of what you’re researching/developing and break them down into specific metrics that you will need to attain in order to meet your goals. It’s a great way to come up with solid facts that back-up the point of your paper. This has been working really well for us so far, and we will continue to update this document throughout the project.