Data Science Methods in Software Engineering

SWEN 789-01 (Graduate Special Topics)

Pradeep K. Murukannaiah

Email: pkmvse at rit-domain
Office hours: MW 2:00–3:00PM plus via email
Office: Golisano 70-1521

[ Home | Schedule | Reading | Paper Assignment | Deliverables ]

Semester Project

This is the main project for this class and it contributes 30% toward the course grade. In line with the theme of this course, the project should involve both data science and software engineering.

Proposal	March 25
Intermediate progress report	April 15
Final report	April 29

Implementation Project (Solo or Group of Two)

You will implement a set of data science methods to solve a specific SE problem. You will require a dataset to start on this path. You can choose a publicly available dataset or collect the data yourself. Data collection can be nontrivial. Thus, if you choose to collect some data yourself, please discuss with me before you start. In the resources section below, I point to many publicly available datasets and specific problems you can address with those datasets.

Important: The effort involved must be nontrivial. It is not required that you implement a novel method, although I encourage it. However, it is required that your work involves significant effort in preprocessing the data, implementing an algorithm, or building a pipeline combining multiple methods. It is not acceptable to take a publicly available dataset and simply run it through an off-the-shelf data science method.

Software

You must implement the project in Java or Python.
You can build on existing tools and libraries.

Proposal

Treat your proposal as a working document for the your final deliverable. Ideally, the proposal should include the following details.

A working title (required)
Team members (required)
Problem description (required)
Motivation
Datasets to be used (required)
Tools and techniques to be used
Relevant prior literature
Preliminary results

Deliverables

A project report in PDF format, describing the problem your implementation addresses and the techniques it icludes.
The complete source code of your project.
A README file providing instructions on how to run your project.

Resources

MSR 2019 Mining Challenge (unfortunately, submission deadline is over; also, look for Mining Challenge from previous years)
Open Research Datasets in SE
The Promise repository.
A curated repository of MSR datasets