This is the main project for this class and it contributes 30% toward the course grade. In line with the theme of this course, the project should involve both data science and software engineering.
Proposal | March 25 |
Intermediate progress report | April 15 |
Final report | April 29 |
You will implement a set of data science methods to solve a specific SE problem. You will require a dataset to start on this path. You can choose a publicly available dataset or collect the data yourself. Data collection can be nontrivial. Thus, if you choose to collect some data yourself, please discuss with me before you start. In the resources section below, I point to many publicly available datasets and specific problems you can address with those datasets.
Important: The effort involved must be nontrivial. It is not required that you implement a novel method, although I encourage it. However, it is required that your work involves significant effort in preprocessing the data, implementing an algorithm, or building a pipeline combining multiple methods. It is not acceptable to take a publicly available dataset and simply run it through an off-the-shelf data science method.
Treat your proposal as a working document for the your final deliverable. Ideally, the proposal should include the following details.