This is the main project for this class and it contributes 30% toward the course grade. In line with the theme of this course, the project should involve both data science and software engineering.
You can choose to do a literature survey or an implementation.
|Identify project type and teammate||March 20|
|Intermediate progress report||April 10|
|Final report||May 1|
A literature survey helps organize existing knowledge on a specific topic. The survey should not merely be a summary of papers. Instead, the author must spend significant effort to organize the papers. The survey author should also provide his or her own perspective on the topic on top of the existing works. You should study a topic on which you can find at least 10 published papers.
I suggest that you stick on of the following themes:
Important: Ideally, you should a study a topic on which there is not already a recent survey. This provides an opportunity to publish your work in a peer-reviewed conference or journal, in future.
You will implement a set of data science methods to solve a specific SE problem. You will require a dataset to start on this path. You can choose a publicly available dataset or collect the data yourself. Data collection can be nontrivial. Thus, if you choose to collect some data yourself, please discuss with me before you start. In the resources section below, I point to many publicly available datasets and specific problems you can address with those datasets.
Important: The effort involved must be nontrivial. It is not required that you implement a novel method, although I encourage it. However, it is required that your work involves significant effort in preprocessing the data, implementing an algorithm, or building a pipeline combining multiple methods. It is not acceptable to take a publicly available dataset and simply run it through an off-the-shelf data science method.