Data Science Methods in Software Engineering

SWEN 789-01 (Graduate Special Topics)

Pradeep K. Murukannaiah

Email: pkmvse at rit-domain
Office hours: MW 2:00–3:00PM plus via email
Office: Golisano 70-1521


[ Home | Schedule | Reading | Paper Assignment | Deliverables ]


Books (Optional)

[Bird 2015]
Christian Bird, Tim Menzies and Thomas Zimmermann. The Art and Science of Analyzing Software Data. Morgan Kaufmann, 2015. RIT e-book.
[Carrington 2005]
Peter J. Carrington, John Scott, and Stanley Wasserman, eds. Models and methods in social network analysis. Vol. 28. Cambridge University Press, 2005.
[Manning 2008]
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to information retrieval. Cambridge University Press, 2008.
[Menzies 2016]
Tim Menzies, Laurie Williams, and Thomas Zimmermann. Perspectives on Data Science for Software Engineering. Morgan Kaufmann, 2016. RIT e-book.
[Tan 2006]
Pang‐Ning Tan, Michael Steinbach, and Vipin Kumar. Introduction to Data Mining. Addison-Wesley Longman, Boston, 2006.

Research Papers (Required)

[Bhattacharya 2012]
Pamela Bhattacharya, Marios Iliofotou, Iulian Neamtiu, and Michalis Faloutsos. 2012. Graph-based analysis and prediction for software evolution. In Proceedings of the 34th International Conference on Software Engineering (ICSE '12). 419-429.
[Ciurumelea 2017]
Adelina Ciurumelea, Andreas Schaufelbühl, Sebastiano Panichella and Harald C. Gall. 2017. Analyzing reviews and code of mobile apps for better release planning. IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER '17). 91--102.
[Han 2009]
Sangmok Han, David R. Wallace, and Robert C. Miller. 2009. Code Completion from Abbreviated Input. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering (ASE '09). 332-343.
[Jiang 2008]
Zhen Ming Jiang, Ahmed E. Hassan, Gilbert Hamann and Parminder Flora. 2008. Automatic identification of load testing problems. IEEE International Conference on Software Maintenance (ICSM 2008). 307-316.
[Laurent 2007]
Paula Laurent, Jane Cleland-Huang, and Chuan Duan. 2007. Towards automated requirements triage. In Proceedings of the 15th IEEE International Requirements Engineering Conference (RE '07). 131--140.)
[Lo 2009]
David Lo, Hong Cheng, Jiawei Han, Siau-Cheng Khoo, and Chengnian Sun. 2009. Classification of software behaviors for failure detection: a discriminative pattern mining approach. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '09). 557--566.
[Maalej 2015]
Walid Maalej and Hadeer Nabil. 2015. Bug report, feature request, or simply praise? on automatically classifying app reviews. In Proceedings of the IEEE 23rd International Requirements Engineering Conference (RE '15). 116--125.
[McCabe 1976]
Thomas J. McCabe. 1976 A complexity measure. IEEE Transactions on Software Engineering. 2(4):308--320.
[Mirarab 2007]
Siavash Mirarab, Alaa Hassouna, and Ladan Tahvildari. 2007. Using Bayesian Belief Networks to Predict Change Propagation in Software Systems. In Proceedings of the 15th IEEE International Conference on Program Comprehension (ICPC '07). 177-188.
[Meneely 2008]
Andrew Meneely, Laurie Williams, Will Snipes, and Jason Osborne. 2008 Predicting failures with developer networks and social network analysis. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering (FSE '08), 13--23.
[Pandita 2012
Rahul Pandita, Xusheng Xiao, Hao Zhong, Tao Xie, Stephen Oney, and Amit Paradkar. 2012. Inferring method specifications from natural language API descriptions. In Proceedings of the 34th International Conference on Software Engineering (ICSE), 815--825.
[Schröter 2006]
Adrian Schröter, Thomas Zimmermann, and Andreas Zeller. 2006. Predicting component failures at design time. In Proceedings of the 2006 ACM/IEEE International Symposium on Empirical Software Engineering (ISESE '06). 18--27.
[Stewart 2002]
B. Stewart. 2002. Predicting project delivery rates using the Naive–Bayes classifier. Journal of Software Maintenance and Evolution: Research and Practice. 14(3):161--179.
[Thummalapenta 2009]
Suresh Thummalapenta and Tao Xie. 2009. Mining exception-handling rules as sequence association rules. In Proceedings of the 31st International Conference on Software Engineering (ICSE '09). 496-506.
[Tufano 2015]
Michele Tufano, Fabio Palomba, Gabriele Bavota, Rocco Oliveto, Massimiliano Di Penta, Andrea De Lucia, and Denys Poshyvanyk. 2015. When and why your code starts to smell bad . In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE '15). 403-414.
[Zanetti 2013]
Marcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone, and Frank Schweitzer. 2013. Categorizing bugs with social networks: A case study on four open source software communities. In Proceedings of the 2013 International Conference on Software Engineering (ICSE '13). 1032-1041.

Research Papers (Optional)

[Bowring 2004]
James F. Bowring, James M. Rehg, and Mary Jean Harrold. 2004. Active learning for automatic classification of software behavior. In Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis (ISSTA '04). 195--205.
[Cheung 2008]
Leslie Cheung, Roshanak Roshandel, Nenad Medvidovic, and Leana Golubchik. 2008. Early prediction of software component reliability. In Proceedings of the 30th international conference on Software engineering (ICSE '08). 111-120.
[Rabiner 1990]
Lawrence R. Rabiner. 1990. A tutorial on hidden Markov models and selected applications in speech recognition. In Readings in speech recognition, Alex Waibel and Kai-Fu Lee (Eds.). Morgan Kaufmann Publishers Inc., San Francisco. 267-296.
[Shepperd 1988]
Martin Shepperd. 1988. A critique of cyclomatic complexity as a software metric. Software Engineering Journal. 3(2):30--36.
[Singh 2011]
Param Vir Singh, Yong Tan, and Nara Youn. 2011. A Hidden Markov Model of Developer Learning Dynamics in Open Source Software Projects. Information Systems Research. 22(4):790--807.
[Turhan 2009]
Burak Turhan and Ayse Bener. 2009. Analysis of Naive Bayes' assumptions on software fault data: An empirical study. Data & Knowledge Engineering. 68(2):278--290.