The course aims at the development of model building and data analysis skills. The focus is on the use of modern "statistical learning" techniques for prediction and classification in business and economics. The course is application-oriented: hence, theory is only developed for understanding practical aspects of applications. As working with data is essential, lab activities complement material discussed in class.
I am convinced that knowledge of modern statistical techniques and especially of state-of-the-art tools for their practical implementation are key assets for professional success. A manager who masters sophisticated data-analysis methods has a deeper understanding of the processes going on in an organization, and a clearer view towards its future developments.
As the most effective way of learning data-analysis methods is based on their implementation in real-data case studies, practical laboratories are essential ingredients of this course.
At the end of the course, students will be able to
1. build models for analyzing a multivariate data set, testing hypotheses of interest and making predictions;
2. master the software necessary for non-trivial practical implementation of the techniques.
Basic knowledge of statistics (descriptive statistics, statistical inference, simple linear regression) at the level of, e.g., Newbold, P., Carlson, W. and Thorne, B. (2010), Statistics for Business and Economics, 7th edition, Prentice Hall.
The exam of Data Analysis and Forecasting is a compulsory requirement.
Statistical learning. Linear regression (multiple models, models with interactions, models with qualitative variables). Classification methods (logistic regression, K-nearest neighbors, linear ana quadratic discriminant analysis). Linear model selection (ridge regression, lasso, subset selection methods). Non-linear models (regression splines, generalized additive models). Tree-based methods. Cluster analysis.
Classroom lectures and laboratory workshops aimed at the analysis of real datasets. The R software will be used for data analysis.
Students shall also carry out groupworks consisting of analysing, and possibly presenting and discussing in class, case studies based on business data.
Evaluation of learning
The final mark M is obtained as M=0.7*F+0.3*P, where F is the mark of the final written exam and P is the mark obtained in the group presentation. The final exam is based on open questions and/or exercises about the theory; the group presentation is a written handout on a real data analysis project to be performed by means of the R software, possibly accompanied by a presentation in class.
G. James, D. Witten, T. Hastie, R Tibshirani (2013) An Introduction to Statistical Learning with Applications in R. Springer
J. Ledolter (2013). Data mining and business analytics with R. Wiley.
Practical application of the methods developed in class requires appropriate software in order to be fully effective. Both class lectures and labs are based on the R software, which is an open source environment for statistical computing and graphics, freely available for download at www.R-project.org.