Data science with R: Applied Predictive Modelling
Compendium for STAT623
2022-05-30 15:00:35
Course overview
Learning outcomes and objectives
The course presents various advanced methods within data science for predictive modelling and the use of R. Methods for regression, including non-linear regression and generalized additive models, and methods for classification, including trees, boosting and Support Vector Machines, will be examined. The course will focus on practical use in r, without going into details of the mathematical theory of the methods.
On completion of the course the student should have the following learning outcomes:
Knowledge
- Knows the basic ideas underpinning carious methods in data science/predictive modelling
Skills
- Can implement various models within data science/predictive modelling in R
- Use data science methods on real data sets and perform predictions
General competence
- Have an overview of how data science methods can be used to analyze larger data sets
Lecture overview
Lecture | Subject | Exercises | Datacamp |
---|---|---|---|
1 | Introduction and short recap of R and Data preprocessing | Recap of R | Introduction to Regression in R (ch 1-2) |
2 | Over-fitting and model tuning, selection and evaluation and multiple regression | Multiple regression | Supervised Learning in R: Regression (ch 1-2) |
3 | Non-linear regression | GAMs | Nonlinear modeling in R with GAMs (ch 1-2) |
4 | Classification methods | Supervised Learning in R: Classification (ch 1-3) | |
5 | Decision Trees and Bagged trees | Machine Learning with Tree-Based Models in R (ch 1-3) | |
6 | Random forrest and boosting | xgboost | Machine Learning with Tree-Based Models in R (ch 3-4) |
7 | Support vector machines | Support Vector Machines in R (ch 1-2) | |
8 | Neural Networks | Introduction to TensorFlow in R (ch 1-3) | |
9 | Feature selection | Supervised Learning in R: Classification (ch 3, video on automatic feature selection |
Litterature
We will use many different sources for teaching you applied predictive modelling. Kuhn and Johnson (2016) and James et al. (2013) are the main references.
References
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning. 1st ed. Springer.
Kuhn, Max, and Kjell Johnson. 2016. Applied Predictive Modeling. 5th ed. Springer.