Codice identificativo insegnamento: 054062
FIRST PART: MODEL IDENTIFICATION
Models in engineering and science
Model accuracy and complexity. Estimation from experimental observations. Models for classification, prediction, control, simulation and management. Data processing techniques.
Stochastic dynamic models, spectral analysis and prediction
Stochastic processes. Input/output models for time series and cause / effect relationships (continuous and discrete time models, AR, MA, ARMA, ARX, ARMAX, Box-Jenkins models). Correlation analysis and spectral analysis. Kolmogorov-Wiener prediction theory.
Identification of input/output models
The problem of model identification starting from simple experimental tests. The Prediction Error Minimization (PEM) paradigm. The Least-Squares (LS) and Maximum Likelihood (ML) for the identification if AR, ARX, ARMA, ARMAX models. Asymptotic analysis of PEM identification. Choice of the complexity: FPE, AIC, MDL indicators. Spectrum estimate.
Kalman filtering and prediction
Stochastic state-space models. Filtering, prediction and regularization. Kalman filter. Steady-state Kalman filter. Extended Kalman filter. Use of the Kalman filter for model identification.
SECOND PART: MACHINE LEARNING
Introduction to Machine Learning
Motivations of machine learning. Machine learning, artificial intelligence and big data. Machine learning applications. Representation of input data. Machine learning process.
Exploratory data analysis
Data validation and cleansing, identification of outliers and missing values detection. Data transformation. Data reduction. Sampling. Features selection. Features extraction by filtering. Principal component analysis. Data discretization. Univariate analysis: graphical analysis, central tendency measurements, dispersion, relative positioning, heterogeneity, empirical density analysis. Bivariate analysis: graphical analysis, correlation, contingency tables. Multivariate analysis: graphical analysis, correlation indices.
Supervised learning: classification and regression
Taxonomy of supervised methods. Evaluation of classification models: holdout, cross-validation, confusion matrices, ROC curves, cumulative gain and lift. Treatment of categorical attributes. Nearest neighbor. Classification and regression trees: separation, arrest and pruning. Bayesian methods: naive Bayesians, Bayesian networks. Logistic regression. Neural networks: Rosenblatt perceptrons, multi-level feed-forward networks. Support vector machines: structural risk minimization; hyperplanes of maximum margin for linear separation; nonlinear separation. Simple and multiple linear regression. Assumptions relating to residues. Least square regression: residual normality and independence, significance of coefficients, analysis of variance, coefficients of determination and linear correlation, multicollinearity, confidence limits and prediction. Selection of predictive variables. Regression regression. Generalized linear regression.
Motivation and evaluation of association rules. Single size association rules. A priori algorithm: generation of frequent item sets; generation of strong rules. Other association rules.
Taxonomy of clustering models. Affinity measurements. Partition methods: K-means, K-medoids. Hierarchical methods: agglomeration and subdivision methods. Evaluation of clustering models.
Applications and use cases
Relational marketing applications. Web mining. Social market analysis. Speech recognition. Text mining. Fraud and anomaly detection. Bioinformatics.