**Codice identificativo insegnamento**: 054062**Programma
sintetico**:

**FIRST PART: MODEL IDENTIFICATION**

**Models in engineering and science**

Model accuracy and complexity. Estimation from experimental observations. Models for classification, prediction, control, simulation and management. Data processing techniques.

**Stochastic dynamic models, spectral analysis and prediction**

Stochastic processes. Input/output models for time series and cause / effect relationships (continuous and discrete time models, AR, MA, ARMA, ARX, ARMAX, Box-Jenkins models). Correlation analysis and spectral analysis. Kolmogorov-Wiener prediction theory.

**Identification of input/output models**

The problem of model identification starting from simple experimental tests. The Prediction Error Minimization (PEM) paradigm. The Least-Squares (LS) and Maximum Likelihood (ML) for the identification if AR, ARX, ARMA, ARMAX models. Asymptotic analysis of PEM identification. Choice of the complexity: FPE, AIC, MDL indicators. Spectrum estimate.

**Kalman filtering and prediction**

Stochastic state-space models. Filtering, prediction and regularization. Kalman filter. Steady-state Kalman filter. Extended Kalman filter. Use of the Kalman filter for model identification.

**SECOND PART: MACHINE LEARNING**

**Introduction to Machine Learning**

Motivations of machine learning. Machine learning, artificial intelligence and big data. Machine learning applications. Representation of input data. Machine learning process.

**Exploratory data analysis**

Data validation and cleansing, identification of outliers and missing values detection. Data transformation. Data reduction. Sampling. Features selection. Features extraction by filtering. Principal component analysis. Data discretization. Univariate analysis: graphical analysis, central tendency measurements, dispersion, relative positioning, heterogeneity, empirical density analysis. Bivariate analysis: graphical analysis, correlation, contingency tables. Multivariate analysis: graphical analysis, correlation indices.

**Supervised learning: classification and regression**

Taxonomy of supervised methods. Evaluation of classification models: holdout, cross-validation, confusion matrices, ROC curves, cumulative gain and lift. Treatment of categorical attributes. Nearest neighbor. Classification and regression trees: separation, arrest and pruning. Bayesian methods: naive Bayesians, Bayesian networks. Logistic regression. Neural networks: Rosenblatt perceptrons, multi-level feed-forward networks. Support vector machines: structural risk minimization; hyperplanes of maximum margin for linear separation; nonlinear separation. Simple and multiple linear regression. Assumptions relating to residues. Least square regression: residual normality and independence, significance of coefficients, analysis of variance, coefficients of determination and linear correlation, multicollinearity, confidence limits and prediction. Selection of predictive variables. Regression regression. Generalized linear regression.

**Association rules**

Motivation and evaluation of association rules. Single size association rules. A priori algorithm: generation of frequent item sets; generation of strong rules. Other association rules.

**Clustering**

Taxonomy of clustering models. Affinity measurements. Partition methods: K-means, K-medoids. Hierarchical methods: agglomeration and subdivision methods. Evaluation of clustering models.

**Applications and use cases**

Relational marketing applications. Web mining. Social market analysis. Speech recognition. Text mining. Fraud and anomaly detection. Bioinformatics.