Codice identificativo insegnamento: 052487
Introduction to Machine Learning
Motivations of machine learning. Machine learning, artificial intelligence and big data. Applications of machine learning. Representation of input data. Machine learning process.
Exploratory data analysis
Data validation and cleansing, outlier and missing values detection. Data transformation. Data reduction. Sampling. Feature selection. Features extraction by filtering. Principal component analysis. Data discretization. Univariate analysis: graphical analysis, measures of central tendency, dispersion, relative location, heterogeneity, analysis of the empirical density. Bivariate analysis: graphical analysis, measures of correlation, contingency tables. Multivariate analysis: graphical analysis, measures of correlation.
Supervised learning: classification and regression
Taxonomy of supervised methods. Evaluation of classification models: holdout, cross-validation, confusion matrix and derived metrics, ROC curve, cumulative gain and lift. Treatment of categorical attributes. Nearest neighbor. Classification and regression trees: splitting, stopping and pruning. Bayesian methods: naive methods, Bayesian networks. Logistic regression. Neural networks: Rosenblatt perceptron, multi-level feed-forward networks. Support vector machines: structural risk minimization, maximal margin hyperplane for linear separation, nonlinear separation. Simple and multiple linear regression. Assumptions on residuals. Least square regression: normality and independence of residuals, significance of coefficients, analysis of variance, coefficients of determination and linear correlation, multicollinearity, confidence and prediction limits. Selection of predictive variables. Ridge regression. Generalized linear regression.
Motivation and evaluation of association rules. Single-dimension association rules. Apriori algorithm. Generation of frequent itemsets, generation of strong rules. General association rules.
Taxonomy of clustering methods. Affinity measures. Partition methods: K-means, K-medoids. Hierarchical methods: agglomerative methods, divisive methods. Evaluation of clustering models.
Applications and use cases
Introduction to Python programming language and its main libraries for machine learning (Scikit-learn, Keras). Applications in relational marketing using Python: lifetime value analysis, acquisition, retention, cross-selling, market basket analysis. Web mining. Social market analysis. Speech recognition. Text mining. Fraud and anomaly detection. Bioinformatics.