Understanding Data and Machine Training
Organisation
- Lecture (Prof. Dr. Ralf Möller, Dr. Marcel Gehrke)
- Seminar (Malte Luttermann, M.Sc.)
Content
- Introduction (pptx, pdf)
- Classification vs. regression, parametric and non-parametric supervised learning, regularisation to avoid overfitting, minimum description length (pptx, pdf)
- Frequency analysis, shopping basket analysis, recommendations (pptx, pdf)
- Statistical fundamentals: samples, optimal estimators, distribution, density, cumulative distribution, scales: nominal, ordinal, interval and ratio scales, hypothesis tests, confidence intervals (pptx, pdf)
- Stochastic or probabilistic fundamentals: probabilities, random variables, conditional probabilities, independence, distributions, Bayesian networks for specifying distributions by factorization, blackboard notation, queries, query answering algorithms, learning methods for complete data, regularization from a probabilistic perspective (pptx, pdf)
- Inductive learning: version space, entropy concept, decision trees, learning rules (pptx, pdf)
- Ensemble methods: Bagging (Random Forests), Boosting (XGBoost) (pptx, pdf)
- Clustering: K-means, K-medoids, DBSCAN, BFR, CURE, BIRCH, Analysis of Variation, (ANOVA), t-test, linear discriminant analysis (pptx, pdf)
- Community Analysis (pptx, pdf)
Practical work
- Python programming language
- Introduction (pdf, pdf without animations)
- Basics (pdf, pdf without animations)
- Advanced (pdf, pdf without animations)
- Markup languages (LaTeX, Markdown) (pdf, pdf without animations)
- Development environments (pdf, pdf without animations)
- Version control (Git) (pdf, pdf without animations)
- Scientific computing (NumPy, SciPy) (pdf, pdf without animations)
- Data processing and visualisation (Pandas, matplotlib, NLTK) (pdf, pdf without animations)
- Machine learning with Python (scikit-learn) (pdf, pdf without animations)