Perception: Natural Language Processing and Computer Vision
Perception: Natural Language Processing and Computer Vision
Organisation
- Lecture (Prof. Dr. Ralf Möller)
- Seminar (Malte Luttermann, M.Sc.)
Content
- Overview & Perception Landscape: From sensory data to symbolic events
- Natural Language Processing: Language models (LDA, transformers), fine tuning, distilling
- Visual Data Processing: Vision transformers, object recognition (YOLO), semantic segmentation
- Multimodal Fusion Paradigms: Early vs. late vs. hybrid; cross‑modal attention, vision‑language pre‑training (VLP): CLIP, ALIGN, Flamingo, BLIP‑2
- Audio & Speech as Perceptual Channels: Spectrograms, wav2vec, multimodal speech‑vision
- Predictive Perception I: Video forecasting – ConvLSTM, predictive coding, diffusion‑based video generation
- Predictive Perception II: Language generation & planning – Seq2Seq, Transformer‑XL, GPT‑style planning
- Trajectory & Intent Prediction – Social‑GAN, CVAE, graph transformers for pedestrian forecasting
- Probabilistic & Symbolic Reasoning: Training Bayesian networks, expectation maximization (EM), dynamic Bayesian networks, Markov networks, Markov random fields
- Abduction Fundamentals: Inverse reasoning, explanation generation, event recognition via abductive multimodal fusion: From low‑level cues to high‑level narratives