Visão Geral
Este curso explora em profundidade o funcionamento do algoritmo de Backpropagation aplicado às Redes Neurais Convolucionais (CNNs) e aos Transformers, abordando como o cálculo de gradientes se adapta a convoluções, mecanismos de atenção e arquiteturas profundas modernas. O foco está na compreensão matemática, no fluxo de gradientes em grafos computacionais complexos e nas implicações práticas para treinamento eficiente, estabilidade numérica e desempenho em tarefas de visão computacional, linguagem natural e modelos multimodais.
Conteúdo Programatico
Module 1: Backpropagation Review and Computational Graphs
- Computational graphs in deep learning
- Automatic differentiation fundamentals
- Reverse-mode differentiation
- Gradient flow analysis
Module 2: Backpropagation in Convolutional Neural Networks
- Convolution operation and parameter sharing
- Gradient computation for convolutional filters
- Backpropagation through stride and padding
- Bias gradients in CNN layers
Module 3: Pooling and Normalization Layers
- Backpropagation through max pooling
- Average pooling gradients
- Batch Normalization forward and backward pass
- Layer and Group Normalization
Module 4: Deep CNN Architectures
- Backpropagation in deep convolutional stacks
- Residual connections and gradient flow
- Skip connections and identity mappings
- Gradient behavior in very deep CNNs
Module 5: Attention Mechanism Fundamentals
- Scaled dot-product attention
- Query, key and value projections
- Softmax gradients
- Attention weight backpropagation
Module 6: Backpropagation in Transformers
- Transformer encoder backpropagation
- Transformer decoder backpropagation
- Multi-head attention gradients
- Feedforward blocks and residual paths
Module 7: Positional Encoding and Embeddings
- Gradient flow through embeddings
- Learnable vs fixed positional encodings
- Backpropagation in token representations
- Stability considerations
Module 8: Optimization Challenges in CNNs and Transformers
- Memory and computational complexity
- Gradient checkpointing
- Mixed precision training
- Numerical stability in large models
Module 9: Practical Implementation and Debugging
- Visualizing gradients in CNNs
- Debugging attention-related gradient issues
- Performance profiling
- Case studies with real-world architectures