Visão Geral
Este curso Numerical Optimization and Stability in Large-Scale Models, aborda em profundidade os fundamentos e as técnicas avançadas de otimização numérica e estabilidade computacional aplicadas a modelos de grande escala, como Deep Neural Networks, Transformers e modelos fundacionais. O foco está na compreensão matemática dos métodos de otimização, no comportamento numérico dos algoritmos em larga escala e nas estratégias utilizadas para garantir treinamento estável, eficiente e confiável em ambientes de alta dimensionalidade e grande volume de dados.
Conteúdo Programatico
Module 1: Foundations of Numerical Optimization
- Optimization problem formulation
- Continuous optimization basics
- High-dimensional optimization challenges
- Loss landscape intuition
Module 2: First-Order Optimization Methods
- Gradient descent revisited
- Stochastic gradient descent
- Momentum methods
- Convergence analysis
Module 3: Second-Order and Quasi-Newton Methods
- Hessian matrix interpretation
- Newton’s method
- Quasi-Newton methods
- Practical limitations in large-scale models
Module 4: Optimization in Large-Scale Deep Learning
- Optimization dynamics in deep networks
- Scale effects on convergence
- Mini-batch size and stability
- Sharp vs flat minima
Module 5: Adaptive Optimization Algorithms
- AdaGrad
- RMSProp
- Adam and variants
- Stability trade-offs in adaptive methods
Module 6: Numerical Precision and Stability
- Floating point arithmetic
- Rounding errors and accumulation
- Mixed precision training
- Loss scaling techniques
Module 7: Gradient Stability Techniques
- Vanishing and exploding gradients
- Gradient clipping
- Normalization effects on gradients
- Regularization and stability
Module 8: Large-Scale Training Strategies
- Distributed optimization challenges
- Synchronous vs asynchronous training
- Gradient aggregation and communication
- Checkpointing and fault tolerance
Module 9: Practical Diagnostics and Case Studies
- Monitoring optimization metrics
- Debugging unstable training runs
- Performance profiling
- Case studies with large-scale models