Visão Geral
Este curso aborda as principais técnicas de quantização utilizadas em Large Language Models (LLMs), modelos de Deep Learning e aplicações de Inteligência Artificial Generativa. O participante aprenderá como reduzir o consumo de memória, aumentar a velocidade de inferência e otimizar custos computacionais por meio da utilização de representações numéricas de menor precisão. O curso explora métodos como Post-Training Quantization (PTQ), Quantization-Aware Training (QAT), GPTQ, AWQ, GGUF, bitsandbytes e outras abordagens modernas utilizadas em ambientes corporativos e de produção.
Conteúdo Programatico
Module 1: Introduction to Model Quantization
- Fundamentals of model optimization
- Challenges of large-scale AI models
- Quantization concepts and objectives
- Benefits and trade-offs
- Enterprise use cases
- Quantization ecosystem overview
Module 2: Numerical Representations and Precision
- Floating-point representations
- FP32, FP16 and BF16 formats
- Integer representations
- Precision and accuracy concepts
- Numerical stability considerations
- Hardware implications
Module 3: Foundations of Quantization
- Quantization theory
- Scale and zero-point concepts
- Uniform and non-uniform quantization
- Static and dynamic quantization
- Error analysis techniques
- Quantization performance metrics
Module 4: Post-Training Quantization (PTQ)
- PTQ fundamentals
- Calibration datasets
- Weight quantization
- Activation quantization
- Accuracy preservation strategies
- PTQ implementation workflows
Module 5: Quantization-Aware Training (QAT)
- QAT architecture
- Simulated quantization during training
- Training optimization strategies
- Accuracy improvement techniques
- Fine-tuning quantized models
- QAT implementation practices
Module 6: Quantization for Large Language Models
- LLM-specific quantization challenges
- Memory optimization strategies
- Quantization of transformer architectures
- Attention layer considerations
- Inference optimization
- Enterprise deployment scenarios
Module 7: Modern Quantization Methods
- GPTQ fundamentals
- AWQ concepts
- SmoothQuant techniques
- Activation-aware quantization
- Advanced quantization approaches
- Comparative analysis of methods
Module 8: Low-Bit Quantization Techniques
- INT8 quantization
- INT4 quantization
- 8-bit and 4-bit inference
- Mixed-precision techniques
- Extreme quantization approaches
- Performance trade-offs
Module 9: Quantization Tooling and Frameworks
- bitsandbytes overview
- GGUF format fundamentals
- Quantization libraries
- Model conversion workflows
- Open-source tooling ecosystem
- Integration best practices
Module 10: Deployment and Performance Optimization
- Quantized model serving
- CPU and GPU optimization
- Edge AI deployment
- Throughput and latency tuning
- Cost optimization strategies
- Production readiness validation
Module 11: Governance, Security and Operational Considerations
- AI governance requirements
- Validation and quality controls
- Model lifecycle management
- Monitoring quantized models
- Security considerations
- Responsible AI practices
Module 12: Quantization Workshop
- PTQ implementation exercises
- QAT laboratory
- GPTQ and AWQ experimentation
- LLM quantization projects
- Performance benchmarking activities
- Final enterprise quantization optimization project