Curso DataOps for IT Professionals
16 horasVisão Geral
Curso DataOps for IT Professionals. Este curso de formação DataOps para Profissionais de TI ensina aos participantes como elevar a qualidade de seus dados, aumentando a eficácia do trabalho analítico baseado nesses dados que suportam as decisões organizacionais. Os participantes aprendem como incorporar planos práticos e assistência técnica ao longo de todo o ciclo de vida dos dados, incluindo aquisição, armazenamento, processamento e consumo de dados.
Objetivo
Após realizar este Curso DataOps for IT Professionals você será capaz de:
- Entenda os desafios do processamento de dados corporativos e dos sistemas de TI
- Corrigir dados de entrada "ruins"
- Execute a limpeza de dados
- Lidar com dados ausentes e duplicados
- Aplicar consistência de dados
- Implementar governança de dados
Pre-Requisitos
- Experiência prática de trabalho em processamento de dados.
Conteúdo Programatico
DataOps Introduction
- DataOps Enterprise Data Technologies
- Enterprise Data Processing Challenges and IT Systems' Woes:
- Data Quality
- What Makes Information Systems Cluttered and Myopic
- Fragmented Data Sources
- Different Data Formats
- System Interoperability
- Maintenance Issues
- Data-Related Roles
- Data Engineering
- What is DataOps?
- The DataOps Technology and Methodology Stack
- The DataOps Manifesto
- Agile Development
- DevOps
- The Lean Manufacturing Methodology
- Key Components of a DataOps Platform
- Overview of DataOps Tools and Services
- Overview of DataOps Platforms
Data Quality
- Data Quality Definitions
- Dimensions of Data Quality
- Defining "Bad" Data
- Missing Data
- Wrong/Incorrect Data or Data Format
- Inconsistent Data
- Outdated (Stale) Information
- Unverifiable Data
- Withheld Data
- Common Causes for “Bad" Data
- Human Factor
- Infrastructure- and Network-Related Issues
- Software Defects
- Using the Wrong Tool for the Job
- Using Untrusted Data
- Aggregation of Data from Disparate Data Sources that have Impedance Mismatch
- Wrong QoS Settings of Queueing Systems
- Wrong Caching System Settings, e.g. TTL
- Not Using the "Ground Truth" Data
- Differently Configured Development/UAT/Production Systems
- Confusing Big-Endian and Little-Endian Byte Order
- Ensuring Data Quality
- Ensuring Integrity of Datasets
- Dealing with "Bad" Input Data
- DDL-enforced Schema & Schema-on-Demand (-on-Read)
- SQL Constraints as Rules for Column-Level and Table-Wide Data
- XML Schema Definition (XSD) for XML Documents
- Validating JSON Documents
- Regular Expressions
- Data Cleansing of Data at Rest
- Controlling Integrity of Data-in-Transit
- Database Normalization
- Using Assertions in Applications
- Operationalizing Input Data Validation
- Data Consistency and Availability
- Dealing with Duplicate Data
- Dealing with Missing (NaN) Data
- Master (Authoritative) Data Management
- Enforcing Data Consistency with the scikit-learn LabelEncoder Class
- Data Provenance
- The Event Sourcing Pattern
- Adopting the Culture of Automation
- On-going Auditing
- Monitoring and Alerting
- UiPath
- Workflow (Pipeline) Orchestration Systems
How to Lead with Data
- Enterprise Architecture Components
- Business Architecture
- Information Architecture
- Application Architecture
- Technology Architecture
- DataOps Functional Architecture
- The Snowflake Data Cloud
- Cloud Design for System Resiliency
- New Data Architecture:
- Data Ownership
- Shared Environment Security Controls
Data Governance (Optional)
- The Need for Data Governance
- Controlling the Decision-Making Process
- Controlling "Agile IT"
- Types of Requirements
- Product
- Process
- Scoping Requirements
- Governance Gotchas
- Governance Best Practices