Master NVIDIA AI Infrastructure & Pass NCA-AIIO
Your Guide to Understanding NVIDIA-Powered AI Infrastructure - From Fundamentals to Certification Success
Bestseller
What you'll learn
Comprehend GPU Architecture and Use Cases - Learn about GPU architecture and its role in accelerating AI workloads across various industries.
Navigate NVIDIA Software Suite – Learn CUDA, GPU cores, DGX, NVLink, InfiniBand, DCGM, GPUDirect, and key tools for AI data center operations.
Prepare for NVIDIA NCA-AIIO Certification - Gain the knowledge and skills needed to successfully pass the NVIDIA AI Infrastructure Operations Associate exam.
Comprehend GPU Architecture and Use Cases Learn about GPU architecture and its role in accelerating AI workloads across various industries.
Course content
6 sections • 72 lectures • 4h 40m total length
Introduction
1 lecture • 2min
Certification Details
2 lectures • 5min
Module 1 - Fundamentals
5 lectures • 16min
Module 2 - Inside an AI centric Data Center
13 lectures • 1hr 1min
Module 3 - NVIDIA Technology Stack
41 lectures • 2hr 34min
Module 4 - AI Workflows
10 lectures • 44min
Module 1
- Drivers of AI evolution
- AI use cases across industries
- AI, ML, DL, Gen AI
- Analogy for AI, ML, DL, Gen AI
- Transformer Model
- Inside an AI centric Data Center
- Power Usage Effectiveness (PUE)
- The Compute Power
- CPU and GPU
- CPU vs. GPU - Architectural difference
- Beyond Moore's law
- Data Processing Unit (DPU)
- Network inside an AI centric Data Center
- Network fabric
- Ethernet vs. InfiniBand
- Converged Ethernet (CE)
- Storage inside an AI centric Data Center
- Cloud vs. On-Prem
- NVIDIA: Powering AI GPU Innovation
- NVIDIA Technology Stack
- Layer 1 - Physical Layer
- GPU on a Graphic Card
- DGX Platform
- DGX SuperPOD
- ConnectX
- BlueField DPUs
- NVIDIA Reference Architectures
- Understanding GPU Cores
- Comparing GPU Cores
- NVIDIA DGX Platform - Timeline
- DGX Platform - Deployment Options
- DGX A100 vs H100
- Layer 2: Data Movement and I/O Acceleration
- NVLink
- InfiniBand
- InfiniBand vs. Ethernet
- DMA and RDMA
- GPUDirect RDMA
- GPUDirect Storage
- Quick Comparison
- Layer 3: OS, Driver and Virtualization
- GPU Drivers
- GPU Virtualization
- vGPU vs. MIG - Part 1
- vGPU vs. MIG - Part 2
- Layer 4: Core Libraries
- Compute Unified Device Architecture (CUDA)
- Installing CUDA
- NVIDIA Collective Communications Library (NCCL)
- NVLink, NVSwitch, PCIe, RDMA vs. NCCL
- Layer 5: Monitoring and Management
- NVIDIA-SMI
- Data Center GPU Manager (DCGM)
- Base Command Manager
- Which one to use?
- Layer 6: Applications & Vertical Solutions
- Summary
- NVIDIA AI Enterprise:
- NVIDIA AI Factory
- AI Workflows
- ML Frameworks
- The NVIDIA differentiator
- Model Training vs. Model Inference
- Job Scheduling vs. Container Orchestration
- Slurm vs Kubernetes
- NVIDIA Integration
- ML Ops - Analogy
- Why ML Ops?
- NVIDIA Tools supporting ML Ops
No prior AI infrastructure experience is required; this course is suitable for beginners. Basic understanding of IT concepts, data centers, or enterprise computing is helpful but not mandatory.
Familiarity with general IT hardware and networking concepts is useful.
Description
Embark on a transformative journey into the world of AI infrastructure with this comprehensive course designed to prepare you for the NVIDIA Certified Associate: AI Infrastructure and Operations (NCA-AIIO) certification. Whether you're an IT professional, system administrator, or DevOps engineer, this course equips you with the foundational knowledge and practical skills needed to manage and optimize AI workloads in data center environments.
What You'll Learn:
AI Fundamentals: Understand the core concepts of Artificial Intelligence, Machine Learning, and Deep Learning, and their applications in modern computing.
NVIDIA Hardware & Software: Gain proficiency in NVIDIA's GPU architectures, including A100, H100, and B200, and explore essential software tools like CUDA, DCGM, and NGC Catalog.
Infrastructure Design: Learn about data center components, networking technologies such as NVLink and InfiniBand, and how to design scalable AI infrastructure.
AI Operations: Master the deployment, monitoring, and optimization of AI workloads in a enterprise data center, utilizing tools like DCGM, Slurm and Kubernetes.
Exam Preparation: Prepare thoroughly for the NCA-AIIO exam with detailed study guides, practice questions, and real-world scenarios. Gain a clear understanding of the exam objectives, learn tips to maximize your performance, and build confidence to pass the certification on your first attempt, validating your expertise in AI infrastructure operations.
Who this course is for:
This course is for IT professionals, beginners, and anyone preparing for the NVIDIA NCA-AIIO certification. Learn AI infrastructure, NVIDIA GPUs, software, and data center operations from the ground up.