Greetings Greyhawk friends! Yes, it's already time for a new episode of my short story Under. If you are new to this story, follow the...
More like this
How to Build a Document Processing Pipeline for RAG with Nemotron
What if your AI agent could instantly parse complex PDFs, extract nested tables, and "see" data within charts as easily as reading a text file?...
More like this
Accelerating Long-Context Model Training in JAX and XLA
Large language models (LLMs) are rapidly expanding their context windows, with recent models supporting sequences of 128K tokens, 256K tokens, and beyond.... Large language models...
More like this
Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert Parallel
In LLM training, Expert Parallel (EP) communication for hyperscale mixture-of-experts (MoE) models is challenging. EP communication is essentially all-to-all,... In LLM training, Expert Parallel (EP)...
More like this
Advancing GPU Programming with the CUDA Tile IR Backend for OpenAI Triton
NVIDIA CUDA Tile is a GPU-based programming model that targets portability for NVIDIA Tensor Cores, unlocking peak GPU performance. One of the great things... NVIDIA...
