What if your AI agent could instantly parse complex PDFs, extract nested tables, and "see" data within charts as easily as reading a text file?...
Accelerating Long-Context Model Training in JAX and XLA
Large language models (LLMs) are rapidly expanding their context windows, with recent models supporting sequences of 128K tokens, 256K tokens, and beyond.... Large language models...
Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert Parallel
In LLM training, Expert Parallel (EP) communication for hyperscale mixture-of-experts (MoE) models is challenging. EP communication is essentially all-to-all,... In LLM training, Expert Parallel (EP)...
Advancing GPU Programming with the CUDA Tile IR Backend for OpenAI Triton
NVIDIA CUDA Tile is a GPU-based programming model that targets portability for NVIDIA Tensor Cores, unlocking peak GPU performance. One of the great things... NVIDIA...
Establishing a Scalable Sparse Ecosystem with the Universal Sparse Tensor
Sparse tensors are vectors, matrices, and higher-dimensional generalizations with many zeros. They are crucial in various fields such as scientific computing,... Sparse tensors are vectors,...
