This presentation talks about how data flows through a typical LLM training sequence and its relationship to system/GPU memory. We will examine a simple system and how it scales to larger more complex systems. We will also talk about the need and differences between Scale-up and Scale-out networks.