NVIDIA recsys-examples 中 Pipeline 框架的完整技术文档。涵盖初始化、Forward 拆分、Context 生命周期和运行时 Timeline。
examples/commons/pipeline/_rewrite_model 如何用 torch.fx 追踪并替换 ShardedModulefill_pipeline 预填充逻辑(Base / Prefetch 对比)PipelinedForward vs PrefetchPipelinedForwardprogress() 稳态循环ShardedModule.forward() 如何被拆分为 input_dist + compute_and_output_distArgInfo 参数追踪机制BaseForward → PipelinedForward → PrefetchPipelinedForward 继承链torch.fx.Tracer 图追踪原理PipelinedPostproc 包装器TrainPipelineContext 类层次图(类图 + TorchRec 类型)Awaitable、Multistreamable、ShardedModule 深度解析progress() 迭代展开可视化wait_stream 同步箭头PipelineTask + PipelinePlanDeclaredIO 外部副作用声明与独立示例_GraftGrad 梯度嫁接_SubmissionSequencer 多线程保序TaskProfiler 暴露时间测量IterContext, DeclaredIO, PipelineTask, TaskSchedule, PipelinePlanintra_iter_deps / inter_iter_deps 同步机制与调度算法period / iter / stage 关系公式PipelinePlan 定义intra_iter_deps, stage, stream 分配examples/commons/pipeline/train_pipeline.py — TrainPipelineSparseDist, PrefetchTrainPipelineSparseDistexamples/commons/pipeline/utils.py — TrainPipelineContext, _rewrite_model, BaseForward, PipelinedForwardexamples/commons/pipeline/sw_pipeline.py — SWPipeline, PipelineTask, PipelinePlan, DeclaredIO, TaskProfilerexamples/commons/pipeline/sw_train_pipeline.py — SWSerialTrainPipeline