Training is when the model learns, it reads tons of text and adjusts itself to get better at predicting words. It’s slow, expensive, and happens just once in a while.
Inference is when the model responds, it uses what it already learned to generate answers. Fast, cheap, and happens every time you chat.
