Redesigning the mannequin
TimesFM is a patched decoder that tokenizes each 32 contiguous timepoints (a patch) as an enter token and applies a transformer stack on high of the sequence of enter tokens to generate the output tokens. It then applies a shared multilayer perceptron (MLP) to translate every output token again to a time sequence of 128 timepoints.
To create TimesFM-ICF (In-Context Wonderful-tuning), we begin with the bottom TimesFM mannequin and proceed the pre-training with new context: the forecast historical past plus all in-context examples. Step one is to ensure the mannequin doesn’t confuse or conflate the forecasting historical past and the in-context examples. Think about you are giving the mannequin a listing of numbers that signify just a few various things, perhaps sun shades gross sales figures from one retailer, then umbrella gross sales figures from one other. In the event you simply merge all these numbers collectively, the mannequin would possibly get confused, pondering it is one steady stream of information. For instance, if the primary retailer’s gross sales had been going up and the second retailer’s gross sales had been happening, the mannequin would possibly incorrectly see it as a single up-and-down sample, quite than two separate, easy traits.
To repair this, we put a particular, learnable “frequent separator token” — like a digital “cease signal” or a “new paragraph” image — after every set of numbers. With these separators in place, as quickly because the mannequin attends to the separator token of an instance it has seen earlier than, it will not combine it up with the information it is at present attempting to foretell. This theoretically permits the mannequin to study from patterns in these previous examples and apply that data to the present forecast. As an illustration, the mannequin may study that “all the shop gross sales are exhibiting constant, directional traits recently, so I ought to predict an upward pattern for my new retailer’s sunscreen gross sales.”