Evaluating LLMs’ Bayesian capabilities
As with people, to be efficient, an LLM’s person interactions require continuous updates to its probabilistic estimates of the person’s preferences primarily based on every new interplay with them. Right here we ask: do LLMs act as if they’ve probabilistic estimates which are up to date as anticipated from optimum Bayesian inference? To the extent that the LLM’s habits deviates from the optimum Bayesian technique, how can we decrease these deviations?
To check this, we used a simplified flight suggestion job, by which the LLMs work together as assistants with a simulated person for 5 rounds. In every spherical, three flight choices had been introduced to each the person and the assistant. Every flight was outlined by a departure time, a length, plenty of stops, and a price. Every simulated person was characterised by a set of preferences: for every characteristic, they might have a robust or weak desire for top or low values of the characteristic (e.g., they could favor longer or shorter flights), or no desire concerning this characteristic.
We in contrast the LLMs’ habits to that of a mannequin, a Bayesian assistant, that follows the optimum Bayesian technique. This mannequin maintains a chance distribution that displays its estimates of the person’s preferences, and makes use of Bayes’ rule to replace this distribution as new details about the person’s decisions turns into out there. Not like many real-life situations, the place it’s tough to specify and implement the Bayesian technique computationally, on this managed setting it’s simple to implement and permits us to exactly estimate the extent to which LLMs deviate from it.
The purpose of the assistant was to advocate the flight that matches the person’s alternative. On the finish of every spherical, the person indicated to the assistant whether or not or not it selected accurately, and offered it with the proper reply.
