Making these algorithms work for LLMs
If we run these algorithms “out-of-the-box” for LLMs, issues go badly. So, we got here up with optimizations to the algorithms that repair the important thing points with working them “out-of-the-box”.
For ELS, we needed to go from example-level DP ensures to user-level DP ensures. We discovered that earlier work was including orders of magnitude extra noise than was really crucial. We had been capable of show that we will add considerably much less noise, making the mannequin a lot better whereas retaining the identical privateness ensures.
For each ELS and ULS, we had to determine how one can optimize the contribution certain. A “default” alternative is to decide on a contribution certain that each person already satisfies; that’s, we don’t do any pre-processing. Nevertheless, some customers might contribute a considerable amount of information, and we might want to add massive quantities of noise to supply privateness to those customers. Setting a smaller contribution certain reduces the quantity of noise we have to add, however the associated fee is having to discard lots of information. As a result of LLM coaching runs are costly, we will’t afford to attempt coaching a bunch of fashions with totally different contribution bounds and decide the perfect one — we’d like an efficient technique to select the contribution certain earlier than we begin coaching.
After prolonged experimentation at scale, for ELS we discovered that setting the contribution certain to be the median variety of examples held by every person was an efficient technique. For ULS, we give a prediction for the full noise added as a perform of the contribution certain, and located that selecting the contribution certain minimizing this prediction was an efficient technique.
