
Yesterday afternoon I joined a presentation by Mark Russinovich, CTO of Microsoft Azure, hosted by Stanford’s Institute for Human-Centered AI. One of the topics he discussed was “unlearning” – techniques for removing certain information from AI models that may have been inadvertently learned during training.
This struck me as another one of those concepts that when you hear about it makes complete sense – in a “duh, why hadn’t I thought about this before?” way. There have been other techniques used to deal with certain aspects of AI models that need output correction applied but the approach Mark discussed was a new one to me. I look forward to learning more about these unlearning techniques!
He also touched on how inference processing could be split between the initial prompt handling followed by the subsequent token generation work developing the actual answer. This would allow less powerful processors to be used for that second phase of inference while saving the highest performance processors for the heavy lifting workload. Again, a logical idea and one I can see being quite useful in scaling the handling of inference workloads.