Join our list for notifications and early access to events
Sanmi's research provided guidance for building models that optimized arbitrary metrics based on the confusion matrix.
"Initially we work[ed out] linear weighted combinations. Eventually, we got to ratios of linear things, which captures things like F-measure. Now we're at the point where we can pretty much do any function of the confusion matrix."
Domain Experts and Metric Elicitation
Having developed a framework for optimizing classifiers against complex performance metrics, the next question Sanmi asked (because it was the next question asked of him), is which one should you choose for a particular problem? This is where metric elicitation comes in.
The idea is to flip the question around and try to determine good metrics for a particular problem by interacting with experts or users to determine which of the metrics we can now optimize for best approximate how the experts are making trade-offs against various types of predictions or classification errors.
For example, a doctor understands the costs associated with diagnosing or misdiagnosing someone with a disease. The trade-off factors could include treatment prices or side effects--factors that can be compressed to the pros/cons of predicting a diagnosis or not. Building a trade-off function for these decisions is difficult. Metric elicitation allows us to identify the preferences of doctors through a series of interactions with them, and to identify the trade-offs that should correspond to their preferences." Once we know these trade-offs, we can build a metric that captures them, which allows you to optimize those preferences directly in your models using the techniques Sanmi developed earlier.
In research developed with Gaurush Hiranandani and other colleagues at the University of Illinois, Performance Metric Elicitation from Pairwise Classifier Comparisons proposes a system of asking experts to rank pairs of preferences, kind of like an eye exam for machine learning metrics.
Metric Elicitation and Inverse Reinforcement Learning
Sanmi notes that learning metrics in this manner is similar to inverse reinforcement learning, where reward functions are being learned, often by interaction with humans. However, the fields differ in that RL is more focused on replicating behavior rather than getting the reward function correct. Metric elicitation, on the other hand, is focused on replicating the same decision-making reward function as the human expert. Matching the model's reward function, as opposed to the model's behavior, has the benefit of greater generalizability, which allows metrics that are agnostic to data distribution or the specific learner you're using.
Sanmi mentions another interesting area of application around fairness and bias, where you have different measures of fairness that correspond to different notions of trade-offs. Upcoming research is focused on finding "elicitation procedures that build context-specific notions of metrics or statistics" that should be normalized across groups to reach a fairness goal in a specific setting.
Robust Distributed Learning
This interview also covers Sanmi's research into robust distributed learning, which aims to harden distributed machine learning systems against adversarial attacks.
Be sure to check out the full interview for the interesting discussion Sam and Sanmi had on both metric elicitation and robust distributed learning. The latter discussion starts about 33 minutes into the interview.