Responsible AI: The research collaboration behind new open-source tools offered by Microsoft

Published

Flowchart showing how responsible AI tools are used together for targeted debugging of machine learning models: the Responsible AI Dashboard for the identification of failures; followed by the Responsible AI Dashboard and Mitigations Library for the diagnosis of failures; then the Responsible AI Mitigations Library for mitigating failures; and lastly the Responsible AI Tracker for tracking, comparing, and validating mitigation techniques from which an arrow points back to the identification phase of the cycle  to indicate the repetition of the process as models and data continue to evolve during the ML lifecycle.

As computing and AI advancements spanning decades are enabling incredible opportunities for people and society, they’re also raising questions about responsible development and deployment. For example, the machine learning models powering AI systems may not perform the same for everyone or every condition, potentially leading to harms related to safety, reliability, and fairness. Single metrics often used to represent model capability, such as overall accuracy, do little to demonstrate under which circumstances or for whom failure is more likely; meanwhile, common approaches to addressing failures, like adding more data and compute or increasing model size, don’t get to the root of the problem. Plus, these blanket trial-and-error approaches can be resource intensive and financially costly.

Through its Responsible AI Toolbox (opens in new tab), a collection of tools and functionalities designed to help practitioners maximize the benefits of AI systems while mitigating harms, and other efforts for responsible AI, Microsoft offers an alternative: a principled approach to AI development centered around targeted model improvement. Improving models through targeting methods aims to identify solutions tailored to the causes of specific failures. This is a critical part of a model improvement life cycle that not only includes the identification, diagnosis, and mitigation of failures but also the tracking, comparison, and validation of mitigation options. The approach supports practitioners in better addressing failures without introducing new ones or eroding other aspects of model performance.

“With targeted model improvement, we’re trying to encourage a more systematic process for improving machine learning in research and practice,” says Besmira Nushi, a Microsoft Principal Researcher (opens in new tab) involved with the development of tools for supporting responsible AI. She is a member of the research team behind the toolbox’s newest additions (opens in new tab): the Responsible AI Mitigations Library (opens in new tab), which enables practitioners to more easily experiment with different techniques for addressing failures, and the Responsible AI Tracker (opens in new tab), which uses visualizations to show the effectiveness of the different techniques for more informed decision-making.

Targeted model improvement: From identification to validation

The tools in the Responsible AI Toolbox, (opens in new tab) available in open source and through the Azure Machine Learning (opens in new tab) platform offered by Microsoft, have been designed with each stage of the model improvement life cycle in mind, informing targeted model improvement through error analysis, fairness assessment, data exploration, and interpretability.

For example, the new mitigations library bolsters mitigation by offering a means of managing failures that occur in data preprocessing, such as those caused by a lack of data or lower-quality data for a particular subset. For tracking, comparison, and validation, the new tracker brings model, code, visualizations, and other development components together for easy-to-follow documentation of mitigation efforts. The tracker’s main feature is disaggregated model evaluation and comparison, which breaks down model performance by data subset to present a clearer picture of a mitigation’s effects on the intended subset, as well as other subsets, helping to uncover hidden performance declines before models are deployed and used by individuals and organizations. Additionally, the tracker allows practitioners to look at performance for subsets of data across iterations of a model to help practitioners determine the most appropriate model for deployment.

photo of Besmira Nushi smiling for the camera

“Data scientists could build many of the functionalities that we offer with these tools; they could build their own infrastructure,” says Nushi. “But to do that for every project requires a lot of effort and time. The benefit of these tools is scale. Here, they can accelerate their work with tools that apply to multiple scenarios, freeing them up to focus on the work of building more reliable, trustworthy models.”

Besmira Nushi, Microsoft Principal Researcher

Building tools for responsible AI that are intuitive, effective, and valuable can help practitioners consider potential harms and their mitigation from the beginning when developing a new model. The result can be more confidence that the work they’re doing is supporting AI that is safer, fairer, and more reliable because it was designed that way, says Nushi. The benefits of using these tools can be far-reaching—from contributing to AI systems that more fairly assess candidates for loans by having comparable accuracy across demographic groups to traffic sign detectors in self-driving cars that can perform better across conditions like sun, snow, and rain.

Converting research into tools for responsible AI

Creating tools that can have the impact researchers like Nushi envision often begins with a research question and involves converting the resulting work into something people and teams can readily and confidently incorporate in their workflows.

“Making that jump from a research paper’s code on GitHub to something that is usable involves a lot more process in terms of understanding what is the interaction that the data scientist would need, what would make them more productive,” says Nushi. “In research, we come up with many ideas. Some of them are too fancy, so fancy that they cannot be used in the real world because they cannot be operationalized.”

Multidisciplinary research teams consisting of user experience researchers, designers, and machine learning and front-end engineers have helped ground the process as have the contributions of those who specialize in all things responsible AI. Microsoft Research works closely with the incubation team of Aether (opens in new tab), the advisory body for Microsoft leadership on AI ethics and effects, to create tools based on the research. Equally important has been partnership with product teams whose mission is to operationalize AI responsibly, says Nushi. For Microsoft Research, that is often Azure Machine Learning (opens in new tab), the Microsoft platform for end-to-end ML model development. Through this relationship, Azure Machine Learning can offer what Microsoft Principal PM Manager Mehrnoosh Sameki (opens in new tab) refers to as customer “signals,” essentially a reliable stream of practitioner wants and needs directly from practitioners on the ground. And, Azure Machine Learning is just as excited to leverage what Microsoft Research and Aether have to offer: cutting-edge science. The relationship has been fruitful.

As the current Azure Machine Learning platform made its debut five years ago, it was clear tooling for responsible AI was going to be necessary. In addition to aligning with the Microsoft vision for AI development, customers were seeking out such resources. They approached the Azure Machine Learning team with requests for explainability and interpretability features, robust model validation methods, and fairness assessment tools, recounts Sameki, who leads the Azure Machine Learning team in charge of tooling for responsible AI. Microsoft Research, Aether, and Azure Machine Learning teamed up to integrate tools for responsible AI into the platform, including InterpretML (opens in new tab) for understanding model behavior, Error Analysis (opens in new tab) for identifying data subsets for which failures are more likely, and Fairlearn (opens in new tab) for assessing and mitigating fairness-related issues. InterpretML and Fairlearn are independent community-driven projects that power several Responsible AI Toolbox functionalities.

Before long, Azure Machine Learning approached Microsoft Research with another signal: customers wanted to use the tools together, in one interface. The research team responded with an approach that enabled interoperability, allowing the tools to exchange data and insights, facilitating a seamless ML debugging experience. Over the course of two to three months, the teams met weekly to conceptualize and design “a single pane of glass” from which practitioners could use the tools collectively. As Azure Machine Learning developed the project, Microsoft Research stayed involved, from providing design expertise to contributing to how the story and capabilities of what had become Responsible AI dashboard (opens in new tab) would be communicated to customers.

After the release, the teams dived into the next open challenge: enabling practitioners to better mitigate failures. Enter the Responsible AI Mitigations Library and the Responsible AI Tracker, which were developed by Microsoft Research in collaboration with Aether. Microsoft Research was well-equipped with the resources and expertise to figure out the most effective visualizations for doing disaggregated model comparison (there was very little previous work available on it) and navigating the proper abstractions for the complexities of applying different mitigations to different subsets of data with a flexible, easy-to-use interface. Throughout the process, the Azure team provided insight into how the new tools fit into the existing infrastructure.

With the Azure team bringing practitioner needs and the platform to the table and research bringing the latest in model evaluation, responsible testing, and the like, it is the perfect fit, says Sameki.

An open-source approach to tooling for responsible AI

While making these tools available through Azure Machine Learning supports customers in bringing their products and services to market responsibly, making these tools open source is important to cultivating an even larger landscape of responsibly developed AI. When release ready, these tools for responsible AI are made open source and then integrated into the Azure Machine Learning platform. The reasons for going with an open-source-first approach are numerous, say Nushi and Sameki:

  • freely available tools for responsible AI are an educational resource for learning and teaching the practice of responsible AI;
  • more contributors, both internal to Microsoft and external, add quality, longevity, and excitement to the work and topic; and
  • the ability to integrate them into any platform or infrastructure encourages more widespread use.

The decision also represents one of the Microsoft AI principles in action—transparency.

photo of Mehrnoosh Sameki smiling for the camera

“In the space of responsible AI, being as open as possible is the way to go, and there are multiple reasons for that,” says Sameki. “The main reason is for building trust with the users and with the consumers of these tools. In my opinion, no one would trust a machine learning evaluation technique or an unfairness mitigation algorithm that is unclear and close source. Also, this field is very new. Innovating in the open nurtures better collaborations in the field.”

Mehrnoosh Sameki, Microsoft Principal PM Manager

Looking ahead

AI capabilities are only advancing. The larger research community, practitioners, the tech industry, government, and other institutions are working in different ways to steer these advancements in a direction in which AI is contributing value and its potential harms are minimized. Practices for responsible AI will need to continue to evolve with AI advancements to support these efforts.

For Microsoft researchers like Nushi and product managers like Sameki, that means fostering cross-company, multidisciplinary collaborations in their continued development of tools that encourage targeted model improvement guided by the step-by-step process of identification, diagnosis, mitigation, and comparison and validation—wherever those advances lead.

“As we get better in this, I hope we move toward a more systematic process to understand what data is actually useful, even for the large models; what is harmful that really shouldn’t be included in those; and what is the data that has a lot of ethical issues if you include it,” says Nushi. “Building AI responsibly is crosscutting, requiring perspectives and contributions from internal teams and external practitioners. Our growing collection of tools shows that effective collaboration has the potential to impact—for the better—how we create the new generation of AI systems.”

Continue reading

See all blog posts