How we created a data-driven, multi-lingual AI bot to support a Hispanic conference

Gustavo Cordido
Microsoft Azure
Published in
8 min readMar 7, 2024

--

SHPE AI Assistant application

Rarely do we have the chance to speak in front of an audience that embraces our work and reflects our own background and community. So, you can imagine my delight when I received an invitation to speak at the Society of Hispanic Professional Engineers (SHPE) National Convention, representing Microsoft and discussing the topic I work on the most: Artificial Intelligence. With both my love for AI and my passion for community outreach intersecting, I became determined to develop a project that would not only showcase the potential of AI but also provide support to the Hispanic and Latinx community attending the event.

I collaborated on this project with my fellow coworker and engineer, David Israwi, who is an expert in front-end development. Our goal was to approach the presentation in a unique manner, where we could both shed light on current advancements and emphasize the importance of keeping up with the rapid pace of new developments. Generative AI, in particular, has emerged as the latest wave in computing, and it is crucial for developers to adapt and integrate it into their daily work. Therefore, we decided to delve into the evolution of AI up until November 2023, showcasing a simple chat application with the tech that many attendees were already familiar with. We would then surprise them with an updated version, that demonstrated newer and more advanced capabilities that had recently emerged. The key features being access to data and multi-language voice support.

Brainstorming the application

The application is centered around a chat-like interface hosted within a React web application written in TypeScript. Users can input text or speech, which is then sent to the model. In the case of requests for conference-related information, such as attending companies, the areas these companies specialize on, roles they are hiring for or tips on how to best prepare to speak with them; the model can access the provided data to formulate an appropriate answer.

Early sketch of the Application

For this we would leverage an older version of a similar application and split the work in two. David would take care of developing and improving the React front-end, and I would take care of managing the back end and Azure services.

The finished application can be found in this GitHub repository: https://github.com/gcordido/shpe-demo alongside instructions on how to install its dependencies, create the necessary services and run it both locally and through GitHub Codespaces.

Accessing the data

To provide a model with data we needed, well, data. For this we reached out to the University of Florida’s SHPE chapter, one of the largest chapters within the organization. Members of this chapter had compiled a comprehensive list of the companies participating in the conference, including detailed descriptions, career information, tips on applying and more; designed to help their members navigate and gain insights about each company. We explained our plan of using this data to power up an application for a talk, and the chapter was kind enough to share it with us.

The next step was to prepare the data, which initially existed in PDF (Portable Document Format) format, making it necessary to convert it into a format that could be easily accessed by the model and hosted in an Azure Cognitive Search index. For this, we parsed the document into plain text and generated embeddings using OpenAI’s text-embedding-ada-002 model, which we access through Azure. Once the embeddings were created, we stored them in an Azure Storage Container as a JSON file, from which we then created an Azure Cognitive Search index using the vectorized data. You can follow the steps of creating the storage and index here. With the index created, we can now direct the model to use it as the data source to generate the proper completion.

Interacting with the AI Model

Typically, Azure-hosted models like GPT 3.5-turbo (the model used in our application) can be accessed through a REST API, facilitating smooth communication between the application and the models in Azure. However, when it comes to OpenAI models and web development, we have the advantage of utilizing the Azure OpenAI client library for JavaScript. This library enables us to incorporate SDK-like components into our backend, simplifying the process of establishing communication between the user and the model.

The way this process works is quite straightforward: the user enters a prompt via the user interface, and the backend handles and transmits this prompt to the Azure OpenAI Service using one of the methods provided by the library. The method then generates a response, which we subsequently send back to the front-end and display to the user.

Implementing Functions

Considering that our application has access to extremely specific data, we do not want to trigger a search for every interaction with the model, as not every prompt may be related to the data in question. As such, we need to incorporate a few additional steps to circumvent this issue. The first and crucial step is to implement OpenAI Functions. This functionality enables the model to identify and execute a function based on the user’s prompt. In our scenario, any prompt related to conference data would trigger the execution of a function we previously defined. By defining said function to be invoked by the model for conference-related prompts, we can instruct the model to obtain a new response by first exploring the search index and incorporating the relevant data.

query_companies function. Uses the OpenAI library to get a completion from the model, with the cognitive search extension enabled for the prompt.

We utilize Functions to avoid triggering an index search for every new prompt. Instead, we limit the search to only those prompts where access to data is essential.

The query_companies function definition as part of the main requests. It instructs the model to call the function ONLY when there is a prompt related to company data.

Once we have our responses, we simply send them back to the front-end, and display them in the chat-like interface of the application.

Introducing multi-lingual speech

With the thought that our audience would primarily consist of members of the Hispanic and Latino/a/x community, we aimed to create an inclusive application that would resonate with our own Latino heritage. As a result, we made the choice to leverage Azure AI Speech’s ability to detect languages beyond English, specifically Spanish and Portuguese. This allowed us to showcase the chatbot’s responses in all three languages, providing a more inclusive approach to an assistant and showcasing some of the newest voices available in the AI Speech service.

Similarly to Azure OpenAI Service, Azure AI Speech provides an easy-to-use SDK for JavaScript. Through the SDK, we detect live microphone audio, recognize speech and synthesize speech from text. David had the brilliant idea of adding an interface option for language, which allowed us to prepare the AI Speech service to detect the appropriate language, as well as changing to the necessary synthetic voice.

Demonstration of Spanish and English language support (I can’t speak Portuguese so bear with me!)

Presenting our application and attendee impressions

David and I showcasing the application during our talk “The Future of AI” at the 2023 SHPE National Convention in Salt Lake City, Utah.

We moved on to prepare a presentation with the application at its core. We covered some of the history of AI and its development and showcased a simpler version of the application, which did not include voice or data support, as a means of showing where most people think AI was at the time. We then decided to discuss the future, and for this, we had the idea of showcasing how the future is already happening, as new developments occur at an exponential rate. This is where our fully-fledged application came to shine.

David brought up the application on his screen and revealed that we actually had two versions. The audience had only seen a very basic chatbot. He then decided to speak into the microphone rather than typing and got a perfect transcription of his prompt and an equally accurate live reading of the model’s response. Moreover, we proceeded to ask questions regarding the conference sponsors, the companies attending, what these companies were hiring for, and even how to best prepare a resume for one of said companies! All this thanks to the RAG (retrieval-augmented generation) architecture and the voice support we implemented.

However, we wanted to take it further. I called for a Spanish-speaking member of the audience, expecting two or three hands. It ended up being about 85% of the room (which, in retrospect, I should have expected given where I was) and picked one person to come on stage and ask a question to the assistant bot in Spanish. Another perfect transcription, regardless of accent, and another live read, this time with a different voice in Spanish. This is where the crowd went wild. Many of the attendees had not seen such representation of their native language in a talk and were excited to see it happening live, with many sharing this impression with us after the talk was over. Lastly, we invited another audience member to ask a question, but this time in Portuguese. And in their words (as our Portuguese is extremely basic and we could not 100% confirm the results being accurate), it worked perfectly as well.

We closed the talk with a resounding applause, feeling deeply humbled by the experience. Throughout the rest of the conference, attendees approached both me and David with questions and a genuine interest in our work, the technology we used, and our life stories. Many of them were students eager to learn how to reach a similar position. We received numerous comments about the importance of representation and the excitement of seeing individuals like us presenting on a topic they aspired to work on, and I personally cannot wait to go back and bring more talks around AI to this wonderful community.

Closing thoughts

As much as this project was a learning experience for me, I believe it can also be a learning experience for you, the reader. There are many areas to cover, and while I would like to believe that my writing may be sufficient, I truly believe that firsthand testing is the best way to learn in technology. Therefore, I highly encourage you to check the repository of the application, fork it, and try running it or playing with it on your own. This will provide you with a deeper understanding of the services I mentioned and most likely lead you to discover numerous ways to improve our application. The repository can be found here: https://github.com/gcordido/shpe-demo.

--

--

Gustavo Cordido
Microsoft Azure

Cloud Advocate in Artificial Intelligence @ Microsoft. Venezuelan