Improving the performance of an AI-powered chatbot by 1900%

Spren is a bleeding-edge fitness tech startup from North Carolina backed by investors such as Boston Seed Capital and Drive by DraftKings. Their platform uses computer vision and machine learning algorithms to turn users’ smartphone cameras into powerful body composition scanners. Measuring body fat percentage, lean mass, and other parameters can provide users with deep insights into their body fat composition, lean mass, and cardio-metabolic health. And, what’s most important, turn that data into actionable insights that can help users optimize their health, performance, and longevity.

Vivek

Co-Founder

Spren

Filip

AI Tech Leader

/ Agency /

United States

Copilot

Pinecone

Langchain

GPT-4

GPT-3.5

Engage smarter, understand better, perform faster

Spren saw potential in utilizing generative AI to provide users with recommendations and answers to their questions and increase user engagement. They had an idea to integrate the GPT model into their chat. The model would then combine user context with an industry-specific knowledge base to provide users with reliable answers and recommendations.

The goal was to boost user engagement, measured by time spent in the app and frequency of use, by using hyper-personalized recommendations. To make performance data more accessible, Spren planned to present it conversationally through a virtual assistant, helping users understand metrics and receive tailored recommendations. Additionally, Spren sought to reduce the model's response time from 40 seconds to minimize user friction and frustration.

Optimizing chatbot engagement with fast and reliable data

The project faced three main challenges: long response times, unstructured data, and GPT hallucinations. Initially, the chatbot required approximately 40 seconds to process each user request via the GPT API, risking user disengagement. The knowledge base comprised articles and video transcripts of varying lengths, styles, and tones, complicating consistent and reliable information retrieval. Additionally, ensuring the accuracy of health-related answers was crucial, necessitating a strategy to minimize the model’s hallucinations.

Smart Solution with a tailored AI model

We employed a dual-model strategy and vector embeddings to enhance performance and cost efficiency. For large-context requests and text generation, we used GPT-4, while GPT-3.5 handled simpler tasks like user context extraction. We converted all inputs, including user queries, context, and the knowledge base, into vectors stored in JSON files and a Pinecone vector database. This approach improved performance and enabled the use of relevance scores for greater accuracy.

Filtering information for an accurate and cost-effective AI solution

To match the right piece of information from the knowledge base with the user request, we used the relevance score, indicating their content's similarity. Everything below a certain level of a predefined threshold was automatically filtered out and was not sent to GPT API. That way, we could improve the accuracy of the answers and reduce the cost of processing a request. The framework we used to connect all the elements (GPT, Pinecone vector database, search, etc.) was LangChain. This simplified the architecture and allowed for a faster implementation.