Empowering Codebase Understanding with ChatGPT

Harnessing the Power of ChatGPT

For those already acquainted with the capabilities of ChatGPT, you're likely aware of its amazing ability to understand and process large volumes of information at an impressive speed. However, its application isn't just limited to providing quick responses to the queries; it has the potential to untangle the complex web of your codebase as well.

Imagine a scenario where you're maintaining a multi-microservice system with a mix of communication protocols like gRPC or Thrift, and you've been bombarded with a set of complex questions like "How do these services operate in conjunction?" or "Why isn't a particular piece of information in the response?" These are especially challenging when you're not entirely familiar with the system. I encountered such a scenario, and it struck me as one of the most difficult tasks, even though I interacted with the codebase every day. So, I decided to experiment with ChatGPT to see if it could make sense of this complexity, and I was pleasantly surprised. Here is my experience:

A Personal Case Study

In my case, I was working with a sizable system comprising over 15 APIs, out of which I hadn't ever looked into the code of more than half. However, the functionality of these APIs was integral to user requests. My idea was to obtain a trace or log that detailed the request and response payload for every API call instigated by a single user request. It is a relatively easy feat if your systems are designed around the microservice architecture.

Consider a scenario where a user loads a feed, like Instagram or Twitter. The request might pass through a load balancer, a presentation layer service, a mixer segregating posts from different sources, a scoring service, an ads service, and perhaps many more. Each of these subcomponents records a request/response to a trace, like Jaeger, or logs it under the same request_id.

Leveraging ChatGPT for API-Level Insights

The idea was simple: use ChatGPT to examine these traces and respond to any API-level questions. This task isn't straightforward, considering the potential enormity of the trace. If each API contained many fields, the span could easily surpass ChatGPT's token window size.

Here are the two strategies I experimented with:

Naive solution: It involved sending the whole log / trace along with the question to ChatGPT for an answer. This would probably work if the system is small or the log has already been heavily pruned. However, the more information you send to GPT, the more likely it can give you a comprehensive answer.

Ensemble solution: The challenge lay in dealing with the potential enormity of the trace. If each API contained many fields, the span could easily surpass ChatGPT's token window size. The solution was an ensemble model. Here's a breakdown of my approach:

Illustration of how an ensemble model of GPT agents collaborate to answer user questions

An orchestrator GPT agent: This agent was responsible for dividing the user question into individual queries for each expert agent, given the API's dependency graph. The prompt engineering is simple, something in the line of "Given the {dependency graph}, please direct your questions to {API} to answer {user question}". ChatCompletion request is executed once for each API across all the APIs in the dependency graph. Orchestrator will help figure out what information each API can provide in order to form the full picture. This is important for question such as "Explain to me how user request work at a high level". In other cases, we don't need to engage every API to answer a specific question that targets to that API alone.
Expert GPT agents for each API: Each expert agent was responsible for scrutinizing the request/response payload for its respective API, and responding to the orchestrator GPT agent's question. The prompt used here is pretty much relay the question from orchestrator with the specific request / response log for that API. ChatGPT has good zero-shot comprehension of what information is added by a service based on the request / response schema (even if the actual values are obfuscated). This really mimic someone with a lot of software development experience and a basic understanding of the business domain.
An aggregator GPT agent: This agent was responsible for collecting responses from all expert GPT agents to form a comprehensive conclusion to the user's query. Aggregator's job is to see the forest from individual feedback from API experts. It pieces together the information to formulate a final congruent answer to user's question.

This triage process required some prompt engineering to ensure each agent was fine-tuned for its specific purpose. With just a few iterations, it was operational, proving it could easily be adapted to unique use cases.

Insights and Possibilities

The experiment was a success, and the 3-tiered model enabled ChatGPT to thoroughly inspect even extensive traces of logs and deliver holistic answers to user queries. The system could provide a high-level overview of the system's operation, define each service's function, identify the information provided by each API, and suggest possible causes when specific data was missing from responses.

There were limitations, however. For instance, it doesn't reveal the logic within an API, as the trace only contains API-level logs. Extending this architecture to function-level may offer interesting insights.

Final Thoughts

My experiment with gpt-4-32k was a revelation. I was astounded by the depth of understanding and sophistication in responding to my queries. It's clear that ChatGPT's knowledge and response is as good as the information it's fed, and this experiment shows that there's a promising future in using ChatGPT in handling complex tasks.

I am eager to explore the limitless possibilities of using AI-powered tools for codebase understanding. I'd love to hear your thoughts on this approach, potential improvements, or alternative ideas. Let's harness the power of AI and embark on this exciting journey together!