How game theory makes AI smarter


MIT CSAIL researchers developed a “consensus game” to improve AI understanding and generation of text by treating the process as a game in which one party generates sentences and another evaluates them. This method, called balance ranking, significantly improves AI performance in tasks such as reading comprehension, math problem solving, and dialogue. Credit: Issues.fr.com

MIT CSAIL researchers have developed a new “consensus game” that improves AI text comprehension and generation skills.

MIT’s “consensus game” improves AI text generation using game theory. This method, equilibrium ranking, improves AI performance and reliability, but faces computational challenges. This could significantly advance the decoding of language models.

AI consensus game: a new approach to language models

Imagine playing a game with a friend in which your goal is to communicate secret messages to each other using only cryptic phrases. Your friend’s job is to guess the secret message behind your sentences. Sometimes you give clues directly, and other times your friend has to guess the message by asking yes or no questions about the clues you gave. The challenge is that you both want to make sure you understand each other and agree on the secret message.

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) created a similar “game” to help improve how AI understands and generates text. It is known as a “consensus game” and involves two parts of an AI system: one part tries to generate sentences (like giving clues), and the other part tries to understand and evaluate these sentences (like guessing the secret message).

MIT Consensus Game

The MIT researchers’ “consensus game” is a game-theoretic approach to decoding language patterns. The equilibrium ranking algorithm harmonizes generative and discriminative queries to improve prediction accuracy on various tasks, outperforming larger models and demonstrating the potential of game theory to improve the consistency and veracity of language models . Credit: Alex Shipps/MIT CSAIL

Game-theoretic approach to AI

The researchers found that by treating this interaction as a game, in which the two parts of the AI ​​work together according to specific rules to agree on the right message, they could significantly improve the AI’s ability to give correct and consistent answers to questions. They tested this new, gamified approach on a variety of tasks, such as reading comprehension, solving math problems, and leading conversations, and found that it helped the AI ​​perform better across the board.

Traditionally, large language models respond in one of two ways: generating answers directly from the model (generative query) or using the model to evaluate a set of predefined answers (discriminant query), which can lead to different results and sometimes incompatible. With the generative approach “Who is the President of the United States?” ” could give a simple answer like “Joe Biden”. However, a discriminatory query could wrongly challenge this fact by evaluating the same answer, such as “Barack Obama.”

Balance AI responses with balance ranking

So how do we reconcile mutually incompatible scoring procedures to achieve consistent and effective predictions?

“Imagine a new way to help language models understand and generate text, like in a game. We developed a training-free game theory method that treats the entire process as a complex game of cues and signals, in which a generator attempts to send the correct message to a discriminator using natural language. Instead of chess pieces, they use words and phrases,” says Athul Jacob, an MIT doctoral student in electrical and computer engineering and CSAIL affiliate. “The way we navigate this game is to find the ‘rough equilibria,’ which leads to a new decoding algorithm called ‘equilibrium ranking.’ This is a pretty exciting demonstration of how integrating game theory strategies can address some major challenges in making language models more reliable and consistent.

When tested on numerous tasks, such as reading comprehension, common sense reasoning, math problem solving, and dialogue, the team’s algorithm consistently improved the performance of these models. Using the ER algorithm with the LLaMA-7B model even outperformed the results of much larger models. “Given that they are already competitive, people have been working on them for a while, but the level of improvements we saw, being able to outperform a model 10 times larger, was a pleasant surprise,” says Jacob .

Game enabled

“Diplomacy,” a strategic board game set in pre-World War I Europe, in which players negotiate alliances, betray friends, and conquer territories without using dice – relying solely on their skills, strategy and interpersonal manipulation – has recently experienced a second coming. . In November 2022, computer scientists including Jacob developed “Cicero”, an AI agent that achieves human-level abilities in the seven-player mixed-pattern game, which requires the same aforementioned skills, but with natural language . The math behind this partly inspired the consensus game.

Although the history of AI agents long predates OpenAI software’s entry into chat in November 2022, it is well established that they can still pose as well-meaning, but pathological, friends.

The consensus gaming system achieves equilibrium in the form of an agreement, guaranteeing precision and fidelity to the original ideas of the model. To achieve this, the method iteratively adjusts the interactions between the generative and discriminative components until they reach consensus on an answer that accurately reflects reality and aligns with their initial beliefs. This approach effectively bridges the gap between the two query methods.

Practical applications and challenges

In practice, implementing the consensus game approach for querying language models, especially for question answering tasks, involves significant computational challenges. For example, when using datasets like MMLU, which contain thousands of multiple choice questions and answers, the model must apply the mechanism to each query. Then, it must reach a consensus between the generative and discriminative components for each question and its possible answers.

The system struggled with a right of passage in elementary school: math word problems. This could not generate wrong answers, which is an essential part of understanding the process of finding the correct answer.

Future directions

“Recent years have seen some truly impressive progress, both in strategic decision-making and in generating languages ​​from AI systems, but we are only just beginning to understand how to combine the two. Equilibrium ranking is a first step in this direction, but I think there is a lot we can do to extend it to more complex problems,” says Jacob.

One avenue for future work consists of improving the basic model by integrating the results of the current method. This is particularly promising because it may produce more factual and consistent responses in various tasks, including factuality and open-ended generation. The potential for such a method to significantly improve the performance of the base model is high, which could result in more reliable and evidence-based results from ChatGPT and similar language models that people use every day.

Expert Opinions on AI Advances

“Even though modern language models, such as ChatGPT and Gemini, have made it possible to solve various tasks through chat interfaces, the statistical decoding process that generates a response from such models has remained unchanged for decades,” explains Ahmad Beirami, researcher at Google. not involved in the work. “The MIT researchers’ proposal is an innovative game theory framework for decoding language models by solving the equilibrium of a consensus game. The significant performance gains reported in the research paper are promising, opening the door to a potential paradigm shift in decoding language models that could fuel a wave of new applications.

Jacob wrote the paper with MIT-IBM Watson Lab researcher Yikang Shen and MIT Department of Electrical Engineering and Computer Science assistant professors Gabriele Farina and Jacob Andreas, who is also a member of CSAIL. They presented their work at the International Conference on Representations of Learning (ICLR) earlier this month, where it was presented as a “flagship paper”. The research also received the “Best Paper” award at the NeurIPS R0-FoMo workshop in December 2023.

Leave a Comment

Your email address will not be published. Required fields are marked *

Shopping Cart
Scroll to Top