# Breakthrough in Artificial Intelligence Solves Game Theory Problem

Italian CMU and Artificial Intelligence (AI) researchers use correlated equilibrium for new algorithm.

In a recent breakthrough in artificial intelligence (AI) machine learning in game theory, researchers are solving a persistent problem with real-world applicability, for the better. At last month’s 34th Neural Information Processing Systems Conference (NeurIPS 2020), researchers from Politecnico di Milano and Carnegie Mellon University presented a new AI algorithm that allows for a more nuanced approach to solve game theory problems, a solution that has a potential impact on the real world. in economics, industry, policy making and science.

Game theory is the science of strategy that was pioneered by Princeton mathematician John Von Neumann (1903-1957) with the publication in 1928 of his theory of board games entitled “Zur Theorie der Gesellschaftsspiele”. It is a mathematical approach to model behavior.

Mention of “game theory” and Nash’s concept of equilibrium, one of the most important decision-making theorems in game theory, may occur to economists, mathematicians, scientists, scholars, scientists, scholars. computer programmers, CEOs, policymakers, entrepreneurs, financial analysts and the techie. American mathematician John Forbes Nath Jr. (1928-2015) is the winner of the Abel Prize, the Commemorative Nobel Prize in Economics and the John von Neumann Theoretical Prize among many other awards and honors.

Nash’s Equilibrium puts forward the concept that the optimal outcome of a game is when there is no incentive for players to deviate from their original strategy after considering an opponent’s choice – that an equilibrium exists if there is a finite number of players and movements.

In data science, the Nash equilibrium supposes an entirely decentralized interaction between the actors, it is therefore a distribution over the uncorrelated strategic space.

To illustrate this concept, consider a hypothetical game where two children have the choice between two strategies, “get-cookie” or “lose-cookie”. Choose the get-cookie strategy and receive a delicious chocolate chip cookie. Choose the lose-cookie strategy and forgo a delicious treat. It logically follows that the two children chase to receive a chocolate chip cookie and choose the get-cookie strategy. Even if one of the children revealed his or her strategy to the other, it does not affect the behavior of the other child. In other words, there is no incentive for children to deviate from their original strategy of obtaining cookies – just ask any parent of soft-toothed children.

Standard game theory assumes homo economicus (the economic man), a view that people act as rational, self-interested agents who seek results that maximize utility. This view is not popular among sociologists, psychologists, anthropologists, and behavioral economists, as many factors can affect decisions, such as imperfect information, changes in preference, individual social preferences. (fairness, altruism, reciprocity, etc.) and other reasons.

A common game theory that illustrates Nash’s Balance is the Prisoner’s Dilemma, where prosecutors with no evidence to convict, offer two prisoners with imperfect information (each being held in separate solitary confinement with no possibility of communicate with the other), the choice between testifying against the other, or remaining silent. If neither confesses, both will serve one year in prison. If one betrays the other and the other remains silent, the traitor is released completely while the other serves 10 years in prison. If the two betray each other, each serves five years in prison. Nash’s balance in the prisoners’ dilemma is that the two prisoners betray each other.

In complex real-world situations where everything is not so arbitrarily black and white, it is not uncommon to find gaps in Nash’s equilibrium. In these scenarios, the concept of correlated equilibrium (EC) proposed in 1974 by the mathematician Robert J. Aumann, winner of the Sveriges Riksbank Prize in economics in memory of Alfred Nobel 2005, offers a more general and flexible approach.

“A correlated strategy is a general distribution over joint action profiles and it is typically modeled via a trusted external mediator who derives an action profile from that distribution and privately recommends their component to each player,” wrote the The research team of Andrea Celli, Alberto Marchesi, Gabriele Farina and Nicola Gatti.

The researchers cite many potential weaknesses in the concept of Nash equilibrium where the correlated equilibrium can attenuate. Continuing the previous example, imagine the children’s mother, a trusted external mediator, providing a recommendation to each child privately during play.

“Most of the work in the multi-agent reinforcement learning community studies either fully competitive environments, where agents selfishly play to achieve Nash equilibrium, or fully cooperative scenarios in which agents have exactly the same goals.” , the researchers explained in the study. “Our work could enable techniques that fall between these two extremes: agents have arbitrary goals, but coordinate their actions towards equilibrium with certain desired properties. ”

Extensive form correlated equilibrium (EFCE) broadens Aumann’s strategic form correlated equilibrium; player recommendations are gradually revealed to a player when they reach new possible sets of information, decision points, at the time the move can be executed. If the player deviates from the recommended action at any decision point, the player will not receive recommendations in the future.

To model the correlated equilibrium in an extensive form, the researchers created an algorithm called Internal Counterfactual Regret Minimization (ICFR) that minimizes the regrets of the triggering agents by breaking down the regrets to each set of information. In other words, the algorithm is a way to minimize the regrets of the laminar subtree.

The researcher assessed the convergence of their algorithm using standard benchmarks from four multiplayer games, Kuhn poker, Leduc poker, Goofspiel and Battleship.

“We show that it is possible to orchestrate the learning procedure so that, for each set of information, the use of a regret minimizer per turn does not compromise the overall convergence of the algorithm”, have writes the researchers.

Researchers from Politecnico di Milano and Carnegie Mellon University were among the winners of the prestigious NeurIPS 2020 Best Paper Awards.

“This article shows the existence of such regret minimization algorithms which converge to CE in a much larger class of games: namely extensive (or tree) form games,” wrote the Presidents of the NeurIPS 2020 program in the conference blog. “This result resolves a long-standing problem at the interface of game theory, computing and economics and may have a substantial impact on games that involve a mediator, for example, on routing. efficient traffic through navigation applications. ”

“We have provided empirical evidence that the ICFR calculates balances that achieve ‘not too far’ from optimal social welfare,” the researchers reported. “It could have a positive societal impact when applied to real economic issues. “

Copyright © 2021 Cami Rosso. All rights reserved.