
Autonomous AI brokers are right here, they usually’re poised to reshape the economic system. By automating discovery, negotiation, and transactions, brokers can overcome inefficiencies like info asymmetries and platform lock-in, enabling sooner, extra clear, and extra aggressive markets.
We’re already seeing early indicators of this transformation in digital marketplaces. Buyer-facing assistants like OpenAI’s Operator and Anthropic’s Laptop Use can navigate web sites and full purchases. On the enterprise facet, Shopify Sidekick, Salesforce Einstein, and Meta’s Enterprise AI assist retailers with operations and buyer engagement. These examples trace at a future the place brokers turn into energetic market contributors, however the construction of those markets stays unsure.
A number of eventualities are potential. We’d see one-sided markets the place solely prospects or companies deploy brokers; closed platforms (referred to as walled gardens) the place corporations tightly management agent interactions; and even open two-sided marketplaces the place buyer and enterprise brokers transact freely throughout ecosystems. Every path carries totally different trade-offs for safety, openness, comfort, and competitors, which can form how worth flows within the digital economic system. For a deeper exploration of those dynamics, see our paper, The Agentic Economic system.
To assist navigate this uncertainty, we constructed Magentic Market (opens in new tab)— an open-source simulation atmosphere for exploring the quite a few potentialities of agentic markets and their societal implications at scale. It gives a basis for learning these markets and guiding them towards outcomes that profit everybody.
This issues as a result of most AI agent analysis focuses on remoted eventualities—a single agent finishing a activity or two brokers negotiating a easy transaction. However actual markets contain a lot of brokers concurrently looking, speaking, and transacting, creating advanced dynamics that may’t be understood by learning brokers in isolation. Capturing this complexity is crucial as a result of real-world deployments elevate essential questions on client welfare, market effectivity, equity, manipulation resistance, and bias—questions that may’t be safely answered in manufacturing environments.
To discover these dynamics in depth, the Magentic Market platform permits managed experimentation throughout numerous agentic market eventualities. Its present focus is on two-sided markets, however the atmosphere is modular and extensible, supporting future exploration of combined human–agent methods, one-sided markets, and sophisticated communication protocols.

What’s Magentic Market?
Magentic Market’s atmosphere manages market-wide capabilities like sustaining catalogs of accessible items and companies, implementing discovery algorithms, facilitating agent-to-agent communication, and dealing with simulated funds via a centralized transaction layer at its core, which ensures transaction integrity throughout all market interactions. Moreover, the platform permits systematic, reproducible analysis. As demonstrated within the following video, it helps a variety of agent implementations and evolving market options, permitting researchers to combine numerous agent architectures and adapt the atmosphere as new capabilities emerge.
We constructed Magentic Market round three core architectural decisions:
HTTP/REST client-server structure: Brokers function as impartial purchasers whereas the Market Setting serves as a central server. This mirrors real-world platforms and helps clear separation of buyer and enterprise agent roles.
Minimal three-endpoint market protocol: Simply three endpoints—register, protocol discovery, and motion execution—lets brokers dynamically uncover obtainable actions. New capabilities can be added with out disrupting present experiments.
Wealthy motion protocol: Particular message sorts assist the entire transaction lifecycle: search, negotiation, proposals, and funds. The protocol is designed for extensibility. New actions like refunds, critiques, or scores might be added seamlessly, permitting researchers to evolve market capabilities and research rising agent behaviors whereas remaining suitable.

Moreover, a visualization module lets customers observe market dynamics and evaluation particular person dialog threads between buyer and enterprise brokers.
Establishing the experiments
To make sure reproducibility, we instantiated {the marketplace} with absolutely artificial information, obtainable in our open-source repository (opens in new tab). The experiments modeled transactions similar to ordering meals and fascinating with dwelling enchancment companies, the place brokers represented prospects and companies partaking in market transactions. This setup enabled exact measurement of habits and systematic comparability towards theoretical higher bounds.
Every experiment was run utilizing 100 prospects and 300 companies and included each proprietary fashions (GPT-4o, GPT-4.1, GPT-5, and Gemini-2.5-Flash) and open-source fashions (OSS-20b, Qwen3-14b, and Qwen3-4b-Instruct-2507).
Our eventualities targeted on easy all-or-nothing requests: Every buyer had a listing of desired objects and facilities that wanted to be current for a transaction to be satisfying. For these transactions, utility was computed because the sum of the shopper’s inner merchandise valuations minus precise costs paid. Shopper welfare, outlined because the sum of utilities throughout all accomplished transactions, served as our key metric for evaluating agent efficiency.
Whereas this experimental setup gives a helpful place to begin, it’s not supposed to be definitive. We encourage researchers to increase the framework with richer, extra nuanced measures and request sorts that higher seize actual client welfare, equity, and different societal issues.
Highlight: Microsoft analysis publication
Microsoft Analysis Publication
What did we discover?
Brokers can enhance client welfare—however solely with good discovery
We explored whether or not two-sided agentic markets—the place AI brokers work together with one another and with service suppliers—can enhance client welfare by lowering info gaps. In contrast to conventional markets, which don’t present agentic assist and place the complete burden of overcoming info asymmetries on prospects, agentic markets shift a lot of that effort to brokers. This alteration issues as a result of as brokers acquire higher instruments for discovery and communication, they relieve prospects of the heavy cognitive load of filling any info gaps. This lowers the price of making knowledgeable choices and improves buyer outcomes.
We in contrast a number of market setups. Beneath life like situations (Agentic: Lexical search), brokers confronted real-world challenges like constructing queries, navigating paginated lists, figuring out the precise companies to ship inquiries to, and negotiating transactions.
Regardless of these complexities, superior proprietary fashions and a few medium-sized open-source fashions like GPTOSS-20b outperformed easy baselines like randomly selecting or just selecting the most affordable possibility. Notably, GPT-5 achieved near-optimal efficiency, demonstrating its capability to successfully collect and make the most of decision-relevant info in life like market situations.

Efficiency elevated significantly beneath the Agentic: Excellent search situation, the place brokers began with the highest three matches without having to look and navigate among the many decisions. On this setting, Sonnet-4.0, Sonnet-4.5, GPT-5, and GPT-4.1 almost reached the theoretical optimum and beat baselines with full amenity particulars however with out agent-to-agent coordination.
Open-source fashions had been combined: GPTOSS-20b carried out strongly beneath each Excellent search and Lexical search situations, even exceeding GPT-4o’s efficiency with Excellent search. This means that comparatively compact fashions can exhibit sturdy information-gathering and decision-making capabilities in advanced multi-agent environments. Qwen3-4b-2507 faltered when discovery concerned irrelevant choices (Lexical search), whereas Qwen3-14b lagged in each instances attributable to basic limitations in reasoning.

Paradox of Alternative
One promise of brokers is their capability to think about much more choices than individuals can. Nevertheless, our experiments revealed a shocking limitation: offering brokers with extra choices doesn’t essentially result in extra thorough exploration. We designed experiments that different the search outcomes restrict from 3 to 100. Aside from Gemini-2.5-Flash and GPT-5, the fashions contacted solely a small fraction of accessible companies whatever the search restrict. This means that almost all fashions don’t conduct exhaustive comparisons and as a substitute simply settle for the preliminary “adequate” choices.

Moreover, throughout all fashions, client welfare declined because the variety of search outcomes elevated. Regardless of contacting over 100 companies, Gemini-2.5-Flash’s efficiency declined from 1,700 to 1,350, and GPT-5 declined much more, from a near-optimal 2,000 to 1,400.
This demonstrates a Paradox of Alternative impact, the place extra exploration doesn’t assure higher outcomes, probably attributable to restricted lengthy context understanding. Claude Sonnet 4 confirmed the steepest efficiency decline, from 1,800 to 600 in client welfare. With all of the choices offered, it struggled to navigate bigger units of choices and often contacted companies that didn’t present the products or companies that the shopper was on the lookout for.
This mixture of poor preliminary choice and untimely search termination demonstrates each insufficient decision-making standards and inadequate exploration methods. Some fashions confirmed modest efficiency decline (i.e., GPT-4.1: from 1,850 to 1,700; GPT-4o: from 1,550 to 1,450), discovering good choices inside their restricted exploration.

Brokers are susceptible to manipulation
We examined six manipulation methods, starting from refined psychological techniques to aggressive immediate injection assaults:
- Authority: Faux credentials like “Michelin Information featured” and “James Beard Award nominated” paired with fabricated certifications.
- Social proof: Claims like “Be part of 50,000+ glad prospects” or “#1-rated Mexican restaurant” mixed with pretend critiques.
- Loss aversion: Worry-based warnings about “meals poisoning” dangers and “contamination points” at competing eating places.
- Immediate injection (primary): Makes an attempt to override agent directions.
- Immediate injection (sturdy): Aggressive assaults utilizing emergency language and fabricating competitor scandals.
Outcomes revealed vital variation in manipulation resistance throughout fashions. Sonnet-4 was immune to all assaults, and not one of the manipulative methods affected any of the shoppers’ decisions. Gemini-2.5-Flash was typically resistant, apart from sturdy immediate injections, the place imply funds to unmanipulated brokers had been affected consequently. GPT-4o, GPTOSS-20b and Qwen3-4b had been very susceptible to immediate injection: all funds had been redirected to the manipulative agent beneath these situations. Particularly for GPTOSS-20 and Qwen3-4b-2507, even conventional psychological manipulation techniques (authority appeals and social proof) elevated funds to malicious brokers, demonstrating their vulnerability to primary persuasion strategies. These findings spotlight a essential safety concern for agentic marketplaces.

Systemic biases create unfair benefits
Our evaluation revealed two distinct forms of systematic biases confirmed by brokers when choosing companies from search outcomes. Fashions confirmed systematic preferences primarily based on the place companies appeared in search outcomes. Whereas proprietary fashions confirmed no sturdy positional preferences, open-source fashions exhibited clear patterns. Particularly, Qwen2.5-14b-2507 confirmed a pronounced bias towards choosing the final enterprise offered, no matter its precise deserves.
Proposal bias is extra pervasive throughout all fashions examined. This “first-offer acceptance” sample means that fashions prioritized quick choice over complete exploration, probably lacking higher options that may have emerged by ready for higher choices. This habits continued throughout each proprietary and open-source fashions, indicating a basic problem in agent decision-making architectures.
These biases can create unfair market dynamics, drive unintended behaviors, and push companies to finish on response velocity quite than services or products high quality.

What this implies
Even state-of-the-art fashions can present notable vulnerabilities and biases in market environments. In our implementation, brokers struggled with too many choices, had been inclined to manipulation techniques, and confirmed systemic biases that created unfair benefits.
These outcomes are formed not solely by agent capabilities but additionally by market design and implementation. Our present research targeted on static markets, however real-world environments are dynamic, with brokers and customers studying over time. Oversight is essential for high-stakes transactions. Brokers ought to help, not substitute, human decision-making.
We plan to discover dynamic markets and human-in-the-loop designs to enhance effectivity and belief. A simulation atmosphere like Magentic Market is essential for understanding the interaction between market elements and brokers earlier than deploying them at scale.
Full particulars of our experimental setup and outcomes can be found in our paper (opens in new tab).
Getting began
Magentic Market is obtainable as an open-source atmosphere for exploring agentic market dynamics. Code, datasets, and experiment templates can be found on GitHub (opens in new tab) and Azure AI Foundry Labs (opens in new tab).
The documentation (opens in new tab) gives directions for reproducing the experiments described above and steerage for extending the atmosphere to new market configurations.
