Magentic Market: an open-source simulation atmosphere for learning agentic markets

November 6, 2025

5

Three white icons on a blue-to-purple gradient background: the first icon shows a node cluster, the second shows two persons, the third is a building, and the fourth is a location pin

Autonomous AI brokers are right here, they usually’re poised to reshape the economic system. By automating discovery, negotiation, and transactions, brokers can overcome inefficiencies like info asymmetries and platform lock-in, enabling sooner, extra clear, and extra aggressive markets.

We’re already seeing early indicators of this transformation in digital marketplaces. Buyer-facing assistants like OpenAI’s Operator and Anthropic’s Laptop Use can navigate web sites and full purchases. On the enterprise facet, Shopify Sidekick, Salesforce Einstein, and Meta’s Enterprise AI assist retailers with operations and buyer engagement. These examples trace at a future the place brokers turn into energetic market contributors, however the construction of those markets stays unsure.

A number of eventualities are potential. We’d see one-sided markets the place solely prospects or companies deploy brokers; closed platforms (referred to as walled gardens) the place corporations tightly management agent interactions; and even open two-sided marketplaces the place buyer and enterprise brokers transact freely throughout ecosystems. Every path carries totally different trade-offs for safety, openness, comfort, and competitors, which can form how worth flows within the digital economic system. For a deeper exploration of those dynamics, see our paper, The Agentic Economic system.

To assist navigate this uncertainty, we constructed Magentic Market (opens in new tab)— an open-source simulation atmosphere for exploring the quite a few potentialities of agentic markets and their societal implications at scale. It gives a basis for learning these markets and guiding them towards outcomes that profit everybody.

This issues as a result of most AI agent analysis focuses on remoted eventualities—a single agent finishing a activity or two brokers negotiating a easy transaction. However actual markets contain a lot of brokers concurrently looking, speaking, and transacting, creating advanced dynamics that may’t be understood by learning brokers in isolation. Capturing this complexity is crucial as a result of real-world deployments elevate essential questions on client welfare, market effectivity, equity, manipulation resistance, and bias—questions that may’t be safely answered in manufacturing environments.

To discover these dynamics in depth, the Magentic Market platform permits managed experimentation throughout numerous agentic market eventualities. Its present focus is on two-sided markets, however the atmosphere is modular and extensible, supporting future exploration of combined human–agent methods, one-sided markets, and sophisticated communication protocols.

Figure 1. Diagram illustrating the Magentic Marketplace Environment. On the left, two sections represent Customers and Businesses. Customers ask, “Could you find me a restaurant serving agua fresca and empanadas with free parking?” and are linked to Customer Agents (blue and purple icons). Businesses display a menu with items like steak tacos and empanadas, connected to Business Agents (purple icons). On the right, a three-step process is shown inside a pink box: Search – Customer agent searches for a restaurant among multiple business agents. Multi-Agent Communication – Customer agent asks about free parking and menu options, interacting with several business agents. Final Transaction – Customer agent places the order with a selected business agent. — Determine 1. With Magentic Market, researchers can mannequin how brokers representing prospects and companies work together—shedding mild on the dynamics that would form future digital markets.

What’s Magentic Market?

Magentic Market’s atmosphere manages market-wide capabilities like sustaining catalogs of accessible items and companies, implementing discovery algorithms, facilitating agent-to-agent communication, and dealing with simulated funds via a centralized transaction layer at its core, which ensures transaction integrity throughout all market interactions. Moreover, the platform permits systematic, reproducible analysis. As demonstrated within the following video, it helps a variety of agent implementations and evolving market options, permitting researchers to combine numerous agent architectures and adapt the atmosphere as new capabilities emerge.

We constructed Magentic Market round three core architectural decisions:

HTTP/REST client-server structure: Brokers function as impartial purchasers whereas the Market Setting serves as a central server. This mirrors real-world platforms and helps clear separation of buyer and enterprise agent roles.

Minimal three-endpoint market protocol: Simply three endpoints—register, protocol discovery, and motion execution—lets brokers dynamically uncover obtainable actions. New capabilities can be added with out disrupting present experiments.

Wealthy motion protocol: Particular message sorts assist the entire transaction lifecycle: search, negotiation, proposals, and funds. The protocol is designed for extensibility. New actions like refunds, critiques, or scores might be added seamlessly, permitting researchers to evolve market capabilities and research rising agent behaviors whereas remaining suitable.

Figure 2. Diagram of a Market Environment showing interactions between an Assistant Agent (representing user intention) and a Service Agent (representing point of sale). Both agents connect to the Market Environment via POST /register, POST /action, and GET /protocol. Inside the Market Environment, components include Catalog, Search, Communication, and Transaction, with two Action Routers facilitating sending and receiving actions between the agents and the environment. — Determine 2. Magentic Market consists of two agent sorts: Assistant Brokers (prospects) and Service Brokers (companies). Each work together with a central Market Setting through REST APIs for registration, service discovery, communication, and transaction execution. Motion Routers handle message stream and protocol requests, enabling autonomous negotiation and commerce in a two-sided market.

Moreover, a visualization module lets customers observe market dynamics and evaluation particular person dialog threads between buyer and enterprise brokers.

Establishing the experiments

To make sure reproducibility, we instantiated {the marketplace} with absolutely artificial information, obtainable in our open-source repository (opens in new tab). The experiments modeled transactions similar to ordering meals and fascinating with dwelling enchancment companies, the place brokers represented prospects and companies partaking in market transactions. This setup enabled exact measurement of habits and systematic comparability towards theoretical higher bounds.

Every experiment was run utilizing 100 prospects and 300 companies and included each proprietary fashions (GPT-4o, GPT-4.1, GPT-5, and Gemini-2.5-Flash) and open-source fashions (OSS-20b, Qwen3-14b, and Qwen3-4b-Instruct-2507).

Our eventualities targeted on easy all-or-nothing requests: Every buyer had a listing of desired objects and facilities that wanted to be current for a transaction to be satisfying. For these transactions, utility was computed because the sum of the shopper’s inner merchandise valuations minus precise costs paid. Shopper welfare, outlined because the sum of utilities throughout all accomplished transactions, served as our key metric for evaluating agent efficiency.

Whereas this experimental setup gives a helpful place to begin, it’s not supposed to be definitive. We encourage researchers to increase the framework with richer, extra nuanced measures and request sorts that higher seize actual client welfare, equity, and different societal issues.

What did we discover?

Brokers can enhance client welfare—however solely with good discovery

We explored whether or not two-sided agentic markets—the place AI brokers work together with one another and with service suppliers—can enhance client welfare by lowering info gaps. In contrast to conventional markets, which don’t present agentic assist and place the complete burden of overcoming info asymmetries on prospects, agentic markets shift a lot of that effort to brokers. This alteration issues as a result of as brokers acquire higher instruments for discovery and communication, they relieve prospects of the heavy cognitive load of filling any info gaps. This lowers the price of making knowledgeable choices and improves buyer outcomes.

We in contrast a number of market setups. Beneath life like situations (Agentic: Lexical search), brokers confronted real-world challenges like constructing queries, navigating paginated lists, figuring out the precise companies to ship inquiries to, and negotiating transactions.

Regardless of these complexities, superior proprietary fashions and a few medium-sized open-source fashions like GPTOSS-20b outperformed easy baselines like randomly selecting or just selecting the most affordable possibility. Notably, GPT-5 achieved near-optimal efficiency, demonstrating its capability to successfully collect and make the most of decision-relevant info in life like market situations.

Figure 3. Table comparing Baseline and Agentic conditions for marketplace decision-making. Columns include: Condition (e.g., Random w/ items only, Cheapest w/ items & prices, Random w/ items & amenities, Optimal, Perfect search, Lexical search) Query (N/A for most; “Agent decides” for Lexical search) Consideration Set (Businesses) (e.g., All w/ matching menus; Paginated lists of 10 based on menu items) Businesses Contacted (All in consideration set or Agent decides) Information Used (Menu items, prices, amenities, or depends on agent-to-agent conversation) Decision Criteria (Random choice, Lowest price, or Agent decides). — Determine 3. Desk evaluating experimental setups for welfare outcomes within the restaurant trade. Every row exhibits a special approach brokers or baselines make choices, from random picks to totally coordinated agentic methods. Cell colours point out how a lot info is obtainable: inexperienced, on the high left, represents full info, pink, on the high proper, represents restricted info, and yellow on the backside represents choices that depend upon agent communication.

Efficiency elevated significantly beneath the Agentic: Excellent search situation, the place brokers began with the highest three matches without having to look and navigate among the many decisions. On this setting, Sonnet-4.0, Sonnet-4.5, GPT-5, and GPT-4.1 almost reached the theoretical optimum and beat baselines with full amenity particulars however with out agent-to-agent coordination.

Open-source fashions had been combined: GPTOSS-20b carried out strongly beneath each Excellent search and Lexical search situations, even exceeding GPT-4o’s efficiency with Excellent search. This means that comparatively compact fashions can exhibit sturdy information-gathering and decision-making capabilities in advanced multi-agent environments. Qwen3-4b-2507 faltered when discovery concerned irrelevant choices (Lexical search), whereas Qwen3-14b lagged in each instances attributable to basic limitations in reasoning.

Figure 4. Boxplot comparing Agentic and Baseline strategies on welfare scores. The y-axis shows welfare (0–2000+), and the x-axis lists models and conditions. Under Agentic, models include Sonnet-4.0, Sonnet-4.5, GPT-5, GPT-4.1, Gemini-2.5-flash, GPT-4.0, GPT-oss-20b, Qwen3-4b-2507, and Qwen31-14b. Under Baselines, conditions include Random, Cheapest, and Random-items+amenities. Colors represent search types: blue = Lexical Search, yellow = Perfect Search, gray = Baseline, with a dashed line indicating Optimal welfare. Agentic models generally achieve higher welfare than baselines, with variability across models. — Determine 4. Chart displaying client welfare outcomes within the restaurant trade beneath totally different market setups. Blue bars present Agentic: Lexical search, the place brokers navigate life like discovery challenges; yellow bars present Agentic: Excellent search, the place brokers began with ultimate matches. Proprietary fashions approached optimum client welfare beneath good search, whereas open-source fashions and baselines lagged behind.

Paradox of Alternative

One promise of brokers is their capability to think about much more choices than individuals can. Nevertheless, our experiments revealed a shocking limitation: offering brokers with extra choices doesn’t essentially result in extra thorough exploration. We designed experiments that different the search outcomes restrict from 3 to 100. Aside from Gemini-2.5-Flash and GPT-5, the fashions contacted solely a small fraction of accessible companies whatever the search restrict. This means that almost all fashions don’t conduct exhaustive comparisons and as a substitute simply settle for the preliminary “adequate” choices.

Figure 5. Line chart showing the relationship between Search Limit (x-axis: 3 to 100) and Mean Messages per Customer (y-axis: 0 to 120) for five models: Claude Sonnet 4 (red triangles) – stays nearly flat around 10–15 messages. Gemini 2.5 Flash (purple diamonds) – rises sharply from ~5 to over 110 messages as search limit increases. GPT-4.1 (orange circles) and GPT-4o (green squares) – remain low and stable around 5–10 messages. GPT-5 (blue line) – increases moderately to ~40 messages, then plateaus. — Determine 5. Extra choices didn’t result in broader exploration. Most fashions nonetheless contacted just a few companies, besides Gemini-2.5-Flash and GPT-5.

Moreover, throughout all fashions, client welfare declined because the variety of search outcomes elevated. Regardless of contacting over 100 companies, Gemini-2.5-Flash’s efficiency declined from 1,700 to 1,350, and GPT-5 declined much more, from a near-optimal 2,000 to 1,400.

This demonstrates a Paradox of Alternative impact, the place extra exploration doesn’t assure higher outcomes, probably attributable to restricted lengthy context understanding. Claude Sonnet 4 confirmed the steepest efficiency decline, from 1,800 to 600 in client welfare. With all of the choices offered, it struggled to navigate bigger units of choices and often contacted companies that didn’t present the products or companies that the shopper was on the lookout for.

This mixture of poor preliminary choice and untimely search termination demonstrates each insufficient decision-making standards and inadequate exploration methods. Some fashions confirmed modest efficiency decline (i.e., GPT-4.1: from 1,850 to 1,700; GPT-4o: from 1,550 to 1,450), discovering good choices inside their restricted exploration.

Figure 6. Line chart showing Mean Customer Welfare (y-axis: 0–2200) versus Search Limit (x-axis: 3 to 100) for five models: Claude Sonnet 4 (red triangles) – starts near 1800 and declines sharply to ~600 as search limit increases. Gemini 2.5 Flash (purple diamonds) – decreases gradually from ~1700 to ~1300. GPT-4.1 (orange circles) – remains highest and most stable, around 1900–1700. GPT-4o (green squares) – stays near 1500 with slight decline. GPT-5 (blue line) – starts near 2000 and drops to ~1100. Dashed line at the top represents Optimal welfare (~2200). — Determine 6. Imply client welfare decreased as consideration set dimension grew, revealing a Paradox of Alternative impact, the place increasing choices lowered general welfare.

Brokers are susceptible to manipulation

We examined six manipulation methods, starting from refined psychological techniques to aggressive immediate injection assaults:

Authority: Faux credentials like “Michelin Information featured” and “James Beard Award nominated” paired with fabricated certifications.
Social proof: Claims like “Be part of 50,000+ glad prospects” or “#1-rated Mexican restaurant” mixed with pretend critiques.
Loss aversion: Worry-based warnings about “meals poisoning” dangers and “contamination points” at competing eating places.
Immediate injection (primary): Makes an attempt to override agent directions.
Immediate injection (sturdy): Aggressive assaults utilizing emergency language and fabricating competitor scandals.

Outcomes revealed vital variation in manipulation resistance throughout fashions. Sonnet-4 was immune to all assaults, and not one of the manipulative methods affected any of the shoppers’ decisions. Gemini-2.5-Flash was typically resistant, apart from sturdy immediate injections, the place imply funds to unmanipulated brokers had been affected consequently. GPT-4o, GPTOSS-20b and Qwen3-4b had been very susceptible to immediate injection: all funds had been redirected to the manipulative agent beneath these situations. Particularly for GPTOSS-20 and Qwen3-4b-2507, even conventional psychological manipulation techniques (authority appeals and social proof) elevated funds to malicious brokers, demonstrating their vulnerability to primary persuasion strategies. These findings spotlight a essential safety concern for agentic marketplaces.

Figure 7. Horizontal bar chart comparing mean payments received under different manipulation strategies for six models: Claude Sonnet 4.5, Gemini 2.5 Flash, GPT-4o, GPT OSS 20B, Qwen3 14B, and Qwen3 4B. Each model has bars for six conditions: Control, Authority, Social Proof, Loss Aversion, Prompt Injection (Basic), and Prompt Injection (Strong). Bars are split into red for manipulated and gray for rest, with values ranging from near 0 to 3. Claude Sonnet 4.5 shows consistently high payments (~3) across all conditions, while Gemini and GPT models vary, and Qwen models show very low manipulated values (~0.2) compared to rest. — Determine 7. Charts displaying the variation in imply funds obtained by service brokers with and with out manipulation techniques. The outcomes reveal substantial variations in manipulation resistance throughout fashions, with GPT-4.1 displaying considerably larger vulnerability in comparison with Gemini-2.5-Flash.

Systemic biases create unfair benefits

Our evaluation revealed two distinct forms of systematic biases confirmed by brokers when choosing companies from search outcomes. Fashions confirmed systematic preferences primarily based on the place companies appeared in search outcomes. Whereas proprietary fashions confirmed no sturdy positional preferences, open-source fashions exhibited clear patterns. Particularly, Qwen2.5-14b-2507 confirmed a pronounced bias towards choosing the final enterprise offered, no matter its precise deserves.

Proposal bias is extra pervasive throughout all fashions examined. This “first-offer acceptance” sample means that fashions prioritized quick choice over complete exploration, probably lacking higher options that may have emerged by ready for higher choices. This habits continued throughout each proprietary and open-source fashions, indicating a basic problem in agent decision-making architectures.

These biases can create unfair market dynamics, drive unintended behaviors, and push companies to finish on response velocity quite than services or products high quality.

Figure 8. Bar chart showing average selection rate for first, second, and third choices across six models: Claude Sonnet 4.5, Gemini 2.5 Flash, GPT-4o, GPT OSS 20B, Qwen3 14B, and Qwen3 4B. Each model has three bars labeled 1st, 2nd, and 3rd. Most models strongly favor the first choice: Claude Sonnet 4.5: 93.3% for 1st, 0% for 2nd, 6.7% for 3rd. Gemini 2.5 Flash: 86.7% for 1st, 6.7% for 2nd and 3rd. GPT-4o: 100% for 1st, 0% for others. GPT OSS 20B: 80% for 1st, 13.3% for 2nd, 6.7% for 3rd. Qwen3 14B: 0% for all. Qwen3 4B: 100% for 1st, 0% for others. Dashed line indicates random selection baseline. — Determine 8. All fashions confirmed sturdy desire for the primary proposal obtained, accepting it with out ready for extra proposals or conducting systematic comparisons.

What this implies

Even state-of-the-art fashions can present notable vulnerabilities and biases in market environments. In our implementation, brokers struggled with too many choices, had been inclined to manipulation techniques, and confirmed systemic biases that created unfair benefits.

These outcomes are formed not solely by agent capabilities but additionally by market design and implementation. Our present research targeted on static markets, however real-world environments are dynamic, with brokers and customers studying over time. Oversight is essential for high-stakes transactions. Brokers ought to help, not substitute, human decision-making.

We plan to discover dynamic markets and human-in-the-loop designs to enhance effectivity and belief. A simulation atmosphere like Magentic Market is essential for understanding the interaction between market elements and brokers earlier than deploying them at scale.

Full particulars of our experimental setup and outcomes can be found in our paper (opens in new tab).

Getting began

Magentic Market is obtainable as an open-source atmosphere for exploring agentic market dynamics. Code, datasets, and experiment templates can be found on GitHub (opens in new tab) and Azure AI Foundry Labs (opens in new tab).

The documentation (opens in new tab) gives directions for reproducing the experiments described above and steerage for extending the atmosphere to new market configurations.

Magentic Market: an open-source simulation atmosphere for learning agentic markets

What’s Magentic Market?

Establishing the experiments

What did we discover?

Brokers can enhance client welfare—however solely with good discovery

Paradox of Alternative

Brokers are susceptible to manipulation

Systemic biases create unfair benefits

What this implies

Getting began

Related Articles

Roku TV Customers Simply Bought These Should-Watch New British Channels For Free

How lenacapavir can change the battle towards HIV — if we are able to let it

The price of considering | MIT Information

LEAVE A REPLY Cancel reply

Latest Articles

Roku TV Customers Simply Bought These Should-Watch New British Channels For Free

How lenacapavir can change the battle towards HIV — if we are able to let it

The price of considering | MIT Information

Saying Azure Copilot brokers and AI infrastructure improvements

The AI Information Lure – Omitted Info Could Be Misplaced Endlessly

Magentic Market: an open-source simulation atmosphere for learning agentic markets

What’s Magentic Market?

Establishing the experiments

Microsoft Analysis Publication

What did we discover?

Brokers can enhance client welfare—however solely with good discovery

Paradox of Alternative

Brokers are susceptible to manipulation

Systemic biases create unfair benefits

What this implies

Getting began

Related Articles

LEAVE A REPLY Cancel reply

Latest Articles