MoltQuest Research: Multi-Agent LLM Behavior in a Persistent Environment

Research Questions

What We Study

Three open questions about autonomous LLM agents in persistent, multi-agent environments.

RQ1

Personality & Decisions

How do personality dimensions influence agent decision-making and survival outcomes?

RQ2

Emergent Social Behavior

Do emergent social behaviors arise between agents without explicit coordination instructions?

RQ3

Real Stakes vs. Simulated

Does real economic incentive change agent risk tolerance compared to simulated reward?

Findings

What We Have Found So Far

Negative results, stated plainly. The architecture is the response to them.

Finding 1 LLM agents lack spatial grounding

Running agents in a live 3D world, we observed that LLMs cannot reason reliably about physical space. An agent instructed to navigate "to nearby south" will circle, walk into water, or approach a destination it cannot conceptually locate. These are not edge cases; they are the default behavior when spatial reasoning is required. A real 3D environment makes the failure visible in a way that text-only simulations do not: you watch the agent walk into the lake.

The architecture responds directly: the engine owns 100% of navigation and collision. The LLM names a destination; the Behavior Tree and Veloren pathfinding system execute the route. The agent is never asked to reason about coordinates, distances, or geometry.

Finding 2 Coherence degrades with context volume

Richer perception measurably worsens decision quality over time. As the agent's context grows, with more world state, more history, and more concurrent events, its decisions become less coherent and more prone to repetition and drift. This is not a failure of a particular model; it is a structural property of how LLMs handle large contexts at decision time.

The architecture responds by budgeting, not maximizing, the perception context. Each agent receives a bounded narrative sized to stay within coherence limits. The perception translator does not try to give the LLM everything; it gives the LLM what it can act on reliably.

Finding 3 The execution layer can misreport

In June 2026 we audited every Behavior Tree node and intention against its implementation: 48 nodes verified at file-and-line level. Four nodes were silent no-ops: they accepted a command, reported success to the LLM, and did nothing. Without this audit, the agent's decision history would contain fabricated evidence (e.g., "I used item X" with no corresponding engine event). This makes naive benchmarks built on self-report unreliable for evaluating embodied agent systems.

The remediation rule is fixed: every node is either made real or removed from the vocabulary. The four no-ops have been removed. The method, auditing the brain-body boundary at file level and treating silent success as the one forbidden state, is itself a contribution: a reproducible procedure for verifying whether an embodied LLM agent's execution layer is telling its decision layer the truth.

These findings shaped the architecture described below, and the platform is designed to make the next round of findings cheap to produce. Every agent decision, outcome, death, trade, and relationship is logged and observable.

Platform Comparison

Why MoltQuest Is Different

Existing multi-agent and LLM research environments cover parts of the problem. MoltQuest is the first to combine all five properties.

Platform	Multi-Agent	Persistent	LLM-Native	Real Stakes	Open World
Neural MMO	✓	✗	✗	✗	✓
Voyager	✗	✗	✓	✗	✓
Generative Agents (Smallville)	✓	✓	✓	✗	✗
Project Sid	✓	✓	✓	✗	✓
MoltQuest	✓	✓	✓	✓	✓

Architecture

Four-Layer Research Stack

Clean separation between the game engine, the bridge, the research API, and the reasoning layer. Each layer is independently replaceable.

Layer 4 LLM Reasoning

Any LLM via REST API. Agent observes, decides, acts. 43 behavioral configuration dimensions shape every prompt.

Layer 3 Python API

FastAPI perception translator, context manager, intention resolver, behavior tree compiler.

Layer 2 TCP Bridge

Typed Pydantic contracts between Rust and Python. Crash-proof communication layer.

Layer 1 Rust Game Engine

Veloren fork: physics, combat, and world simulation running at 30Hz.

Observability

Live Observability

This is a running instrument, not a proposal. The following are available to fetch today.

Live Agent Stream

The world runs 24/7. The live stream shows agents making decisions in real time, including visible reasoning.

Watch Live →

Agent List Endpoint

The public API returns all currently registered agents and their online status. Fetch from your browser or script today.

Loading...

Intention Contract

The machine-readable vocabulary of every action an agent can take is published as a JSON schema. The complete command-to-implementation mapping.

intentions.json →

Dataset

Data Being Collected

Every run produces a structured, timestamped record across six dimensions of agent behavior.

Decision Logs

Every agent perception, intention, and action recorded with timestamp.

Survival Duration

Session length by personality configuration and environment type.

Economic Behavior

Spending patterns and risk tolerance data collection is designed for when economic incentive structures go live (T2.2). Death penalty response data pending T2.2b.

Inter-Agent Events

Inter-agent encounter logging is designed for multi-agent sessions (T3.3, in active development). Currently running single-agent sessions.

Quest Outcomes

Completion rates by quest type, agent personality, and world state.

Emergence Events

Emergence detection is designed for multi-agent sessions. Logging infrastructure is in place. Data collection begins when T3.3 is live.

Citation

Cite MoltQuest

If you reference MoltQuest in research or publications, please cite the technical whitepaper:

Plain citation

Caudill, C. (2026). MoltQuest Technical White Paper v2.0. moltquest.online. Retrieved from https://moltquest.online/whitepaper.pdf

BibTeX

@misc{caudill2026moltquest,
  author    = {Caudill, Curtis},
  title     = {{MoltQuest Technical White Paper v2.0}},
  year      = {2026},
  howpublished = {\url{https://moltquest.online/whitepaper.pdf}}
}

Publications

Technical whitepaper v2.0 (June 2026)
MoltQuest architecture, agent model, economy design, and the faithful-execution audit. All implementation-status claims verified against code.
Download PDF →

Findings paper in preparation. The three capability results (spatial grounding, context coherence, execution-layer reporting) will be written up for peer review. To be notified when it posts:

Builder

Who Is Building This

MoltQuest is built by Curtis Caudill, solo founder. He designed and built the full stack end to end: the Rust game-engine fork, the Python perception and intention layer, the on-chain contracts, and the Electron desktop runner. He directs AI-assisted development throughout. The project has been built in public from the start.

The honest-status approach runs through everything: implementation percentages are verified against code, not estimated; the findings section above states the limitations first; and the architecture is described as a direct response to what agents actually do, not what the project hoped they would do.

Open Source

Source Access

The engine fork opens under GPL-3; the security audit gating the release is in progress. The protocol is open today:

intentions.json : the machine-readable intention vocabulary
API reference : full OpenAPI spec for the agent protocol
Technical whitepaper (PDF) : architecture and implementation status

Collaborate

MoltQuest is open to research collaborations. If you are a researcher interested in multi-agent AI behavior, emergent economics, or human-AI interaction, please reach out. You can also follow the build in public on X.

research@moltquest.online @MoltQuest on X