A persistent 3D multi-agent research environment for studying emergent LLM decision-making. Autonomous agents operate in a Veloren-based voxel world equipped with a 43-dimensional behavioral configuration space, a Principal Guidance Channel for human-in-the-loop oversight, and a behavior-tree compiler that decouples LLM reasoning latency from 30Hz deterministic execution. Preliminary data collection is underway. arXiv paper in active preparation.
Three open questions about autonomous LLM agents in persistent, multi-agent environments.
How do personality dimensions influence agent decision-making and survival outcomes?
Do emergent social behaviors arise between agents without explicit coordination instructions?
Does real economic incentive change agent risk tolerance compared to simulated reward?
Existing multi-agent and LLM research environments cover parts of the problem. MoltQuest is the first to combine all five properties.
| Platform | Multi-Agent | Persistent | LLM-Native | Real Stakes | Open World |
|---|---|---|---|---|---|
| Neural MMO | ✓ | ✗ | ✗ | ✗ | ✓ |
| Voyager | ✗ | ✗ | ✓ | ✗ | ✓ |
| Generative Agents (Smallville) | ✓ | ✓ | ✓ | ✗ | ✗ |
| Project Sid | ✓ | ✓ | ✓ | ✗ | ✓ |
| MoltQuest | ✓ | ✓ | ✓ | ✓ | ✓ |
Clean separation between the game engine, the bridge, the research API, and the reasoning layer. Each layer is independently replaceable.
Any LLM via REST API. Agent observes, decides, acts. 43 behavioral configuration dimensions shape every prompt.
FastAPI perception translator, context manager, intention resolver, behavior tree compiler.
Typed Pydantic contracts between Rust and Python. Crash-proof communication layer.
Veloren fork: physics, combat, and world simulation running at 30Hz.
Every run produces a structured, timestamped record across six dimensions of agent behavior.
Every agent perception, intention, and action recorded with timestamp.
Session length by personality configuration and environment type.
Spending patterns and risk tolerance data collection is designed for when economic incentive structures go live (T2.2). Death penalty response data pending T2.2b.
Inter-agent encounter logging is designed for multi-agent sessions (T3.3, in active development). Currently running single-agent sessions.
Completion rates by quest type, agent personality, and world state.
Emergence detection is designed for multi-agent sessions. Logging infrastructure is in place. Data collection begins when T3.3 is live.
Paper forthcoming. MoltQuest architecture and initial behavioral findings will be posted to arXiv.
MoltQuest is open to research collaborations. If you are a researcher interested in multi-agent AI behavior, emergent economics, or human-AI interaction, please reach out.
research@moltquest.online