MDMP Platform Blueprint

MDMP-NATIVE AI DECISION PLATFORM

Architectural Blueprint and Product Specification

Jeep Marshall LTC, US Army (Retired) Airborne Infantry | Special Operations | Process Improvement jeep.marshall@gmail.com March 2026

SERIES NOTE

This is Paper 7 in the Herding Cats in the AI Age series. Paper 1 established that AI needs doctrine, not more intelligence. Paper 2 showed the military already built the coordination frameworks the AI industry lacks. Paper 3 proved those principles work in a live laboratory. Paper 4 demonstrated the consequences when coordination is absent. Paper 5 showed that two models can negotiate a coordination protocol in real time. Paper 6 proved that four models assigned military staff roles produce demonstrably better strategic decisions than a solo baseline. Paper 7 now asks the next operational question: what does a production platform look like that scales this coordination pattern from a proof-of-concept to a system any team can use?

EXECUTIVE SUMMARY

Paper 6’s proof-of-concept validated that doctrine-structured multi-model ensembles produce measurably better strategic analysis than a single AI model. The ensemble surfaced 6 strategic insights the solo baseline missed, including 2 rated HIGH value. Most critically, Paper 6 demonstrated that a structured framework makes the difference: without doctrine, four models produce chaos. With doctrine, they function as a coordinated team.

This paper translates that proof-of-concept into a production platform specification. The thesis is straightforward: a conversational, MDMP-native platform that accepts natural voice or text input, converts it into structured tasks, deploys AI agents to work those tasks in parallel under defined roles, and surfaces decision-ready outputs is both technically viable and strategically necessary.

The platform is not a generic multi-agent orchestrator. It is doctrine-first, then software. The MDMP structure is the constraint that makes coordination work. This paper provides the technical architecture, deployment model, access tiers, and roadmap for building such a system.

1. THE PROBLEM PAPER 6 SOLVED

1.1 The Ensemble Hypothesis

Before Paper 6, the field evidence on multi-agent coordination was mixed at best. A UC Berkeley study identified 14 distinct failure modes across multi-agent systems with failure rates ranging from 41% to 86.7%.¹ A Google/MIT collaboration found that multi-agent systems degraded performance on sequential tasks by 39–70%.² The assumed answer — “more models = better decisions” — was empirically false.

Paper 6 tested a different hypothesis: more models with structure beats one model without it, and solo models with structure beat any ensemble without it. The structure in question was military doctrine — the MDMP (Military Decision Making Process).

1.2 The Paper 6 Design

Four frontier AI models were assigned specialized analysis roles:

Commander (Claude Opus 4.6): Synthesis and final recommendation
Intelligence Officer (Gemini 3): Environmental scan, threat analysis
Operations Officer (ChatGPT/GPT-4o): COA development and feasibility analysis
Analyst (Grok/SuperGrok): Contrarian analysis, failure mode identification

All four models received identical mission briefing on the series’ own publication strategy — a real decision with real stakes, not a contrived benchmark. The solo baseline (Claude) ran the full process alone. The ensemble models ran independently, then the Commander synthesized their outputs.

1.3 Key Findings

What the Ensemble Surfaced That Solo Missed:

Content Strategy (Gemini) — HIGH Value: Break Paper 2’s 33,500 words into 10 “Operational Briefs” for sustained content distribution. Solo Claude treated it as monolithic.
Hybrid Approach (ChatGPT) — MEDIUM-HIGH Value: “Doctrine → Articles → Case Studies → Book” sequence. ChatGPT synthesized multiple approaches into a hybrid generating more content touchpoints.
Competitive Threat (Gemini) — MEDIUM Value: McKinsey/Deloitte rebranding quality methodology for AI creates a specific, named competitive threat with urgency.
Credibility Challenge (Grok) — HIGH Value: Specific vulnerabilities the author must pre-empt.
Implementation Gap (Grok) — MEDIUM Value: Build a GitHub repo with code and simulations to compete on implementation, not essays alone.
Alignment Opportunity (Gemini) — MEDIUM Value: Position the series to align with Department of War 2026 initiatives.

What Solo Did Better:

Operational detail: Week-by-week execution plans with hour estimates and decision gates
Risk register: 8 risks with likelihood, impact, and specific mitigations
Deadline management: Factored hard constraints into timeline
Assumption validation: Identified 5 specific assumptions requiring validation with deadlines

The Refined Thesis:

“A doctrine-structured multi-model ensemble surfaces strategic blind spots that solo analysis misses, while solo analysis produces superior operational detail. The optimal pattern is ensemble for strategy, solo for operations — and doctrine is the constant that makes both work.”

2. THE PLATFORM VISION

2.1 Core Principle: Doctrine First, Software Second

The platform is not a general-purpose multi-agent orchestrator. It is MDMP-native.³ The MDMP structure is not a bolt-on feature or an optional workflow. It is the foundation.

Why? Because Paper 6 proved that without structure, multi-model coordination fails. The MDMP provides the structure: defined phases, assigned roles, synthesis procedures, and decision gates. These are not procedural niceties. They are the constraints that make the system work.

2.2 User Perspective: Voice-First Input

A user speaks naturally into the system: “We have a market entry decision for the European AI regulation market. We need to assess whether to enter before the GDPR AI Act is finalized or wait for regulatory clarity. We need this decision by Friday.”

The platform converts this natural language input into structured tasks:

Problem Receipt: Parse the decision, extract deadline, identify stakeholders
Analysis Phase: What are the facts, assumptions, constraints, and risks?
COA Development: What are the courses of action?
COA Analysis: What are the failure modes and second-order effects?
COA Comparison: Which COA is optimal against defined criteria?
Approval: Commander recommends, human decides
Execution: Actionable next steps, timelines, accountability

Throughout this process, AI agents work in parallel:

Intelligence agent scans the regulatory landscape
Operations agent develops market entry scenarios
Analyst tears each scenario apart
Commander synthesizes everything and presents decision-ready outputs

The human sees each stage of analysis and can inject decisions, redirect analysis, or request deeper dives at any phase.

2.3 Output Layer: Decision Documents

The system produces outputs aligned to standard planning structure:

Situation Report: Current state, constraints, environmental factors
Problem Statement: What we are deciding and why
Strategic Objectives: What success looks like and decision-making logic
Courses of Action: Multiple COAs with analysis
Selected COA: The decision and reasoning
Execution Plan: Immediate next steps, accountability, timeline
Lessons Learned Registry: What we learned for future decisions

This structure is not arbitrary. It forces clarity. It prevents decisions from being based on vague intuitions. It creates an audit trail.

3. ARCHITECTURE OVERVIEW

3.1 Input Layer

Voice-to-Text Pipeline:

Whisper API for transcription (proven quality, ~98% accuracy at 8+ second clips)⁴
Speaker identification (multi-user support with role tagging)
Real-time transcription (streaming to processing layer, no batch delays)

Text Chat Fallback:

Web/mobile chat interface for users who prefer typing
Prompt templates for common decision types (resource allocation, market entry, personnel decisions, etc.)
Structured prompt injection for role-bounded agent inputs

3.2 Task Engine

Inbox Queue: All inputs become discrete tasks in a message queue. No input is lost. No decision is left in the uncompleted state.

Task Parser AI: A dedicated Claude or Gemini model (small model, fast) converts raw input into structured work items:

Problem statement
Context and constraints
Decision deadline
Primary stakeholder (the decision-maker)
Secondary stakeholders (who has input authority)

Priority Sorter: Routes tasks to appropriate phase. A task arriving mid-decision might jump to COA Analysis. A task arriving at 4 PM with a Friday deadline gets priority queued.

3.3 Agent Layer: Multi-AI Ensemble

Slot-Based Architecture (Pluggable):

Role	Primary Model	Backup Model	Function	Update Frequency
Commander (Synthesis)	Claude Sonnet	Claude Opus	Decision synthesis, conflict resolution, final recommendation	Per decision
Intelligence Officer	Gemini 3	Claude Opus	Environmental scan, competitive landscape, threat analysis	Per COA
Operations Officer	GPT-4o	Claude Opus	COA development, feasibility analysis, resource modeling	Per COA
Analyst	Grok/SuperGrok	Claude Opus	Critical analysis, failure modes, second-order effects	Per COA
Scribe (Optional)	Claude Haiku	—	Meeting notes, transcript synthesis, lesson extraction	Per session

Each slot can be filled by different models based on budget, capability needs, or availability. The architecture is model-agnostic. The MDMP structure is model-independent.

The slot-based pluggable architecture implements dependency injection: the framework defines the interface (Intelligence, Operations), and any compatible model fills the slot. This is the software engineering principle of programming to interfaces, not implementations.⁵ The organizational chart is the contract. The model is the implementation. Swapping implementations does not require redesigning the organization.

Figure 1 — MDMP Platform Architecture

MDMP-Native AI Decision Platform — architecture showing input layer, 7-phase pipeline with acceptance gates, pluggable agent slots, and tiered output. Each gate implements force-advance logic and circuit breaker patterns for graceful degradation.

Key Feature: Role Isolation

Each agent operates independently until synthesis phase. The Intelligence officer does not see Operations output before generating its analysis. The Analyst does not know which COA was recommended before attacking all three. This prevents groupthink and preserves distinct analytical perspectives.⁶

3.3.1 Error Handling and Resilience

Multi-agent systems fail. One model times out. One API rate-limits. One agent hallucinates. The platform must degrade gracefully: survive partial failures and surface what is known.

Agent Timeout Handling:

Each agent has a 30-second default timeout (configurable per role: intelligence agents 45s, operations agents 60s, scribe 15s). If an agent does not return before timeout, the orchestrator:

Logs the timeout with agent name, phase, and deadline.
Marks the agent’s slot as “timed_out” in the decision JSON.
If the agent is critical (Commander, Operations), escalates to backup model immediately.
If the agent is secondary (Analyst), continues with 3-agent analysis (logged as reduced ensemble).

Retry Logic:

Transient failures (API rate limit, network timeout) trigger exponential backoff: 1s, 2s, 4s, max 3 retries. Permanent failures (auth failure, model deprecated) log immediately and do not retry. Backoff is per-task, not per-request, preventing cascade failures across the queue.

Fallback Activation:

Each agent role has a primary and backup model. If the primary model becomes unavailable (API down, quota exceeded, model deprecated), the orchestrator switches to the backup model. The decision JSON records the switch: "s2_model": "gemini-3", "s2_backup_activated": true. Output quality may degrade, but analysis continues.

Contradiction Detection:

Intelligence and Operations outputs are cross-referenced. If Operations claims “market window closes in 12 months” but Intelligence found “market window closes in 8 months,” the contradiction is flagged: "contradiction_detected": {"claim_a": "...", "claim_b": "...", "severity": "HIGH"}. Human is alerted. Analysis continues, but the conflict is surfaced in the Commander’s recommendation.

Graceful Degradation:

If 1 of 4 agents fails to return analysis (timeout, error):

3-agent ensemble produces output.
The decision JSON marks the missing agent’s slot.
Commander synthesizes 3 analyses + notes the gap.
Post-decision review flags the missing perspective.

If 2 of 4 agents fail: escalate to human. The system will not produce a recommendation with less than half the ensemble.

Graceful degradation means the system’s output quality degrades proportionally to the lost model’s contribution weight, not catastrophically. A four-model ensemble that loses one model retains 75% of its analytical surface. It does not fail. It narrows. No single model is load-bearing — the ensemble distributes responsibility across roles.

Circuit Breaker Pattern:

If a single API (e.g., GPT-4o, Gemini) fails 3 times in 5 minutes, the orchestrator opens a circuit breaker: no further requests to that API for 5 minutes. Alternative models are used. After 5 minutes, a single test request is sent. If successful, the circuit closes and normal operation resumes. This prevents hammering a degraded service and prevents cascade failure.⁷

3.4 MDMP Processing Pipeline

The seven-step MDMP structure is the spine of the system:

Figure 7.1 — MDMP Seven-Step Cycle with AI Role Overlays

The MDMP seven-step cycle (solid arrows) with AI agent roles (dashed) overlaid per phase. Each phase has a primary owner; the Supervisor Agent bookends the cycle (receipt + approval). The feedback loop from step 7 back to step 1 is the Plan→Execute→Assess rhythm that makes MDMP iterative rather than one-shot.

Phase 1: Problem Receipt

Parse the decision problem statement
Extract explicit constraints (deadline, budget, stakeholder list)
Identify the decision-maker (who has authority)
Flag any ambiguities or missing context

Phase 2: Analysis

Intelligence Officer: environmental scan, threat analysis
Identify facts vs. assumptions
Extract constraints
Initial risk register

Phase 3: COA Development

Operations Officer: generate 3–5 mutually exclusive courses of action
Each COA is fully described (what, how, timeline, resource requirements)
Analyst: preliminary evaluation of each COA

Phase 4: COA Analysis

Operations Officer: detailed feasibility analysis for each COA
Intelligence Officer: implications of each COA in the broader environment
Analyst: full critical analysis — what could go wrong, what am I missing
Risk modeling: likelihood, impact, mitigation for each COA

Phase 5: COA Comparison

Operations Officer: structured comparison matrix
Score each COA against weighted criteria
Rank the COAs

Phase 6: Approval

Commander: review all analysis
Select recommended COA
Document reasoning (why this one, what other considerations mattered)
Human decision-maker: approve, reject, or request additional analysis

Phase 7: Execution

Scribe: translate selected COA into actionable next steps
Timeline: week-by-week or day-by-day depending on decision urgency
Assign ownership: who is accountable for each action
Define success metrics: how will we know this is working

3.4.1 Phase Acceptance Gates

Each MDMP phase transition requires explicit gate approval. A gate is a decision point: data sufficient to advance, or re-analysis required? Gates prevent premature advancement and ensure decision quality degrades gracefully under time pressure.

The MDMP pipeline maps precisely onto DMAIC — the Lean Six Sigma improvement cycle. Military doctrine and process improvement arrived at the same structure from different directions:

DMAIC Phase	MDMP Equivalent	Platform Implementation
Define	Analysis	Problem framing, decision scope, success metrics extraction
Measure	COA Development	Data collection, baseline assessment, COA generation
Analyze	COA Analysis	Root cause analysis, failure mode identification, COA evaluation
Improve	COA Selection + Execution	Solution implementation, execution plan, accountability assignment
Control	Execution + Assessment + Lessons	Monitoring, course correction, outcome tracking

This equivalence is not decorative. It means every quality improvement tool from the DMAIC toolkit — control charts, Pareto analysis, root cause analysis, process sigma calculations — applies directly to MDMP phase performance measurement. The platform can report a process sigma for its decision pipeline the same way a manufacturing system reports sigma for its production line.

Rolled Throughput Yield across phase gates: Each phase gate has a first-pass yield — the probability that a decision advances without requiring rework. Rolled Throughput Yield across all gates: RTY = Π(FPY_phase_i). If each of seven phases passes at 90% FPY, the pipeline has RTY = 0.9⁷ = 47.8% — less than half of decisions pass all gates without rework at that rate. This is why force-advance logic exists: in time-compressed situations, perfect quality across every gate is the wrong optimization target. The platform logs which gates were forced-advanced and why, enabling post-decision RTY analysis and systematic improvement over time.⁸

Gate Structure:

Phase	Gate Criteria	Owner	Force-Advance?
1 → 2	Problem statement clear; deadline explicit; stakeholders identified	Task Parser	Yes (logged)
2 → 3	Facts/assumptions documented; constraints listed; risk register initialized with ≥3 risks	Intelligence	Yes (logged)
3 → 4	≥3 COAs generated, mutually exclusive, each fully described	Operations	Yes (logged)
4 → 5	COA analysis complete; Analyst objections documented; scoring rubric finalized	Analyst	Yes (logged)
5 → 6	Comparison matrix complete; ranking clear; tiebreaker criteria defined	Operations	Yes (logged)
6 → 7	Commander recommendation documented; human has reviewed analysis; approval/defer decision logged	Commander	Yes (logged)

Gate Logic:

If gate criteria are met: advance immediately.
If gate criteria are not met and deadline permits: trigger re-analysis on the blocked phase.
If gate criteria are not met and deadline is imminent: human can force-advance. Log the override: timestamp, gate, reason, decision-maker.

Failed gates do not halt the system. They trigger re-analysis. If human forces advance, the gap is logged for post-decision review.

Rationale: Gates enforce discipline while respecting time pressure. In a 2-hour decision cycle, all gates may be forced-advanced (logged). In a 2-week cycle, most gates are met before advancement.

3.5 Human-in-the-Loop Design

Humans decide. AI advises.

At three critical junctures, humans retain authority:

After Comparison: Before Commander synthesis, the human reviews what agents generated. Any human can say “wait, I need to understand the intelligence analysis better” and request deeper analysis on specific aspects.
Before Selection: The human makes the final decision. AI recommends. Human decides.
Before Execution: The human reviews the execution plan. No task leaves the system without human eyes on it.

Override Logging:

Every human intervention is documented:

What analysis did the human override?
Why (what information changed the recommendation)?
Timestamp and decision-maker

This creates an audit trail. It also creates organizational learning: why did the machine recommend A but the human chose B? Was the human right? We learn over time.

3.6 Data Layer: JSON-First Backend

Why JSON?

Industry standard, portable, scalable
No proprietary lock-in
Data exportable at any time
Future migration from JSON to PostgreSQL is a software problem, not an architecture problem⁹

Schema:

{
  "decision": {
    "id": "UUID",
    "problem_statement": "...",
    "deadline": "ISO-8601",
    "stakeholders": ["..."],
    "created": "ISO-8601",
    "decision_maker": "UUID"
  },
  "analysis": {
    "facts": ["..."],
    "assumptions": ["..."],
    "constraints": ["..."],
    "risks": [...]
  },
  "options": [
    {
      "id": "OPTION-1",
      "title": "...",
      "description": "...",
      "intelligence_analysis": "...",
      "operations_analysis": "...",
      "analyst_evaluation": "...",
      "score": 3.65
    }
  ],
  "selected_option": "OPTION-1",
  "commander_recommendation": "...",
  "human_decision": "APPROVE | REJECT | DEFER",
  "execution_plan": [...],
  "lessons_learned": [...]
}

Every decision is a JSON document. Every analysis is timestamped. Every decision is traceable.

3.6.1 Multi-Tenant Isolation

Production deployment requires strict tenant isolation to prevent cross-customer data leakage. Every decision object carries a tenant_id field. Database security enforces row-level access control: a query for decisions belonging to Tenant A always returns only Tenant A’s data.

Isolation mechanisms:

Row-Level Security (RLS): PostgreSQL policy on all tables restricts queries to WHERE tenant_id = current_setting('app.tenant_id'). Tenant context is set at connection initialization, not at query time.
Logical Isolation: Each tenant has isolated API keys. A stolen key grants access only to that tenant’s decisions. Key scope is enforced at the API gateway.
Blast Radius Containment: If Tenant A’s API key is compromised, exposure is limited to Tenant A’s data. Message queues are partitioned by tenant_id.
Rate Limiting Per Tenant: Token quota and request rate limits are per-tenant. One tenant’s spike in usage does not starve other tenants’ processing.
Credential Rotation: API keys expire after 90 days (configurable per tier). Expired keys return 401 immediately.

3.7 Output Layer: Decision Documents

Situation Report Dashboard:

Current decision status
Timeline to deadline
Key insights from each agent
Outstanding decisions or approvals

The dashboard is a real-time control chart: decision quality metrics plotted against control limits derived from the platform’s historical performance baseline. Violations trigger investigation, not automatic rollback — distinguishing special cause variation from common cause variation.¹⁰ The commander sees not just the current status but whether current performance falls within the system’s normal operating envelope.

Execution Orders:

Problem statement
Strategic objectives
Selected COA and reasoning
Execution timeline
Accountability (who owns what)
Success metrics

Lessons Learned Registry:

Auto-generated from decision analysis
Connected to historical decisions (is this similar to a past decision?)
Tagged by decision domain (market entry, resource allocation, personnel, etc.)

4. MDMP PROCESSING PIPELINE: DETAILED FLOW

4.1 Problem Receipt

A user inputs (voice or text): “We’re deciding whether to acquire company X. They have 40 engineers, $2M ARR, 15 accounts. Acquisition would cost $50M. We need to decide in 2 weeks.”

The Task Parser generates:

{
  "decision_type": "acquisition",
  "problem_statement": "Go/no-go acquisition decision on company X",
  "context": {
    "target": "Company X",
    "metrics": {
      "headcount": 40,
      "arr": "$2M",
      "accounts": 15,
      "price": "$50M"
    }
  },
  "deadline": "2026-03-28",
  "constraint_budget": "$50M",
  "stakeholders": [
    {"role": "CEO", "type": "decision_maker"},
    {"role": "CFO", "type": "stakeholder"},
    {"role": "VP Engineering", "type": "stakeholder"}
  ]
}

This goes to the Analysis phase.

4.2 Analysis

Intelligence Officer Analysis:

Competitive landscape: where is company X positioned?
Market conditions: is this the right time to acquire?
Regulatory or IP considerations
Team stability risk
Integration complexity

Output: A structured intelligence estimate identifying what we know, what we assume, and what we need to verify.

4.3 COA Development

Operations Officer generates three courses of action:

COA 1: “Acquire Now”

Full acquisition at $50M asking price
Target integration in 90 days
Retain all 40 engineers
Accelerate product roadmap with acquired team

COA 2: “Negotiate and Acquire”

Counter-offer at $35M
Selective team acquisition (20 core engineers)
Slower integration (180 days)
Risk losing key talent if negotiations fail

COA 3: “Do Not Acquire”

Hire directly to fill the gap (estimated $1.5M/year for 5 years + 12-month ramp time)
Build internally vs. acquire
Lower immediate capital cost but longer time to capability

4.4 COA Analysis

Operations Officer (Detailed Analysis):

COA 1: $50M cash outlay, 90-day integration risk, fast time to market
COA 2: $35M potential outlay (if negotiation succeeds), 180-day integration, risk of talent retention failure
COA 3: $7.5M total cost, 18-month ramp, retain capital flexibility

Intelligence Officer (Environmental Impact):

Competitor is also sniffing around Company X (time-sensitive)
Market window for the product closes in 12 months
Regulatory environment may tighten next year (affects post-acquisition integration)

Analyst (Critical Assessment):

COA 1: Integration failure rate for mid-market acquisitions is 40–60%.¹¹ You might spend $50M and still lose the team.
COA 2: The asking price is artificially low (negotiation will fail). You’ll end up at COA 1 pricing or lose the deal.
COA 3: By the time you hire and onboard, the market window is closed. This is a $50M opportunity cost, not a cost-savings.

4.5 COA Comparison

Weighted Decision Matrix:

Criterion	Weight	COA 1	COA 2	COA 3
Speed to Market	25%	5 — 90 days	3 — 180 days	1 — 18 months
Capital Efficiency	20%	2 — $50M outlay	4 — $35M potential	5 — $7.5M
Team Retention Risk	20%	3 — moderate risk	2 — high risk	5 — no risk (hire new)
Integration Complexity	15%	2 — high	3 — moderate-high	5 — low (own team)
Competitive Position	20%	5 — fast to capability	4 — moderate	1 — too slow
Weighted Score	100%	3.5	3.1	2.8

COA 1 wins, but the Analyst’s integration risk caveat carries weight.

4.6 Approval

Commander Recommendation: “COA 1 with mitigated risk. The market window and competitive threat justify the $50M price. The integration risk is real but manageable with a 90-day structured integration plan (separate team, preserved decision authority, clear success metrics). I recommend COA 1 with the following risk mitigations: hire an integration lead with M&A experience, define month-1 through month-3 wins in advance, establish weekly executive sync, plan for 25% post-acquisition churn and have replacement hiring plan ready.”

Human Decision: CEO approves COA 1 with risk mitigations noted.

4.7 Execution

Execution Plan:

Week	Action	Owner	Success Metric
W1	Negotiate final terms, legal due diligence	CEO, General Counsel	LOI signed by Friday
W2	Retention agreements with key engineers	CEO, VP Eng	All 15+ core team signed
W3	Integration lead hired, 100-day plan drafted	CEO	Integration lead starts, plan reviewed
W1-2	Product roadmap integration planning	VP Product	Unified roadmap draft
M2	Week 30 integration milestones met	Integration Lead	First product release post-acquisition
M3	Full team integration complete	Integration Lead	Post-integration engagement survey >3.5/5

5. TIERED ACCESS MODEL

Tier 1: Free (Cadet/Student)

Target: ROTC cadets, military students, civilian undergraduates learning decision-making

Capabilities:

Access to MDMP pipeline with limited monthly token budget (100K tokens/month)
Single AI model (Claude or GPT-4o, user choice)
Up to 5 decisions per month
Student guide and teaching materials included

Pricing: Free

Government Pathway: DoD and military academy students have unlimited free access.

Tier 2: Professional

Target: Enterprise planners, consulting firms, startup leadership teams

Capabilities:

Full multi-agent ensemble (3–4 models, user configurable)
Unlimited token budget (pay-as-you-go, $0.03/K tokens)
Unlimited decisions
Advanced analytics (decision velocity, outcome tracking, team performance)
Custom agent roles (add specialist agents for specific domains)
Integration with external tools (calendar, project management, email)

Pricing: $500/month base + token overage

Tier 3: Enterprise/Government

Target: Department of War, enterprise strategy teams, defense contractors

Capabilities:

Dedicated deployment (on-premise or private cloud)
Custom MDMP templates per unit or organization
DARPA/DoD integration pathway
SLA, security compliance (roadmap: FEDRAMP Phase 3), audit logging
Custom agent tuning per organization’s decision patterns
Post-decision outcome tracking and organizational learning feedback loop

Pricing: $500K–$2M annually depending on deployment scope and customization

6. TECHNOLOGY STACK

Backend Infrastructure

Layer	Technology	Rationale	Cost Model
API Gateway	FastAPI (Python)	Lightweight, async-first, built for AI pipelines	Open source
Message Queue	AWS SQS or Redis	Decouple input from processing, handle spikes	$0.50/M messages (SQS)
Orchestration	LangChain or custom	Route tasks to agents, manage parallel processing	LangChain free tier
Database	PostgreSQL (prod) / JSON (staging)	Scalable, ACID compliance, full-text search	Self-hosted or AWS RDS
Voice Processing	Whisper API + WebRTC	Real-time transcription, low latency	$0.006/minute¹²
Agent Runtime	Claude API + GPT-4o + Gemini API	Multi-model routing, parallel execution	Pay-as-you-go per model

Frontend

Component	Technology	Rationale
Web UI	React 18 + TypeScript	Type-safe, component reuse
Mobile	React Native or native iOS/Android	Cross-platform, native performance
Voice Input	WebRTC + Whisper	Browser-based recording, cloud transcription
Real-time Updates	WebSocket + Redux	Live analysis updates as agents work
Document Export	ReportLab (Python) + pdfkit	Generate PDFs on demand

Deployment

Development: Docker containers, local Kubernetes for testing

Staging: AWS ECS on EC2, RDS PostgreSQL, CloudFront CDN

Production (Tier 1/2): AWS Lambda for stateless processing, managed PostgreSQL, API Gateway

Production (Tier 3/Enterprise): On-premise Kubernetes cluster with VPC isolation, encryption at rest and in transit, audit logging. Phase 3 roadmap includes security pathway validation.

7. COMPETITIVE POSITIONING AND GOVERNMENT PATHWAY

7.1 Why This Platform Matters

The AI industry is converging on multi-agent architectures. Every major lab has announced orchestration frameworks. But the industry assumes that more models = better decisions. Paper 6 proved this wrong. The differentiator is not capability. It is structure.

This platform competes not on model capability, but on decision quality. “More models” is commodity. “Better decisions through doctrine” is differentiation.

7.2 DARPA CLARA Alignment

DARPA’s “Collaborative Learning for Resilient AI” (CLARA) program¹³ explicitly solicits AI systems that improve decision-making under uncertainty through multi-model coordination. This platform is CLARA-native:

Resilience: If one model fails, others continue. No single point of failure.
Collaborative Learning: Each decision feeds back into the lessons learned registry, improving future decisions.
Doctrine-Structured: MDMP provides the framework CLARA seeks.

The platform architecture aligns with DARPA, NSF (AI Institutes), and Department of War program requirements, pending formal partnership negotiations.¹⁴

7.3 DoD Deployment Pathway

Immediate (6–12 months):

Deploy Tier 1 free access to ROTC programs nationwide
Pilot Tier 2 with specialized commands
Conduct live exercise demonstrations

Medium Term (1–2 years):

Tier 3 deployment to service component commands
Integration with existing command systems
Certification as an approved planning tool

Long Term (2–3 years):

Multi-national integration (NATO allies)
Experimentation with adaptive staff models

8. ROADMAP

Phase 1: MVP (Month 1–3, April–June 2026)

Deliverable: Single-decision prototype built on current stack, voice input, phases 1–5 (no execution production yet)

Scope:

Voice-to-text pipeline (Whisper)
Task parser (Claude small model)
Single agent (Claude Opus as Commander, running all staff roles)
Web UI for problem input and phase review
JSON backend storage

Success Metric: Conduct 10 live decisions with cadets or military students. Measure decision quality vs. baseline.

Estimated Cost: $200K (engineering + API costs)

Phase 2: Multi-Model Ensemble (Month 4–6, July–September 2026)

Deliverable: Multi-agent platform with 3–4 model ensemble, full phases 1–7, JSON backend

Scope:

Add Gemini 3, GPT-4o, other model integration
Implement role-bounded agent architecture
Phase 7: Execution with Scribe
Lessons Learned Registry (auto-extracted)
Mobile web app (React Native)
Basic analytics dashboard

Success Metric: Tier 2 private beta with 5 enterprise customers. Measure decision velocity and user satisfaction.

Estimated Cost: $500K (multi-model integration + mobile + database)

Phase 3: Enterprise Deployment (Month 7–12, October 2026–March 2027)

Deliverable: Tier 3 enterprise platform with on-premise option, custom templates, outcome tracking

Scope:

On-premise Kubernetes deployment
Security compliance assessment
Custom agent roles per organization
Post-decision outcome tracking
Integration with enterprise tools
Executive reporting and decision analytics

Success Metric: Tier 3 pilot with specialized command. Measure decision outcome improvement vs. baseline.

Estimated Cost: $1.5M (security, compliance, custom integrations)

Phase 4: Advanced Capabilities (Month 13–24, April 2027–March 2028)

Deliverable: Extension for tactical/operational planning (decision support for human commanders)

Scope:

Integration with intelligence feeds
Real-time threat assessment
Distributed decision authority models
Live exercise integration

Success Metric: Field exercise with live decision loop demonstrating 3–4x faster cycle.

Estimated Cost: $2M+ (R&D)

9. RISKS AND MITIGATIONS

Risk 1: Model Capability Changes

What: New frontier models emerge with different APIs, capabilities, or costs. Platform becomes obsolete.

Mitigation: API abstraction layer makes model swapping a configuration change, not code rewrite. The MDMP structure is model-agnostic.

Risk 2: Integration Failure

What: Platform integrates with existing systems but integration is delayed or technically infeasible.

Mitigation: Partnership with integration specialist early. Build API contracts before development starts.

Risk 3: Data Privacy/Compliance

What: Decisions contain sensitive information. Deployment to government requires security hardening beyond Tier 2 scope.

Mitigation: Tier 3 assumes on-premise deployment from day one. Customer owns all data. Encryption at rest.

Risk 4: User Adoption

What: Users reject the structured approach as “too rigid” or “slowing down decisions.”

Mitigation: User training emphasizes that structure accelerates decisions by 3–4x (Paper 6 thesis). Demonstrate with live data from early pilots.

Risk 5: Cost of Multi-Model Ensemble

What: Querying 4 models simultaneously is expensive. Customers balk.

Mitigation: Tier 1 defaults to single-model to keep costs low. Tier 2 makes multi-model cost transparent and optional. Tier 3 budgets appropriately.

10. CONCLUSION

Paper 6 proved that doctrine-structured multi-model ensembles produce better strategic decisions than a single AI model. This paper translates that proof-of-concept into a platform specification that can scale from a cadet learning decision-making to an enterprise planning cell deciding on a multi-billion-dollar program.

The platform is not generic. It is doctrine-first. The MDMP structure is the constraint that makes coordination work. Every architectural decision flows from this principle.

The competitive advantage is not model capability — that is a commodity that every major lab will match. The advantage is structure. A team that runs its decisions through MDMP (with AI assistance) will make better decisions than a team running the same decision through a generic chatbot. The platform operationalizes this advantage.

The government pathway is clear. DARPA CLARA explicitly seeks this type of system. Specialized commands are actively seeking advanced decision support. The market exists. The need is documented.

Paper 8 later names this pattern the Toboggan Doctrine — gravity-fed governance where the agent takes a ride on the reverse-entropy information enricher slide, becoming a factory worker pushing templates around the work area rather than navigating every decision from scratch. The MDMP platform described here is the planning leg of that slide: templates pre-encode the seven steps, the agent rides the channel, and each cycle feeds lessons back into the template.

The remaining question is execution: can we build it fast enough to capture the window?

FOOTNOTES

_{Canonical source: herding-cats.ai/papers/paper-7-mdmp-platform-blueprint/ · Series tag: HCAI-d641c4-P7}


This paper	Paper 7 of 10
Previous	← Paper 6: When the Cats Take the Same Test
Next	Paper 8: The Toboggan Doctrine →

UC Berkeley EECS-2025-164: “From Local Coordination to System-Level Strategies: Designing Reliable, Societal-Scale Multi-Agent Autonomy Across Scales,” Victoria Tuck, 2025. Identified failure modes across multiple categories in multi-agent systems with failure rates 41-86.7%. Available at: https://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-164.html ↩
“Towards a Science of Scaling Agent Systems,” Yubin Kim, Ken Gu, et al. (Google Research, MIT, Google DeepMind), 2025. arXiv:2512.08296. Found multi-agent systems degrade sequential task performance by 39-70% while improving parallel task performance by 80.9%. Available at: https://arxiv.org/abs/2512.08296 ↩
The “doctrine first, software second” principle prioritizes organizational structure and decision-making processes over technical implementation. This inversion makes structure the constraint that governs software architecture. ↩
Whisper V3 achieves Word Error Rate (WER) of approximately 2% (~98% accuracy) on speech segments 8 seconds or longer under standard acoustic conditions. Performance varies by language, accent, and acoustic environment. Source: OpenAI Whisper documentation and independent benchmarks. ↩
Gamma, E., Helm, R., Johnson, R., & Vlissides, J. (1994). Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley. The principle “program to an interface, not an implementation” (p. 18) is the foundational design pattern enabling the slot-based pluggable agent architecture. ↩
Role isolation where Intelligence, Operations, and Analyst each generate analysis independently before synthesis prevents groupthink and preserves analytical diversity critical to adversarial thinking and blind spot detection. ↩
Cemri, M., et al. “Multi-Agent Systems Failure Taxonomy (MAST).” UC Berkeley EECS-2025-164, NeurIPS 2025 Spotlight. Identifies “error propagation in sequential multi-agent workflows” as a primary failure mode class. The circuit breaker pattern addresses this by isolating failed API dependencies before their failures cascade downstream. ↩
Rolled Throughput Yield (RTY) is a Lean Six Sigma metric: RTY = Π(FPY_i) across n process steps. For n=7 phases at 90% FPY each: RTY = 0.9⁷ = 47.8%. Source: Pyzdek, T., & Keller, P. (2014). The Six Sigma Handbook, 4th ed. McGraw-Hill. The force-advance mechanism trades RTY for decision velocity under time pressure — a deliberate operational tradeoff, not a quality failure. ↩
JSON-first backend design maintains technology independence: future migration to PostgreSQL or other databases remains a software problem, not an architecture problem, enabling graceful scaling without core redesign. ↩
Statistical Process Control (SPC) control charts — Shewhart X̄ and R charts, CUSUM, EWMA — distinguish common cause variation (within ±3σ of the process mean) from special cause variation (outside control limits or exhibiting non-random patterns). Source: Montgomery, D.C. (2020). Introduction to Statistical Quality Control, 8th ed. Wiley. Applied to decision quality metrics, control limits define what “normal analytical performance” looks like and flag when investigation is warranted. ↩
Integration failure rates for mid-market acquisitions (target valuations $20M-$500M) range from 40% to 60%, with common failure modes including cultural misalignment, technical debt incompatibility, and talent retention failure. ↩
OpenAI Whisper API transcription cost is $0.006 per minute of audio processed. Source: OpenAI API pricing, current as of March 2026. ↩
DARPA’s Collaborative Learning for Resilient AI (CLARA) program solicits AI systems that improve decision-making under uncertainty through multi-model coordination. Source: DARPA Programs. Available at: https://www.darpa.mil/research/programs/clara ↩
Specialized commands are actively exploring advanced decision support and adaptive planning architectures. The Department of War 2026 initiatives include funding for agentic AI experimentation. Formal partnership negotiations are pending. ↩

MDMP Platform Blueprint

MDMP-NATIVE AI DECISION PLATFORM

Architectural Blueprint and Product Specification

SERIES NOTE

EXECUTIVE SUMMARY

1. THE PROBLEM PAPER 6 SOLVED

1.1 The Ensemble Hypothesis

1.2 The Paper 6 Design

1.3 Key Findings

2. THE PLATFORM VISION

2.1 Core Principle: Doctrine First, Software Second

2.2 User Perspective: Voice-First Input

2.3 Output Layer: Decision Documents

3. ARCHITECTURE OVERVIEW

3.1 Input Layer

3.2 Task Engine

3.3 Agent Layer: Multi-AI Ensemble

3.3.1 Error Handling and Resilience

3.4 MDMP Processing Pipeline

3.4.1 Phase Acceptance Gates

3.5 Human-in-the-Loop Design

3.6 Data Layer: JSON-First Backend

3.6.1 Multi-Tenant Isolation

3.7 Output Layer: Decision Documents

4. MDMP PROCESSING PIPELINE: DETAILED FLOW

4.1 Problem Receipt

4.2 Analysis

4.3 COA Development

4.4 COA Analysis

4.5 COA Comparison

4.6 Approval

4.7 Execution

5. TIERED ACCESS MODEL

Tier 1: Free (Cadet/Student)

Tier 2: Professional

Tier 3: Enterprise/Government

6. TECHNOLOGY STACK

Backend Infrastructure

Frontend

Deployment

7. COMPETITIVE POSITIONING AND GOVERNMENT PATHWAY

7.1 Why This Platform Matters

7.2 DARPA CLARA Alignment

7.3 DoD Deployment Pathway

8. ROADMAP

Phase 1: MVP (Month 1–3, April–June 2026)

Phase 2: Multi-Model Ensemble (Month 4–6, July–September 2026)

Phase 3: Enterprise Deployment (Month 7–12, October 2026–March 2027)

Phase 4: Advanced Capabilities (Month 13–24, April 2027–March 2028)

9. RISKS AND MITIGATIONS

Risk 1: Model Capability Changes

Risk 2: Integration Failure

Risk 3: Data Privacy/Compliance

Risk 4: User Adoption

Risk 5: Cost of Multi-Model Ensemble

10. CONCLUSION

FOOTNOTES

Series Navigation

RELATED

Footnotes