Skip to content

MDMP Platform Blueprint

Architectural Blueprint and Product Specification

Section titled “Architectural Blueprint and Product Specification”

Jeep Marshall LTC, US Army (Retired) Airborne Infantry | Special Operations | Process Improvement jeep.marshall@gmail.com March 2026


This is Paper 7 in the Herding Cats in the AI Age series. Paper 1 established that AI needs doctrine, not more intelligence. Paper 2 showed the military already built the coordination frameworks the AI industry lacks. Paper 3 proved those principles work in a live laboratory. Paper 4 demonstrated the consequences when coordination is absent. Paper 5 showed that two models can negotiate a coordination protocol in real time. Paper 6 proved that four models assigned military staff roles produce demonstrably better strategic decisions than a solo baseline. Paper 7 now asks the next operational question: what does a production platform look like that scales this coordination pattern from a proof-of-concept to a system any team can use?


Paper 6’s proof-of-concept validated that doctrine-structured multi-model ensembles produce measurably better strategic analysis than a single AI model. The ensemble surfaced 6 strategic insights the solo baseline missed, including 2 rated HIGH value. Most critically, Paper 6 demonstrated that a structured framework makes the difference: without doctrine, four models produce chaos. With doctrine, they function as a coordinated team.

This paper translates that proof-of-concept into a production platform specification. The thesis is straightforward: a conversational, MDMP-native platform that accepts natural voice or text input, converts it into structured tasks, deploys AI agents to work those tasks in parallel under defined roles, and surfaces decision-ready outputs is both technically viable and strategically necessary.

The platform is not a generic multi-agent orchestrator. It is doctrine-first, then software. The MDMP structure is the constraint that makes coordination work. This paper provides the technical architecture, deployment model, access tiers, and roadmap for building such a system.


Before Paper 6, the field evidence on multi-agent coordination was mixed at best. A UC Berkeley study identified 14 distinct failure modes across multi-agent systems with failure rates ranging from 41% to 86.7%.1 A Google/MIT collaboration found that multi-agent systems degraded performance on sequential tasks by 39–70%.2 The assumed answer — “more models = better decisions” — was empirically false.

Paper 6 tested a different hypothesis: more models with structure beats one model without it, and solo models with structure beat any ensemble without it. The structure in question was military doctrine — the MDMP (Military Decision Making Process).

Four frontier AI models were assigned specialized analysis roles:

  • Commander (Claude Opus 4.6): Synthesis and final recommendation
  • Intelligence Officer (Gemini 3): Environmental scan, threat analysis
  • Operations Officer (ChatGPT/GPT-4o): COA development and feasibility analysis
  • Analyst (Grok/SuperGrok): Contrarian analysis, failure mode identification

All four models received identical mission briefing on the series’ own publication strategy — a real decision with real stakes, not a contrived benchmark. The solo baseline (Claude) ran the full process alone. The ensemble models ran independently, then the Commander synthesized their outputs.

What the Ensemble Surfaced That Solo Missed:

  1. Content Strategy (Gemini) — HIGH Value: Break Paper 2’s 33,500 words into 10 “Operational Briefs” for sustained content distribution. Solo Claude treated it as monolithic.

  2. Hybrid Approach (ChatGPT) — MEDIUM-HIGH Value: “Doctrine → Articles → Case Studies → Book” sequence. ChatGPT synthesized multiple approaches into a hybrid generating more content touchpoints.

  3. Competitive Threat (Gemini) — MEDIUM Value: McKinsey/Deloitte rebranding quality methodology for AI creates a specific, named competitive threat with urgency.

  4. Credibility Challenge (Grok) — HIGH Value: Specific vulnerabilities the author must pre-empt.

  5. Implementation Gap (Grok) — MEDIUM Value: Build a GitHub repo with code and simulations to compete on implementation, not essays alone.

  6. Alignment Opportunity (Gemini) — MEDIUM Value: Position the series to align with Department of War 2026 initiatives.

What Solo Did Better:

  • Operational detail: Week-by-week execution plans with hour estimates and decision gates
  • Risk register: 8 risks with likelihood, impact, and specific mitigations
  • Deadline management: Factored hard constraints into timeline
  • Assumption validation: Identified 5 specific assumptions requiring validation with deadlines

The Refined Thesis:

“A doctrine-structured multi-model ensemble surfaces strategic blind spots that solo analysis misses, while solo analysis produces superior operational detail. The optimal pattern is ensemble for strategy, solo for operations — and doctrine is the constant that makes both work.”


2.1 Core Principle: Doctrine First, Software Second

Section titled “2.1 Core Principle: Doctrine First, Software Second”

The platform is not a general-purpose multi-agent orchestrator. It is MDMP-native.3 The MDMP structure is not a bolt-on feature or an optional workflow. It is the foundation.

Why? Because Paper 6 proved that without structure, multi-model coordination fails. The MDMP provides the structure: defined phases, assigned roles, synthesis procedures, and decision gates. These are not procedural niceties. They are the constraints that make the system work.

A user speaks naturally into the system: “We have a market entry decision for the European AI regulation market. We need to assess whether to enter before the GDPR AI Act is finalized or wait for regulatory clarity. We need this decision by Friday.”

The platform converts this natural language input into structured tasks:

  1. Problem Receipt: Parse the decision, extract deadline, identify stakeholders
  2. Analysis Phase: What are the facts, assumptions, constraints, and risks?
  3. COA Development: What are the courses of action?
  4. COA Analysis: What are the failure modes and second-order effects?
  5. COA Comparison: Which COA is optimal against defined criteria?
  6. Approval: Commander recommends, human decides
  7. Execution: Actionable next steps, timelines, accountability

Throughout this process, AI agents work in parallel:

  • Intelligence agent scans the regulatory landscape
  • Operations agent develops market entry scenarios
  • Analyst tears each scenario apart
  • Commander synthesizes everything and presents decision-ready outputs

The human sees each stage of analysis and can inject decisions, redirect analysis, or request deeper dives at any phase.

The system produces outputs aligned to standard planning structure:

  • Situation Report: Current state, constraints, environmental factors
  • Problem Statement: What we are deciding and why
  • Strategic Objectives: What success looks like and decision-making logic
  • Courses of Action: Multiple COAs with analysis
  • Selected COA: The decision and reasoning
  • Execution Plan: Immediate next steps, accountability, timeline
  • Lessons Learned Registry: What we learned for future decisions

This structure is not arbitrary. It forces clarity. It prevents decisions from being based on vague intuitions. It creates an audit trail.


Voice-to-Text Pipeline:

  • Whisper API for transcription (proven quality, ~98% accuracy at 8+ second clips)4
  • Speaker identification (multi-user support with role tagging)
  • Real-time transcription (streaming to processing layer, no batch delays)

Text Chat Fallback:

  • Web/mobile chat interface for users who prefer typing
  • Prompt templates for common decision types (resource allocation, market entry, personnel decisions, etc.)
  • Structured prompt injection for role-bounded agent inputs

Inbox Queue: All inputs become discrete tasks in a message queue. No input is lost. No decision is left in the uncompleted state.

Task Parser AI: A dedicated Claude or Gemini model (small model, fast) converts raw input into structured work items:

  • Problem statement
  • Context and constraints
  • Decision deadline
  • Primary stakeholder (the decision-maker)
  • Secondary stakeholders (who has input authority)

Priority Sorter: Routes tasks to appropriate phase. A task arriving mid-decision might jump to COA Analysis. A task arriving at 4 PM with a Friday deadline gets priority queued.

Slot-Based Architecture (Pluggable):

RolePrimary ModelBackup ModelFunctionUpdate Frequency
Commander (Synthesis)Claude SonnetClaude OpusDecision synthesis, conflict resolution, final recommendationPer decision
Intelligence OfficerGemini 3Claude OpusEnvironmental scan, competitive landscape, threat analysisPer COA
Operations OfficerGPT-4oClaude OpusCOA development, feasibility analysis, resource modelingPer COA
AnalystGrok/SuperGrokClaude OpusCritical analysis, failure modes, second-order effectsPer COA
Scribe (Optional)Claude HaikuMeeting notes, transcript synthesis, lesson extractionPer session

Each slot can be filled by different models based on budget, capability needs, or availability. The architecture is model-agnostic. The MDMP structure is model-independent.

The slot-based pluggable architecture implements dependency injection: the framework defines the interface (Intelligence, Operations), and any compatible model fills the slot. This is the software engineering principle of programming to interfaces, not implementations.5 The organizational chart is the contract. The model is the implementation. Swapping implementations does not require redesigning the organization.

Figure 1 — MDMP Platform Architecture

Output Layer

MDMP Processing Pipeline

Input Layer

access controls

parallel analysis

resilience

Quality Controls

Circuit Breaker

Control Chart

Force-Advance Log

Agent Slot Layer - Pluggable

Personnel Slot

Intelligence Slot

Operations Slot

Sustainment Slot

Plans Slot

Communications Slot

Commander Slot - Synthesis

Tiered Access

Free - Students and ROTC Cadets

Pro - Unit Staff Officers

Enterprise - 2M Government

Voice Input

Text Input

Mission Brief Upload

Phase 1 - Receipt of Problem

Gate 1-2

Phase 2 - Analysis

Gate 2-3

Phase 3 - COA Development

Gate 3-4

Phase 4 - COA Analysis

Gate 4-5

Phase 5 - Comparison

Gate 5-6

Phase 6 - Approval

Gate 6-7

Phase 7 - Execution

Orders

Update

Notice

Audit Trail

MDMP-Native AI Decision Platform — architecture showing input layer, 7-phase pipeline with acceptance gates, pluggable agent slots, and tiered output. Each gate implements force-advance logic and circuit breaker patterns for graceful degradation.

Key Feature: Role Isolation

Each agent operates independently until synthesis phase. The Intelligence officer does not see Operations output before generating its analysis. The Analyst does not know which COA was recommended before attacking all three. This prevents groupthink and preserves distinct analytical perspectives.6

Multi-agent systems fail. One model times out. One API rate-limits. One agent hallucinates. The platform must degrade gracefully: survive partial failures and surface what is known.

Agent Timeout Handling:

Each agent has a 30-second default timeout (configurable per role: intelligence agents 45s, operations agents 60s, scribe 15s). If an agent does not return before timeout, the orchestrator:

  1. Logs the timeout with agent name, phase, and deadline.
  2. Marks the agent’s slot as “timed_out” in the decision JSON.
  3. If the agent is critical (Commander, Operations), escalates to backup model immediately.
  4. If the agent is secondary (Analyst), continues with 3-agent analysis (logged as reduced ensemble).

Retry Logic:

Transient failures (API rate limit, network timeout) trigger exponential backoff: 1s, 2s, 4s, max 3 retries. Permanent failures (auth failure, model deprecated) log immediately and do not retry. Backoff is per-task, not per-request, preventing cascade failures across the queue.

Fallback Activation:

Each agent role has a primary and backup model. If the primary model becomes unavailable (API down, quota exceeded, model deprecated), the orchestrator switches to the backup model. The decision JSON records the switch: "s2_model": "gemini-3", "s2_backup_activated": true. Output quality may degrade, but analysis continues.

Contradiction Detection:

Intelligence and Operations outputs are cross-referenced. If Operations claims “market window closes in 12 months” but Intelligence found “market window closes in 8 months,” the contradiction is flagged: "contradiction_detected": {"claim_a": "...", "claim_b": "...", "severity": "HIGH"}. Human is alerted. Analysis continues, but the conflict is surfaced in the Commander’s recommendation.

Graceful Degradation:

If 1 of 4 agents fails to return analysis (timeout, error):

  • 3-agent ensemble produces output.
  • The decision JSON marks the missing agent’s slot.
  • Commander synthesizes 3 analyses + notes the gap.
  • Post-decision review flags the missing perspective.

If 2 of 4 agents fail: escalate to human. The system will not produce a recommendation with less than half the ensemble.

Graceful degradation means the system’s output quality degrades proportionally to the lost model’s contribution weight, not catastrophically. A four-model ensemble that loses one model retains 75% of its analytical surface. It does not fail. It narrows. No single model is load-bearing — the ensemble distributes responsibility across roles.

Circuit Breaker Pattern:

If a single API (e.g., GPT-4o, Gemini) fails 3 times in 5 minutes, the orchestrator opens a circuit breaker: no further requests to that API for 5 minutes. Alternative models are used. After 5 minutes, a single test request is sent. If successful, the circuit closes and normal operation resumes. This prevents hammering a degraded service and prevents cascade failure.7

The seven-step MDMP structure is the spine of the system:

Figure 7.1 — MDMP Seven-Step Cycle with AI Role Overlays

feedback

owns

owns

owns

owns

owns

owns

owns

1. Receipt

of Mission

2. Mission

Analysis

3. COA

Development

4. COA

Analysis

5. COA

Comparison

6. COA

Approval

7. Orders

Production

Supervisor

Agent

Intel +

Analysis

Plans Agent

Devil's

Advocate

Scoring

Agent

Supervisor

Agent

Orders +

Comms

The MDMP seven-step cycle (solid arrows) with AI agent roles (dashed) overlaid per phase. Each phase has a primary owner; the Supervisor Agent bookends the cycle (receipt + approval). The feedback loop from step 7 back to step 1 is the Plan→Execute→Assess rhythm that makes MDMP iterative rather than one-shot.

Phase 1: Problem Receipt

  • Parse the decision problem statement
  • Extract explicit constraints (deadline, budget, stakeholder list)
  • Identify the decision-maker (who has authority)
  • Flag any ambiguities or missing context

Phase 2: Analysis

  • Intelligence Officer: environmental scan, threat analysis
  • Identify facts vs. assumptions
  • Extract constraints
  • Initial risk register

Phase 3: COA Development

  • Operations Officer: generate 3–5 mutually exclusive courses of action
  • Each COA is fully described (what, how, timeline, resource requirements)
  • Analyst: preliminary evaluation of each COA

Phase 4: COA Analysis

  • Operations Officer: detailed feasibility analysis for each COA
  • Intelligence Officer: implications of each COA in the broader environment
  • Analyst: full critical analysis — what could go wrong, what am I missing
  • Risk modeling: likelihood, impact, mitigation for each COA

Phase 5: COA Comparison

  • Operations Officer: structured comparison matrix
  • Score each COA against weighted criteria
  • Rank the COAs

Phase 6: Approval

  • Commander: review all analysis
  • Select recommended COA
  • Document reasoning (why this one, what other considerations mattered)
  • Human decision-maker: approve, reject, or request additional analysis

Phase 7: Execution

  • Scribe: translate selected COA into actionable next steps
  • Timeline: week-by-week or day-by-day depending on decision urgency
  • Assign ownership: who is accountable for each action
  • Define success metrics: how will we know this is working

Each MDMP phase transition requires explicit gate approval. A gate is a decision point: data sufficient to advance, or re-analysis required? Gates prevent premature advancement and ensure decision quality degrades gracefully under time pressure.

The MDMP pipeline maps precisely onto DMAIC — the Lean Six Sigma improvement cycle. Military doctrine and process improvement arrived at the same structure from different directions:

DMAIC PhaseMDMP EquivalentPlatform Implementation
DefineAnalysisProblem framing, decision scope, success metrics extraction
MeasureCOA DevelopmentData collection, baseline assessment, COA generation
AnalyzeCOA AnalysisRoot cause analysis, failure mode identification, COA evaluation
ImproveCOA Selection + ExecutionSolution implementation, execution plan, accountability assignment
ControlExecution + Assessment + LessonsMonitoring, course correction, outcome tracking

This equivalence is not decorative. It means every quality improvement tool from the DMAIC toolkit — control charts, Pareto analysis, root cause analysis, process sigma calculations — applies directly to MDMP phase performance measurement. The platform can report a process sigma for its decision pipeline the same way a manufacturing system reports sigma for its production line.

Rolled Throughput Yield across phase gates: Each phase gate has a first-pass yield — the probability that a decision advances without requiring rework. Rolled Throughput Yield across all gates: RTY = Π(FPY_phase_i). If each of seven phases passes at 90% FPY, the pipeline has RTY = 0.9⁷ = 47.8% — less than half of decisions pass all gates without rework at that rate. This is why force-advance logic exists: in time-compressed situations, perfect quality across every gate is the wrong optimization target. The platform logs which gates were forced-advanced and why, enabling post-decision RTY analysis and systematic improvement over time.8

Gate Structure:

PhaseGate CriteriaOwnerForce-Advance?
1 → 2Problem statement clear; deadline explicit; stakeholders identifiedTask ParserYes (logged)
2 → 3Facts/assumptions documented; constraints listed; risk register initialized with ≥3 risksIntelligenceYes (logged)
3 → 4≥3 COAs generated, mutually exclusive, each fully describedOperationsYes (logged)
4 → 5COA analysis complete; Analyst objections documented; scoring rubric finalizedAnalystYes (logged)
5 → 6Comparison matrix complete; ranking clear; tiebreaker criteria definedOperationsYes (logged)
6 → 7Commander recommendation documented; human has reviewed analysis; approval/defer decision loggedCommanderYes (logged)

Gate Logic:

  • If gate criteria are met: advance immediately.
  • If gate criteria are not met and deadline permits: trigger re-analysis on the blocked phase.
  • If gate criteria are not met and deadline is imminent: human can force-advance. Log the override: timestamp, gate, reason, decision-maker.

Failed gates do not halt the system. They trigger re-analysis. If human forces advance, the gap is logged for post-decision review.

Rationale: Gates enforce discipline while respecting time pressure. In a 2-hour decision cycle, all gates may be forced-advanced (logged). In a 2-week cycle, most gates are met before advancement.

Humans decide. AI advises.

At three critical junctures, humans retain authority:

  1. After Comparison: Before Commander synthesis, the human reviews what agents generated. Any human can say “wait, I need to understand the intelligence analysis better” and request deeper analysis on specific aspects.

  2. Before Selection: The human makes the final decision. AI recommends. Human decides.

  3. Before Execution: The human reviews the execution plan. No task leaves the system without human eyes on it.

Override Logging:

Every human intervention is documented:

  • What analysis did the human override?
  • Why (what information changed the recommendation)?
  • Timestamp and decision-maker

This creates an audit trail. It also creates organizational learning: why did the machine recommend A but the human chose B? Was the human right? We learn over time.

Why JSON?

  • Industry standard, portable, scalable
  • No proprietary lock-in
  • Data exportable at any time
  • Future migration from JSON to PostgreSQL is a software problem, not an architecture problem9

Schema:

{
"decision": {
"id": "UUID",
"problem_statement": "...",
"deadline": "ISO-8601",
"stakeholders": ["..."],
"created": "ISO-8601",
"decision_maker": "UUID"
},
"analysis": {
"facts": ["..."],
"assumptions": ["..."],
"constraints": ["..."],
"risks": [...]
},
"options": [
{
"id": "OPTION-1",
"title": "...",
"description": "...",
"intelligence_analysis": "...",
"operations_analysis": "...",
"analyst_evaluation": "...",
"score": 3.65
}
],
"selected_option": "OPTION-1",
"commander_recommendation": "...",
"human_decision": "APPROVE | REJECT | DEFER",
"execution_plan": [...],
"lessons_learned": [...]
}

Every decision is a JSON document. Every analysis is timestamped. Every decision is traceable.

Production deployment requires strict tenant isolation to prevent cross-customer data leakage. Every decision object carries a tenant_id field. Database security enforces row-level access control: a query for decisions belonging to Tenant A always returns only Tenant A’s data.

Isolation mechanisms:

  1. Row-Level Security (RLS): PostgreSQL policy on all tables restricts queries to WHERE tenant_id = current_setting('app.tenant_id'). Tenant context is set at connection initialization, not at query time.

  2. Logical Isolation: Each tenant has isolated API keys. A stolen key grants access only to that tenant’s decisions. Key scope is enforced at the API gateway.

  3. Blast Radius Containment: If Tenant A’s API key is compromised, exposure is limited to Tenant A’s data. Message queues are partitioned by tenant_id.

  4. Rate Limiting Per Tenant: Token quota and request rate limits are per-tenant. One tenant’s spike in usage does not starve other tenants’ processing.

  5. Credential Rotation: API keys expire after 90 days (configurable per tier). Expired keys return 401 immediately.

Situation Report Dashboard:

  • Current decision status
  • Timeline to deadline
  • Key insights from each agent
  • Outstanding decisions or approvals

The dashboard is a real-time control chart: decision quality metrics plotted against control limits derived from the platform’s historical performance baseline. Violations trigger investigation, not automatic rollback — distinguishing special cause variation from common cause variation.10 The commander sees not just the current status but whether current performance falls within the system’s normal operating envelope.

Execution Orders:

  • Problem statement
  • Strategic objectives
  • Selected COA and reasoning
  • Execution timeline
  • Accountability (who owns what)
  • Success metrics

Lessons Learned Registry:

  • Auto-generated from decision analysis
  • Connected to historical decisions (is this similar to a past decision?)
  • Tagged by decision domain (market entry, resource allocation, personnel, etc.)

4. MDMP PROCESSING PIPELINE: DETAILED FLOW

Section titled “4. MDMP PROCESSING PIPELINE: DETAILED FLOW”

A user inputs (voice or text): “We’re deciding whether to acquire company X. They have 40 engineers, $2M ARR, 15 accounts. Acquisition would cost $50M. We need to decide in 2 weeks.”

The Task Parser generates:

{
"decision_type": "acquisition",
"problem_statement": "Go/no-go acquisition decision on company X",
"context": {
"target": "Company X",
"metrics": {
"headcount": 40,
"arr": "$2M",
"accounts": 15,
"price": "$50M"
}
},
"deadline": "2026-03-28",
"constraint_budget": "$50M",
"stakeholders": [
{"role": "CEO", "type": "decision_maker"},
{"role": "CFO", "type": "stakeholder"},
{"role": "VP Engineering", "type": "stakeholder"}
]
}

This goes to the Analysis phase.

Intelligence Officer Analysis:

  • Competitive landscape: where is company X positioned?
  • Market conditions: is this the right time to acquire?
  • Regulatory or IP considerations
  • Team stability risk
  • Integration complexity

Output: A structured intelligence estimate identifying what we know, what we assume, and what we need to verify.

Operations Officer generates three courses of action:

COA 1: “Acquire Now”

  • Full acquisition at $50M asking price
  • Target integration in 90 days
  • Retain all 40 engineers
  • Accelerate product roadmap with acquired team

COA 2: “Negotiate and Acquire”

  • Counter-offer at $35M
  • Selective team acquisition (20 core engineers)
  • Slower integration (180 days)
  • Risk losing key talent if negotiations fail

COA 3: “Do Not Acquire”

  • Hire directly to fill the gap (estimated $1.5M/year for 5 years + 12-month ramp time)
  • Build internally vs. acquire
  • Lower immediate capital cost but longer time to capability

Operations Officer (Detailed Analysis):

  • COA 1: $50M cash outlay, 90-day integration risk, fast time to market
  • COA 2: $35M potential outlay (if negotiation succeeds), 180-day integration, risk of talent retention failure
  • COA 3: $7.5M total cost, 18-month ramp, retain capital flexibility

Intelligence Officer (Environmental Impact):

  • Competitor is also sniffing around Company X (time-sensitive)
  • Market window for the product closes in 12 months
  • Regulatory environment may tighten next year (affects post-acquisition integration)

Analyst (Critical Assessment):

  • COA 1: Integration failure rate for mid-market acquisitions is 40–60%.11 You might spend $50M and still lose the team.
  • COA 2: The asking price is artificially low (negotiation will fail). You’ll end up at COA 1 pricing or lose the deal.
  • COA 3: By the time you hire and onboard, the market window is closed. This is a $50M opportunity cost, not a cost-savings.

Weighted Decision Matrix:

CriterionWeightCOA 1COA 2COA 3
Speed to Market25%5 — 90 days3 — 180 days1 — 18 months
Capital Efficiency20%2 — $50M outlay4 — $35M potential5 — $7.5M
Team Retention Risk20%3 — moderate risk2 — high risk5 — no risk (hire new)
Integration Complexity15%2 — high3 — moderate-high5 — low (own team)
Competitive Position20%5 — fast to capability4 — moderate1 — too slow
Weighted Score100%3.53.12.8

COA 1 wins, but the Analyst’s integration risk caveat carries weight.

Commander Recommendation: “COA 1 with mitigated risk. The market window and competitive threat justify the $50M price. The integration risk is real but manageable with a 90-day structured integration plan (separate team, preserved decision authority, clear success metrics). I recommend COA 1 with the following risk mitigations: hire an integration lead with M&A experience, define month-1 through month-3 wins in advance, establish weekly executive sync, plan for 25% post-acquisition churn and have replacement hiring plan ready.”

Human Decision: CEO approves COA 1 with risk mitigations noted.

Execution Plan:

WeekActionOwnerSuccess Metric
W1Negotiate final terms, legal due diligenceCEO, General CounselLOI signed by Friday
W2Retention agreements with key engineersCEO, VP EngAll 15+ core team signed
W3Integration lead hired, 100-day plan draftedCEOIntegration lead starts, plan reviewed
W1-2Product roadmap integration planningVP ProductUnified roadmap draft
M2Week 30 integration milestones metIntegration LeadFirst product release post-acquisition
M3Full team integration completeIntegration LeadPost-integration engagement survey >3.5/5

Target: ROTC cadets, military students, civilian undergraduates learning decision-making

Capabilities:

  • Access to MDMP pipeline with limited monthly token budget (100K tokens/month)
  • Single AI model (Claude or GPT-4o, user choice)
  • Up to 5 decisions per month
  • Student guide and teaching materials included

Pricing: Free

Government Pathway: DoD and military academy students have unlimited free access.

Target: Enterprise planners, consulting firms, startup leadership teams

Capabilities:

  • Full multi-agent ensemble (3–4 models, user configurable)
  • Unlimited token budget (pay-as-you-go, $0.03/K tokens)
  • Unlimited decisions
  • Advanced analytics (decision velocity, outcome tracking, team performance)
  • Custom agent roles (add specialist agents for specific domains)
  • Integration with external tools (calendar, project management, email)

Pricing: $500/month base + token overage

Target: Department of War, enterprise strategy teams, defense contractors

Capabilities:

  • Dedicated deployment (on-premise or private cloud)
  • Custom MDMP templates per unit or organization
  • DARPA/DoD integration pathway
  • SLA, security compliance (roadmap: FEDRAMP Phase 3), audit logging
  • Custom agent tuning per organization’s decision patterns
  • Post-decision outcome tracking and organizational learning feedback loop

Pricing: $500K–$2M annually depending on deployment scope and customization


LayerTechnologyRationaleCost Model
API GatewayFastAPI (Python)Lightweight, async-first, built for AI pipelinesOpen source
Message QueueAWS SQS or RedisDecouple input from processing, handle spikes$0.50/M messages (SQS)
OrchestrationLangChain or customRoute tasks to agents, manage parallel processingLangChain free tier
DatabasePostgreSQL (prod) / JSON (staging)Scalable, ACID compliance, full-text searchSelf-hosted or AWS RDS
Voice ProcessingWhisper API + WebRTCReal-time transcription, low latency$0.006/minute12
Agent RuntimeClaude API + GPT-4o + Gemini APIMulti-model routing, parallel executionPay-as-you-go per model
ComponentTechnologyRationale
Web UIReact 18 + TypeScriptType-safe, component reuse
MobileReact Native or native iOS/AndroidCross-platform, native performance
Voice InputWebRTC + WhisperBrowser-based recording, cloud transcription
Real-time UpdatesWebSocket + ReduxLive analysis updates as agents work
Document ExportReportLab (Python) + pdfkitGenerate PDFs on demand

Development: Docker containers, local Kubernetes for testing

Staging: AWS ECS on EC2, RDS PostgreSQL, CloudFront CDN

Production (Tier 1/2): AWS Lambda for stateless processing, managed PostgreSQL, API Gateway

Production (Tier 3/Enterprise): On-premise Kubernetes cluster with VPC isolation, encryption at rest and in transit, audit logging. Phase 3 roadmap includes security pathway validation.


7. COMPETITIVE POSITIONING AND GOVERNMENT PATHWAY

Section titled “7. COMPETITIVE POSITIONING AND GOVERNMENT PATHWAY”

The AI industry is converging on multi-agent architectures. Every major lab has announced orchestration frameworks. But the industry assumes that more models = better decisions. Paper 6 proved this wrong. The differentiator is not capability. It is structure.

This platform competes not on model capability, but on decision quality. “More models” is commodity. “Better decisions through doctrine” is differentiation.

DARPA’s “Collaborative Learning for Resilient AI” (CLARA) program13 explicitly solicits AI systems that improve decision-making under uncertainty through multi-model coordination. This platform is CLARA-native:

  • Resilience: If one model fails, others continue. No single point of failure.
  • Collaborative Learning: Each decision feeds back into the lessons learned registry, improving future decisions.
  • Doctrine-Structured: MDMP provides the framework CLARA seeks.

The platform architecture aligns with DARPA, NSF (AI Institutes), and Department of War program requirements, pending formal partnership negotiations.14

Immediate (6–12 months):

  • Deploy Tier 1 free access to ROTC programs nationwide
  • Pilot Tier 2 with specialized commands
  • Conduct live exercise demonstrations

Medium Term (1–2 years):

  • Tier 3 deployment to service component commands
  • Integration with existing command systems
  • Certification as an approved planning tool

Long Term (2–3 years):

  • Multi-national integration (NATO allies)
  • Experimentation with adaptive staff models

Phase 1: MVP (Month 1–3, April–June 2026)

Section titled “Phase 1: MVP (Month 1–3, April–June 2026)”

Deliverable: Single-decision prototype built on current stack, voice input, phases 1–5 (no execution production yet)

Scope:

  • Voice-to-text pipeline (Whisper)
  • Task parser (Claude small model)
  • Single agent (Claude Opus as Commander, running all staff roles)
  • Web UI for problem input and phase review
  • JSON backend storage

Success Metric: Conduct 10 live decisions with cadets or military students. Measure decision quality vs. baseline.

Estimated Cost: $200K (engineering + API costs)

Phase 2: Multi-Model Ensemble (Month 4–6, July–September 2026)

Section titled “Phase 2: Multi-Model Ensemble (Month 4–6, July–September 2026)”

Deliverable: Multi-agent platform with 3–4 model ensemble, full phases 1–7, JSON backend

Scope:

  • Add Gemini 3, GPT-4o, other model integration
  • Implement role-bounded agent architecture
  • Phase 7: Execution with Scribe
  • Lessons Learned Registry (auto-extracted)
  • Mobile web app (React Native)
  • Basic analytics dashboard

Success Metric: Tier 2 private beta with 5 enterprise customers. Measure decision velocity and user satisfaction.

Estimated Cost: $500K (multi-model integration + mobile + database)

Phase 3: Enterprise Deployment (Month 7–12, October 2026–March 2027)

Section titled “Phase 3: Enterprise Deployment (Month 7–12, October 2026–March 2027)”

Deliverable: Tier 3 enterprise platform with on-premise option, custom templates, outcome tracking

Scope:

  • On-premise Kubernetes deployment
  • Security compliance assessment
  • Custom agent roles per organization
  • Post-decision outcome tracking
  • Integration with enterprise tools
  • Executive reporting and decision analytics

Success Metric: Tier 3 pilot with specialized command. Measure decision outcome improvement vs. baseline.

Estimated Cost: $1.5M (security, compliance, custom integrations)

Phase 4: Advanced Capabilities (Month 13–24, April 2027–March 2028)

Section titled “Phase 4: Advanced Capabilities (Month 13–24, April 2027–March 2028)”

Deliverable: Extension for tactical/operational planning (decision support for human commanders)

Scope:

  • Integration with intelligence feeds
  • Real-time threat assessment
  • Distributed decision authority models
  • Live exercise integration

Success Metric: Field exercise with live decision loop demonstrating 3–4x faster cycle.

Estimated Cost: $2M+ (R&D)


What: New frontier models emerge with different APIs, capabilities, or costs. Platform becomes obsolete.

Mitigation: API abstraction layer makes model swapping a configuration change, not code rewrite. The MDMP structure is model-agnostic.

What: Platform integrates with existing systems but integration is delayed or technically infeasible.

Mitigation: Partnership with integration specialist early. Build API contracts before development starts.

What: Decisions contain sensitive information. Deployment to government requires security hardening beyond Tier 2 scope.

Mitigation: Tier 3 assumes on-premise deployment from day one. Customer owns all data. Encryption at rest.

What: Users reject the structured approach as “too rigid” or “slowing down decisions.”

Mitigation: User training emphasizes that structure accelerates decisions by 3–4x (Paper 6 thesis). Demonstrate with live data from early pilots.

What: Querying 4 models simultaneously is expensive. Customers balk.

Mitigation: Tier 1 defaults to single-model to keep costs low. Tier 2 makes multi-model cost transparent and optional. Tier 3 budgets appropriately.


Paper 6 proved that doctrine-structured multi-model ensembles produce better strategic decisions than a single AI model. This paper translates that proof-of-concept into a platform specification that can scale from a cadet learning decision-making to an enterprise planning cell deciding on a multi-billion-dollar program.

The platform is not generic. It is doctrine-first. The MDMP structure is the constraint that makes coordination work. Every architectural decision flows from this principle.

The competitive advantage is not model capability — that is a commodity that every major lab will match. The advantage is structure. A team that runs its decisions through MDMP (with AI assistance) will make better decisions than a team running the same decision through a generic chatbot. The platform operationalizes this advantage.

The government pathway is clear. DARPA CLARA explicitly seeks this type of system. Specialized commands are actively seeking advanced decision support. The market exists. The need is documented.

Paper 8 later names this pattern the Toboggan Doctrine — gravity-fed governance where the agent takes a ride on the reverse-entropy information enricher slide, becoming a factory worker pushing templates around the work area rather than navigating every decision from scratch. The MDMP platform described here is the planning leg of that slide: templates pre-encode the seven steps, the agent rides the channel, and each cycle feeds lessons back into the template.

The remaining question is execution: can we build it fast enough to capture the window?



Canonical source: herding-cats.ai/papers/paper-7-mdmp-platform-blueprint/ · Series tag: HCAI-d641c4-P7

This paperPaper 7 of 10
Previous← Paper 6: When the Cats Take the Same Test
NextPaper 8: The Toboggan Doctrine
  1. UC Berkeley EECS-2025-164: “From Local Coordination to System-Level Strategies: Designing Reliable, Societal-Scale Multi-Agent Autonomy Across Scales,” Victoria Tuck, 2025. Identified failure modes across multiple categories in multi-agent systems with failure rates 41-86.7%. Available at: https://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-164.html

  2. “Towards a Science of Scaling Agent Systems,” Yubin Kim, Ken Gu, et al. (Google Research, MIT, Google DeepMind), 2025. arXiv:2512.08296. Found multi-agent systems degrade sequential task performance by 39-70% while improving parallel task performance by 80.9%. Available at: https://arxiv.org/abs/2512.08296

  3. The “doctrine first, software second” principle prioritizes organizational structure and decision-making processes over technical implementation. This inversion makes structure the constraint that governs software architecture.

  4. Whisper V3 achieves Word Error Rate (WER) of approximately 2% (~98% accuracy) on speech segments 8 seconds or longer under standard acoustic conditions. Performance varies by language, accent, and acoustic environment. Source: OpenAI Whisper documentation and independent benchmarks.

  5. Gamma, E., Helm, R., Johnson, R., & Vlissides, J. (1994). Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley. The principle “program to an interface, not an implementation” (p. 18) is the foundational design pattern enabling the slot-based pluggable agent architecture.

  6. Role isolation where Intelligence, Operations, and Analyst each generate analysis independently before synthesis prevents groupthink and preserves analytical diversity critical to adversarial thinking and blind spot detection.

  7. Cemri, M., et al. “Multi-Agent Systems Failure Taxonomy (MAST).” UC Berkeley EECS-2025-164, NeurIPS 2025 Spotlight. Identifies “error propagation in sequential multi-agent workflows” as a primary failure mode class. The circuit breaker pattern addresses this by isolating failed API dependencies before their failures cascade downstream.

  8. Rolled Throughput Yield (RTY) is a Lean Six Sigma metric: RTY = Π(FPY_i) across n process steps. For n=7 phases at 90% FPY each: RTY = 0.9⁷ = 47.8%. Source: Pyzdek, T., & Keller, P. (2014). The Six Sigma Handbook, 4th ed. McGraw-Hill. The force-advance mechanism trades RTY for decision velocity under time pressure — a deliberate operational tradeoff, not a quality failure.

  9. JSON-first backend design maintains technology independence: future migration to PostgreSQL or other databases remains a software problem, not an architecture problem, enabling graceful scaling without core redesign.

  10. Statistical Process Control (SPC) control charts — Shewhart X̄ and R charts, CUSUM, EWMA — distinguish common cause variation (within ±3σ of the process mean) from special cause variation (outside control limits or exhibiting non-random patterns). Source: Montgomery, D.C. (2020). Introduction to Statistical Quality Control, 8th ed. Wiley. Applied to decision quality metrics, control limits define what “normal analytical performance” looks like and flag when investigation is warranted.

  11. Integration failure rates for mid-market acquisitions (target valuations $20M-$500M) range from 40% to 60%, with common failure modes including cultural misalignment, technical debt incompatibility, and talent retention failure.

  12. OpenAI Whisper API transcription cost is $0.006 per minute of audio processed. Source: OpenAI API pricing, current as of March 2026.

  13. DARPA’s Collaborative Learning for Resilient AI (CLARA) program solicits AI systems that improve decision-making under uncertainty through multi-model coordination. Source: DARPA Programs. Available at: https://www.darpa.mil/research/programs/clara

  14. Specialized commands are actively exploring advanced decision support and adaptive planning architectures. The Department of War 2026 initiatives include funding for agentic AI experimentation. Formal partnership negotiations are pending.