My Approach | Ben Sweet - AI Product Manager

1. Introduction

AI products are systems that exist at the intersection of human behavior, data constraints, model performance, governance, and business value. Building them well requires analytical discipline, architectural fluency, and a capacity to translate ambiguity into a sequence of testable decisions.

This document outlines, at a very high level, the end-to-end method I use whenever I am presented with a vague business problem and tasked with creating a solution, whether it involves Agentic AI, LLMs, classical ML, analytics, or traditional software. It reflects years of work across enterprises with varying levels of AI maturity, startups, and small- and medium-sized businesses. It's designed to show not only what I do, but how I think.

2. Phase One: Clarify and Frame the Problem

Before all else, I ask: What is the exact business problem to be solved? Framing the business problem and need clearly and accurately is the most critical part of the entire AI product development process. If this is not done correctly, the entire project will be off-base.

The reason is simple: every subsequent decision flows from the problem frame. If you misdiagnose the problem, you'll build the wrong solution, and no amount of technical excellence downstream will fix it. You can have a perfectly architected RAG pipeline, flawless evals, and robust monitoring, but if you're solving the wrong problem, it's all wasted effort.

2.1 Establish the Real Goal, Not the Stated Request

Every engagement begins with separating symptoms from causes. Stakeholders often present solutions disguised as problems ("We need an AI chatbot"), so I guide the conversation toward:

What decision or behavior actually needs to change
Who is affected, and why it matters now
What failure looks like today
What constraints define the solution space

I distill this into a crystal clear one-sentence problem hypothesis that becomes the anchor for all later decisions.

2.2 Validate Who the User Actually Is

AI often inserts itself into workflows without respecting human variation. I map:

Primary users and their personas, contexts, journeys, etc.
Proxy users and their personas, contexts, journeys, etc.
Affected partners (legal, compliance, operations)
Hidden constraints (skills, incentives, risk tolerance)

Clarity here prevents significant downstream rework.

2.3 Quantify the Cost of the Problem

The question is always "How expensive is the status quo?" I measure:

Time lost
Error rates
Operational cost
Revenue leakage
Compliance risk
Customer experience degradation

This creates a value envelope for the solution: what the AI must justify.

3. Phase Two: Diagnose Feasibility

3.1 Data Readiness Assessment

Before AI is even discussed, I evaluate:

Whether sufficient labeled or unlabeled data exists
Data accessibility (APIs, lakes, schemas, silos)
Data quality and semantic consistency
Gaps requiring data generation or synthetic augmentation
Privacy and regulatory constraints

This determines whether the solution is feasible with AI, or whether traditional software or process redesign is more appropriate.

3.2 Technical Fit: LLM, Agentic AI, RAG, ML, Low-Code, or Software?

This is a highly simplified version, but my essential decision tree is:

3.2.1 Foundation LLM vs. LLM + RAG

I evaluate whether the task requires:

World knowledge (LLM-only)
Enterprise-specific knowledge (LLM + RAG)
Factual precision requiring document-grounded retrieval
Traceability for regulated workflows

RAG is selected when the model must remain tightly anchored to proprietary data with auditable provenance.

3.2.2 Agentic AI

I evaluate whether the task requires:

Multi-step reasoning
Tool use (APIs, search, calculators)
Conditional branching
Autonomous workflows

If so, an agentic architecture (single agent or orchestrated multi-agent) may be the correct pattern. I ensure safety boundaries: maximum thinking depth, tool restrictions, and deterministic checkpoints.

3.2.3 Low-Code or No-Code Automation

If the solution is primarily workflow automation, notifications, integrations between SaaS systems, or routing of structured events, then a combination of platforms like n8n, Make, Zapier Enterprise, or Airplane may deliver faster, cheaper value than custom engineering or DS involvement.

3.2.4 Classical ML vs. Software

Classical ML remains the best fit for prediction, ranking, scoring, classification, and anomaly detection. Traditional software is selected when processes are rule-bound, deterministic, or regulatory tolerance for probabilistic outcomes is low.

I choose the simplest technology that reliably achieves the desired outcome. Complexity is never the goal.

3.4 Build vs. Buy Evaluation

3.4.1 The Decision Logic

I recommend building when the capability:

Is core to competitive differentiation
Requires deep customization or domain-specific tuning
Must integrate tightly with internal systems
Carries compliance or auditability obligations vendors cannot meet
Has long-term reuse across multiple lines of business

I recommend buying when:

Time to value is critical
The capability is commoditized (summaries, extraction, embeddings)
Internal engineering or DS capacity is limited
Vendor SLAs and governance controls meet enterprise needs
Total cost of ownership is lower when amortized over time

3.4.2 Simplified 5-Criteria Decision Matrix

Criterion	Build Favors	Buy Favors
Strategic Differentiation	Proprietary workflows, domain specificity	Commodity capabilities
Time to Value	Long runway allowed	Immediate delivery required
Integration Complexity	Deep integration with legacy systems	API-level integration sufficient
Data Sensitivity & Governance	Strict auditability, on-prem needs	Vendor meets compliance controls
Total Cost of Ownership	Reuse across products justifies investment	Vendor amortizes R&D cost

4. Phase Three: Frame the Solution

4.1 Co-Create a Shared Vision with Engineering and Data Science

Before writing anything formal, I conduct short technical framing sessions to ensure:

What we are building
What we are not building
What assumptions we are making
Where uncertainty still exists
What early experiments are possible

This reduces friction and creates psychological ownership across teams.

4.2 Architect the System Conceptually

For AI products, the architecture defines the product's behavior. I outline:

Foundation model options
Retrieval pipeline and vector database strategy
Agent orchestration patterns (if applicable)
Fine-tuning or prompt-engineering requirements
Integration points with upstream and downstream systems
Human-in-the-loop pathways
Guardrails and safety layers (rate limiting, structured output, schema enforcement, deterministic checkpoints)
Evaluation and monitoring loops
Logging and traceability requirements

This architecture becomes the conceptual blueprint that anchors the PRD and accelerates feasibility discussion.

4.3 Decide What "Good" Looks Like

Reliable measurability is paramount to ongoing success. I define both user-centric and system-centric success metrics:

Accuracy thresholds
Drift tolerance
Feedback loop performance
Latency expectations
Reliability SLAs
Reduction in operational effort
Expected ROI over time

5. Phase Four: Experiment and De-Risk

5.1 Build Fast, Instrumented Experiments

Before committing to a full PRD, I create 1 to 3 high-leverage experiments:

A prompt-engineering sandbox
A retrieval pipeline prototype
A small labeled test set for early evals
Synthetic data probes
Error-mode analysis
LLM-as-Judge evaluations to scale early testing when human annotation is too slow

These experiments validate the most fragile assumptions early.

5.2 Evaluate Against Reality, Not Hope

I test for:

Failure patterns
Hallucination types
Robustness across edge cases
Misalignment with user expectations
Cost-performance ratios
Latency risks

If the idea fails here, we pivot, saving months of downstream effort.

5.3 Present a Feasibility Assessment to Stakeholders

I summarize what is viable, what is risky, what must change, and what the architecture will require to scale. This is the moment when a stakeholder begins to "see" the product.

6. Phase Five: Formalize the PRD

6.1 Write Requirements That Bridge Worlds

A strong AI PRD translates ambiguity into structured intention. My PRDs include, at a very high level (please see actual PRDs at bensweet.ai for more details):

Business context
Problem framing
Product vision
Goals, success metrics, and KPIs
User persona and user journeys
Architecture overview (I create full architecture specs with Eng team and append)
Data and model requirements/considerations
RAG or agent design (if applicable)
Acceptance criteria and test cases
Evaluation methodology
Monitoring and fallback design
Nonfunctional requirements (latency, security, scalability)
Benefits/impacts to non-technical groups in plain English
Release plan and effort estimates

The PRD becomes a shared reference point rather than a static artifact.

6.2 Sequence Delivery Through Iterative Risk Reduction

I design the roadmap in ascending order of uncertainty:

Baselines
Retrieval and data pipelines
Model selection or tuning
Interaction design
Monitoring instrumentation
Closed-beta testing
Progressive rollout

This ensures predictable progress even in uncertain domains.

7. Phase Six: Partner for Implementation

7.1 Work as a Translational Layer

I ensure alignment across:

Engineering
Data Science
Architecture
Security
Compliance
Operations
Executive sponsors

I make decisions transparent and reduce ambiguity that slows execution.

7.2 Maintain a Live Model of System Behavior

Small deviations early become systemic failures later. I:

Watch real-time logs
Analyze model errors
Monitor drift
Refine prompts or tuning based on empirical feedback
Ensure continuous alignment with user intent

The solution becomes a living system, one we observe, not merely deploy.

8. Phase Seven: Measure and Scale

8.1 Validate Impact Quantitatively and Qualitatively

I measure:

Whether users actually adopt the product
Whether the model behaves reliably under real conditions
Whether operational time decreases
Whether error rates recede
Whether costs remain within target ranges

This is where AI stops being academic and becomes economic.

8.2 Create Feedback Loops for Continuous Improvement

Every AI system experiences concept drift (data distribution changes) and behavioral drift (model action changes). To maintain performance, I implement:

Automated eval sets
Human-in-the-loop review cycles
Structured feedback collection
Regression tests for prompt or model changes
Telemetry and monitoring dashboards

Continuous tuning sustains real value.

8.3 Scale Horizontally Across Use Cases

If the product demonstrates stable value, I extend its architecture:

Additional workflows
New user groups
Expanded retrieval corpora
Additional agents
Automated training data pipelines

Scaling is planned, not accidental.

9. My Guiding Principles

Simplicity first, AI second.
Every model is guilty until proven reliable.
Architectures must be explainable to non-technical audiences.
Human users determine whether the model is valuable, not metrics alone.
AI product management is the craft of reducing uncertainty into a sequence of learning loops.
A PM's job is to make better decisions possible, not to win arguments.

10. Conclusion

This process—diagnose, frame, experiment, design, deliver, measure, and scale—is how I navigate the intersection of human needs, engineering constraints, and model behavior, and how I consistently transform ambiguous business challenges into workable, trustworthy, economically sound AI products.

See This Methodology Applied Download as PDF