Cup of Wit
Writing

The AI Agent Promised to Do Your Work. Here's Why It Didn't.

Core Idea: Agentic AI — AI that doesn't just answer questions but actually does things autonomously — was the biggest promise of 2025 and 2026. Book your meetings. Research your competitors. Write and send the report. Most people who tried it got burned. Not because the technology is fake, but because nobody told them the four specific conditions that have to be true before an agent can work reliably. This article names those conditions clearly, validates the frustration, and gives readers a practical test before they trust an agent with anything that matters.



šŸ“° Working Title Options

  • The AI Agent Promised to Do Your Work. Here's Why It Didn't.

Draft opening:

You were told it would handle it So you tried it. And somewhere in the middle of what was supposed to be an autonomous workflow, something went sideways. The agent booked the wrong time slot. Pulled data from the wrong source. Sent a half-finished email. Or — perhaps worst of all — confidently completed every step and delivered an output that was completely wrong in ways you didn't catch until it mattered.


Section 1: What an AI Agent Actually Is (In Plain Terms)

Why this section matters: Half the frustration with agents comes from a mismatch between what people expected and what the technology actually does. Clear this up fast — don't talk down to readers, just be precise.

The simple version:

A regular AI interaction is a conversation. You ask. It answers. You decide what to do with it.

An AI agent is different. It doesn't just respond — it acts. It can:

  • Use tools (search the web, run code, read files, send emails)
  • Take a sequence of steps without asking you after each one
  • Make decisions along the way based on what it finds

That's genuinely powerful. It's also where things go wrong.

The core tension:

Every decision point in a multi-step workflow is a place where the agent can misinterpret, assume, or simply get it wrong — and then build the next step on top of that error. By the time the output reaches you, you're looking at the end result of a chain of small mistakes, not a single obvious failure.

A human doing the same task would pause, notice something felt off, and ask. An agent doesn't pause. It proceeds.


Section 2: The 4 Real Reasons Your Agent Failed

This is the core of the article. Make it specific — not vague critique, but named, recognisable failure modes.


Reason 1: The task wasn't actually well-defined — it just felt like it was

The most common failure. You gave the agent a clear instruction. What you didn't give it was every decision it would need to make inside that instruction.

āŒ "Research our top three competitors and summarise their positioning."

Seems clear. But the agent now has to decide:

  • Which three competitors? By revenue? By market presence? By your perception?
  • What counts as "positioning"? Pricing? Messaging? Product features?
  • Which sources are authoritative? Their website? Press coverage? LinkedIn?
  • How long is a summary? Three sentences? Three pages?
  • What format? Bullet points? Prose? A comparison table?

Every ambiguity is a decision the agent makes without you. And it will make those decisions confidently, invisibly, and move on.

The test: Before giving a task to an agent, ask yourself — if I handed this to a new hire on their first day, what would they have to guess? Those guesses are where the agent will fail.


Reason 2: The agent had too much autonomy for the stakes involved

Agents exist on a spectrum. At one end: the agent suggests, you approve each step. At the other: the agent acts end-to-end with no checkpoints.

Most people set up agents closer to the second end, because that's the version that was advertised. Full autonomy. No interruptions.

But Deloitte's 2026 research is explicit on this point: successful agentic implementations define graduated autonomy — more human oversight for higher-stakes steps, less for routine ones. You don't give a new employee full signing authority on day one because they seem capable.

The test: For every step in your agent's workflow, ask — if this step goes wrong, what's the blast radius? High blast radius = needs a checkpoint. Low blast radius = let it run.


Reason 3: The agent had bad inputs and didn't know it

Agents can only work with what they're given. If the data is incomplete, outdated, inconsistently formatted, or missing key context — the agent won't flag it. It will use what it has and produce a confident output based on flawed foundations.

This is the failure mode that catches people most off guard, because the output looks complete. All the fields are filled. The report has all its sections. The email is grammatically correct. But the source material was wrong, and the agent had no way to know.

The test: Before deploying an agent on a task, ask — if a human looked at these inputs cold, would they have everything they need? If not, the agent won't either — it'll just hide the gap better.


Reason 4: Nobody defined what "done" looks like

Agents are optimised to complete tasks. They're not optimised to question whether the completion was actually good. Without explicit success criteria, an agent will finish the workflow and hand you an output — whether that output is excellent or subtly broken.

This is different from how a capable human works. A senior person completing a task uses judgment to evaluate the result before it leaves their hands. They ask: does this actually achieve what we needed? An agent doesn't ask that question unless you build it into the workflow.

The test: Can you write down, in one or two sentences, what a good outcome looks like for this task — before the agent starts? If you can't, the agent can't evaluate its own work either.


Section 3: Before & After

Scenario: You ask an agent to prepare a competitive summary ahead of a client meeting.

āŒ How most people set it up:

"Research our top competitors and prepare a summary I can use before the client meeting."

What the agent does: Searches the web, finds some publicly available information, formats a tidy document. Looks complete. Feels useful.

What actually happens in the meeting: The client mentions a product launch from last month that the agent's sources missed. Two of the three "competitors" the agent identified aren't actually in this client's market. The framing is generic — nothing in the summary reflects what this specific client cares about.

āœ… How it works when set up properly:

Task: Prepare a competitive summary for a meeting with [Client Name], a mid-market 
financial services firm evaluating vendors in the compliance automation space.

Competitors to research: [Competitor A], [Competitor B], [Competitor C]
— use only information published in the last 90 days
— prioritise: pricing signals, product announcements, customer reviews on G2

Format: One page. Three sections — one per competitor.
Each section: 3 bullets max. Focus on what has changed recently, not general positioning.

Done when: Each bullet cites a specific source with a date. 
Nothing older than 90 days. No generic positioning statements.

What the agent does: Exactly what you defined. Nothing more, nothing ambiguous.

What happens in the meeting: You walk in with current, relevant intelligence. The client asks about a recent announcement — it's already in your summary.

The difference isn't the agent. It's the brief.


Section 4: The Agent Readiness Test — 5 Questions Before You Deploy

Give readers a practical tool they can use immediately.

Before giving any task to an AI agent, run it through these five questions:

QuestionWhy it matters
Can I describe every decision the agent will need to make?Undescribed decisions become agent guesses
What's the worst thing that happens if one step goes wrong?Determines how much oversight each step needs
Are the inputs complete, current, and unambiguous?Garbage in, confident garbage out
Can I write down what good output looks like before it starts?Without this, the agent has no quality bar
Who reviews the output before it goes anywhere that matters?Agents complete tasks. Humans are accountable for outcomes.

Rule of thumb: If you can't answer all five, the task isn't ready for an agent. It's ready for you to think it through more carefully first.


Section 5: What Agents Are Actually Good For Right Now

Don't leave readers thinking agents are useless. Be honest and specific about where they work.

Agents work reliably when:

  • The task is repetitive and the steps don't change
  • The inputs are clean and structured (a spreadsheet, a defined template)
  • Each step is verifiable before the next one starts
  • The stakes of any single step failing are low or recoverable
  • You review the output before it touches anything real

Agents struggle when:

  • The task requires judgment about ambiguous situations
  • The inputs are inconsistent, incomplete, or context-dependent
  • Errors in one step cascade invisibly into the next
  • There's no checkpoint between action and consequence
  • "Done" is defined by human judgment, not a checklist

The honest positioning for 2026: Agents are a powerful tool for well-defined, structured, low-ambiguity work. They are not yet reliable deputies for complex, judgment-heavy, high-stakes tasks. The gap between those two categories is where most of the frustration lives.


Section 6: The Closing Reframe

End on truth, not reassurance.

The promise of AI agents wasn't wrong. Autonomous AI that handles multi-step work is coming, and some version of it is already here for the right tasks.

But the version that was sold to most people — set it up, step back, let it run — skipped over everything that makes autonomous work actually work: clear task definition, appropriate oversight, quality inputs, and explicit success criteria.

Those aren't AI problems. They're management problems. And the people who figure that out first will use agents effectively while everyone else is still debugging them.

The agent didn't fail because the technology is broken. It failed because it was given a job with no brief, no checkpoints, and no definition of done. That's not the agent's fault. It's the setup.


āœļø Closing Line (CTA)

Think about the last AI task that went wrong for you. Which of the four failure modes was it? Drop it in the comments — I'd bet it was Reason 1 more often than not.