Claude Code, agents, and the CTO-shaped tasks they quietly handle — plus the bias nobody schedules for
If you squint hard enough — and if your scope is narrowly technical — Claude Code-style agents genuinely cover a meaningful slice of early “CTO work.”
They can skim a codebase, propose refactors, wire endpoints, scaffold tests, write migration notes, summarise architecture, churn through backlog-shaped tasks, even push changes through repeatable loops. Much of classic early-startup CTO labour historically looked exactly like sustained, patient execution with taste. Agents are increasingly excellent at sustained execution with superficial taste.
Good news and bad news are the same: capabilities moved faster than organisational literacy about what CTO-shaped work truly is.
Why it feels reasonable to deputise CTO work to Claude Code / agents
In very small teams:
- Nobody has forty hours weekly for exploratory refactors anyway.
- The gap between “thinking” and “typing” shrunk — reasoning and implementation interleave tightly.
- The highest leverage bottleneck is frequently discovery inside your own codebase — exactly where tooling shines.
Agents are also cheap at scale psychologically — they do not mind rereading a file for the seventeenth time. Humans do mind, and resentment becomes hidden drag.
Seen this way: delegating CTO-shaped tasks is not irrational. Delegating CTO-shaped risk ownership remains a distinct question.
An agent stack does not:
- Negotiate timelines with founders when everything is labelled “critical.”
- Tell a salesperson that the roadmap story is drifting from reality — with career consequences hanging in air.
- Go to bat for reducing scope knowing that optics will sting this month.
- Bet their reputation repeatedly on correctness when ambiguity is asymmetric (security, tenancy, misuse of customer data).
- Persist through misaligned organisational incentives long enough that standards actually survive two hiring cycles.
CTO posture is partly output; it is materially consequences and persistence across political weather. Agents do not accumulate institutional scar tissue inside your hallway.
Even if you outsource “technical cognition” narrowly, founders still owe the business explicit ownership somewhere.
Confirmation bias meets “helpfulness”: when models rarely give you blunt bad news
The second half is less about tooling brands and more about how probabilistic assistants behave.
Most chat-style models are optimised (through training methods and RL-style preferences you do not configure) toward continuation that feels coherent, cooperative, productive. That aligns uncomfortably closely with psychological confirmation dynamics:
You describe half a plan loaded with unstated optimism; the safest-looking move is refine and extend your frame — rarely to stop the room and insist the premise is dangerously wrong.
Concretely, this shows up when:
You ask “is approach X sane?”
The safest answer resembles “here is X with improvements” — substantially more plausible than stopping you cold with evidence that assumption Y undermines feasibility.
You ask architectural questions anchored in favoured stack or vendor narratives.
Answers tend to legitimise plausible paths already inside your linguistic bubble.
You ask vague success criteria.
Answers fill silence with scaffolding — specificity without guaranteed epistemic discipline.
Researchers and practitioners often discuss alignment for helpfulness. Helpfulness is statistically anti-adversarial. Adversarial was part of CTO tradition at best shops: someone obligated to veto weak consensus.
You can prompt for criticism; that helps at margin. Prompting rarely replaces accountability when incentives press against conflict.
Nobody updates their calendar recurring event saying “confirmation bias retrospective.” Nobody puts “tell me this strategy is dangerously wrong Tuesday 10 AM.” Humans forget; conversational AI does not spontaneously introduce inconvenient institutional truth where it detects political pressure latent in wording.
Bias does not evaporate inside coding agents wrapping tools either: loops inherit model tendencies — coherence over confrontation, flattering velocity over blocking questions about reversibility categories.
Startup failure modes here are quieter than exploding API billing — people ship coherent junk at speed and learn months later foundational premises were leaky.
Strong teams compensate with rituals: independent review gates, deliberate adversarial personas in written design review, outsiders who politically can say stop.
This is partly why organisational forms like fractional CTO still exist independently of whether any single coder uses Claude extensively.
Human leadership does not magically fix tool bias — humans carry bias too — but accountable humans tolerate professional friction conversational agents optimise away.
Practical mitigations (short list)
Concrete enough to crib into playbooks yet non-dogmatic:
- Force binary risk classification up front — reversible vs expensive / dangerous to unwind; write it before implementation seduction takes over.
- Ask for disconfirming evidence explicitly — then treat silence or hand-waving as signal, not green light.
- Rotate reviewers not immersed in same chat history — reduce echo chamber depth.
- Institutionalise “pre-mortem” five bullet failure modes before greenfield architecture bets.
- Pair agent velocity with team standards when multiple humans plus AI coexist — messy otherwise; workshops can compress alignment.
Where this intersects CTO hiring discussions
Agents moved the economics threshold shaping when full-time CTO first hire feels obligatory.
They simultaneously moved risk shape toward hidden correlation: fast correlated mistakes across modules when everyone vibes with same conversational prior.
Neither observation alone tells you fractional vs full-time — they tell you interrogation sharpening.
Older companion framing stays relevant: Do you really need a CTO? — rewritten partially by tooling since publication, unanswered at accountability layer unchanged.
—
Jarosław Michalik, Fractional CTO, Impl.



