Two engineers proposed the same architecture change. One wrote a 12-page document with every detail. The other wrote a 2-page RFC with a clear problem statement, three options with tradeoffs, and a recommendation. The 2-page RFC was approved in one meeting. The 12-page document generated three weeks of back-and-forth because nobody could find the actual decision points buried in the detail. Technical writing isn't about writing more — it's about writing so clearly that the reader never has to re-read a sentence.
What We'll Cover
Core Principles of Technical Writing
1. Lead with the Conclusion
Engineers scan documents. They don't read them top-to-bottom like novels. Put the most important information first: what you're proposing, what you need from the reader, and what the tradeoffs are. Background and context come after, for readers who want it.
This is the inverted pyramid structure journalists use. It works because busy readers can stop at any point and still have the key information.
2. One Idea Per Sentence
Bad: "We should migrate to PostgreSQL because MySQL doesn't support JSON operations natively and our analytics pipeline requires complex JSON queries and the migration can be done incrementally using dual-write patterns."
Good: "We should migrate to PostgreSQL. MySQL lacks native JSON query support, which our analytics pipeline needs. The migration can be done incrementally using a dual-write pattern."
Same information. Easier to parse. Each sentence makes one point.
3. Use Concrete Language
| Vague (Avoid) | Concrete (Use) | Why It's Better |
|---|---|---|
| "The system is slow" | "P95 latency is 2.3 seconds, up from 800ms last month" | Numbers make the problem actionable |
| "We need better error handling" | "The payment endpoint returns 500 when Stripe's API is down. It should return 503 with a retry-after header" | Specific change, specific endpoint |
| "This will improve performance significantly" | "This reduces database queries per page load from 47 to 3 using eager loading" | Measurable improvement with a clear mechanism |
| "We should consider scaling" | "At current growth (15% MoM), we'll exceed the database connection limit (100) in 6 weeks" | Timeline creates urgency and precision |
4. Write for the Reader, Not for Yourself
Before writing, ask: who will read this and what do they need to do with it? A design doc for your team can assume shared context. An RFC for the wider engineering org can't. An incident report for leadership needs business impact, not stack traces.
Writing RFCs and Design Docs
RFCs (Request for Comments) are how engineers get buy-in for significant changes. A good RFC saves weeks of implementation time by catching issues before code is written.
RFC Template
# RFC: Migrate User Service to Event-Driven Architecture
**Author:** Priya Sharma
**Status:** Under Review
**Date:** 2025-10-15
**Reviewers:** Backend team, Platform team
## Summary (3 sentences max)
Migrate the user service from synchronous REST calls to an event-driven
architecture using Kafka. This eliminates the cascading failure pattern
we've seen 4 times this quarter and reduces inter-service coupling.
## Problem
When the notification service is slow (which happens ~2x/week during
email campaigns), user registration fails because the user service
calls notification synchronously. 340 users were affected last month.
## Proposed Solution
Publish a `UserCreated` event to Kafka. The notification service
subscribes asynchronously. User registration succeeds regardless of
notification service health.
## Alternatives Considered
1. **Add a retry queue to the user service** — Simpler, but doesn't solve
coupling. We'd still be blocked during notification outages.
2. **Use a message broker (RabbitMQ)** — Works, but we already run Kafka
for the analytics pipeline. Adding RabbitMQ means operating two
messaging systems.
3. **Do nothing** — Accept the failure rate. Not recommended given
business impact (340 lost registrations/month × $12 avg LTV = $4,080/month).
## Implementation Plan
Phase 1 (1 week): Add Kafka producer to user service, publish UserCreated events
Phase 2 (1 week): Notification service subscribes to UserCreated
Phase 3 (3 days): Remove synchronous call, monitor for 1 sprint
Rollback: Re-enable synchronous call via feature flag
## Open Questions
- Should we use Avro or JSON for event schemas? (Leaning Avro for type safety)
- Retention period for user events? (Proposing 30 days)
What Makes an RFC Effective
- The problem section is specific. "340 users affected" is stronger than "reliability issues"
- Alternatives are genuinely considered, not strawmen set up to make the proposed solution look good
- The implementation plan is phased with a rollback strategy
- Open questions are listed honestly — this invites targeted feedback instead of broad criticism
- Business impact is quantified. "$4,080/month" gets leadership attention in a way "user registration sometimes fails" doesn't
Writing Incident Reports
Incident reports serve two audiences: the engineering team (what broke and how to prevent it) and leadership (what was the business impact and what are we doing about it). Write for both.
Incident Report Template
# Incident Report: Payment Processing Outage
**Date:** 2025-10-08 | **Duration:** 47 minutes | **Severity:** SEV-1
**Author:** Rahul Mehta | **Status:** Action items in progress
## Summary
Payment processing was unavailable from 14:23 to 15:10 IST.
~230 transactions failed. Estimated revenue impact: ₹3.8 lakh.
## Timeline (all times IST)
14:23 — PagerDuty alert: payment success rate drops below 90%
14:26 — On-call (Rahul) acknowledges. Checks Grafana dashboard
14:31 — Root cause identified: Stripe webhook endpoint returning 500
14:35 — Fix deployed: null check on new 'metadata' field Stripe added
14:38 — Webhook processing resumes. Backlog of 230 events queuing
15:10 — Backlog cleared. All pending transactions processed. Incident closed
## Root Cause
Stripe added a new optional 'metadata' field to webhook payloads on
Oct 7. Our webhook handler assumed all fields were present and threw
a NullPointerException when 'metadata' was null for certain event types.
## What Went Well
- Alert fired within 2 minutes of first failure
- Root cause identified in 8 minutes
- Fix deployed in 4 minutes after identification
## What Went Poorly
- No contract tests for Stripe webhook payloads — we didn't catch the schema change
- Webhook handler had no graceful degradation for unknown/null fields
## Action Items
1. [ ] Add contract tests for all third-party webhook payloads (Owner: Priya, Due: Oct 15)
2. [ ] Implement defensive parsing — treat all external fields as optional (Owner: Rahul, Due: Oct 12)
3. [ ] Subscribe to Stripe API changelog for advance notice (Owner: DevOps, Due: Oct 10)
Notice: no blame, just facts. "Our webhook handler assumed" — not "Rahul's code didn't handle." Blameless post-mortems produce better outcomes because people report issues honestly instead of hiding them. For more on this, see our incident management guide.
Writing Better PR Descriptions
PR descriptions are the most frequently written and most frequently neglected form of technical writing. A good PR description reduces review time by 50% because the reviewer understands the context before reading the code.
The Three-Part PR Description
- What — What does this PR change? One paragraph
- Why — Why is this change needed? Link to the issue or explain the problem
- How to verify — How can the reviewer test this? Steps, commands, or screenshots
Most PRs only include the "what." The "why" and "how to verify" are what separate fast reviews from slow ones. For comprehensive PR practices, see our code review guide.
Common Technical Writing Mistakes
| Mistake | Example | Fix |
|---|---|---|
| Burying the lead | Three paragraphs of background before stating the proposal | First sentence = what you're proposing. Background comes after |
| Weasel words | "It might be beneficial to perhaps consider migrating" | "We should migrate to PostgreSQL. Here's why:" |
| Passive voice overuse | "The database was migrated and the tests were updated" | "We migrated the database and updated the tests" |
| Acronym soup | "The SRE team's SLA for the CDN's TTL impacts the CMS's TTFB" | Define acronyms on first use. If a sentence has 3+ acronyms, rewrite it |
| Assuming context | "As we discussed, we'll use the new approach" | Write as if the reader wasn't in that meeting. They probably weren't |
| Wall of text | 8-line paragraphs with no headings or structure | Use headers, bullet points, and short paragraphs. People scan, not read |
Getting Better at Technical Writing
The Edit Pass
First drafts are for getting ideas out. The edit pass is where writing gets good. On every edit pass, ask:
- Can I cut this sentence without losing meaning? If yes, cut it
- Is there a simpler word? ("Use" instead of "utilize", "because" instead of "due to the fact that")
- Would a table, list, or code block communicate this better than prose?
- If I were reading this for the first time, what questions would I have?
Read Good Technical Writing
Study engineering blogs from companies that write well: Stripe's engineering blog, Cloudflare's blog, and Julia Evans' work. Notice how they explain complex systems in simple language without dumbing things down. That's the target.
Practice on Internal Docs
Every PR description, every Slack thread summary, every design doc is practice. You don't need to write blog posts to improve. Start by making your internal writing clearer and more concise. Your teammates will notice.
Frequently Asked Questions
How long should a design doc or RFC be?
As short as possible while covering: the problem, proposed solution, alternatives considered, and implementation plan. For most changes, that's 1-3 pages. If your RFC is over 5 pages, you're probably trying to solve too many problems at once — split it.
Should engineers be expected to write well?
Yes. Writing is a core engineering skill at senior levels and above. You can't lead a technical direction, coordinate across teams, or drive architectural decisions without clear written communication. The code is only half the work — the other half is getting people aligned on what to build.
Can AI tools help with technical writing?
AI is great for first drafts, grammar checking, and restructuring messy notes into clear prose. But it can't provide the technical judgment, specific numbers, or institutional context that make technical writing valuable. Use AI to polish, not to think. The insights and decisions must be yours.
How do I get feedback on my technical writing?
Ask a teammate to read your doc before the formal review. One specific question works better than "any feedback?" — try "Is the problem statement clear?" or "Can you tell what I'm proposing by reading just the first paragraph?" Targeted questions get actionable answers.