**Core principle:** Simplest, fastest capture possible. AI processes in the background after the fact. AI is additive, never destructive — the original capture is always preserved.
1. Capture completes in under 2 seconds. Write raw input to .md, sync, done. No AI in the hot path.
2. AI is additive. The original capture is preserved verbatim in the note body. AI adds frontmatter, tags, links, and classification around it — never rewrites the user's words.
3. No forms at capture time. No type picker, no tag selector, no category dropdown. The user's only job is to dump content in.
4. Questions emerge, they aren't captured. Users capture sources and thoughts. The AI processing pipeline surfaces questions from patterns in annotations and reading — questions aren't manually filed.
5. Multi-item captures are a single bundle. A capture can contain multiple photos, URLs, or files. The AI pipeline processes them together — identifying which items are the same source, what's a new thought, and how they relate.
steel add vs steel createsteel add = raw capture. Dump in a URL, text, or files with zero metadata. AI structures it after.
steel create = manual authoring. You specify --type, --title, --tags, --body. You're building a fully-formed note — no AI needed.
Different intents: add = "capture this, deal with it later" vs create = "I'm writing a note right now."
steel add — single command, no flags needed. AI figures out the rest.
```bash
steel add "https://paulgraham.com/greatwork.html" # URL → Source
steel add "Nietzsche's will to power is less about domination than self-overcoming" # text → Thought
steel add ~/papers/attention-is-all-you-need.pdf # file → Source
steel add . # file in current dir
steel add cover.jpg page-47.jpg page-48.jpg # multiple items → capture bundle
```
What happens:
1. CLI writes a raw .md file to the vault inbox (inbox/capture-{timestamp}.md)
2. Any attached files (images, PDFs) are stored alongside as inbox/capture-{timestamp}/
3. Minimal frontmatter: captured_at, capture_source: cli, raw input type (url | text | file | bundle)
4. Sync pushes to server
5. Background AI pipeline picks it up
Multiple arguments create a capture bundle — all items processed together so AI can understand the relationship (e.g., a book cover photo + page photos = one source + annotations).
No steel add for questions. Questions are generated by the AI transposition engine from annotations and reading patterns, not manually captured.
Audio-first capture. When the user taps Share → Steel Notes:
```
┌─────────────────────────────────┐
│ Steel Notes │
│ │
│ 🎙️ Recording... │
│ "Speak your thoughts" │
│ │
│ ┌─────────┐ ┌──────────────┐ │
│ │ Save │ │ Type Instead │ │
│ └─────────┘ └──────────────┘ │
│ │
│ Cancel │
└─────────────────────────────────┘
```
Flow:
1. Share Sheet opens → audio recording starts immediately
2. User speaks their annotation/thought about the shared content
3. Tap Save → audio sent to Whisper for transcription
4. Or tap Type Instead → recording stops, text field appears
5. Or tap Cancel → nothing saved
What gets captured:
When sharing multiple photos (e.g., from the camera roll), all images are bundled into a single capture. The AI pipeline processes them together.
What gets written:
```markdown
captured_at: 2026-03-26T14:30:00Z
capture_source: share_sheet
capture_app: Safari
input_type: url
processing_status: pending
https://paulgraham.com/greatwork.html
The bit about bus ticket collectors is exactly what Nietzsche means by
"love of fate" — pursuing something not because it's useful but because
you can't help it.
```
Immediately saved, Share Sheet dismissed. AI pipeline handles the rest.
The primary in-app capture surface. Opens directly to camera — no menus, no pickers.
```
┌─────────────────────────────────┐
│ ┌─────┐ ┌─────┐ ┌─────┐ │ ← photo strip (scrollable)
│ │ 📷1 │ │ 📷2 │ │ 📷3 │ │ snapped photos collect here
│ └─────┘ └─────┘ └─────┘ │
├─────────────────────────────────┤
│ │
│ │
│ [ Camera View ] │
│ │
│ │
│ │
│ ┌───────────┐ │
│ │ ⏺ Snap │ │ ← shutter button
│ └───────────┘ │
│ │
│ ┌─────────────────────────┐ │
│ │ Save │ │ ← big save button, always visible
│ └─────────────────────────┘ │
└─────────────────────────────────┘
```
Flow:
1. User taps capture tab / button → camera opens immediately
2. Snap a photo → thumbnail appears in the strip at the top
3. Snap more → thumbnails accumulate, strip scrolls horizontally
4. Tap a thumbnail to remove it
5. Tap Save → bundle saved as inbox/capture-{timestamp}.md with all photos
6. Camera stays open for next capture (or user navigates away)
No voice note here — this is pure camera speed. If the user wants to add a voice note, they do it through the Share Sheet flow (share photos from camera roll → Steel Notes).
What gets written:
```markdown
captured_at: 2026-03-26T21:00:00Z
capture_source: camera
input_type: bundle
items: [IMG_001.jpg, IMG_002.jpg, IMG_003.jpg]
processing_status: pending
```
AI pipeline handles the rest — OCR the images, identify the source, create the right notes.
Quick-capture button on the widget opens the app directly to a minimal text input. Same flow as "Type Instead" in Share Sheet — raw text, saved immediately, AI processes later.
After a capture syncs to the server, the AI pipeline runs asynchronously. The user sees a "processing" indicator in their vault list that clears when done.
```
Raw capture (.md + attachments in inbox/)
│
├─ 1. OCR / Extract ─── Images → text (vision model or OCR)
│ PDFs → text extraction
│ URLs → fetch + clean HTML
│
├─ 2. Resolve ────────── Is this a known Source? (FTS title/author match)
│ Book cover → identify title + author → check vault
│ If exists: link to it. If not: create new Source.
│
├─ 3. Classify ──────── What did the user capture?
│ Bundle of book cover + pages → Source + Annotation
│ Standalone text → Thought
│ URL → Source
│
├─ 4. Structure ─────── Build note(s) from extracted content
│ Clean up voice transcription
│ One capture may produce multiple notes
│
├─ 5. Enrich ────────── Add tags, update frontmatter
│ Generate IDs (src-2026-03-26-001)
│
├─ 6. Link ──────────── FTS search for related notes
│ Match against open Questions
│ Add markdown links to related notes
│
└─ 7. File ──────────── Move from inbox/ to proper folder(s)
Source content → sources/
User context → notes/ (companion Note)
Standalone thoughts → thoughts/
Sync all updated .md files to all devices
```
A critical step: before creating a new Source, the pipeline checks if one already exists.
How it works:
1. Extract identifying info from the capture (title, author, URL, ISBN)
2. FTS search the vault for matches
3. If match found with high confidence → link to existing Source
4. If no match → create new Source
This means a single capture can produce:
User snaps 3 photos: book cover, page 47, page 48. Adds a voice note via Share Sheet.
Raw capture:
```markdown
captured_at: 2026-03-26T20:15:00Z
capture_source: share_sheet
input_type: bundle
items: [cover.jpg, page-47.jpg, page-48.jpg]
voice_note: thoughts.m4a
processing_status: pending
[pending transcription]
```
AI pipeline processes:
1. OCR cover.jpg → "Beyond Good and Evil — Friedrich Nietzsche"
2. OCR page photos → extracted text from pages 47-48
3. Resolve → FTS search finds sources/beyond-good-and-evil.md already exists
4. Classify → this is an Annotation on an existing Source, not a new Source
5. Transcribe voice note → "This section on master and slave morality connects to what Paul Graham said about doing great work from intrinsic motivation"
6. Structure → create a new Thought with the page text + voice note, linked to the existing Source
7. Link → also links to sources/how-to-do-great-work.md (Paul Graham mention)
Result — Note created (no new Source needed):
The pipeline recognized the book cover as an existing Source and created a Note linked to it:
```markdown
id: note-2026-03-26-004
type: note
title: "On master morality and intrinsic motivation"
sources: [src-2026-03-15-001]
tags: [morality, motivation, nietzsche, intrinsic-drive]
status: active
captured_at: 2026-03-26T20:15:00Z
processed_at: 2026-03-26T20:15:38Z
![[th-2026-03-26-005]]
This section on master and slave morality connects to what Paul Graham said
about doing great work from intrinsic motivation.
```
The AI pipeline also created a standalone Thought (th-2026-03-26-005) with the extracted page text and transcluded it into the Note. No new Source created — the pipeline recognized the book and linked to the existing one.
Before (raw capture):
```markdown
captured_at: 2026-03-26T14:30:00Z
capture_source: share_sheet
input_type: url
processing_status: pending
https://paulgraham.com/greatwork.html
The bit about bus ticket collectors is exactly what Nietzsche means by
"love of fate" — pursuing something not because it's useful but because
you can't help it.
```
After (AI processed) — two files created:
Source (sources/how-to-do-great-work.md) — external content only:
```markdown
id: src-2026-03-26-003
type: source
title: "How to Do Great Work"
author: Paul Graham
url: https://paulgraham.com/greatwork.html
status: unread
tags: [creativity, motivation, craft]
date_added: "2026-03-26T14:30:00Z"
Paul Graham — July 2023
[full extracted article body here]
```
Note (notes/how-to-do-great-work.md) — user's engagement:
```markdown
id: note-2026-03-26-003
type: note
title: "How to Do Great Work"
sources: [src-2026-03-26-003]
tags: [creativity, motivation, craft]
status: active
created: "2026-03-26T14:30:00Z"
updated: "2026-03-26T14:30:45Z"
The bit about bus ticket collectors is exactly what Nietzsche means by
"love of fate" — pursuing something not because it's useful but because
you can't help it.
```
Key points:
inbox/ to sources/ + notes/processing_status field removed (done)See NOTE_ARCHITECTURE.md for the full Note model design — transclusion, Source auto-creation, and the VaultItem interface rename.
Server-side via AWS Bedrock + Amazon Transcribe. The capture already syncs to S3, so the Lambda triggers AI processing directly — no external API keys needed on any client or server. Everything stays within the AWS ecosystem using IAM roles.
| Service | Use | Why |
|---|---|---|
| **Claude on Bedrock** | Classification, structuring, enrichment, linking, tag generation | Same models as direct Anthropic API, identical pricing, uses existing AWS IAM — no API keys to manage |
| **Claude Vision on Bedrock** | OCR for book covers, page photos, handwritten notes | Same Claude model handles both text and vision — no separate OCR service |
| **Amazon Transcribe** | Voice note transcription (Share Sheet audio) | AWS-native, reads audio directly from S3, async batch-friendly for Lambda |
No external API accounts needed. The Lambda's IAM role grants access to Bedrock and Transcribe directly.
```
Client saves capture → syncs to S3 → POST /vault/push/confirm
→ Server sees file in inbox/ → enqueues processing job
→ Lambda runs AI pipeline → writes processed .md back to S3
→ Push notification to all devices → clients pull updated file
```
Separate from capture, runs on demand or periodically. Reads annotations across sources and surfaces patterns:
Results appear in a "Insights" feed. They're auto-generated but the user can dismiss, edit, or promote them to full notes.
Voice transcription for Share Sheet captures. Uses Amazon Transcribe instead of OpenAI Whisper — stays within the AWS ecosystem, reads audio directly from S3, no external API key needed.
| Decision | Choice |
|---|---|
| **Service** | Amazon Transcribe (AWS-native) |
| **Rust SDK** | aws-sdk-transcribe — same SDK pattern as S3, SNS |
| **When** | After capture syncs to S3, before AI pipeline |
| **Audio format** | m4a (iOS native), synced to S3 with the capture |
| **Processing** | Async batch — StartTranscriptionJob reads from S3, polls for completion |
| **Fallback** | If transcription fails, keep raw audio reference in the note |
| **Storage** | Audio files stored in S3 at {user_id}/audio/{capture_id}.m4a, deleted after successful transcription (unless user opts to keep) |
| **Cost** | ~$0.0001/second of audio |
Flow: Share Sheet records audio → saves .m4a + capture .md → syncs both to S3 → Lambda calls Amazon Transcribe → transcript injected into .md → AI pipeline continues.