Capture Architecture

Steel Notes — Capture Architecture

**Core principle:** Simplest, fastest capture possible. AI processes in the background after the fact. AI is additive, never destructive — the original capture is always preserved.

Design Rules

1. Capture completes in under 2 seconds. Write raw input to .md, sync, done. No AI in the hot path.

2. AI is additive. The original capture is preserved verbatim in the note body. AI adds frontmatter, tags, links, and classification around it — never rewrites the user's words.

3. No forms at capture time. No type picker, no tag selector, no category dropdown. The user's only job is to dump content in.

4. Questions emerge, they aren't captured. Users capture sources and thoughts. The AI processing pipeline surfaces questions from patterns in annotations and reading — questions aren't manually filed.

5. Multi-item captures are a single bundle. A capture can contain multiple photos, URLs, or files. The AI pipeline processes them together — identifying which items are the same source, what's a new thought, and how they relate.

Capture Surfaces

CLI: `steel add` vs `steel create`

steel add = raw capture. Dump in a URL, text, or files with zero metadata. AI structures it after.

steel create = manual authoring. You specify --type, --title, --tags, --body. You're building a fully-formed note — no AI needed.

Different intents: add = "capture this, deal with it later" vs create = "I'm writing a note right now."

steel add — single command, no flags needed. AI figures out the rest.

```bash

steel add "https://paulgraham.com/greatwork.html" # URL → Source

steel add "Nietzsche's will to power is less about domination than self-overcoming" # text → Thought

steel add ~/papers/attention-is-all-you-need.pdf # file → Source

steel add . # file in current dir

steel add cover.jpg page-47.jpg page-48.jpg # multiple items → capture bundle

```

What happens:

1. CLI writes a raw .md file to the vault inbox (inbox/capture-{timestamp}.md)

2. Any attached files (images, PDFs) are stored alongside as inbox/capture-{timestamp}/

3. Minimal frontmatter: captured_at, capture_source: cli, raw input type (url | text | file | bundle)

4. Sync pushes to server

5. Background AI pipeline picks it up

Multiple arguments create a capture bundle — all items processed together so AI can understand the relationship (e.g., a book cover photo + page photos = one source + annotations).

No steel add for questions. Questions are generated by the AI transposition engine from annotations and reading patterns, not manually captured.

iOS Share Sheet

Audio-first capture. When the user taps Share → Steel Notes:

```

┌─────────────────────────────────┐

│ Steel Notes │

│ │

│ 🎙️ Recording... │

│ "Speak your thoughts" │

│ │

│ ┌─────────┐ ┌──────────────┐ │

│ │ Save │ │ Type Instead │ │

│ └─────────┘ └──────────────┘ │

│ │

│ Cancel │

└─────────────────────────────────┘

```

Flow:

1. Share Sheet opens → audio recording starts immediately

2. User speaks their annotation/thought about the shared content

3. Tap Save → audio sent to Whisper for transcription

4. Or tap Type Instead → recording stops, text field appears

5. Or tap Cancel → nothing saved

What gets captured:

The shared content (URL, text, images, PDF) — can be multiple items

The user's voice note (transcribed to text via Whisper API)

Timestamp and source app

When sharing multiple photos (e.g., from the camera roll), all images are bundled into a single capture. The AI pipeline processes them together.

What gets written:

```markdown

captured_at: 2026-03-26T14:30:00Z

capture_source: share_sheet

capture_app: Safari

input_type: url

processing_status: pending

Captured URL

https://paulgraham.com/greatwork.html

Voice Note

The bit about bus ticket collectors is exactly what Nietzsche means by

"love of fate" — pursuing something not because it's useful but because

you can't help it.

```

Immediately saved, Share Sheet dismissed. AI pipeline handles the rest.

In-App Camera Capture

The primary in-app capture surface. Opens directly to camera — no menus, no pickers.

```

┌─────────────────────────────────┐

│ ┌─────┐ ┌─────┐ ┌─────┐ │ ← photo strip (scrollable)

│ │ 📷1 │ │ 📷2 │ │ 📷3 │ │ snapped photos collect here

│ └─────┘ └─────┘ └─────┘ │

├─────────────────────────────────┤

│ │

│ [ Camera View ] │

│ │

│ ┌───────────┐ │

│ │ ⏺ Snap │ │ ← shutter button

│ └───────────┘ │

│ │

│ ┌─────────────────────────┐ │

│ │ Save │ │ ← big save button, always visible

│ └─────────────────────────┘ │

└─────────────────────────────────┘

```

Flow:

1. User taps capture tab / button → camera opens immediately

2. Snap a photo → thumbnail appears in the strip at the top

3. Snap more → thumbnails accumulate, strip scrolls horizontally

4. Tap a thumbnail to remove it

5. Tap Save → bundle saved as inbox/capture-{timestamp}.md with all photos

6. Camera stays open for next capture (or user navigates away)

No voice note here — this is pure camera speed. If the user wants to add a voice note, they do it through the Share Sheet flow (share photos from camera roll → Steel Notes).

What gets written:

```markdown

captured_at: 2026-03-26T21:00:00Z

capture_source: camera

input_type: bundle

items: [IMG_001.jpg, IMG_002.jpg, IMG_003.jpg]

processing_status: pending

Captured Photos

IMG_001.jpg

IMG_002.jpg

IMG_003.jpg

```

AI pipeline handles the rest — OCR the images, identify the source, create the right notes.

Home Screen Widget

Quick-capture button on the widget opens the app directly to a minimal text input. Same flow as "Type Instead" in Share Sheet — raw text, saved immediately, AI processes later.

Background AI Pipeline

After a capture syncs to the server, the AI pipeline runs asynchronously. The user sees a "processing" indicator in their vault list that clears when done.

Pipeline Stages

```

Raw capture (.md + attachments in inbox/)

│

├─ 1. OCR / Extract ─── Images → text (vision model or OCR)

│ PDFs → text extraction

│ URLs → fetch + clean HTML

│

├─ 2. Resolve ────────── Is this a known Source? (FTS title/author match)

│ Book cover → identify title + author → check vault

│ If exists: link to it. If not: create new Source.

│

├─ 3. Classify ──────── What did the user capture?

│ Bundle of book cover + pages → Source + Annotation

│ Standalone text → Thought

│ URL → Source

│

├─ 4. Structure ─────── Build note(s) from extracted content

│ Clean up voice transcription

│ One capture may produce multiple notes

│

├─ 5. Enrich ────────── Add tags, update frontmatter

│ Generate IDs (src-2026-03-26-001)

│

├─ 6. Link ──────────── FTS search for related notes

│ Match against open Questions

│ Add markdown links to related notes

│

└─ 7. File ──────────── Move from inbox/ to proper folder(s)

Source content → sources/

User context → notes/ (companion Note)

Standalone thoughts → thoughts/

Sync all updated .md files to all devices

```

Source Resolution

A critical step: before creating a new Source, the pipeline checks if one already exists.

How it works:

1. Extract identifying info from the capture (title, author, URL, ISBN)

2. FTS search the vault for matches

3. If match found with high confidence → link to existing Source

4. If no match → create new Source

This means a single capture can produce:

**0 new Sources** (already exists) + **1 new Thought/Annotation** (linked to it)

**1 new Source** + **1 new Annotation** (if user added a voice note)

**1 new Thought** (standalone text, no source context)

Multi-Item Capture Example: Book Photos

User snaps 3 photos: book cover, page 47, page 48. Adds a voice note via Share Sheet.

Raw capture:

```markdown

captured_at: 2026-03-26T20:15:00Z

capture_source: share_sheet

input_type: bundle

items: [cover.jpg, page-47.jpg, page-48.jpg]

voice_note: thoughts.m4a

processing_status: pending

Captured Items

cover.jpg

page-47.jpg

page-48.jpg

Voice Note

[pending transcription]

```

AI pipeline processes:

1. OCR cover.jpg → "Beyond Good and Evil — Friedrich Nietzsche"

2. OCR page photos → extracted text from pages 47-48

3. Resolve → FTS search finds sources/beyond-good-and-evil.md already exists

4. Classify → this is an Annotation on an existing Source, not a new Source

5. Transcribe voice note → "This section on master and slave morality connects to what Paul Graham said about doing great work from intrinsic motivation"

6. Structure → create a new Thought with the page text + voice note, linked to the existing Source

7. Link → also links to sources/how-to-do-great-work.md (Paul Graham mention)

Result — Note created (no new Source needed):

The pipeline recognized the book cover as an existing Source and created a Note linked to it:

```markdown

id: note-2026-03-26-004

type: note

title: "On master morality and intrinsic motivation"

sources: [src-2026-03-15-001]

tags: [morality, motivation, nietzsche, intrinsic-drive]

status: active

captured_at: 2026-03-26T20:15:00Z

processed_at: 2026-03-26T20:15:38Z

On master morality and intrinsic motivation

![[th-2026-03-26-005]]

Capture Note

This section on master and slave morality connects to what Paul Graham said

about doing great work from intrinsic motivation.

Beyond Good and Evil — source, pp. 47-48

How to Do Great Work — Paul Graham on intrinsic motivation

```

The AI pipeline also created a standalone Thought (th-2026-03-26-005) with the extracted page text and transcluded it into the Note. No new Source created — the pipeline recognized the book and linked to the existing one.

What AI adds (example)

Before (raw capture):

```markdown

captured_at: 2026-03-26T14:30:00Z

capture_source: share_sheet

input_type: url

processing_status: pending

Captured URL

https://paulgraham.com/greatwork.html

Voice Note

The bit about bus ticket collectors is exactly what Nietzsche means by

"love of fate" — pursuing something not because it's useful but because

you can't help it.

```

After (AI processed) — two files created:

Source (sources/how-to-do-great-work.md) — external content only:

```markdown

id: src-2026-03-26-003

type: source

title: "How to Do Great Work"

author: Paul Graham

url: https://paulgraham.com/greatwork.html

status: unread

tags: [creativity, motivation, craft]

date_added: "2026-03-26T14:30:00Z"

How to Do Great Work

Paul Graham — July 2023

[full extracted article body here]

```

Note (notes/how-to-do-great-work.md) — user's engagement:

```markdown

id: note-2026-03-26-003

type: note

title: "How to Do Great Work"

sources: [src-2026-03-26-003]

tags: [creativity, motivation, craft]

status: active

created: "2026-03-26T14:30:00Z"

updated: "2026-03-26T14:30:45Z"

How to Do Great Work

Capture Note

The bit about bus ticket collectors is exactly what Nietzsche means by

"love of fate" — pursuing something not because it's useful but because

you can't help it.

Beyond Good and Evil — Nietzsche's amor fati

On obsessive curiosity

```

Key points:

**Source + Note pair** created from a single capture — Source holds article content, Note holds user's reaction

**Original voice note preserved** in the Note under "Capture Note" — never rewritten

**AI added**: title, author, tags, related links, article body extraction

**Moved** from inbox/ to sources/ + notes/

processing_status field removed (done)

See NOTE_ARCHITECTURE.md for the full Note model design — transclusion, Source auto-creation, and the VaultItem interface rename.

Where AI Runs

Server-side via AWS Bedrock + Amazon Transcribe. The capture already syncs to S3, so the Lambda triggers AI processing directly — no external API keys needed on any client or server. Everything stays within the AWS ecosystem using IAM roles.

Service	Use	Why
Claude on Bedrock	Classification, structuring, enrichment, linking, tag generation	Same models as direct Anthropic API, identical pricing, uses existing AWS IAM — no API keys to manage
Claude Vision on Bedrock	OCR for book covers, page photos, handwritten notes	Same Claude model handles both text and vision — no separate OCR service
Amazon Transcribe	Voice note transcription (Share Sheet audio)	AWS-native, reads audio directly from S3, async batch-friendly for Lambda

No external API accounts needed. The Lambda's IAM role grants access to Bedrock and Transcribe directly.

Processing Trigger

```

Client saves capture → syncs to S3 → POST /vault/push/confirm

→ Server sees file in inbox/ → enqueues processing job

→ Lambda runs AI pipeline → writes processed .md back to S3

→ Push notification to all devices → clients pull updated file

```

Transposition Engine

Separate from capture, runs on demand or periodically. Reads annotations across sources and surfaces patterns:

**Themed groups**: "You've annotated 4 sources about amor fati — here's a synthesis draft"

**Questions**: "Your annotations suggest you're exploring: *Is obsessive curiosity a prerequisite for great work?*"

**Connections**: "This annotation on source A contradicts your note on source B"

Results appear in a "Insights" feed. They're auto-generated but the user can dismiss, edit, or promote them to full notes.

Voice Transcription (Amazon Transcribe)

Voice transcription for Share Sheet captures. Uses Amazon Transcribe instead of OpenAI Whisper — stays within the AWS ecosystem, reads audio directly from S3, no external API key needed.

Decision	Choice
Service	Amazon Transcribe (AWS-native)
Rust SDK	`aws-sdk-transcribe` — same SDK pattern as S3, SNS
When	After capture syncs to S3, before AI pipeline
Audio format	m4a (iOS native), synced to S3 with the capture
Processing	Async batch — `StartTranscriptionJob` reads from S3, polls for completion
Fallback	If transcription fails, keep raw audio reference in the note
Storage	Audio files stored in S3 at `{user_id}/audio/{capture_id}.m4a`, deleted after successful transcription (unless user opts to keep)
Cost	~$0.0001/second of audio

Flow: Share Sheet records audio → saves .m4a + capture .md → syncs both to S3 → Lambda calls Amazon Transcribe → transcript injected into .md → AI pipeline continues.

Capture Architecture

Steel Notes — Capture Architecture

Design Rules

Capture Surfaces

CLI: `steel add` vs `steel create`

iOS Share Sheet

Captured URL

Voice Note

In-App Camera Capture

Captured Photos

Home Screen Widget

Background AI Pipeline

Pipeline Stages

Source Resolution

Multi-Item Capture Example: Book Photos

Captured Items

Voice Note

On master morality and intrinsic motivation

Capture Note

Related

What AI adds (example)

Captured URL

Voice Note

How to Do Great Work

How to Do Great Work

Capture Note

Related

Where AI Runs

Processing Trigger

Transposition Engine

Voice Transcription (Amazon Transcribe)

Capture Architecture

Steel Notes — Capture Architecture

Design Rules

Capture Surfaces

CLI: steel add⧉ vs steel create⧉

iOS Share Sheet

Captured URL

Voice Note

In-App Camera Capture

Captured Photos

Home Screen Widget

Background AI Pipeline

Pipeline Stages

Source Resolution

Multi-Item Capture Example: Book Photos

Captured Items

Voice Note

On master morality and intrinsic motivation

Capture Note

Related

What AI adds (example)

Captured URL

Voice Note

How to Do Great Work

How to Do Great Work

Capture Note

Related

Where AI Runs

Processing Trigger

Transposition Engine

Voice Transcription (Amazon Transcribe)

CLI: `steel add` vs `steel create`