← Back to Plan

Capture Architecture

Steel Notes — Capture Architecture

**Core principle:** Simplest, fastest capture possible. AI processes in the background after the fact. AI is additive, never destructive — the original capture is always preserved.

Design Rules

1. Capture completes in under 2 seconds. Write raw input to .md, sync, done. No AI in the hot path.

2. AI is additive. The original capture is preserved verbatim in the note body. AI adds frontmatter, tags, links, and classification around it — never rewrites the user's words.

3. No forms at capture time. No type picker, no tag selector, no category dropdown. The user's only job is to dump content in.

4. Questions emerge, they aren't captured. Users capture sources and thoughts. The AI processing pipeline surfaces questions from patterns in annotations and reading — questions aren't manually filed.

5. Multi-item captures are a single bundle. A capture can contain multiple photos, URLs, or files. The AI pipeline processes them together — identifying which items are the same source, what's a new thought, and how they relate.


Capture Surfaces

CLI: steel add vs steel create

steel add = raw capture. Dump in a URL, text, or files with zero metadata. AI structures it after.

steel create = manual authoring. You specify --type, --title, --tags, --body. You're building a fully-formed note — no AI needed.

Different intents: add = "capture this, deal with it later" vs create = "I'm writing a note right now."

steel add — single command, no flags needed. AI figures out the rest.

```bash

steel add "https://paulgraham.com/greatwork.html" # URL → Source

steel add "Nietzsche's will to power is less about domination than self-overcoming" # text → Thought

steel add ~/papers/attention-is-all-you-need.pdf # file → Source

steel add . # file in current dir

steel add cover.jpg page-47.jpg page-48.jpg # multiple items → capture bundle

```

What happens:

1. CLI writes a raw .md file to the vault inbox (inbox/capture-{timestamp}.md)

2. Any attached files (images, PDFs) are stored alongside as inbox/capture-{timestamp}/

3. Minimal frontmatter: captured_at, capture_source: cli, raw input type (url | text | file | bundle)

4. Sync pushes to server

5. Background AI pipeline picks it up

Multiple arguments create a capture bundle — all items processed together so AI can understand the relationship (e.g., a book cover photo + page photos = one source + annotations).

No steel add for questions. Questions are generated by the AI transposition engine from annotations and reading patterns, not manually captured.

iOS Share Sheet

Audio-first capture. When the user taps Share → Steel Notes:

```

┌─────────────────────────────────┐

│ Steel Notes │

│ │

│ 🎙️ Recording... │

│ "Speak your thoughts" │

│ │

│ ┌─────────┐ ┌──────────────┐ │

│ │ Save │ │ Type Instead │ │

│ └─────────┘ └──────────────┘ │

│ │

│ Cancel │

└─────────────────────────────────┘

```

Flow:

1. Share Sheet opens → audio recording starts immediately

2. User speaks their annotation/thought about the shared content

3. Tap Save → audio sent to Whisper for transcription

4. Or tap Type Instead → recording stops, text field appears

5. Or tap Cancel → nothing saved

What gets captured:

  • The shared content (URL, text, images, PDF) — can be multiple items
  • The user's voice note (transcribed to text via Whisper API)
  • Timestamp and source app
  • When sharing multiple photos (e.g., from the camera roll), all images are bundled into a single capture. The AI pipeline processes them together.

    What gets written:

    ```markdown


    captured_at: 2026-03-26T14:30:00Z

    capture_source: share_sheet

    capture_app: Safari

    input_type: url

    processing_status: pending


    Captured URL

    https://paulgraham.com/greatwork.html

    Voice Note

    The bit about bus ticket collectors is exactly what Nietzsche means by

    "love of fate" — pursuing something not because it's useful but because

    you can't help it.

    ```

    Immediately saved, Share Sheet dismissed. AI pipeline handles the rest.

    In-App Camera Capture

    The primary in-app capture surface. Opens directly to camera — no menus, no pickers.

    ```

    ┌─────────────────────────────────┐

    │ ┌─────┐ ┌─────┐ ┌─────┐ │ ← photo strip (scrollable)

    │ │ 📷1 │ │ 📷2 │ │ 📷3 │ │ snapped photos collect here

    │ └─────┘ └─────┘ └─────┘ │

    ├─────────────────────────────────┤

    │ │

    │ │

    │ [ Camera View ] │

    │ │

    │ │

    │ │

    │ ┌───────────┐ │

    │ │ ⏺ Snap │ │ ← shutter button

    │ └───────────┘ │

    │ │

    │ ┌─────────────────────────┐ │

    │ │ Save │ │ ← big save button, always visible

    │ └─────────────────────────┘ │

    └─────────────────────────────────┘

    ```

    Flow:

    1. User taps capture tab / button → camera opens immediately

    2. Snap a photo → thumbnail appears in the strip at the top

    3. Snap more → thumbnails accumulate, strip scrolls horizontally

    4. Tap a thumbnail to remove it

    5. Tap Save → bundle saved as inbox/capture-{timestamp}.md with all photos

    6. Camera stays open for next capture (or user navigates away)

    No voice note here — this is pure camera speed. If the user wants to add a voice note, they do it through the Share Sheet flow (share photos from camera roll → Steel Notes).

    What gets written:

    ```markdown


    captured_at: 2026-03-26T21:00:00Z

    capture_source: camera

    input_type: bundle

    items: [IMG_001.jpg, IMG_002.jpg, IMG_003.jpg]

    processing_status: pending


    Captured Photos

  • IMG_001.jpg
  • IMG_002.jpg
  • IMG_003.jpg
  • ```

    AI pipeline handles the rest — OCR the images, identify the source, create the right notes.

    Home Screen Widget

    Quick-capture button on the widget opens the app directly to a minimal text input. Same flow as "Type Instead" in Share Sheet — raw text, saved immediately, AI processes later.


    Background AI Pipeline

    After a capture syncs to the server, the AI pipeline runs asynchronously. The user sees a "processing" indicator in their vault list that clears when done.

    Pipeline Stages

    ```

    Raw capture (.md + attachments in inbox/)

    ├─ 1. OCR / Extract ─── Images → text (vision model or OCR)

    │ PDFs → text extraction

    │ URLs → fetch + clean HTML

    ├─ 2. Resolve ────────── Is this a known Source? (FTS title/author match)

    │ Book cover → identify title + author → check vault

    │ If exists: link to it. If not: create new Source.

    ├─ 3. Classify ──────── What did the user capture?

    │ Bundle of book cover + pages → Source + Annotation

    │ Standalone text → Thought

    │ URL → Source

    ├─ 4. Structure ─────── Build note(s) from extracted content

    │ Clean up voice transcription

    │ One capture may produce multiple notes

    ├─ 5. Enrich ────────── Add tags, update frontmatter

    │ Generate IDs (src-2026-03-26-001)

    ├─ 6. Link ──────────── FTS search for related notes

    │ Match against open Questions

    │ Add markdown links to related notes

    └─ 7. File ──────────── Move from inbox/ to proper folder(s)

    Source content → sources/

    User context → notes/ (companion Note)

    Standalone thoughts → thoughts/

    Sync all updated .md files to all devices

    ```

    Source Resolution

    A critical step: before creating a new Source, the pipeline checks if one already exists.

    How it works:

    1. Extract identifying info from the capture (title, author, URL, ISBN)

    2. FTS search the vault for matches

    3. If match found with high confidence → link to existing Source

    4. If no match → create new Source

    This means a single capture can produce:

  • **0 new Sources** (already exists) + **1 new Thought/Annotation** (linked to it)
  • **1 new Source** + **1 new Annotation** (if user added a voice note)
  • **1 new Thought** (standalone text, no source context)
  • Multi-Item Capture Example: Book Photos

    User snaps 3 photos: book cover, page 47, page 48. Adds a voice note via Share Sheet.

    Raw capture:

    ```markdown


    captured_at: 2026-03-26T20:15:00Z

    capture_source: share_sheet

    input_type: bundle

    items: [cover.jpg, page-47.jpg, page-48.jpg]

    voice_note: thoughts.m4a

    processing_status: pending


    Captured Items

  • cover.jpg
  • page-47.jpg
  • page-48.jpg
  • Voice Note

    [pending transcription]

    ```

    AI pipeline processes:

    1. OCR cover.jpg → "Beyond Good and Evil — Friedrich Nietzsche"

    2. OCR page photos → extracted text from pages 47-48

    3. Resolve → FTS search finds sources/beyond-good-and-evil.md already exists

    4. Classify → this is an Annotation on an existing Source, not a new Source

    5. Transcribe voice note → "This section on master and slave morality connects to what Paul Graham said about doing great work from intrinsic motivation"

    6. Structure → create a new Thought with the page text + voice note, linked to the existing Source

    7. Link → also links to sources/how-to-do-great-work.md (Paul Graham mention)

    Result — Note created (no new Source needed):

    The pipeline recognized the book cover as an existing Source and created a Note linked to it:

    ```markdown


    id: note-2026-03-26-004

    type: note

    title: "On master morality and intrinsic motivation"

    sources: [src-2026-03-15-001]

    tags: [morality, motivation, nietzsche, intrinsic-drive]

    status: active

    captured_at: 2026-03-26T20:15:00Z

    processed_at: 2026-03-26T20:15:38Z


    On master morality and intrinsic motivation

    ![[th-2026-03-26-005]]

    Capture Note

    This section on master and slave morality connects to what Paul Graham said

    about doing great work from intrinsic motivation.

    Related

  • Beyond Good and Evil — source, pp. 47-48
  • How to Do Great Work — Paul Graham on intrinsic motivation
  • ```

    The AI pipeline also created a standalone Thought (th-2026-03-26-005) with the extracted page text and transcluded it into the Note. No new Source created — the pipeline recognized the book and linked to the existing one.

    What AI adds (example)

    Before (raw capture):

    ```markdown


    captured_at: 2026-03-26T14:30:00Z

    capture_source: share_sheet

    input_type: url

    processing_status: pending


    Captured URL

    https://paulgraham.com/greatwork.html

    Voice Note

    The bit about bus ticket collectors is exactly what Nietzsche means by

    "love of fate" — pursuing something not because it's useful but because

    you can't help it.

    ```

    After (AI processed) — two files created:

    Source (sources/how-to-do-great-work.md) — external content only:

    ```markdown


    id: src-2026-03-26-003

    type: source

    title: "How to Do Great Work"

    author: Paul Graham

    url: https://paulgraham.com/greatwork.html

    status: unread

    tags: [creativity, motivation, craft]

    date_added: "2026-03-26T14:30:00Z"


    How to Do Great Work

    Paul Graham — July 2023

    [full extracted article body here]

    ```

    Note (notes/how-to-do-great-work.md) — user's engagement:

    ```markdown


    id: note-2026-03-26-003

    type: note

    title: "How to Do Great Work"

    sources: [src-2026-03-26-003]

    tags: [creativity, motivation, craft]

    status: active

    created: "2026-03-26T14:30:00Z"

    updated: "2026-03-26T14:30:45Z"


    How to Do Great Work

    Capture Note

    The bit about bus ticket collectors is exactly what Nietzsche means by

    "love of fate" — pursuing something not because it's useful but because

    you can't help it.

    Related

  • Beyond Good and Evil — Nietzsche's amor fati
  • On obsessive curiosity
  • ```

    Key points:

  • **Source + Note pair** created from a single capture — Source holds article content, Note holds user's reaction
  • **Original voice note preserved** in the Note under "Capture Note" — never rewritten
  • **AI added**: title, author, tags, related links, article body extraction
  • **Moved** from inbox/ to sources/ + notes/
  • processing_status field removed (done)
  • See NOTE_ARCHITECTURE.md for the full Note model design — transclusion, Source auto-creation, and the VaultItem interface rename.

    Where AI Runs

    Server-side via AWS Bedrock + Amazon Transcribe. The capture already syncs to S3, so the Lambda triggers AI processing directly — no external API keys needed on any client or server. Everything stays within the AWS ecosystem using IAM roles.

    ServiceUseWhy
    **Claude on Bedrock**Classification, structuring, enrichment, linking, tag generationSame models as direct Anthropic API, identical pricing, uses existing AWS IAM — no API keys to manage
    **Claude Vision on Bedrock**OCR for book covers, page photos, handwritten notesSame Claude model handles both text and vision — no separate OCR service
    **Amazon Transcribe**Voice note transcription (Share Sheet audio)AWS-native, reads audio directly from S3, async batch-friendly for Lambda

    No external API accounts needed. The Lambda's IAM role grants access to Bedrock and Transcribe directly.

    Processing Trigger

    ```

    Client saves capture → syncs to S3 → POST /vault/push/confirm

    → Server sees file in inbox/ → enqueues processing job

    → Lambda runs AI pipeline → writes processed .md back to S3

    → Push notification to all devices → clients pull updated file

    ```


    Transposition Engine

    Separate from capture, runs on demand or periodically. Reads annotations across sources and surfaces patterns:

  • **Themed groups**: "You've annotated 4 sources about amor fati — here's a synthesis draft"
  • **Questions**: "Your annotations suggest you're exploring: *Is obsessive curiosity a prerequisite for great work?*"
  • **Connections**: "This annotation on source A contradicts your note on source B"
  • Results appear in a "Insights" feed. They're auto-generated but the user can dismiss, edit, or promote them to full notes.


    Voice Transcription (Amazon Transcribe)

    Voice transcription for Share Sheet captures. Uses Amazon Transcribe instead of OpenAI Whisper — stays within the AWS ecosystem, reads audio directly from S3, no external API key needed.

    DecisionChoice
    **Service**Amazon Transcribe (AWS-native)
    **Rust SDK**aws-sdk-transcribe — same SDK pattern as S3, SNS
    **When**After capture syncs to S3, before AI pipeline
    **Audio format**m4a (iOS native), synced to S3 with the capture
    **Processing**Async batch — StartTranscriptionJob reads from S3, polls for completion
    **Fallback**If transcription fails, keep raw audio reference in the note
    **Storage**Audio files stored in S3 at {user_id}/audio/{capture_id}.m4a, deleted after successful transcription (unless user opts to keep)
    **Cost**~$0.0001/second of audio

    Flow: Share Sheet records audio → saves .m4a + capture .md → syncs both to S3 → Lambda calls Amazon Transcribe → transcript injected into .md → AI pipeline continues.