← Back to Plan

Tech Decisions

Steel Notes — Technology Decisions

Architectural decisions with rationale, so future sessions don't re-litigate them.


Sync Backend: Custom S3 API vs Platform Cloud Storage

Decision: Custom REST API + S3. Do not use iCloud Drive, CloudKit, or any platform cloud as the sync backend.

Why not iCloud Drive / CloudKit

iCloud Drive was evaluated and rejected for product use. The core issue: you lose control of the most user-visible parts of the experience.

What you loseImpact
Sync timingiCloud decides when to push. A note saved on iPhone may take 30s–5min to appear on Mac. Cannot be influenced.
OrderingiCloud doesn't know that two files are related. Linked notes can arrive out of order.
Error states"iCloud Sync Paused" is Apple's error, not ours. Cannot intercept, explain, or recover gracefully.
Conflict timingConflicts surface after both devices have diverged, never proactively.
TelemetryCannot see why sync is slow for a specific user. Debugging is blind.
Cross-platformiCloud Drive has no Android story. Any future Android support would require a full re-architecture.

iCloud Drive is acceptable as an optional data portability layer (letting users mirror their vault in the Files app), but it must never be the canonical sync backend.

Why S3 + custom API wins

The data is small text files. A heavy user with 1,000 notes is ~50MB in S3 — effectively free ($0.001/month in storage). The API is a thin coordination layer; it never touches file bytes.

What we gain:

Control pointiCloud DriveCustom S3
Push "sync now" to other devicesNoYes — APNs/FCM on confirm
Near real-time syncNoYes — push-triggered, seconds
Sync telemetry per userNoYes
Graceful degradation UXNoYes — we control the error states
Cross-platform (Android, web)NoYes — same protocol
E2E encryption, user controls keysPartialYes — API/S3 never read file contents
Data export (download as zip)Files app hackFirst-class GET /vault/export
Conflict detectionAfter-the-factVersion-checked on every push

Backend Language: Rust (Axum)

Decision: Rust with Axum. Not Go, not Kotlin/Ktor, not TypeScript.

The workload is I/O-bound, not CPU-bound

This API validates JWTs, queries Postgres, generates presigned URLs, and dispatches push notifications. It never reads or writes vault file bytes. The bottleneck is always network/DB round-trips.

Where Rust wins meaningfully for this workload

FactorRust (Axum)GoKotlin (Ktor)
Memory per instance~10–30 MB~30–50 MB~200–400 MB (JVM)
Cold start (Lambda)~50 ms~80 ms~2–5 s
Tail latency (p99)BestVery goodGood
Binary size~5 MB single binary~10 MB single binary~50 MB+ fat JAR
Monthly hosting cost at scaleLowestLow3–5Ɨ more (RAM)

Kotlin/Ktor was specifically rejected despite the KMP model-sharing benefit. The JVM memory footprint means paying 3–5Ɨ more per instance. For a thin coordination API, sharing ~100 lines of data models is not worth the operational cost.

Go was the pragmatic alternative. If Rust proves too slow to iterate on, Go is the fallback — similar operational profile, faster to write.

Why Rust's slower dev velocity is acceptable here

The API surface is small: ~5 core endpoints, auth middleware, S3 presigned URL generation, push notification dispatch. This is not a complex domain — maybe 1,500–2,000 lines of actual logic. The Axum + sqlx + aws-sdk-rust stack is mature and well-documented for exactly this pattern.

The operational savings (smaller instances, faster cold starts on Lambda, lower tail latency) compound over the lifetime of the product.

Stack: Rust 2024 edition, Axum 0.8, sqlx (Postgres, compile-time checked queries), aws-sdk-s3, jsonwebtoken, tower middleware.


Cloud Provider: AWS

Decision: AWS. Not Azure, not GCP.

The architecture is designed around S3. S3 presigned URLs are native; using Azure Blob Storage would require rewriting the presigned URL logic for no benefit.

AWS services used:

ServicePurpose
**Lambda**API hosting (pay-per-request, $0 at idle)
**API Gateway**HTTPS endpoint in front of Lambda
**RDS Postgres**User, device, and file version state
**S3**Vault file storage (steel-notes-vaults/ bucket)
**SNS → APNs/FCM**Push notification fan-out to devices

Infrastructure is defined in Terraform at server/deploy/terraform/.


Starting Infrastructure Size: t4g.nano

Decision: RDS t4g.nano to start.

  • t4g.nano = 0.5 GB RAM, 2 vCPU burstable, ARM64 (Graviton2)
  • Cost: ~$6/month on-demand, ~$4/month reserved
  • Handles thousands of users at our query volume (simple indexed lookups, no complex joins)
  • Upgrade path: t4g.micro → t4g.small → Aurora Serverless when warranted by actual load
  • Lambda + API Gateway handles all compute — no EC2 or Fargate instances to size at launch.


    Deployment Model: AWS Lambda via cargo-lambda

    Decision: Lambda as the primary deployment target. ECS Fargate as a future fallback if Lambda p99 latency becomes a problem at scale.

    Rust's binary (~5 MB) and cold start (~50 ms) make Lambda viable in a way that JVM-based languages are not. cargo-lambda handles the build and deploy pipeline. The binary runs on ECS Fargate with zero code changes if needed.


    Repo Structure: Monorepo

    Decision: Single repo for now. Backend lives in server/ alongside shared/, cli/, iosApp/.

    Rationale: moving fast matters more than clean repo boundaries at this stage. Splitting into separate repos adds CI/CD complexity, versioning overhead, and friction for cross-cutting changes (e.g., a sync protocol change touches both shared/ and server/).

    Revisit when: the team grows and backend/client development cadences diverge significantly.

    ```

    steel-notes/

    ā”œā”€ā”€ shared/ # KMP shared Kotlin module

    ā”œā”€ā”€ cli/ # Kotlin/Native macOS CLI

    ā”œā”€ā”€ iosApp/ # SwiftUI iOS app

    ā”œā”€ā”€ server/ # Rust/Axum backend (AWS Lambda)

    │ ā”œā”€ā”€ src/

    │ ā”œā”€ā”€ Cargo.toml

    │ ā”œā”€ā”€ Dockerfile

    │ └── deploy/terraform/

    └── docs/ # Architecture docs

    ```


    Decisions Still Open

    1. Pricing model — free tier limits + paid subscription? Storage cost per user is negligible (~$0.001/month for text). Real costs are Lambda invocations and push notifications. Decide before Phase 3E.

    2. Cloudflare R2 vs AWS S3 — R2 has no egress fees (S3 charges $0.09/GB out). For a text-heavy app with small files, egress cost is negligible now but worth revisiting if attachments (images, PDFs) are added. S3 chosen for ecosystem simplicity; R2 is a drop-in swap if egress becomes meaningful.

    3. E2E encryption — architecture supports it (API and S3 never read file contents), but key management UX is complex. Deferred post-launch or as a premium feature.

    4. Lambda → Fargate migration trigger — if p99 API latency from Lambda cold starts exceeds 500 ms at scale, migrate to Fargate minimum-1-task. No code changes required.