132 lines
7.0 KiB
Markdown
132 lines
7.0 KiB
Markdown
# SEC-MS-02 — Streaming Upload Refactor (Requirements Draft)
|
||
|
||
**Goal**
|
||
Replace the current “single POST with multipart FormData” guest upload with a streaming / chunked pipeline that:
|
||
|
||
- avoids buffering entire files in PHP memory
|
||
- supports larger assets (target 25 MB originals)
|
||
- keeps antivirus/EXIF scrubbing and storage accounting intact
|
||
- exposes clear retry semantics to the guest PWA
|
||
|
||
This document captures the scope for SEC-MS-02 and feeds into implementation tickets.
|
||
|
||
---
|
||
|
||
## 1. Current State (Baseline)
|
||
|
||
- Upload endpoint: `POST /api/v1/events/{token}/upload` handled by `EventPublicController::upload`.
|
||
- Laravel validation enforces `image|max:6144` (≈6 MB). Entire file is received via `Request::file('photo')`.
|
||
- Storage flow: `Storage::disk($hotDisk)->putFile(...)` followed by synchronous thumbnail creation and `event_media_assets` bookkeeping.
|
||
- Device rate limiting: simple counter (`guest_name` = device id) per event.
|
||
- Security: join token validation + IP rate limiting; antivirus/exif cleanup handled asynchronously by `ProcessPhotoSecurityScan` (queued).
|
||
- Frontend: guest PWA uses `fetch` + FormData; progress handled by custom XHR queue for UI feedback.
|
||
|
||
Pain points:
|
||
- Upload size ceiling due to PHP post_max_size + memory usage.
|
||
- Slow devices stall the controller request; no streaming/chunk resume.
|
||
- Throttling/locks only consider completed uploads; partial data still consumes bandwidth.
|
||
|
||
---
|
||
|
||
## 2. Target Architecture Overview
|
||
|
||
### 2.1 Session-Based Chunk Upload
|
||
|
||
1. **Create session**
|
||
- `POST /api/v1/events/{token}/uploads` → returns `upload_id`, `upload_key`, storage target, chunk size.
|
||
- Validate join token + device limits *before* accepting session. Record session in new table `event_upload_sessions`.
|
||
|
||
2. **Upload chunks**
|
||
- `PUT /api/v1/events/{token}/uploads/{upload_id}/chunk` with headers: `Content-Range`, `Content-Length`, `Upload-Key`.
|
||
- Chunks written to hot storage *stream* destination (e.g. `storage/app/uploads/{upload_id}/chunk_{index}`) via `StreamedResponse`/`fopen`.
|
||
- Track received ranges in session record; enforce sequential or limited parallel chunks.
|
||
|
||
3. **Complete upload**
|
||
- `POST /api/v1/events/{token}/uploads/{upload_id}/complete`
|
||
- Assemble chunks → single file (use stream copy to final path), compute checksum, dispatch queue jobs (AV/EXIF, thumbnail).
|
||
- Persist `photos` row + `event_media_assets` references (mirroring current logic).
|
||
|
||
4. **Abort**
|
||
- `DELETE /api/v1/events/{token}/uploads/{upload_id}` to clean up partial data.
|
||
|
||
### 2.2 Storage Strategy
|
||
|
||
- Use `EventStorageManager` hot disk but with temporary “staging” directory.
|
||
- After successful assembly, move to final `events/{eventId}/photos/{uuid}.ext`.
|
||
- For S3 targets, evaluate direct multipart upload to S3 using pre-signed URLs:
|
||
- Option A (short-term): stream into local disk, then background job pushes to S3.
|
||
- Option B (stretch): delegate chunk upload directly to S3 using `createMultipartUpload`, storing uploadId + partETags.
|
||
- Ensure staging cleanup job removes abandoned sessions (cron every hour).
|
||
|
||
### 2.3 Metadata & Limits
|
||
|
||
- New table `event_upload_sessions` fields:
|
||
`id (uuid)`, `event_id`, `join_token_id`, `device_id`, `status (pending|uploading|assembling|failed|completed)`, `total_size`, `received_bytes`, `chunk_size`, `expires_at`, `failure_reason`, timestamps.
|
||
- Device/upload limits: enforce daily cap per device via session creation; consider max concurrent sessions per device/token (default 2).
|
||
- Maximum file size: 25 MB (configurable via `config/media.php`). Validate at `complete` by comparing expected vs actual bytes.
|
||
|
||
### 2.4 Validation & Security
|
||
|
||
- Require `Upload-Key` secret per session (stored hashed) to prevent hijacking.
|
||
- Join token + device validations reused; log chunk IP + UA for anomaly detection.
|
||
- Abort sessions on repeated integrity failures or mismatched `Content-Range`.
|
||
- Update rate limiter to consider `PUT` chunk endpoints separately.
|
||
|
||
### 2.5 API Responses & Errors
|
||
|
||
- Provide consistent JSON:
|
||
- `201` create: `{ upload_id, chunk_size, expires_at }`
|
||
- chunk success: `204`
|
||
- complete: `201 { photo_id, file_path, thumbnail_path }`
|
||
- error codes: `upload_limit`, `chunk_out_of_order`, `range_mismatch`, `session_expired`.
|
||
- Document in `docs/prp/03-api.md` + update guest SDK.
|
||
|
||
### 2.6 Backend Jobs
|
||
|
||
- Assembly job (if asynchronous) ensures chunk merge is offloaded for large files; update `ProcessPhotoSecurityScan` to depend on final asset record.
|
||
- Add metric counters (Prometheus/Laravel events) for chunk throughput, failed sessions, average complete time.
|
||
|
||
---
|
||
|
||
## 3. Frontend Changes (Guest PWA)
|
||
|
||
- Replace current FormData POST with streaming uploader:
|
||
- Request session, slice file into `chunk_size` (default 1 MB) using `Blob.slice`, upload sequentially with retry/backoff.
|
||
- Show granular progress (bytes uploaded / total).
|
||
- Support resume: store `upload_id` & received ranges in IndexedDB; on reconnect query session status from new endpoint `GET /api/v1/events/{token}/uploads/{upload_id}`.
|
||
- Ensure compatibility fallback: if browser lacks required APIs (e.g. old Safari), fallback to legacy single POST (size-limited) with warning.
|
||
- Update service worker/queue to pause/resume chunk uploads when offline.
|
||
|
||
---
|
||
|
||
## 4. Integration & Migration Tasks
|
||
|
||
1. **Schema**: create `event_upload_sessions` table + indices; optional `event_upload_chunks` if tracking per-part metadata.
|
||
2. **Config**: new entries in `config/media.php` for chunk size, staging path, session TTL, max size.
|
||
3. **Env**: add `.env` knobs (e.g. `MEDIA_UPLOAD_CHUNK_SIZE=1048576`, `MEDIA_UPLOAD_MAX_SIZE=26214400`).
|
||
4. **Cleanup Command**: `php artisan media:prune-upload-sessions` to purge expired sessions & staging files. Hook into cron `/cron/media-prune-sessions.sh`.
|
||
5. **Docs**: update PRP (sections 03, 10) and guest PWA README; add troubleshooting guide for chunk upload errors.
|
||
6. **Testing**:
|
||
- Unit: session creation, chunk validation, assembly with mocked storage.
|
||
- Feature: end-to-end upload success + failure (PHPUnit).
|
||
- Playwright: simulate chunked upload with network throttling.
|
||
- Load: ensure concurrent uploads do not exhaust disk IO.
|
||
|
||
---
|
||
|
||
## 5. Open Questions
|
||
|
||
- **S3 Multipart vs. Local Assembly**: confirm timeline for direct-to-S3; MVP may prefer local assembly to limit complexity.
|
||
- **Encryption**: decide whether staging chunks require at-rest encryption (likely yes if hot disk is shared).
|
||
- **Quota Enforcement**: should device/event caps be session-based (limit sessions) or final photo count (existing)? Combine both?
|
||
- **Backward Compatibility**: decide when to retire legacy endpoint; temporarily keep `/upload` fallback behind feature flag.
|
||
|
||
---
|
||
|
||
## 6. Next Steps
|
||
|
||
- Finalise design choices (S3 vs local) with Media Services.
|
||
- Break down into implementation tasks (backend API, frontend uploader, cron cleanup, observability).
|
||
- Schedule dry run in staging with large sample files (20+ MB) and monitor memory/CPU.
|
||
- Update SEC-MS-02 ticket checklist with deliverables above.
|