# Public API Incident Response Playbook (SEC-API-02) Scope: Guest-facing API endpoints that rely on join tokens and power the guest PWA plus the public gallery. This includes: - `/api/v1/events/{token}/*` (stats, tasks, uploads, photos) - `/api/v1/gallery/{token}/*` - Signed download/asset routes generated via `EventPublicController` The playbook focuses on abuse, availability loss, and leaked content. --- ## 1. Detection & Alerting | Signal | Where to Watch | Notes | | --- | --- | --- | | 4xx/5xx spikes | Application logs (`storage/logs/laravel.log`), centralized logging | Look for repeated `Join token access denied` / `token_rate_limited` or unexpected 5xx. | | Rate-limit triggers | Laravel log lines emitted from `EventPublicController::handleTokenFailure` | Contains IP + truncated token preview. | | CDN/WAF alerts | Reverse proxy (if enabled) | Ensure 429/403 anomalies are forwarded to incident channel. | | Synthetic monitors | Planned via `SEC-API-03` | Placeholder until monitors exist. | Manual check commands: ```bash php artisan log:tail --lines=200 | grep "Join token" php artisan log:tail --lines=200 | grep "gallery" ``` ## 2. Severity Classification | Level | Criteria | Examples | | --- | --- | --- | | SEV-1 | Wide outage (>50% error rate), confirmed data leak or malicious mass-download | Gallery downloads serving wrong event, join-token table compromised. | | SEV-2 | Localised outage (single tenant/event) or targeted brute force attempting to enumerate tokens | Single event returning 500, repeated `invalid_token` from single IP range. | | SEV-3 | Minor functional regression or cosmetic issue | Rate limit misconfiguration causing occasional 429 for legitimate users. | Escalate SEV-1/2 immediately to on-call via Slack `#incident-response` and open PagerDuty incident (if configured). ## 3. Immediate Response Checklist 1. **Confirm availability** - `curl -I https://app.test/api/v1/gallery/{known_good_token}` - Use tenant-provided test token to validate `/events/{token}` flow. 2. **Snapshot logs** - Export last 15 minutes from log aggregator or `storage/logs`. Attach to incident ticket. 3. **Assess scope** - Identify affected tenant/event IDs via log context. - Note IP addresses triggering rate limits. 4. **Decide mitigation** - Brute force? → throttle/bock offending IPs. - Compromised token? → revoke token via Filament or `php artisan tenant:join-tokens:revoke {id}` (once command exists). - Endpoint regression? → begin rolling fix or feature flag toggle. ## 4. Mitigation Tactics ### 4.1 Abuse / Brute force - Increase rate-limiter strictness temporarily by editing `config/limiting.php` (if available) or applying runtime block in the load balancer. - Use fail2ban/WAF rules to block offending IPs. For quick local action: ```bash sudo ufw deny from ``` - Consider temporarily disabling gallery download by setting `PUBLIC_GALLERY_ENABLED=false` (feature flag planned) and clearing cache. ### 4.2 Token Compromise - Revoke specific token via Filament “Join Tokens” modal (Event → Join Tokens → revoke). - Notify tenant with replacement token instructions. - Audit join-token logs for additional suspicious use and consider rotating all tokens for the event. ### 4.3 Internal Failure (500s) - Tail logs for stack traces. - If due to downstream storage, fail closed: return 503 with maintenance banner while running `php artisan storage:diagnostics`. - Roll back recent deployment or disable new feature flag if traced to release. ## 5. Communication | Audience | Channel | Cadence | | --- | --- | --- | | Internal on-call | Slack `#incident-response`, PagerDuty | Initial alert, hourly updates. | | Customer Support | Slack `#support` with summary | Once per significant change (mitigation applied, issue resolved). | | Tenants | Email template “Public gallery disruption” (see `resources/lang/*/emails.php`) | Only for SEV-1 or impactful SEV-2 after mitigation. | Document timeline, impact, and mitigation in the incident ticket. ## 6. Verification & Recovery After applying mitigation: 1. Re-run test requests for affected endpoints. 2. Validate join-token creation/revocation via Filament. 3. Confirm error rates return to baseline in monitoring/dashboard. 4. Remove temporary firewall blocks once threat subsides. ## 7. Post-Incident Actions - File RCA within 48 hours including: root cause, detection gaps, follow-up tasks (e.g., enabling synthetic monitors, adding audit fields). - Update documentation if new procedures are required (`docs/prp/11-public-gallery.md`, `docs/prp/03-api.md`). - Schedule backlog items for long-term fixes (e.g., better anomaly alerting, token analytics dashboards). ## 8. References & Tools - Log aggregation: `storage/logs/laravel.log` (local), Stackdriver/Splunk (staging/prod). - Rate limit config: `App\Providers\AppServiceProvider` → `RateLimiter::for('tenant-api')` and `EventPublicController::handleTokenFailure`. - Token management UI: Filament → Events → Join Tokens. - Signed URL generation: `app/Http/Controllers/Api/EventPublicController` (for tracing download issues). Keep this document alongside the other deployment runbooks and review quarterly.