Internal Documentation

Mailroom Agent API Proxy

Secure, zero-retention API proxy for Gemini with enterprise-grade protection against abuse and key compromise.

System Architecture

Customer App

Mailroom Agent Desktop

API Proxy Worker

Validates, limits, forwards

Google Gemini

AI Model API

Cloudflare D1

Keys, limits, usage

Admin Dashboard

Key management

Request Flow

Every API request goes through multiple validation layers before reaching Gemini.

1

API Key Validation

Request arrives with API key in header. Key is hashed and looked up in D1. Invalid keys are rejected immediately.

2

Rate Limit Check

Check requests-per-minute limit for this key. If exceeded, return 429 with retry-after header. Protects against runaway scripts.

3

Usage Quota Check

Verify monthly token/request quota hasn't been exceeded. If over limit, return 402 Payment Required.

4

Forward to Gemini

Request is forwarded to Gemini using our master API key. Response is streamed back to customer. No content is logged.

5

Async Usage Update

After response completes, usage counters are incremented asynchronously (non-blocking). Only metadata stored: key_id, timestamp, token_count.

Protection Layers

Multiple safeguards ensure a compromised key can't cause significant damage.

Per-Minute Rate Limits

Each key has a maximum requests-per-minute. Prevents scripts from hammering the API. Automatically throttles without blocking legitimate use.

Monthly Usage Caps

Hard limit on tokens/requests per billing period. Once reached, key stops working until reset. Customer sees clear usage in dashboard.

Anomaly Detection

Sudden usage spikes trigger alerts. If a key goes from 10 req/day to 1000 req/hour, we're notified and can investigate or pause.

Instant Revocation

Any key can be revoked instantly from admin dashboard. Takes effect immediately - no propagation delay. Compromised key = dead key in seconds.

Key Rotation

Generate new key while old key remains valid for grace period. Allows seamless rotation without downtime. Old key auto-expires.

Global Circuit Breaker

If total spend across all keys exceeds daily threshold, all non-essential keys pause. Protects us from coordinated attacks or billing surprises.

Progressive Throttling

Instead of hard blocking, we progressively slow down requests as usage increases. This provides a better user experience while still protecting resources.

Usage Level Request Delay Effect
0-50% of limit 0ms (instant) Normal operation, no throttling
50-75% of limit 500ms delay Gentle slowdown, still usable
75-90% of limit 2 second delay Noticeable slowdown, discourages heavy use
90-100% of limit 5 second delay Heavy throttling, warning territory
100%+ of limit Blocked (429) Hard stop until reset

Why Throttle vs Block?

Blocking disrupts legitimate workflows. Throttling lets work continue while naturally limiting abuse. A stolen key running a script will be painfully slow.

Abuse Prevention

Scripts trying to abuse a stolen key will face 5+ second delays per request. What would take seconds takes hours. Not worth the effort.

Default Limits

These limits apply per API key. Throttling kicks in before hard limits. Can be customized per subscription tier.

Limit Type Default Value Purpose
Requests per minute 60 RPM Throttling starts at 30 RPM, blocked at 60
Requests per day 5,000 RPD Throttling starts at 2,500 RPD
Tokens per month 10M tokens Monthly quota tied to subscription tier
Max tokens per request 32,000 Prevents single massive requests
Concurrent requests 10 Queue additional requests with delay

API Usage Example

Customers use their Mailroom Agent API key to access Gemini through our proxy.

Request to Mailroom Agent API POST
# Customer's request to our proxy curl https://api.mailroomagent.com/v1/generateContent \ -H "Authorization: Bearer ma_live_abc123..." \ -H "Content-Type: application/json" \ -d '{ "model": "gemini-pro", "contents": [{ "parts": [{ "text": "Classify this document..." }] }] }'
Response Headers (Rate Limit Info)
X-RateLimit-Limit: 60 X-RateLimit-Remaining: 58 X-RateLimit-Reset: 1706486400 X-Monthly-Tokens-Used: 1234567 X-Monthly-Tokens-Limit: 10000000

Zero Retention Policy

We never store request or response content. Only metadata required for billing and security.

Never Logged

Prompt content, document text, AI responses, file contents, user data in requests.

Only Stored

Key ID, timestamp, token count, HTTP status, response time (for billing and debugging).

If a Key is Compromised

Even in the worst case, damage is limited by design.

Scenario Maximum Damage Mitigation
Key stolen, used immediately 60 requests before throttle Rate limit kicks in within 1 minute
Key stolen, used over time 5,000 requests/day max Daily cap prevents extended abuse
Key stolen, billing impact Monthly quota only Can't exceed subscription tier limits
Key stolen, we're notified Revoked in <1 minute Anomaly alerts + instant revocation

Defense in Depth

No single point of failure. Multiple layers ensure that even if one protection is bypassed, others contain the damage.

Zero content retention
Hashed API keys
Edge-processed globally