Internal Documentation

Mailroom Agent API Proxy

Secure, zero-retention API proxy for Gemini with enterprise-grade protection against abuse and key compromise.

System Architecture

Customer App

Mailroom Agent Desktop

API Proxy Worker

Validates, limits, forwards

Google Gemini

AI Model API

Cloudflare D1

Keys, limits, usage

Admin Dashboard

Key management

Request Flow

Every API request goes through multiple validation layers before reaching Gemini.

API Key Validation

Request arrives with API key in header. Key is hashed and looked up in D1. Invalid keys are rejected immediately.

Rate Limit Check

Check requests-per-minute limit for this key. If exceeded, return 429 with retry-after header. Protects against runaway scripts.

Usage Quota Check

Verify monthly token/request quota hasn't been exceeded. If over limit, return 402 Payment Required.

Forward to Gemini

Request is forwarded to Gemini using our master API key. Response is streamed back to customer. No content is logged.

Async Usage Update

After response completes, usage counters are incremented asynchronously (non-blocking). Only metadata stored: key_id, timestamp, token_count.

Protection Layers

Multiple safeguards ensure a compromised key can't cause significant damage.

Per-Minute Rate Limits

Each key has a maximum requests-per-minute. Prevents scripts from hammering the API. Automatically throttles without blocking legitimate use.

Monthly Usage Caps

Hard limit on tokens/requests per billing period. Once reached, key stops working until reset. Customer sees clear usage in dashboard.

Anomaly Detection

Sudden usage spikes trigger alerts. If a key goes from 10 req/day to 1000 req/hour, we're notified and can investigate or pause.

Instant Revocation

Any key can be revoked instantly from admin dashboard. Takes effect immediately - no propagation delay. Compromised key = dead key in seconds.

Key Rotation

Generate new key while old key remains valid for grace period. Allows seamless rotation without downtime. Old key auto-expires.

Global Circuit Breaker

If total spend across all keys exceeds daily threshold, all non-essential keys pause. Protects us from coordinated attacks or billing surprises.

Progressive Throttling

Instead of hard blocking, we progressively slow down requests as usage increases. This provides a better user experience while still protecting resources.

Usage Level	Request Delay	Effect
0-50% of limit	0ms (instant)	Normal operation, no throttling
50-75% of limit	500ms delay	Gentle slowdown, still usable
75-90% of limit	2 second delay	Noticeable slowdown, discourages heavy use
90-100% of limit	5 second delay	Heavy throttling, warning territory
100%+ of limit	Blocked (429)	Hard stop until reset

Why Throttle vs Block?

Blocking disrupts legitimate workflows. Throttling lets work continue while naturally limiting abuse. A stolen key running a script will be painfully slow.

Abuse Prevention

Scripts trying to abuse a stolen key will face 5+ second delays per request. What would take seconds takes hours. Not worth the effort.

Default Limits

These limits apply per API key. Throttling kicks in before hard limits. Can be customized per subscription tier.

Limit Type	Default Value	Purpose
Requests per minute	60 RPM	Throttling starts at 30 RPM, blocked at 60
Requests per day	5,000 RPD	Throttling starts at 2,500 RPD
Tokens per month	10M tokens	Monthly quota tied to subscription tier
Max tokens per request	32,000	Prevents single massive requests
Concurrent requests	10	Queue additional requests with delay

API Usage Example

Customers use their Mailroom Agent API key to access Gemini through our proxy.

                Request to Mailroom Agent API
                POST
            

# Customer's request to our proxy
curl https://api.mailroomagent.com/v1/generateContent \
  -H "Authorization: Bearer ma_live_abc123..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-pro",
    "contents": [{
      "parts": [{
        "text": "Classify this document..."
      }]
    }]
  }'
            

                Response Headers (Rate Limit Info)
            

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 58
X-RateLimit-Reset: 1706486400
X-Monthly-Tokens-Used: 1234567
X-Monthly-Tokens-Limit: 10000000
            

Zero Retention Policy

We never store request or response content. Only metadata required for billing and security.

Never Logged

Prompt content, document text, AI responses, file contents, user data in requests.

Only Stored

Key ID, timestamp, token count, HTTP status, response time (for billing and debugging).

If a Key is Compromised

Even in the worst case, damage is limited by design.

Scenario	Maximum Damage	Mitigation
Key stolen, used immediately	60 requests before throttle	Rate limit kicks in within 1 minute
Key stolen, used over time	5,000 requests/day max	Daily cap prevents extended abuse
Key stolen, billing impact	Monthly quota only	Can't exceed subscription tier limits
Key stolen, we're notified	Revoked in <1 minute	Anomaly alerts + instant revocation

Zero content retention

Hashed API keys

Edge-processed globally