Chapter 11

Chapter 11: Machine Identity & API Keys

Chapter 11 — Non-Human Principals

Every chapter so far has described how a human authenticates. But modern infrastructure is dominated by non-human actors: CI/CD pipelines, microservices, automation scripts, and monitoring agents. These machines cannot scan a QR code or click a magic link. This chapter builds an API key system that gives machines a durable, revocable identity with the same audit trail as human sessions.

11.1 The Machine Identity Problem

Consider a CI/CD pipeline that runs integration tests against a staging environment protected by Rooiam. After a code push, the pipeline needs to:

  1. Authenticate to the Rooiam API.
  2. Provision a test user.
  3. Verify the test scenario.
  4. Clean up the test user.

A human performs these steps interactively. A machine must do it programmatically, in a headless environment, without any interactive input. Session cookies and magic links are designed for browsers and humans — they have no natural equivalent for a Python script.

The industry-standard solution is an API key: a long, random token that the machine includes in every HTTP request as a bearer credential.

Authorization: Bearer rooiam_sk_3a8f2e1c9b7d4...

An API key differs from a session cookie in several important ways:

PropertySession CookieAPI Key
DurationShort-lived (hours/days)Long-lived (months/permanent)
ScopeFull user accessCan be scoped to specific permissions
User interactionRenewed by user activityRenewed manually by an admin
RevocationAutomatic on expiryOnly by explicit deletion
StorageBrowser cookie jarApplication secret store or env var

The long lifetime and non-interactive nature of API keys make key management a first-class concern: who has which keys, when were they last used, and can they be revoked immediately if a key leaks?

11.2 The One-Way Storage Problem

API keys must be stored on the server for authentication. But unlike passwords, API keys are long random strings that the user typed zero times — they were generated by the server and copied once into a config file. Does this mean we can store them in plaintext?

No. A database breach that exposes plaintext API keys gives an attacker immediate access to every machine that ever used the system. The keys work until each is individually revoked — which is difficult to do quickly at scale.

The solution is the same as passwords: store a one-way hash. Rooiam hashes API keys with SHA-256 (not Argon2 — we explain why shortly):

A database breach exposes only hashes. SHA-256 of a 32-byte random key has ~256 bits of entropy — it is computationally infeasible to reverse.

Why SHA-256 and not Argon2? Argon2 is designed to be slow — that is its defence against dictionary attacks on low-entropy secrets like passwords. API keys are not low-entropy: they are 32 bytes of OS randomness (~256 bits). There is no dictionary to attack. The cost of Argon2 would be paid on every API call (potentially thousands per second) for zero security benefit. SHA-256 is fast, collision-resistant, and appropriate for high-entropy secrets.

11.3 The Schema

This schema shows one API key row tied to a workspace, with an optional user link when the key is meant to act as a specific person.

CREATE TABLE api_keys (
    id              UUID        PRIMARY KEY DEFAULT gen_random_uuid(),

    -- The owning principal
    organization_id UUID        NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
    user_id         UUID        REFERENCES users(id) ON DELETE SET NULL,
    -- NULL user_id = org-level key (for service accounts, not linked to a person)

    -- The key material — raw key is shown once and never stored
    key_hash        TEXT        UNIQUE NOT NULL, -- SHA-256 hex digest
    key_prefix      VARCHAR(16) NOT NULL,        -- e.g. "rooiam_sk_3a8f2e" (for display)

    -- Metadata
    name            VARCHAR(200) NOT NULL,       -- "CI Pipeline", "Monitoring Agent"
    scopes          TEXT[]       NOT NULL DEFAULT '{}', -- e.g. {"users:read", "audit:read"}

    -- Lifecycle
    last_used_at    TIMESTAMPTZ,
    expires_at      TIMESTAMPTZ,                 -- NULL = no expiry
    revoked_at      TIMESTAMPTZ,                 -- NULL = active
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    created_by      UUID        REFERENCES users(id) ON DELETE SET NULL
);

CREATE INDEX idx_api_keys_hash ON api_keys (key_hash);
CREATE INDEX idx_api_keys_org  ON api_keys (organization_id, revoked_at);

key_prefix — Stores the first 16 characters of the raw key. This lets the user identify a key on screen ("the key starting with rooiam_sk_3a8f2e") without storing the full secret. If a key appears in a log file or Slack message, the prefix helps the owner quickly identify and revoke the right key.

scopes — A PostgreSQL text array listing the permissions this key has. An API key for a read-only monitoring agent would have {"users:read", "audit:read"}. A full-access key would have {"*"}. This is a coarse-grained permission layer on top of the RBAC system from Chapter 8.

revoked_at — Revocation is implemented by setting this column rather than deleting the row. The row stays in the database, preserving the audit trail and allowing the "last used at" information to be displayed in the key management UI.

11.4 Key Generation

// src/modules/api_keys/service.rs

use rand::RngCore;
use rand::rngs::OsRng;
use sha2::{Sha256, Digest};

pub struct CreatedKey {
    pub raw:    String,   // shown to user once
    pub hash:   String,   // stored in DB
    pub prefix: String,   // stored in DB for display
}

pub fn generate_api_key() -> CreatedKey {
    // 32 bytes = 256 bits of OS entropy
    let mut bytes = [0u8; 32];
    OsRng.fill_bytes(&mut bytes);

    // Encode as URL-safe base64, prefix with "rooiam_sk_"
    let encoded = base64::engine::general_purpose::URL_SAFE_NO_PAD
        .encode(&bytes);
    let raw = format!("rooiam_sk_{encoded}");

    // Hash for storage (SHA-256 of the full raw key)
    let hash = hex::encode(Sha256::digest(raw.as_bytes()));

    // Prefix for display (first 16 chars of raw key after "rooiam_sk_")
    let prefix = raw[..16.min(raw.len())].to_string();

    CreatedKey { raw, hash, prefix }
}

pub async fn create_api_key(
    db:    &PgPool,
    org_id: Uuid,
    user_id: Uuid,
    name:   &str,
    scopes: &[String],
    expires_at: Option<DateTime<Utc>>,
) -> Result<CreatedKey, AppError> {
    let key = generate_api_key();

    sqlx::query!(
        r#"
        INSERT INTO api_keys
            (organization_id, user_id, key_hash, key_prefix, name, scopes,
             expires_at, created_by)
        VALUES ($1, $2, $3, $4, $5, $6, $7, $2)
        "#,
        org_id,
        user_id,
        key.hash,
        key.prefix,
        name,
        scopes,
        expires_at,
    )
    .execute(db)
    .await?;

    // Emit audit log
    AuditEvent::new("api_key.created")
        .org(org_id)
        .actor(user_id, /* session_id */ Uuid::nil())
        .meta(serde_json::json!({ "name": name, "prefix": key.prefix }))
        .fire(db.clone());

    Ok(key)  // raw key returned to caller once; not stored
}

The function returns CreatedKey which contains the raw key. The caller (the HTTP handler) includes raw in the API response. This is the only time the full key appears in the system. After the response is sent, the raw key is gone from server memory — only the hash remains.

11.5 Authentication Middleware

On every request that includes an Authorization: Bearer header, Rooiam runs the API key extractor middleware:

// src/modules/api_keys/middleware.rs

pub async fn extract_api_key(
    db:      &PgPool,
    bearer:  &str,
) -> Result<ApiKeyPrincipal, AppError> {
    // 1. Hash the incoming bearer token
    let hash = hex::encode(Sha256::digest(bearer.as_bytes()));

    // 2. Look up by hash — index makes this O(log n) not O(n)
    let key = sqlx::query_as!(
        ApiKey,
        r#"
        SELECT * FROM api_keys
        WHERE key_hash = $1
          AND revoked_at IS NULL
          AND (expires_at IS NULL OR expires_at > NOW())
        "#,
        hash,
    )
    .fetch_optional(db)
    .await?
    .ok_or(AppError::Unauthorized)?;

    // 3. Update last_used_at (fire-and-forget to avoid blocking request)
    let db2 = db.clone();
    let key_id = key.id;
    tokio::spawn(async move {
        let _ = sqlx::query!(
            "UPDATE api_keys SET last_used_at = NOW() WHERE id = $1",
            key_id
        )
        .execute(&db2)
        .await;
    });

    Ok(ApiKeyPrincipal {
        key_id:         key.id,
        organization_id: key.organization_id,
        user_id:        key.user_id,
        scopes:         key.scopes,
    })
}

The WHERE revoked_at IS NULL AND (expires_at IS NULL OR expires_at > NOW()) clause means revocation takes effect immediately on the next request. There is no cache to invalidate, no token to wait out. This is the advantage of opaque credentials over signed JWTs: the server holds the authoritative state.

11.6 Scope Enforcement

Requests authenticated via API key are checked against the key's scopes array in addition to the standard RBAC check:

pub fn check_api_key_scope(
    principal: &ApiKeyPrincipal,
    required:  &str,
) -> Result<(), AppError> {
    let has_wildcard = principal.scopes.iter().any(|s| s == "*");
    let has_scope    = principal.scopes.iter().any(|s| s == required);

    if has_wildcard || has_scope {
        return Ok(());
    }

    Err(AppError::Forbidden(format!(
        "API key does not have scope: {required}"
    )))
}

This creates a dual-layer authorisation for API keys:

  1. The RBAC permission check from Chapter 8 (what the key's owning user is allowed to do).
  2. The scope check (what this specific key was issued permission to do).

A key can only do things that are both within its scope and within the owning user's permissions. Granting a monitoring agent a users:read scope does not give it users:write access, even if the owning admin account has that permission.

11.7 The Key Lifecycle

Both expired and revoked keys are rejected by the middleware. The difference is:

  • Expired: expected lifecycle end. The key was created with a future expires_at and that date has passed. The owner planned for this.
  • Revoked: emergency or routine decommissioning. The owner explicitly terminated the key — perhaps because it leaked, or the machine that held it was decommissioned.

Neither state deletes the row, so the management UI can show the full history of all keys ever created for an organisation, including when they were last used before expiry.


Chapter Summary
  • API keys give non-human actors (CI pipelines, microservices) a durable, revocable identity that does not require interactive login.
  • Keys are stored as SHA-256 hashes, not plaintext. SHA-256 (not Argon2) is appropriate because API keys have ~256 bits of entropy — there is no dictionary to attack.
  • The key_prefix column stores the first 16 characters of the raw key for display purposes, allowing users to identify a key without ever storing the full secret.
  • Scopes create a least-privilege layer: a key can only access resources it was explicitly granted, even if the issuing user has broader permissions.
  • Revocation is immediate: the middleware looks up the key on every request; setting revoked_at blocks all subsequent requests with no cache to wait out.
  • The last_used_at update uses .fire() (background Tokio task) so key authentication never blocks on a write.

Exercises
  1. An engineer commits an API key to a public GitHub repository by mistake. Describe the exact sequence of operations needed to contain the incident. What Rooiam endpoints would be called? How quickly does revocation take effect?

  2. The middleware hashes the incoming bearer token and queries by key_hash. Could an attacker enumerate valid API keys by sending millions of requests with guessed values? How many hashes per second would they need to try, and how long would it take to find a 32-byte random key?

  3. An API key has scopes = ["users:read"]. The owning user account has the admin role with users:write permission. Can the key call a POST /v1/users endpoint? Trace the dual-layer authorisation check.

  4. Design a service_accounts table for org-level machine identities that have no associated human user. How does this differ from a key with user_id = NULL? What extra columns would a service account need?