StaffAdmin Documentation

AI System Configuration

Comprehensive staff guide for configuring the BeforeMerge AI system. Covers provider setup, scan tiers, credit calculations, automatic fallback, user API key overrides, the AI review engine, repo profile learning, best practices mapping, and the AI testing dashboard.

Location: Admin → AI Providers tab·Access: is_staff = true

1. AI Provider Configuration

BeforeMerge supports three AI providers out of the box. Each provider is configured via environment variables and managed through the Admin AI Providers tab.

Environment Variables

Set these in your deployment environment. All three are optional — the system works with any combination, but at least one must be configured.

.env (server-side only)

# Anthropic — primary provider
ANTHROPIC_API_KEY=sk-ant-api03-...

# OpenAI — secondary provider
OPENAI_API_KEY=sk-proj-...

# Google AI — tertiary provider
GOOGLE_AI_API_KEY=AIza...

Available Providers & Models

Provider	Default Model	Available Models	Pricing (per 1M tokens)
Anthropic	Claude Sonnet 4.6	Haiku 3.5, Sonnet 4.6, Opus 4.6	$0.25 / $1.25 (in/out) to $15 / $75
OpenAI	GPT-5.4	GPT-4.1 Mini, GPT-5.4, GPT-5.4 Turbo	$0.40 / $1.60 to $10 / $30
Google	Gemini 2.5 Pro	Gemini 2.5 Flash, Gemini 2.5 Pro	$0.15 / $0.60 to $3.50 / $10.50

Provider Priority System

The system tries providers in priority order. If the first provider fails or is unavailable, it automatically falls back to the next. Priority is configurable in the Admin AI Providers tab.

Default priority order

// Configured in admin_ai_provider_config table
provider_priority: ["anthropic", "openai", "google"]

// The system resolves the active provider:
// 1. Check provider_priority order
// 2. Skip any provider without a configured API key
// 3. Skip any provider currently in major_outage
// 4. Use the first available provider

Setting Default Models per Provider

Each provider has a default model used when no specific model is requested. Admins can change the default model in the AI Providers tab without redeploying. The tier-to-model mapping (Quick/Deep/Max) is configured separately.

2. Scan Tiers

Every scan runs at one of three tiers, each trading speed for depth. The tier determines which AI model is used, how many passes are made over the code, and the credit cost.

Quick (1x multiplier)

Fast, single-pass scan using a lightweight model. Best for quick PR checks and real-time feedback. Catches obvious issues like syntax errors, basic security violations, and style problems.

Default model: Claude Haiku 3.5·Typical latency: 5-15s

Deep (3x multiplier)

Multi-pass scan using a mid-tier model. Performs contextual analysis including cross-file dependency checks, security pattern matching against OWASP rules, and architecture validation. Recommended for most PR reviews.

Default model: Claude Sonnet 4.6·Typical latency: 30-90s

Max (10x multiplier)

Comprehensive multi-pass scan using the most capable model. Performs deep architectural analysis, identifies subtle security vulnerabilities, checks for performance anti-patterns, verifies best practices across the entire changeset, and cross-references with the repo profile. Ideal for release branches and critical PRs.

Default model: Claude Opus 4.6·Typical latency: 2-5 min

Tier Model Mapping

Admins can reassign which model is used at each tier. This is useful when a new model is released or when cost optimization is needed.

Tier	Default Model	Multiplier	Use Case
Quick	Haiku 3.5	1x	Fast PR checks, lint-level issues
Deep	Sonnet 4.6	3x	Standard PR reviews, security checks
Max	Opus 4.6	10x	Release branches, audits, critical PRs

3. Credit System Flow

Credits are the internal currency for scan operations. The cost of a scan is calculated from two admin-configurable values: the base scan cost and the tier multiplier.

Base Scan Costs (Admin-Configurable)

Scan Type	Base Cost (credits)	Description
PR Diff	2	Scan only the changed files in a pull request
Full Small	5	Full repo scan for repos under 50 files
Full Medium	10	Full repo scan for repos with 50-200 files
Full Large	20	Full repo scan for repos over 200 files

Credit Calculation Walkthrough

Admin sets base scan costs

In the AI Providers tab, admins configure the base credit cost for each scan type (PR Diff = 2, Full Small = 5, etc.).

Admin sets tier multipliers

Each scan tier has a multiplier: Quick = 1x, Deep = 3x, Max = 10x. These are also configurable.

User creates a scan

The system calculates the total cost:

text

total_credits = base_cost x tier_multiplier

Example: Deep PR Diff scan
  = 2 (base) x 3 (deep multiplier)
  = 6 credits

System checks org credit balance

The system queries the org's current credit balance. If insufficient, the scan is rejected with an "insufficient_credits" error and the user is prompted to upgrade or purchase more credits.

Debit credits and start scan

Credits are debited atomically before the scan starts. If the scan fails, credits are refunded to the org balance.

AI feature credits (additional)

Some AI features (multi-model verification, repo profiling) consume additional credits based on the model_credit_costs table. These are deducted separately from the scan cost.

Custom API key bypass

If the org has configured their own AI API key via the org_ai_api_key table, credit deduction is skipped entirely. The scan uses the org's key and they pay the provider directly.

Model Credit Costs

Model	Credits per Call	Notes
Haiku 3.5	1	Cheapest, used for Quick tier
Gemini 2.5 Flash	1	Google equivalent to Haiku
GPT-4.1 Mini	2	OpenAI lightweight model
Sonnet 4.6	3	Default for Deep tier
GPT-5.4	4	OpenAI mid-tier
Gemini 2.5 Pro	4	Google mid-tier
GPT-5.4 Turbo	6	OpenAI high-tier
Opus 4.6	8	Most capable, used for Max tier

4. Fallback System

Auto-fallback ensures scans complete even when a provider is experiencing an outage. The system monitors provider status and automatically routes requests to the next available provider.

How Auto-Fallback Works

Before each AI call, the system checks the provider's operational status

Status is fetched from Statuspage.io APIs for each provider (Anthropic, OpenAI, Google)

Only a major_outage triggers fallback — degraded performance does not

The system walks the provider_priority list and picks the first provider that is operational and has a configured API key

If all providers are in major_outage, the scan is queued for retry with exponential backoff (max 3 retries over 30 minutes)

If retries are exhausted, the scan fails with a provider_unavailable error and the user is notified

Fallback Flow

text

Request comes in
  |
  v
Check provider_priority[0] (e.g., "anthropic")
  |-- Has API key? No  --> skip to next
  |-- Major outage?  Yes --> skip to next
  |-- Available? Yes --> use this provider
  |
  v
Check provider_priority[1] (e.g., "openai")
  |-- Same checks...
  |
  v
Check provider_priority[2] (e.g., "google")
  |-- Same checks...
  |
  v
All providers down --> queue for retry
  |-- Retry 1: after 5 minutes
  |-- Retry 2: after 15 minutes
  |-- Retry 3: after 30 minutes
  |-- Exhausted: fail with provider_unavailable

Audit Logging

Every fallback event is logged to the ai_provider_audit_log table with the following fields:

Field	Type	Description
event_type	enum	fallback_triggered, provider_restored, all_providers_down
from_provider	text	Provider that was unavailable
to_provider	text	Provider that handled the request (null if all down)
reason	text	Status check result (e.g., major_outage)
scan_id	uuid	The scan that triggered the fallback
created_at	timestamptz	When the event occurred

5. User API Key Overrides

Organizations on Pro and Enterprise plans can bring their own AI provider API keys. This bypasses the BeforeMerge credit system entirely — the org pays the provider directly at their own negotiated rates.

How It Works

Org admin navigates to Settings > AI Provider Keys

Enters their API key for one or more providers (Anthropic, OpenAI, Google)

Key is validated with a lightweight test call before being saved

Key is encrypted at rest and stored in the org_ai_api_key table

When a scan runs, the system checks for an org key before using the platform key

If an org key exists for the selected provider, credits are not deducted

Storage & Security

Aspect	Detail
Table	org_ai_api_key
Encryption	AES-256 at rest, decrypted only at call time
Display	Keys are masked in the UI — only the last 4 characters shown (e.g., ...xK9m)
Access	Only org admins can add, view (masked), or delete keys
RLS	Row-level security ensures keys are scoped to the owning org
Deletion	Hard delete — key is removed from the database entirely, not soft-deleted

Important: Org API keys are never logged, never included in error reports, and never sent to any external service other than the AI provider itself. The full key is only held in memory during the API call.

6. AI Review Engine

The AI review engine analyzes code changes and produces structured inline comments with severity ratings, category tags, and actionable fix suggestions.

How PR Reviews Work

text

1. PR scan completes (diff extracted from GitHub)
2. Diff is chunked into file-level segments
3. Each chunk is sent to the AI with:
   - The repo profile (if available)
   - Applicable BeforeMerge rules
   - OWASP mappings for security-relevant files
   - Best practices for the detected language
4. AI returns structured findings:
   {
     file: "src/actions/user.ts",
     line: 42,
     severity: "critical",
     category: "security",
     title: "SQL injection in server action",
     description: "User input is interpolated directly...",
     suggestion: "Use parameterized queries or the Supabase query builder",
     rule_ref: "avoid-raw-sql-in-server-actions"
   }
5. Findings are deduplicated and posted as inline PR comments

Severity Tiers

Severity	Color	Meaning	Action Required
Critical	Red	Security vulnerability, data loss risk, or crash	Must fix before merge
Major	Orange	Significant bug, performance issue, or anti-pattern	Should fix before merge
Minor	Yellow	Code smell, readability issue, or missing optimization	Fix when convenient
Trivial	Gray	Style nit, naming suggestion, or documentation gap	Optional improvement

8 Review Categories

Security

Performance

Architecture

Quality

Reliability

Accessibility

Testing

Documentation

Triggering Reviews

Trigger	Plans	How
Auto on PR scan	Pro, Enterprise	Enabled by default when a PR scan completes. Can be disabled per-project.
Manual trigger	All plans	Click "Run AI Review" on any scan detail page to generate review comments on demand.
Webhook auto-scan	Pro, Enterprise	When a PR webhook fires, the system auto-scans and auto-reviews in sequence.

Multi-Model Verification

When two or more providers are configured, Enterprise plans can enable multi-model verification. Each finding is verified by a second model from a different provider. Findings confirmed by both models are marked as "verified" with higher confidence. This costs additional credits based on the second model's model_credit_costs entry.

7. Repo Profile Learning

Repo profiles let the AI tailor reviews to a specific codebase. The system automatically detects frameworks, conventions, and architecture patterns, then uses this context in every review.

What Gets Detected

Category	Examples
Frameworks	Next.js (App Router vs Pages), React, Vue, Express, NestJS, Django, Rails
Languages	TypeScript, JavaScript, Python, Go, Rust, Java, C#
Conventions	Naming patterns, file structure, import style, monorepo vs single-package
Architecture	Server Components vs Client Components, API routes vs Server Actions, microservices vs monolith
Dependencies	ORM (Prisma, Drizzle, Supabase), state management, testing framework
Config patterns	ESLint rules, TypeScript strictness, CI/CD pipeline type

Update Frequency

Profiles are rebuilt automatically under two conditions:

Every 10 scans— after the 10th scan on a repo, the profile is regenerated from the latest codebase snapshot

Every 7 days— if a scan runs and the profile is older than 7 days, it is refreshed regardless of scan count

How AI Uses Profiles

The repo profile is injected into the AI system prompt for every review. This enables the AI to:

Apply framework-specific best practices (e.g., Next.js App Router patterns instead of Pages Router)

Flag architectural violations specific to the repo's structure

Suggest fixes using the project's actual ORM, testing framework, and conventions

Avoid false positives from rules that don't apply to the detected stack

8. Best Practices & OWASP

BeforeMerge ships with 108 built-in best practices across 9 languages, plus full OWASP Top 10 mapping. These are used by the AI review engine to ground findings in established standards.

Language Coverage (108 Best Practices)

Language	Best Practices	Key Areas
TypeScript / JavaScript	18	Type safety, async patterns, module boundaries
Python	14	Type hints, virtual envs, exception handling
Go	12	Error handling, goroutine safety, interfaces
Rust	12	Ownership, unsafe blocks, error propagation
Java	12	Null safety, streams, concurrency
C#	10	Async/await, LINQ, dependency injection
Ruby	10	Method visibility, gems, testing patterns
PHP	10	Type declarations, PSR standards, SQL safety
Swift	10	Optionals, protocol-oriented design, memory management

OWASP Top 10 Mapping

Each security-related best practice is mapped to one or more OWASP Top 10 categories. When the AI identifies a security finding, it references the relevant OWASP category in the finding output.

OWASP ID	Category	Mapped Best Practices
A01:2021	Broken Access Control	RLS policies, authorization checks, RBAC patterns
A02:2021	Cryptographic Failures	Key management, TLS enforcement, hashing
A03:2021	Injection	SQL injection, XSS, command injection, template injection
A04:2021	Insecure Design	Input validation, trust boundaries, threat modeling
A05:2021	Security Misconfiguration	Default credentials, verbose errors, CORS
A06:2021	Vulnerable Components	Dependency scanning, version pinning, CVE checking
A07:2021	Auth Failures	Session management, password policies, MFA
A08:2021	Data Integrity Failures	CI/CD security, deserialization, signature verification
A09:2021	Logging Failures	Audit trails, log injection, monitoring gaps
A10:2021	SSRF	URL validation, allowlists, internal network protection

How AI References Best Practices

During a review, the AI receives the best practices relevant to the detected languages in the changeset. Each finding can include a rule_ref and owasp_ref field linking back to the specific best practice and OWASP category. This provides auditable, standards-backed justification for every finding.

9. AI Testing Dashboard

The AI Testing Dashboard is available to staff at Admin → AI Testing tab. It provides a sandbox for testing prompts, comparing models, and validating review quality before rolling out changes.

6 Testing Sections

1. Prompt Playground

Test system prompts and user prompts against any configured model. Adjust temperature, max tokens, and see raw responses. Useful for iterating on review prompt templates.

2. Model Comparison

Run the same prompt across all configured providers simultaneously. Side-by-side output comparison with response time, token usage, and cost metrics.

3. Review Simulation

Paste a code diff and run it through the full review pipeline. See the exact findings that would be generated, including severity, category, and rule references.

4. Best Practice Validation

Test individual best practices against code snippets. Verify that the AI correctly identifies violations and produces accurate fix suggestions.

5. Repo Profile Inspector

View and manually edit repo profiles. Test how profile changes affect review output by re-running a previous scan with a modified profile.

6. Cost Calculator

Estimate credit costs for different scan configurations. Input scan type, tier, number of files, and see the breakdown of base cost, tier multiplier, and model credits.

Testing New Prompts

When iterating on the review system prompt:

Use the Prompt Playground to test changes against a known-good diff

Run Model Comparison to ensure the prompt works well across all providers

Use Review Simulation with 3-5 representative diffs covering different languages and issue types

Check that findings include correct severity, category, and rule_ref values

Only deploy the new prompt after passing all validation checks

Comparing Models

Use Model Comparison when evaluating a new model release (e.g., a new Sonnet version). Key metrics to compare: finding accuracy, false positive rate, response latency, and token cost. The dashboard highlights differences in red/green for easy scanning.

10. Configuration Reference

All admin-configurable AI settings in one table. Every setting is editable from the Admin panel without redeploying.

Setting	Location	Default	Description
Provider priority	AI Providers tab	`anthropic, openai, google`	Order in which providers are tried. First available provider is used.
Default models	AI Providers tab	`Sonnet 4.6, GPT-5.4, Gemini 2.5 Pro`	Default model per provider when no tier-specific model is configured.
Tier model mapping	AI Providers tab	`Haiku / Sonnet / Opus`	Which model to use for Quick, Deep, and Max tiers respectively.
Model credit costs	AI Providers tab	`1-8 per model`	Credits consumed per AI call for each model. Used for additional AI features.
Scan credit costs	AI Providers tab	`2-20 per scan type`	Base credit cost for each scan type (PR Diff, Full Small/Medium/Large).
Tier multipliers	AI Providers tab	`1x / 3x / 10x`	Credit multiplier for Quick, Deep, and Max tiers.
Auto-fallback	AI Providers tab	`Enabled`	Automatically switch to next provider on major_outage.
Multi-model verification	AI Providers tab	`Disabled`	Verify findings with a second model. Enterprise only.
Auto-review on PR scan	AI Providers tab	`Enabled (Pro/Enterprise)`	Automatically run AI review after PR scan completes.
Repo profile auto-update	AI Providers tab	`Every 10 scans or 7 days`	Frequency of automatic repo profile regeneration.
Statuspage polling interval	AI Providers tab	`60 seconds`	How often provider status is checked via Statuspage.io APIs.
Fallback retry limit	AI Providers tab	`3 retries`	Maximum retry attempts when all providers are down.