StaffAdmin Documentation

AI System Configuration

Comprehensive staff guide for configuring the BeforeMerge AI system. Covers provider setup, scan tiers, credit calculations, automatic fallback, user API key overrides, the AI review engine, repo profile learning, best practices mapping, and the AI testing dashboard.

Location: Admin → AI Providers tab·Access: is_staff = true

1. AI Provider Configuration

BeforeMerge supports three AI providers out of the box. Each provider is configured via environment variables and managed through the Admin AI Providers tab.

Environment Variables

Set these in your deployment environment. All three are optional — the system works with any combination, but at least one must be configured.

.env (server-side only)
# Anthropic — primary provider
ANTHROPIC_API_KEY=sk-ant-api03-...

# OpenAI — secondary provider
OPENAI_API_KEY=sk-proj-...

# Google AI — tertiary provider
GOOGLE_AI_API_KEY=AIza...

Available Providers & Models

ProviderDefault ModelAvailable ModelsPricing (per 1M tokens)
AnthropicClaude Sonnet 4.6Haiku 3.5, Sonnet 4.6, Opus 4.6$0.25 / $1.25 (in/out) to $15 / $75
OpenAIGPT-5.4GPT-4.1 Mini, GPT-5.4, GPT-5.4 Turbo$0.40 / $1.60 to $10 / $30
GoogleGemini 2.5 ProGemini 2.5 Flash, Gemini 2.5 Pro$0.15 / $0.60 to $3.50 / $10.50

Provider Priority System

The system tries providers in priority order. If the first provider fails or is unavailable, it automatically falls back to the next. Priority is configurable in the Admin AI Providers tab.

Default priority order
// Configured in admin_ai_provider_config table
provider_priority: ["anthropic", "openai", "google"]

// The system resolves the active provider:
// 1. Check provider_priority order
// 2. Skip any provider without a configured API key
// 3. Skip any provider currently in major_outage
// 4. Use the first available provider

Setting Default Models per Provider

Each provider has a default model used when no specific model is requested. Admins can change the default model in the AI Providers tab without redeploying. The tier-to-model mapping (Quick/Deep/Max) is configured separately.

2. Scan Tiers

Every scan runs at one of three tiers, each trading speed for depth. The tier determines which AI model is used, how many passes are made over the code, and the credit cost.

Quick (1x multiplier)

Fast, single-pass scan using a lightweight model. Best for quick PR checks and real-time feedback. Catches obvious issues like syntax errors, basic security violations, and style problems.

Default model: Claude Haiku 3.5·Typical latency: 5-15s

Deep (3x multiplier)

Multi-pass scan using a mid-tier model. Performs contextual analysis including cross-file dependency checks, security pattern matching against OWASP rules, and architecture validation. Recommended for most PR reviews.

Default model: Claude Sonnet 4.6·Typical latency: 30-90s

Max (10x multiplier)

Comprehensive multi-pass scan using the most capable model. Performs deep architectural analysis, identifies subtle security vulnerabilities, checks for performance anti-patterns, verifies best practices across the entire changeset, and cross-references with the repo profile. Ideal for release branches and critical PRs.

Default model: Claude Opus 4.6·Typical latency: 2-5 min

Tier Model Mapping

Admins can reassign which model is used at each tier. This is useful when a new model is released or when cost optimization is needed.

TierDefault ModelMultiplierUse Case
QuickHaiku 3.51xFast PR checks, lint-level issues
DeepSonnet 4.63xStandard PR reviews, security checks
MaxOpus 4.610xRelease branches, audits, critical PRs

3. Credit System Flow

Credits are the internal currency for scan operations. The cost of a scan is calculated from two admin-configurable values: the base scan cost and the tier multiplier.

Base Scan Costs (Admin-Configurable)

Scan TypeBase Cost (credits)Description
PR Diff2Scan only the changed files in a pull request
Full Small5Full repo scan for repos under 50 files
Full Medium10Full repo scan for repos with 50-200 files
Full Large20Full repo scan for repos over 200 files

Credit Calculation Walkthrough

1

Admin sets base scan costs

In the AI Providers tab, admins configure the base credit cost for each scan type (PR Diff = 2, Full Small = 5, etc.).

2

Admin sets tier multipliers

Each scan tier has a multiplier: Quick = 1x, Deep = 3x, Max = 10x. These are also configurable.

3

User creates a scan

The system calculates the total cost:

text
total_credits = base_cost x tier_multiplier

Example: Deep PR Diff scan
  = 2 (base) x 3 (deep multiplier)
  = 6 credits
4

System checks org credit balance

The system queries the org's current credit balance. If insufficient, the scan is rejected with an "insufficient_credits" error and the user is prompted to upgrade or purchase more credits.

5

Debit credits and start scan

Credits are debited atomically before the scan starts. If the scan fails, credits are refunded to the org balance.

6

AI feature credits (additional)

Some AI features (multi-model verification, repo profiling) consume additional credits based on the model_credit_costs table. These are deducted separately from the scan cost.

7

Custom API key bypass

If the org has configured their own AI API key via the org_ai_api_key table, credit deduction is skipped entirely. The scan uses the org's key and they pay the provider directly.

Model Credit Costs

ModelCredits per CallNotes
Haiku 3.51Cheapest, used for Quick tier
Gemini 2.5 Flash1Google equivalent to Haiku
GPT-4.1 Mini2OpenAI lightweight model
Sonnet 4.63Default for Deep tier
GPT-5.44OpenAI mid-tier
Gemini 2.5 Pro4Google mid-tier
GPT-5.4 Turbo6OpenAI high-tier
Opus 4.68Most capable, used for Max tier

4. Fallback System

Auto-fallback ensures scans complete even when a provider is experiencing an outage. The system monitors provider status and automatically routes requests to the next available provider.

How Auto-Fallback Works

Before each AI call, the system checks the provider's operational status
Status is fetched from Statuspage.io APIs for each provider (Anthropic, OpenAI, Google)
Only a major_outage triggers fallback — degraded performance does not
The system walks the provider_priority list and picks the first provider that is operational and has a configured API key
If all providers are in major_outage, the scan is queued for retry with exponential backoff (max 3 retries over 30 minutes)
If retries are exhausted, the scan fails with a provider_unavailable error and the user is notified

Fallback Flow

text
Request comes in
  |
  v
Check provider_priority[0] (e.g., "anthropic")
  |-- Has API key? No  --> skip to next
  |-- Major outage?  Yes --> skip to next
  |-- Available? Yes --> use this provider
  |
  v
Check provider_priority[1] (e.g., "openai")
  |-- Same checks...
  |
  v
Check provider_priority[2] (e.g., "google")
  |-- Same checks...
  |
  v
All providers down --> queue for retry
  |-- Retry 1: after 5 minutes
  |-- Retry 2: after 15 minutes
  |-- Retry 3: after 30 minutes
  |-- Exhausted: fail with provider_unavailable

Audit Logging

Every fallback event is logged to the ai_provider_audit_log table with the following fields:

FieldTypeDescription
event_typeenumfallback_triggered, provider_restored, all_providers_down
from_providertextProvider that was unavailable
to_providertextProvider that handled the request (null if all down)
reasontextStatus check result (e.g., major_outage)
scan_iduuidThe scan that triggered the fallback
created_attimestamptzWhen the event occurred

5. User API Key Overrides

Organizations on Pro and Enterprise plans can bring their own AI provider API keys. This bypasses the BeforeMerge credit system entirely — the org pays the provider directly at their own negotiated rates.

How It Works

Org admin navigates to Settings > AI Provider Keys
Enters their API key for one or more providers (Anthropic, OpenAI, Google)
Key is validated with a lightweight test call before being saved
Key is encrypted at rest and stored in the org_ai_api_key table
When a scan runs, the system checks for an org key before using the platform key
If an org key exists for the selected provider, credits are not deducted

Storage & Security

AspectDetail
Tableorg_ai_api_key
EncryptionAES-256 at rest, decrypted only at call time
DisplayKeys are masked in the UI — only the last 4 characters shown (e.g., ...xK9m)
AccessOnly org admins can add, view (masked), or delete keys
RLSRow-level security ensures keys are scoped to the owning org
DeletionHard delete — key is removed from the database entirely, not soft-deleted
Important: Org API keys are never logged, never included in error reports, and never sent to any external service other than the AI provider itself. The full key is only held in memory during the API call.

6. AI Review Engine

The AI review engine analyzes code changes and produces structured inline comments with severity ratings, category tags, and actionable fix suggestions.

How PR Reviews Work

text
1. PR scan completes (diff extracted from GitHub)
2. Diff is chunked into file-level segments
3. Each chunk is sent to the AI with:
   - The repo profile (if available)
   - Applicable BeforeMerge rules
   - OWASP mappings for security-relevant files
   - Best practices for the detected language
4. AI returns structured findings:
   {
     file: "src/actions/user.ts",
     line: 42,
     severity: "critical",
     category: "security",
     title: "SQL injection in server action",
     description: "User input is interpolated directly...",
     suggestion: "Use parameterized queries or the Supabase query builder",
     rule_ref: "avoid-raw-sql-in-server-actions"
   }
5. Findings are deduplicated and posted as inline PR comments

Severity Tiers

SeverityColorMeaningAction Required
CriticalRedSecurity vulnerability, data loss risk, or crashMust fix before merge
MajorOrangeSignificant bug, performance issue, or anti-patternShould fix before merge
MinorYellowCode smell, readability issue, or missing optimizationFix when convenient
TrivialGrayStyle nit, naming suggestion, or documentation gapOptional improvement

8 Review Categories

Security
Performance
Architecture
Quality
Reliability
Accessibility
Testing
Documentation

Triggering Reviews

TriggerPlansHow
Auto on PR scanPro, EnterpriseEnabled by default when a PR scan completes. Can be disabled per-project.
Manual triggerAll plansClick "Run AI Review" on any scan detail page to generate review comments on demand.
Webhook auto-scanPro, EnterpriseWhen a PR webhook fires, the system auto-scans and auto-reviews in sequence.

Multi-Model Verification

When two or more providers are configured, Enterprise plans can enable multi-model verification. Each finding is verified by a second model from a different provider. Findings confirmed by both models are marked as "verified" with higher confidence. This costs additional credits based on the second model's model_credit_costs entry.

7. Repo Profile Learning

Repo profiles let the AI tailor reviews to a specific codebase. The system automatically detects frameworks, conventions, and architecture patterns, then uses this context in every review.

What Gets Detected

CategoryExamples
FrameworksNext.js (App Router vs Pages), React, Vue, Express, NestJS, Django, Rails
LanguagesTypeScript, JavaScript, Python, Go, Rust, Java, C#
ConventionsNaming patterns, file structure, import style, monorepo vs single-package
ArchitectureServer Components vs Client Components, API routes vs Server Actions, microservices vs monolith
DependenciesORM (Prisma, Drizzle, Supabase), state management, testing framework
Config patternsESLint rules, TypeScript strictness, CI/CD pipeline type

Update Frequency

Profiles are rebuilt automatically under two conditions:

Every 10 scans— after the 10th scan on a repo, the profile is regenerated from the latest codebase snapshot
Every 7 days— if a scan runs and the profile is older than 7 days, it is refreshed regardless of scan count

How AI Uses Profiles

The repo profile is injected into the AI system prompt for every review. This enables the AI to:

Apply framework-specific best practices (e.g., Next.js App Router patterns instead of Pages Router)
Flag architectural violations specific to the repo's structure
Suggest fixes using the project's actual ORM, testing framework, and conventions
Avoid false positives from rules that don't apply to the detected stack

8. Best Practices & OWASP

BeforeMerge ships with 108 built-in best practices across 9 languages, plus full OWASP Top 10 mapping. These are used by the AI review engine to ground findings in established standards.

Language Coverage (108 Best Practices)

LanguageBest PracticesKey Areas
TypeScript / JavaScript18Type safety, async patterns, module boundaries
Python14Type hints, virtual envs, exception handling
Go12Error handling, goroutine safety, interfaces
Rust12Ownership, unsafe blocks, error propagation
Java12Null safety, streams, concurrency
C#10Async/await, LINQ, dependency injection
Ruby10Method visibility, gems, testing patterns
PHP10Type declarations, PSR standards, SQL safety
Swift10Optionals, protocol-oriented design, memory management

OWASP Top 10 Mapping

Each security-related best practice is mapped to one or more OWASP Top 10 categories. When the AI identifies a security finding, it references the relevant OWASP category in the finding output.

OWASP IDCategoryMapped Best Practices
A01:2021Broken Access ControlRLS policies, authorization checks, RBAC patterns
A02:2021Cryptographic FailuresKey management, TLS enforcement, hashing
A03:2021InjectionSQL injection, XSS, command injection, template injection
A04:2021Insecure DesignInput validation, trust boundaries, threat modeling
A05:2021Security MisconfigurationDefault credentials, verbose errors, CORS
A06:2021Vulnerable ComponentsDependency scanning, version pinning, CVE checking
A07:2021Auth FailuresSession management, password policies, MFA
A08:2021Data Integrity FailuresCI/CD security, deserialization, signature verification
A09:2021Logging FailuresAudit trails, log injection, monitoring gaps
A10:2021SSRFURL validation, allowlists, internal network protection

How AI References Best Practices

During a review, the AI receives the best practices relevant to the detected languages in the changeset. Each finding can include a rule_ref and owasp_ref field linking back to the specific best practice and OWASP category. This provides auditable, standards-backed justification for every finding.

9. AI Testing Dashboard

The AI Testing Dashboard is available to staff at Admin → AI Testing tab. It provides a sandbox for testing prompts, comparing models, and validating review quality before rolling out changes.

6 Testing Sections

1. Prompt Playground

Test system prompts and user prompts against any configured model. Adjust temperature, max tokens, and see raw responses. Useful for iterating on review prompt templates.

2. Model Comparison

Run the same prompt across all configured providers simultaneously. Side-by-side output comparison with response time, token usage, and cost metrics.

3. Review Simulation

Paste a code diff and run it through the full review pipeline. See the exact findings that would be generated, including severity, category, and rule references.

4. Best Practice Validation

Test individual best practices against code snippets. Verify that the AI correctly identifies violations and produces accurate fix suggestions.

5. Repo Profile Inspector

View and manually edit repo profiles. Test how profile changes affect review output by re-running a previous scan with a modified profile.

6. Cost Calculator

Estimate credit costs for different scan configurations. Input scan type, tier, number of files, and see the breakdown of base cost, tier multiplier, and model credits.

Testing New Prompts

When iterating on the review system prompt:

Use the Prompt Playground to test changes against a known-good diff
Run Model Comparison to ensure the prompt works well across all providers
Use Review Simulation with 3-5 representative diffs covering different languages and issue types
Check that findings include correct severity, category, and rule_ref values
Only deploy the new prompt after passing all validation checks

Comparing Models

Use Model Comparison when evaluating a new model release (e.g., a new Sonnet version). Key metrics to compare: finding accuracy, false positive rate, response latency, and token cost. The dashboard highlights differences in red/green for easy scanning.

10. Configuration Reference

All admin-configurable AI settings in one table. Every setting is editable from the Admin panel without redeploying.

SettingLocationDefaultDescription
Provider priorityAI Providers tabanthropic, openai, googleOrder in which providers are tried. First available provider is used.
Default modelsAI Providers tabSonnet 4.6, GPT-5.4, Gemini 2.5 ProDefault model per provider when no tier-specific model is configured.
Tier model mappingAI Providers tabHaiku / Sonnet / OpusWhich model to use for Quick, Deep, and Max tiers respectively.
Model credit costsAI Providers tab1-8 per modelCredits consumed per AI call for each model. Used for additional AI features.
Scan credit costsAI Providers tab2-20 per scan typeBase credit cost for each scan type (PR Diff, Full Small/Medium/Large).
Tier multipliersAI Providers tab1x / 3x / 10xCredit multiplier for Quick, Deep, and Max tiers.
Auto-fallbackAI Providers tabEnabledAutomatically switch to next provider on major_outage.
Multi-model verificationAI Providers tabDisabledVerify findings with a second model. Enterprise only.
Auto-review on PR scanAI Providers tabEnabled (Pro/Enterprise)Automatically run AI review after PR scan completes.
Repo profile auto-updateAI Providers tabEvery 10 scans or 7 daysFrequency of automatic repo profile regeneration.
Statuspage polling intervalAI Providers tab60 secondsHow often provider status is checked via Statuspage.io APIs.
Fallback retry limitAI Providers tab3 retriesMaximum retry attempts when all providers are down.