AI System Configuration
Comprehensive staff guide for configuring the BeforeMerge AI system. Covers provider setup, scan tiers, credit calculations, automatic fallback, user API key overrides, the AI review engine, repo profile learning, best practices mapping, and the AI testing dashboard.
Admin → AI Providers tab·Access: is_staff = true1. AI Provider Configuration
BeforeMerge supports three AI providers out of the box. Each provider is configured via environment variables and managed through the Admin AI Providers tab.
Environment Variables
Set these in your deployment environment. All three are optional — the system works with any combination, but at least one must be configured.
# Anthropic — primary provider
ANTHROPIC_API_KEY=sk-ant-api03-...
# OpenAI — secondary provider
OPENAI_API_KEY=sk-proj-...
# Google AI — tertiary provider
GOOGLE_AI_API_KEY=AIza...Available Providers & Models
| Provider | Default Model | Available Models | Pricing (per 1M tokens) |
|---|---|---|---|
| Anthropic | Claude Sonnet 4.6 | Haiku 3.5, Sonnet 4.6, Opus 4.6 | $0.25 / $1.25 (in/out) to $15 / $75 |
| OpenAI | GPT-5.4 | GPT-4.1 Mini, GPT-5.4, GPT-5.4 Turbo | $0.40 / $1.60 to $10 / $30 |
| Gemini 2.5 Pro | Gemini 2.5 Flash, Gemini 2.5 Pro | $0.15 / $0.60 to $3.50 / $10.50 |
Provider Priority System
The system tries providers in priority order. If the first provider fails or is unavailable, it automatically falls back to the next. Priority is configurable in the Admin AI Providers tab.
// Configured in admin_ai_provider_config table
provider_priority: ["anthropic", "openai", "google"]
// The system resolves the active provider:
// 1. Check provider_priority order
// 2. Skip any provider without a configured API key
// 3. Skip any provider currently in major_outage
// 4. Use the first available providerSetting Default Models per Provider
Each provider has a default model used when no specific model is requested. Admins can change the default model in the AI Providers tab without redeploying. The tier-to-model mapping (Quick/Deep/Max) is configured separately.
2. Scan Tiers
Every scan runs at one of three tiers, each trading speed for depth. The tier determines which AI model is used, how many passes are made over the code, and the credit cost.
Quick (1x multiplier)
Fast, single-pass scan using a lightweight model. Best for quick PR checks and real-time feedback. Catches obvious issues like syntax errors, basic security violations, and style problems.
Claude Haiku 3.5·Typical latency: 5-15sDeep (3x multiplier)
Multi-pass scan using a mid-tier model. Performs contextual analysis including cross-file dependency checks, security pattern matching against OWASP rules, and architecture validation. Recommended for most PR reviews.
Claude Sonnet 4.6·Typical latency: 30-90sMax (10x multiplier)
Comprehensive multi-pass scan using the most capable model. Performs deep architectural analysis, identifies subtle security vulnerabilities, checks for performance anti-patterns, verifies best practices across the entire changeset, and cross-references with the repo profile. Ideal for release branches and critical PRs.
Claude Opus 4.6·Typical latency: 2-5 minTier Model Mapping
Admins can reassign which model is used at each tier. This is useful when a new model is released or when cost optimization is needed.
| Tier | Default Model | Multiplier | Use Case |
|---|---|---|---|
| Quick | Haiku 3.5 | 1x | Fast PR checks, lint-level issues |
| Deep | Sonnet 4.6 | 3x | Standard PR reviews, security checks |
| Max | Opus 4.6 | 10x | Release branches, audits, critical PRs |
3. Credit System Flow
Credits are the internal currency for scan operations. The cost of a scan is calculated from two admin-configurable values: the base scan cost and the tier multiplier.
Base Scan Costs (Admin-Configurable)
| Scan Type | Base Cost (credits) | Description |
|---|---|---|
| PR Diff | 2 | Scan only the changed files in a pull request |
| Full Small | 5 | Full repo scan for repos under 50 files |
| Full Medium | 10 | Full repo scan for repos with 50-200 files |
| Full Large | 20 | Full repo scan for repos over 200 files |
Credit Calculation Walkthrough
Admin sets base scan costs
In the AI Providers tab, admins configure the base credit cost for each scan type (PR Diff = 2, Full Small = 5, etc.).
Admin sets tier multipliers
Each scan tier has a multiplier: Quick = 1x, Deep = 3x, Max = 10x. These are also configurable.
User creates a scan
The system calculates the total cost:
total_credits = base_cost x tier_multiplier
Example: Deep PR Diff scan
= 2 (base) x 3 (deep multiplier)
= 6 creditsSystem checks org credit balance
The system queries the org's current credit balance. If insufficient, the scan is rejected with an "insufficient_credits" error and the user is prompted to upgrade or purchase more credits.
Debit credits and start scan
Credits are debited atomically before the scan starts. If the scan fails, credits are refunded to the org balance.
AI feature credits (additional)
Some AI features (multi-model verification, repo profiling) consume additional credits based on the model_credit_costs table. These are deducted separately from the scan cost.
Custom API key bypass
If the org has configured their own AI API key via the org_ai_api_key table, credit deduction is skipped entirely. The scan uses the org's key and they pay the provider directly.
Model Credit Costs
| Model | Credits per Call | Notes |
|---|---|---|
| Haiku 3.5 | 1 | Cheapest, used for Quick tier |
| Gemini 2.5 Flash | 1 | Google equivalent to Haiku |
| GPT-4.1 Mini | 2 | OpenAI lightweight model |
| Sonnet 4.6 | 3 | Default for Deep tier |
| GPT-5.4 | 4 | OpenAI mid-tier |
| Gemini 2.5 Pro | 4 | Google mid-tier |
| GPT-5.4 Turbo | 6 | OpenAI high-tier |
| Opus 4.6 | 8 | Most capable, used for Max tier |
4. Fallback System
Auto-fallback ensures scans complete even when a provider is experiencing an outage. The system monitors provider status and automatically routes requests to the next available provider.
How Auto-Fallback Works
Fallback Flow
Request comes in
|
v
Check provider_priority[0] (e.g., "anthropic")
|-- Has API key? No --> skip to next
|-- Major outage? Yes --> skip to next
|-- Available? Yes --> use this provider
|
v
Check provider_priority[1] (e.g., "openai")
|-- Same checks...
|
v
Check provider_priority[2] (e.g., "google")
|-- Same checks...
|
v
All providers down --> queue for retry
|-- Retry 1: after 5 minutes
|-- Retry 2: after 15 minutes
|-- Retry 3: after 30 minutes
|-- Exhausted: fail with provider_unavailableAudit Logging
Every fallback event is logged to the ai_provider_audit_log table with the following fields:
| Field | Type | Description |
|---|---|---|
| event_type | enum | fallback_triggered, provider_restored, all_providers_down |
| from_provider | text | Provider that was unavailable |
| to_provider | text | Provider that handled the request (null if all down) |
| reason | text | Status check result (e.g., major_outage) |
| scan_id | uuid | The scan that triggered the fallback |
| created_at | timestamptz | When the event occurred |
5. User API Key Overrides
Organizations on Pro and Enterprise plans can bring their own AI provider API keys. This bypasses the BeforeMerge credit system entirely — the org pays the provider directly at their own negotiated rates.
How It Works
Storage & Security
| Aspect | Detail |
|---|---|
| Table | org_ai_api_key |
| Encryption | AES-256 at rest, decrypted only at call time |
| Display | Keys are masked in the UI — only the last 4 characters shown (e.g., ...xK9m) |
| Access | Only org admins can add, view (masked), or delete keys |
| RLS | Row-level security ensures keys are scoped to the owning org |
| Deletion | Hard delete — key is removed from the database entirely, not soft-deleted |
6. AI Review Engine
The AI review engine analyzes code changes and produces structured inline comments with severity ratings, category tags, and actionable fix suggestions.
How PR Reviews Work
1. PR scan completes (diff extracted from GitHub)
2. Diff is chunked into file-level segments
3. Each chunk is sent to the AI with:
- The repo profile (if available)
- Applicable BeforeMerge rules
- OWASP mappings for security-relevant files
- Best practices for the detected language
4. AI returns structured findings:
{
file: "src/actions/user.ts",
line: 42,
severity: "critical",
category: "security",
title: "SQL injection in server action",
description: "User input is interpolated directly...",
suggestion: "Use parameterized queries or the Supabase query builder",
rule_ref: "avoid-raw-sql-in-server-actions"
}
5. Findings are deduplicated and posted as inline PR commentsSeverity Tiers
| Severity | Color | Meaning | Action Required |
|---|---|---|---|
| Critical | Red | Security vulnerability, data loss risk, or crash | Must fix before merge |
| Major | Orange | Significant bug, performance issue, or anti-pattern | Should fix before merge |
| Minor | Yellow | Code smell, readability issue, or missing optimization | Fix when convenient |
| Trivial | Gray | Style nit, naming suggestion, or documentation gap | Optional improvement |
8 Review Categories
Triggering Reviews
| Trigger | Plans | How |
|---|---|---|
| Auto on PR scan | Pro, Enterprise | Enabled by default when a PR scan completes. Can be disabled per-project. |
| Manual trigger | All plans | Click "Run AI Review" on any scan detail page to generate review comments on demand. |
| Webhook auto-scan | Pro, Enterprise | When a PR webhook fires, the system auto-scans and auto-reviews in sequence. |
Multi-Model Verification
When two or more providers are configured, Enterprise plans can enable multi-model verification. Each finding is verified by a second model from a different provider. Findings confirmed by both models are marked as "verified" with higher confidence. This costs additional credits based on the second model's model_credit_costs entry.
7. Repo Profile Learning
Repo profiles let the AI tailor reviews to a specific codebase. The system automatically detects frameworks, conventions, and architecture patterns, then uses this context in every review.
What Gets Detected
| Category | Examples |
|---|---|
| Frameworks | Next.js (App Router vs Pages), React, Vue, Express, NestJS, Django, Rails |
| Languages | TypeScript, JavaScript, Python, Go, Rust, Java, C# |
| Conventions | Naming patterns, file structure, import style, monorepo vs single-package |
| Architecture | Server Components vs Client Components, API routes vs Server Actions, microservices vs monolith |
| Dependencies | ORM (Prisma, Drizzle, Supabase), state management, testing framework |
| Config patterns | ESLint rules, TypeScript strictness, CI/CD pipeline type |
Update Frequency
Profiles are rebuilt automatically under two conditions:
How AI Uses Profiles
The repo profile is injected into the AI system prompt for every review. This enables the AI to:
8. Best Practices & OWASP
BeforeMerge ships with 108 built-in best practices across 9 languages, plus full OWASP Top 10 mapping. These are used by the AI review engine to ground findings in established standards.
Language Coverage (108 Best Practices)
| Language | Best Practices | Key Areas |
|---|---|---|
| TypeScript / JavaScript | 18 | Type safety, async patterns, module boundaries |
| Python | 14 | Type hints, virtual envs, exception handling |
| Go | 12 | Error handling, goroutine safety, interfaces |
| Rust | 12 | Ownership, unsafe blocks, error propagation |
| Java | 12 | Null safety, streams, concurrency |
| C# | 10 | Async/await, LINQ, dependency injection |
| Ruby | 10 | Method visibility, gems, testing patterns |
| PHP | 10 | Type declarations, PSR standards, SQL safety |
| Swift | 10 | Optionals, protocol-oriented design, memory management |
OWASP Top 10 Mapping
Each security-related best practice is mapped to one or more OWASP Top 10 categories. When the AI identifies a security finding, it references the relevant OWASP category in the finding output.
| OWASP ID | Category | Mapped Best Practices |
|---|---|---|
| A01:2021 | Broken Access Control | RLS policies, authorization checks, RBAC patterns |
| A02:2021 | Cryptographic Failures | Key management, TLS enforcement, hashing |
| A03:2021 | Injection | SQL injection, XSS, command injection, template injection |
| A04:2021 | Insecure Design | Input validation, trust boundaries, threat modeling |
| A05:2021 | Security Misconfiguration | Default credentials, verbose errors, CORS |
| A06:2021 | Vulnerable Components | Dependency scanning, version pinning, CVE checking |
| A07:2021 | Auth Failures | Session management, password policies, MFA |
| A08:2021 | Data Integrity Failures | CI/CD security, deserialization, signature verification |
| A09:2021 | Logging Failures | Audit trails, log injection, monitoring gaps |
| A10:2021 | SSRF | URL validation, allowlists, internal network protection |
How AI References Best Practices
During a review, the AI receives the best practices relevant to the detected languages in the changeset. Each finding can include a rule_ref and owasp_ref field linking back to the specific best practice and OWASP category. This provides auditable, standards-backed justification for every finding.
9. AI Testing Dashboard
The AI Testing Dashboard is available to staff at Admin → AI Testing tab. It provides a sandbox for testing prompts, comparing models, and validating review quality before rolling out changes.
6 Testing Sections
1. Prompt Playground
Test system prompts and user prompts against any configured model. Adjust temperature, max tokens, and see raw responses. Useful for iterating on review prompt templates.
2. Model Comparison
Run the same prompt across all configured providers simultaneously. Side-by-side output comparison with response time, token usage, and cost metrics.
3. Review Simulation
Paste a code diff and run it through the full review pipeline. See the exact findings that would be generated, including severity, category, and rule references.
4. Best Practice Validation
Test individual best practices against code snippets. Verify that the AI correctly identifies violations and produces accurate fix suggestions.
5. Repo Profile Inspector
View and manually edit repo profiles. Test how profile changes affect review output by re-running a previous scan with a modified profile.
6. Cost Calculator
Estimate credit costs for different scan configurations. Input scan type, tier, number of files, and see the breakdown of base cost, tier multiplier, and model credits.
Testing New Prompts
When iterating on the review system prompt:
Comparing Models
Use Model Comparison when evaluating a new model release (e.g., a new Sonnet version). Key metrics to compare: finding accuracy, false positive rate, response latency, and token cost. The dashboard highlights differences in red/green for easy scanning.
10. Configuration Reference
All admin-configurable AI settings in one table. Every setting is editable from the Admin panel without redeploying.
| Setting | Location | Default | Description |
|---|---|---|---|
| Provider priority | AI Providers tab | anthropic, openai, google | Order in which providers are tried. First available provider is used. |
| Default models | AI Providers tab | Sonnet 4.6, GPT-5.4, Gemini 2.5 Pro | Default model per provider when no tier-specific model is configured. |
| Tier model mapping | AI Providers tab | Haiku / Sonnet / Opus | Which model to use for Quick, Deep, and Max tiers respectively. |
| Model credit costs | AI Providers tab | 1-8 per model | Credits consumed per AI call for each model. Used for additional AI features. |
| Scan credit costs | AI Providers tab | 2-20 per scan type | Base credit cost for each scan type (PR Diff, Full Small/Medium/Large). |
| Tier multipliers | AI Providers tab | 1x / 3x / 10x | Credit multiplier for Quick, Deep, and Max tiers. |
| Auto-fallback | AI Providers tab | Enabled | Automatically switch to next provider on major_outage. |
| Multi-model verification | AI Providers tab | Disabled | Verify findings with a second model. Enterprise only. |
| Auto-review on PR scan | AI Providers tab | Enabled (Pro/Enterprise) | Automatically run AI review after PR scan completes. |
| Repo profile auto-update | AI Providers tab | Every 10 scans or 7 days | Frequency of automatic repo profile regeneration. |
| Statuspage polling interval | AI Providers tab | 60 seconds | How often provider status is checked via Statuspage.io APIs. |
| Fallback retry limit | AI Providers tab | 3 retries | Maximum retry attempts when all providers are down. |