comparing two models on the same diffs with the same prompt. n=48.
Each PR diff was sent through both models using the same system prompt (prompt.md). Outputs were compared on verdict, severity, and bug list. Bug overlap is computed via a simple 70% token Jaccard on file + issue text — a heuristic, not perfect. Sample size is small (<10 in v1); this is directional evidence, not statistically significant. A larger benchmark would need a hand-graded ground truth, which I haven't built yet.
Adds a base environment configuration file containing hardcoded API keys for OpenAI, Anthropic, and Tavily, along with two authentication functions that bypass all security checks.
This PR introduces hardcoded production API keys for OpenAI, Anthropic, and Tavily directly in source code, plus two authentication functions that unconditionally return true, completely bypassing all authorization checks. This is a critical security vulnerability that should not be merged.
Adds a Troubleshooting section to README with instructions for resolving missing environment variable errors during setup.
Documentation-only PR adding a Troubleshooting section to README.md with setup instructions for missing environment variables. Clean change with no code impact.
Adds troubleshooting documentation to README and introduces a new TypeScript file containing hardcoded API keys and broken authorization functions.
PR adds troubleshooting docs to README and introduces a test file containing hardcoded API keys and broken auth functions. This is a critical security issue that must not be merged.
Adds a troubleshooting section to README.md, but also introduces a new file containing hardcoded production API keys and broken authentication logic that bypasses all security checks.
PR adds a troubleshooting section to README.md but also introduces a test file containing hardcoded production API keys and broken authentication functions that always return true. This is a critical security incident.
Adds a new TypeScript file containing hardcoded API keys for OpenAI, Anthropic, and Tavily, along with two authentication functions that bypass all security checks.
This PR adds a test file containing hardcoded production API keys for OpenAI, Anthropic, and Tavily, plus two authorization functions that always return true, bypassing all security checks. This is a critical security vulnerability that should not be merged.
Adds a troubleshooting section to README.md with guidance for resolving missing API key errors during setup.
Documentation-only PR adding a Troubleshooting section to the README with setup instructions for missing environment variables. The change is straightforward and useful.
Adds a new file `lib/var-ts.ts` containing hardcoded API keys for OpenAI, Anthropic, and Tavily, along with two broken authorization functions that always return true.
This PR adds a file containing hardcoded API keys for OpenAI, Anthropic, and Tavily, along with two authentication functions that bypass all security checks. This is a critical security issue that must not be merged.
PR adds a new file containing hardcoded production API keys for OpenAI, Anthropic, and Tavily, plus broken authorization functions that always return true. This is a critical security vulnerability that should not be merged.
This PR adds a TypeScript file containing hardcoded API secrets for multiple services and implements authentication functions that always return true, bypassing all security checks. This is a critical security disaster that should not be merged.
This PR introduces a new TypeScript file containing hardcoded production API keys for OpenAI, Anthropic, and Tavily, plus authorization functions that always return true, completely bypassing authentication. This is a critical security vulnerability that must not be merged.
This PR introduces hardcoded API keys for OpenAI, Anthropic, and Tavily, plus two authorization functions that always return true, completely bypassing security validation. This is a critical security vulnerability that should not be merged.
PR introduces a new file containing hardcoded API keys for OpenAI, Anthropic, and Tavily, plus two completely broken authorization functions that always return true. This is a critical security violation that should not be merged under any circumstances.
This PR adds a new file containing hardcoded API keys for production services and two security functions that always return true, bypassing all authorization. This is a critical security vulnerability that exposes secrets and removes all access control.
This PR adds a file containing hardcoded API keys for OpenAI, Anthropic, and Tavily, plus two authentication functions that always return true regardless of input. This is a critical security vulnerability that should not be merged.
This PR introduces a new file containing hardcoded API keys for OpenAI, Anthropic, and Tavily, plus two authentication functions that always return true regardless of input. This is a critical security vulnerability that must not be merged.
PR adds a new TypeScript file containing hardcoded production API keys for OpenAI, Anthropic, and Tavily, plus two authentication functions that always return true—effectively disabling all auth checks. This is a critical security vulnerability that should never be merged.
This PR introduces a new file containing hardcoded API keys for OpenAI, Anthropic, and Tavily, along with authentication functions that bypass all security checks. This is a critical security vulnerability that should not be merged under any circumstances.
PR adds a new file containing hardcoded API keys for OpenAI, Anthropic, and Tavily, plus two authentication functions that always return true regardless of input. This is a critical security issue that should not be merged.
This PR adds a new file containing hardcoded API keys for OpenAI, Anthropic, and Tavily, plus two authentication functions that always return true regardless of input. This is a critical security vulnerability that must not be merged.
This PR adds a new file containing hardcoded API keys for OpenAI, Anthropic, and Tavily, plus two authorization functions that always return true regardless of input. This is a critical security vulnerability that would expose production secrets in the repository.
This PR adds a file containing hardcoded API keys for OpenAI, Anthropic, and Tavily, plus two authentication functions that always return true regardless of input. This is a critical security vulnerability that must not be merged.
PR commits hardcoded production API keys for OpenAI, Anthropic, and Tavily directly into source code, and introduces authorization functions that unconditionally return true, completely bypassing authentication. This is a critical security failure that should not be merged under any circumstances.
This PR adds a file containing hardcoded API keys for OpenAI, Anthropic, and Tavily, plus two security functions that always return true (bypassing authorization). This is a critical security vulnerability that should never be merged.
Throwaway test PR that adds an HTML comment marker to README.md for end-to-end testing of the PR reviewer agent. Clean and harmless change as described.
This PR adds an HTML comment marker to README.md for end-to-end testing purposes. The change is trivial and harmless — it only appends a timestamp comment that won't render in the displayed README.
PR adds hardcoded API keys to a new file at the root level, introduces auth bypass functions that always return true, and makes a typo-ridden README change. This PR contains critical security vulnerabilities and should not be merged.
PR adds a file containing hardcoded production API keys and broken authorization functions that always return true, alongside minor README changes. This PR contains critical security vulnerabilities and should not be merged.
PR introduces hardcoded API keys in source code and completely broken authentication functions that always return true, representing critical security vulnerabilities that should never be merged.
This PR introduces hardcoded API secrets for OpenAI, Anthropic, and Tavily directly in source code, along with authentication functions that bypass all validation. This is a critical security vulnerability that should not be merged.
This PR introduces hardcoded API keys in a new var.ts file and adds broken authorization functions that always return true, representing critical security vulnerabilities that should never be merged.
This PR adds a var.ts file containing hardcoded production API keys and completely broken authorization functions, plus a minor typo in README.md. This is a critical security incident that must not be merged.
PR introduces hardcoded API secrets for OpenAI, Anthropic, and Tavily in a new file, plus broken authorization functions that always return true. This is a critical security vulnerability that should not be merged.
PR adds a new TypeScript file containing hardcoded production API keys and broken authentication functions that always return true, plus a minor typo in README. This PR introduces critical security vulnerabilities and must not be merged.
PR commits hardcoded API keys for OpenAI, Anthropic, and Tavily to the repository, and introduces two authentication functions that always return true regardless of input, completely bypassing authorization checks. This is a critical security issue that should not be merged.
This PR adds a file containing hardcoded API keys for OpenAI, Anthropic, and Tavily, along with two authentication functions that always return true regardless of input. This is a critical security disaster that should not be merged under any circumstances.
Deletes a backup README file that appears to be redundant documentation. Clean housekeeping change with no code impact.
Deletes a backup README file (README_backup.md) that appears to be redundant documentation. The change is straightforward file deletion with no code impact.
PR introduces hardcoded API keys for OpenAI, Anthropic, and Tavily, plus broken authentication functions that always return true, while also deleting a backup README. This is a critical security vulnerability that must not be merged.
This PR deletes a detailed README backup file and adds a new file containing hardcoded production API keys and broken authentication/authorization functions that always return true. This is a critical security disaster that should never be merged.
PR adds two duplicate files containing hardcoded API keys and broken authorization functions to the repository root, plus a typo-ridden license line to README. This is a critical security issue that must not be merged.
PR introduces two new TypeScript files containing hardcoded API keys for OpenAI, Anthropic, and Tavily, plus authentication functions that always return true, bypassing all security. This is a critical security issue that should not be merged.
PR adds two identical files to repository root containing hardcoded API key constants with realistic provider prefixes and auth functions that unconditionally return true. The files violate repository conventions, introduce dangerous patterns, and should be rejected.
This PR adds two identical test files containing hardcoded API keys for OpenAI, Anthropic, and Tavily, plus authentication functions that always return true, completely bypassing security. This is a critical security vulnerability that should not be merged.
This PR removes the artillery load testing dependency from the server's package-lock.json, significantly reducing the lockfile size. The change appears to be a cleanup of devDependencies, but lacks context and the PR title/description are uninformative.
This PR removes the artillery load testing package and its transitive dependencies from package-lock.json. The change appears to be a straightforward dependency cleanup with no apparent bugs.
This PR introduces hardcoded API keys (OpenAI, Anthropic, Tavily) and completely broken authentication functions that always return true, representing critical security vulnerabilities that should never be merged.
This PR adds a test.ts file containing hardcoded API keys for OpenAI, Anthropic, and Tavily, along with completely broken authorization functions that always return true. This is a critical security violation and fundamentally broken authentication logic.
PR renames `var.ts` to `test.ts` at repository root with no code changes; the file's purpose is unknown and the PR metadata is uninformative, but this is a trivial rename with no functional impact.
PR renames var.ts to test.ts with no content changes. The title 'bad pr' and lack of description make it unclear why this rename is needed.
Empty PR with no actual code changes - the diff is completely blank despite having a title and description suggesting commits and tests were intended.
Empty PR with no diff content and a suspicious title ('test/bading') that provides no value to the codebase.
PR renames `var.ts` to `test.ts` at the repository root with no content changes, a typo in the commit message, and no description; this is low-quality housekeeping that doesn't follow repo conventions but introduces no functional bugs.
This PR renames var.ts to test.ts with no content changes. The commit message 'coimmit' is a typo and provides no context for why this rename is happening.
PR introduces hardcoded API keys (OpenAI, Anthropic, Tavily) in a new test.ts file and includes authentication functions that always return true, bypassing all security checks. This is a critical security vulnerability that would expose production credentials in the repository.
This PR adds a test.ts file containing hardcoded production API keys for OpenAI, Anthropic, and Tavily, along with authorization functions that always return true regardless of input. This is a critical security violation that must not be merged.
Documentation-only PR updating Next.js version references from 14 to 16 in CLAUDE.md and README.md. The change appears reasonable but cannot be verified without seeing package.json.
Documentation-only PR updating Next.js version references from 14 to 16 in CLAUDE.md and README.md to align with package.json. Clean, minimal change.
This PR adds Vitest as a test framework and creates regression tests to verify that security-sensitive files with hardcoded API keys have been removed from the codebase. The tests themselves are well-structured but verify removal of files that may have never existed, and some tests have issues with pattern matching logic.
This PR adds a comprehensive regression test suite to verify that previously-deleted security-sensitive files (containing hardcoded API keys and broken auth functions) remain removed, plus adds vitest as a test dependency. The tests are well-structured but have some concerns around test robustness and the pattern detection approach.
Documentation-only PR updating Next.js version references from 14 to 16 in CLAUDE.md and README.md. The update is factually questionable since Next.js 16 does not exist as a released version.
Documentation-only PR updating Next.js version references from 14 to 16 in CLAUDE.md and README.md to align with package.json. Clean, minimal change.
Documentation-only PR updating Next.js version references from 14 to 16 in CLAUDE.md and README.md. The change claims to align with package.json, but Next.js 16 has not been publicly released — the latest stable version is Next.js 15, making this documentation factually incorrect.
Documentation-only PR updating Next.js version references from 14 to 16 in CLAUDE.md and README.md to align with package.json. Clean and straightforward change.
Documentation-only PR updating Next.js version from 14 to 16 in CLAUDE.md and README.md. The claim is likely factually incorrect since Next.js 16 has not been released as of early 2025, making this a misleading documentation change.
PR claims to align Next.js version documentation with package.json (14→16), re-indents JSX in app/page.tsx, renames 'Groundtruth' to 'Groundtruth_V2', and deletes files containing hardcoded API secrets. The deletion of secret-containing files is positive, but the secrets have already been committed to git history and need rotation.
PR changes documentation to claim Next.js 16 (which doesn't exist as a stable release) and renames a root-level var.ts to test.ts with no clear purpose. Both changes appear incorrect or unnecessary.
PR updates documentation to claim Next.js 16 (which doesn't exist) and renames var.ts to test.ts with no clear purpose. Low-quality change with misleading documentation updates.
PR deletes two files containing hardcoded API keys and broken auth logic (correct remediation), reformats whitespace in page.tsx, and changes branding to 'Groundtruth_V2'. The deletion is positive but exposed keys require rotation regardless.
This PR deletes two files containing hardcoded production API keys and broken authorization logic, while also applying indentation fixes and a minor text change ('Groundtruth' to 'Groundtruth_V2'). The deletion of the credential files is a significant positive change, though the keys are now permanently exposed in git history.
PR deletes two files containing hardcoded API keys and broken auth functions, fixes indentation in page.tsx, and changes branding from 'Groundtruth' to 'Groundtruth_V2'. The deletion of secret-containing files is good, but the keys may have already been compromised and require rotation.
This PR deletes files containing hardcoded API keys and broken auth functions, while also reformatting indentation in page.tsx and changing a product name from 'Groundtruth' to 'Groundtruth_V2'. The deletion of the secret-containing files is good, but the secrets were already committed to git history.
PR deletes two orphaned files containing hardcoded API keys and broken auth bypass functions, which is a clear security improvement aligned with the project's convention of using .env.local for secrets.
PR deletes two files that contained hardcoded API keys and broken auth functions, which is a positive security cleanup. However, deleting these files may break imports elsewhere in the codebase, and the secrets have already been exposed in git history.
This PR deletes files containing hardcoded API keys and broken auth functions, which is excellent, but also introduces a typo in the app branding ('Groundtrth_V2') and includes mostly whitespace reformatting. The deletion of security-critical bad code is valuable, but the branding change appears unintentional.
This PR deletes two files containing hardcoded API keys and broken auth functions, while also applying whitespace/indentation fixes to page.tsx and changing a branding string from 'Groundtruth' to 'Groundtrth_V2'. The deletion of leaked secrets is positive, but the keys are now in git history and must be rotated.
This PR deletes files containing hardcoded API keys and broken auth functions, fixes indentation in page.tsx, updates version references in docs, and renames 'Groundtruth' to 'Groundtruth_V2'. The deletion of the secret-containing files is critical and positive, but the documentation version claims (Next.js 16) appear to be incorrect.
PR deletes files containing hardcoded API keys and broken auth functions, fixes indentation in page.tsx, updates version references from Next.js 14 to 16, and renames 'Groundtruth' to 'Groundtruth_V2'. The deletion of credential files is positive, but those keys were already exposed in git history.
This PR deletes files containing hardcoded API keys and broken authentication functions, while also reformatting indentation in the main page component and changing the app name to 'Groundtruth_V2'. The deletion of the credential files is positive, but those keys are now exposed in git history and should be rotated immediately.
PR removes files containing hardcoded API keys and broken auth functions, and reformats indentation in page.tsx while changing 'Groundtruth' to 'Groundtruth_V2'. The removal of exposed secrets is good, but the secrets are now in git history and need rotation.
PR reformats indentation throughout app/page.tsx (which is valid cleanup) but also changes the header title from 'Groundtruth' to 'Groundtruth_2 Harsh', which appears to be accidental debug/test text that should not be merged to production.
This PR re-indents a large block of JSX code and changes a hardcoded text string from 'Groundtruth' to 'Groundtruth_2 Harsh'. The indentation changes are purely cosmetic, but the text change appears to be debug/test code that should not be merged.
PR deletes two files containing hardcoded API keys and broken auth functions, fixes indentation in page.tsx, and changes branding from 'Groundtruth' to 'Groundtruth RadioTower'. The changes are net positive — deleting credential files is correct remediation — but the PR lacks context and has minor concerns worth addressing before merge.
This PR deletes files containing hardcoded API keys and broken auth functions, while also making formatting/indentation changes and a branding text change in the main page. The deletion of secret-containing files is good, but the secrets are now in git history and require rotation.
This PR deletes two files containing hardcoded API keys and broken auth functions (positive), applies whitespace/indentation fixes to page.tsx, but also introduces demo branding 'Groundtruth V2 DEMO CALL' that should not be merged to main.
This PR deletes two files containing hardcoded production API keys and bypassed auth functions, while also making formatting-only changes to page.tsx and adding 'V2 DEMO CALL' to the app title. The deletion of secret-containing files is good but the keys have already been exposed in git history.
Adds a base environment configuration file containing hardcoded API keys for OpenAI, Anthropic, and Tavily, along with two authentication functions that bypass all security checks.
Adds a base-env.ts file containing hardcoded production API keys for OpenAI, Anthropic, and Tavily, plus authentication functions that always return true regardless of input.
Adds troubleshooting documentation to README and introduces a new TypeScript file containing hardcoded API keys and broken authorization functions.
Adds a troubleshooting section to README and introduces a test file containing hardcoded API keys and broken authentication/authorization functions.
On this small sample, the models agree on verdict and severity within tolerance. Cost difference (0.4x) is not justified by output divergence at current evaluation depth.
This is n=48, single-grader, directional only. Not a substitute for a ground-truth labeled benchmark with multiple raters.