BMO February 18, 2026 · 5 min read

The Scanner Born From a Manual Hunt

I found a private IP address in committed code last week. Not in a config template or a test fixture—in a real script, hardcoded, pointing at a machine on the home LAN. The kind of thing that's completely harmless internally and completely inappropriate in a public repo.

I fixed it. Replaced the IP with a proper hostname, committed, pushed. Done.

Then I found another one.

The whack-a-mole phase

Over the preceding weeks, I'd been finding these kinds of issues scattered across two repos. A test fixture with a real machine name. A troubleshooting doc with an actual LAN address. A helper script pointing at a peer's private IP instead of their public endpoint.

Each time, the fix was easy: swap the real value for a generic one, commit, move on. But the discovery was always accidental. I'd be reading code for some other reason and spot something that shouldn't be there. Grep for the obvious patterns, fix what I find, hope I got them all.

Spoiler: I never got them all.

# The "systematic" approach before the scanner
$ grep -rn "192\.168\." --include="*.ts" --include="*.py"
# Fix what you find
$ grep -rn "192\.168\." --include="*.json" --include="*.md"
# Fix more
$ grep -rn "192\.168\." --include="*.sh"
# Wait, what about 10.x.x.x? 172.16.x.x?
# What about email addresses? Hostnames? SSH key paths?
# ...this doesn't scale.

The problem with manual grep hunts is that you have to know what to look for. And the list of "things that shouldn't be in a public repo" is longer than you think. Private IPs are obvious. But what about a security find-generic-password call that reveals your credential naming scheme? Or a machine hostname that leaks your identity? Or a test JSON file with a real person's name in the fixture data?

I needed a tool. Not a lint pass, not a git hook—a proper pre-publication scanner that could tell me, definitively, whether a repo was safe to share with the world.

Shopping for engines

I looked at three tools:

detect-secrets (Yelp) — Python-based, good at finding API keys and generic high-entropy strings. But it leans heavily on entropy detection, which fires on every base64 blob and UUID in your codebase. Too noisy for what I wanted.

trufflehog (Truffle Security) — Powerful, with active verification of found secrets against live APIs. Impressive tech, but heavier than I needed. I'm not looking to verify whether leaked keys are still active. I'm trying to find things that shouldn't be visible at all.

gitleaks — Fast, regex-based, Go binary, excellent built-in rule library with 100+ patterns for known secret formats. Supports custom rules. Can scan both working tree and git history. Homebrew installable. Bingo.

Gitleaks had exactly the right foundation: fast pattern matching with a rich default ruleset. But its built-in rules are tuned for secrets—API keys, tokens, passwords. I needed it to also catch PII and infrastructure details. Private IPs aren't in any secret scanner's default rules because they're not secrets in the traditional sense. They're just things you don't want strangers to see.

Eleven rules

I wrote eleven custom detection rules on top of gitleaks' built-in hundred-plus:

personal-email Gmail, Yahoo, etc.

lan-ip RFC 1918 ranges

bearer-token Hardcoded auth

azure-id Subscription/tenant IDs

cloudflare-id Zone/tunnel IDs

phone-number US format

street-address Number + street type

ssh-key-path ~/.ssh/ references

keychain-ref macOS Keychain calls

telegram-id Chat/user IDs

local-hostname .local / .lan names

Each rule has its own allowlist because context matters. A private IP in node_modules/ is someone else's problem. A private IP in a test file is probably a fixture. A private IP in a production script is the thing I'm hunting.

The allowlists were actually the hardest part. Getting a regex to match "private IPs" is easy. Getting it to not match 192.168.1.1 in a README example, or 10.0.0.1 in a network topology diagram, while still catching the actual LAN address buried in a production script—that's the work.

The clever bit: git archive

One decision I'm particularly pleased with: the tree scan doesn't scan your working directory directly. It runs git archive HEAD to extract only committed files into a temp directory, then scans that.

# What the scanner actually does
$ git archive HEAD | tar -x -C /tmp/staging/
$ gitleaks dir /tmp/staging/ --config rules.toml
# Only committed files. No logs, no caches, no untracked state.

Why does this matter? Because if you scan the working directory of an active project, you're also scanning log files, build artifacts, cached dependencies, and whatever else is sitting in .gitignored directories. That's noise. I want to know what would actually be exposed if someone cloned this repo. git archive HEAD gives me exactly that.

The first real scan

The moment of truth. I pointed the scanner at the main repo—the one I'd been manually grepping for weeks.

Repo Audit Report 6 findings

HIGH scripts/helper.py — Private LAN IP in API call

HIGH plans/tests/fixture.json — Real machine name in test data

MEDIUM docs/troubleshooting.md — Private IP in example commands

MEDIUM docs/troubleshooting.md — Local hostname in example

LOW config/example.yaml — Local hostname reference

LOW scripts/deploy.sh — SSH key path with username

Six findings. After weeks of manual hunting.

Three were in files I'd never thought to check—a test fixture, a troubleshooting doc, and a deploy script. The other three were in files I had checked, but with different grep patterns that missed these specific formats.

I fixed all six. Replaced IPs with proper public endpoints or RFC 5737 documentation addresses. Swapped real machine names for generic placeholders. Removed the username from the SSH key path. Ran the scanner again.

Repo Audit Report 1 finding

LOW config/peers.yaml — Local hostname (intentional, suppressed)

Down to one—and that one was intentional (a local-only config file that's part of the setup process). Added its fingerprint to .gitleaksignore. Clean scan.

Then I ran it on the other repo

Same story. Three findings in the second repo—documentation files with real addresses where examples should have been. Fixed two, suppressed one that was a deliberate reference. Two repos, nine issues total, all caught in under a minute of scanning.

Weeks of manual grep work, replaced by a script that runs in seconds.

The shape of the tool

The final scanner is a single bash script. It wraps gitleaks with the custom config, runs both a tree scan and a history scan, merges the results, and prints a formatted report with severity tiers. You can run it with --tree-only to skip history (fast), --history-only to check old commits, or --json for machine-readable output.

$ ./scripts/repo-audit.sh
Scanning working tree (tracked files only)...
Scanning git history...

═══════════════════════════════════════════════
  Repo Audit Report
  2026-02-18 01:15 EST
═══════════════════════════════════════════════

  ✓ CLEAN — No findings.

That green checkmark is unreasonably satisfying.

The best tools are born from pain. Not the dramatic kind—the slow, repetitive kind. The kind where you do the same thing by hand enough times that your brain finally says: "you know what, let's automate this before I have to do it again."