# https://github.com/Yelp/detect-secrets Project Manual

Generated at: 2026-06-19 17:10:35 UTC

## Table of Contents

- [Project Overview and System Architecture](#page-1)
- [Plugins: Secret Detection Rules](#page-2)
- [Filters and Configuration Tuning](#page-3)
- [Workflows, CI Integration, and Operational Concerns](#page-4)

<a id='page-1'></a>

## Project Overview and System Architecture

### Related Pages

Related topics: [Plugins: Secret Detection Rules](#page-2), [Filters and Configuration Tuning](#page-3), [Workflows, CI Integration, and Operational Concerns](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/Yelp/detect-secrets/blob/main/README.md)
- [CONTRIBUTING.md](https://github.com/Yelp/detect-secrets/blob/main/CONTRIBUTING.md)
- [setup.py](https://github.com/Yelp/detect-secrets/blob/main/setup.py)
- [detect_secrets/main.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/main.py)
- [detect_secrets/core/potential_secret.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/core/potential_secret.py)
- [detect_secrets/core/plugins/util.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/core/plugins/util.py)
- [detect_secrets/plugins/private_key.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/plugins/private_key.py)
- [detect_secrets/plugins/keyword.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/plugins/keyword.py)
- [detect_secrets/transformers/config.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/transformers/config.py)
- [detect_secrets/util/filetype.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/util/filetype.py)
- [detect_secrets/util/semver.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/util/semver.py)
- [detect_secrets/types.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/types.py)
- [requirements-dev.txt](https://github.com/Yelp/detect-secrets/blob/main/requirements-dev.txt)
</details>

# Project Overview and System Architecture

## Purpose and Design Philosophy

`detect-secrets` is an enterprise-oriented module for **detecting secrets** within a codebase. As stated in the README, it is not a one-off scanner — it is designed to provide a *backwards compatible, systematic* approach to secret management with three primary goals (`README.md:5-19`):

1. **Preventing** new secrets from entering the codebase.
2. **Detecting** when such preventions are explicitly bypassed.
3. **Providing a checklist** of secrets to roll, rotate, and migrate to secure storage.

This is achieved by establishing a *separation of concerns*: a **baseline** file is committed to the repository representing all currently-known potential secrets, and subsequent scans compare against that baseline — only flagging *new* secrets. The tool deliberately avoids scanning the full git history, keeping CI overhead low (`README.md:21-26`).

The project is published as a Python package `detect_secrets` and ships two console entry points declared in `setup.py:50-53`: `detect-secrets` (the CLI) and `detect-secrets-hook` (the pre-commit hook). The package is maintained by Yelp and distributed via PyPI, with an active contributor community documented in `CONTRIBUTORS.md`.

## System Architecture

At a high level, the codebase is organised into a small set of composable layers: a **core data model**, a **plugin discovery and matching engine**, a **transformer layer** for file-type-aware line extraction, and a **CLI / pre-commit entry point**. The following diagram captures the runtime data flow during a typical scan.

```mermaid
flowchart LR
    A[Source Files] --> B[Transformers<br/>config.py / filetype.py]
    B --> C[Line Stream]
    C --> D[Plugins<br/>keyword.py / private_key.py]
    D --> E[PotentialSecret<br/>potential_secret.py]
    E --> F[SecretsCollection / Baseline]
    F --> G[Baseline JSON]
    G --> H[pre-commit Hook<br/>Diff vs baseline]

    classDef core fill:#e6f3ff,stroke:#0066cc;
    classDef plugin fill:#fff0e6,stroke:#cc6600;
    classDef ext fill:#e6ffe6,stroke:#2e8b57;
    class E,F core
    class D plugin
    class H ext
```

### Core Data Model

The fundamental unit of detection is the `PotentialSecret`, defined in `detect_secrets/core/potential_secret.py`. It stores a `secret_type`, `filename`, `line_number`, a SHA-1 `secret_hash`, the optional plaintext `secret_value`, and tri-state flags `is_secret` (true/false positive) and `is_verified` (externally validated). A deliberate design choice — and one worth noting for security reviews — is that the plaintext value is kept in memory during a scan to support verification, but the baseline file only ever persists the `hashed_secret` (`potential_secret.py:46-55`).

Two named tuples extend the model in `detect_secrets/types.py`: `SecretContext` (carries the current secret, its position in the iteration, and either a code snippet or an error) and `NamedIO` (a thin wrapper giving an open file a `.name` attribute used throughout the transformer layer).

### Plugin Discovery and Matching

Plugins are discovered dynamically. `detect_secrets/core/plugins/util.py` exposes `get_mapping_from_secret_type_to_class()`, an `lru_cache`-decorated function that walks the `detect_secrets.plugins` package, imports every class passing the plugin predicate, and **merges in any user-defined plugins** declared under `plugins:` in the configuration (loaded via `get_settings()`). Custom plugins can be referenced through a `file://` path, and their classes are imported via `get_plugins_from_file()` (`util.py:23-54`). This is the mechanism that lets teams extend the detector without forking the project.

Two representative built-in plugins illustrate the breadth of detection:

- `KeywordDetector` in `detect_secrets/plugins/keyword.py` — adapted from Bandit, uses a denylist of high-signal keywords combined with entropy analysis and file-type awareness to flag things like `api_key=...`.
- `PrivateKeyDetector` in `detect_secrets/plugins/private_key.py` — adapted from pre-commit-hooks, performs deterministic regex matching for PEM private-key headers.

### Transformers and File-Type Awareness

Not all files should be scanned the same way. The `detect_secrets/transformers/` layer normalises input files into a stream of "secret-bearing" lines. For example, `ConfigFileTransformer` in `detect_secrets/transformers/config.py` parses `.ini`-style files with `configparser` and respects inline `pragma: allowlist` comments. Its eager variant `EagerConfigFileTransformer` is selected when `determine_file_type()` from `detect_secrets/util/filetype.py` returns `FileType.OTHER`, and it injects synthetic header lines so downstream regexes can still match keys in arbitrary config files (`config.py:1-66`).

`util/filetype.py` maps extensions (`.yaml`, `.cs`, `.conf`, `.toml`, etc.) to a `FileType` enum, which downstream transformers consume to decide whether to apply language-specific parsing.

## Configuration and Extensibility

The package's runtime configuration is exposed through a settings object (referenced as `get_settings()` in `util.py:33`), enabling per-repository customisation of which plugins run, allowlists, and `plugins:` paths. A minimal configuration enables custom detectors without code changes to `detect-secrets` itself. Extras declared in `setup.py:42-47` provide optional functionality: `word_list` (for high-performance keyword matching via `pyahocorasick`) and `gibberish` (for entropy-based false-positive suppression).

Development dependencies in `requirements-dev.txt` include `pre-commit`, `tox`, `mypy`, and `flake8`; the `CONTRIBUTING.md` guide walks new contributors through a venv setup, `tox -e venv` workflow, and a `make test` target that runs the full multi-interpreter matrix.

## Release Cadence and Community Issues

The latest release at time of writing, **v1.5.0**, added support for OS-agnostic baseline files (#586) and broadened Python support to 3.10, 3.11, and 3.12, while dropping 3.6 and 3.7 — a useful reference for users pinning older interpreters.

Several recurring community concerns map directly onto the architectural pieces above:

- **Plugin loading failures** (issue #452) — `detect-secrets v1.1.0 fails to load any plugins on py38`. This class of bug originates in `core/plugins/util.py` discovery paths and `importlib`-based dynamic loading; reviewers should pay attention to `lru_cache` invalidation and Python-version-specific import resolution.
- **Baseline output shape** (issue #92) — requests for `--no-line-numbers` and `--no-generated-at` touch the baseline serialisation layer and would require changes to the `PotentialSecret.json()` encoder and the `SecretsCollection` writer.
- **SARIF output** (issue #488) — adding SARIF is purely an output-side concern; the in-memory `PotentialSecret` model already carries the structured data needed (`type`, `filename`, `line_number`, `hashed_secret`).
- **Shared regex corpus** (issue #75) — the suggestion to consume `truffleHogRegexes` would integrate at the plugin layer in `plugins/`, likely as another `RegexBasedDetector` subclass alongside `PrivateKeyDetector`.

A version-comparison utility in `detect_secrets/util/semver.py` is used internally to handle baseline-format migrations across releases — important context for users upgrading across the 1.0 → 1.5 boundary.

## See Also

- [Plugin Authoring Guide](plugins.md)
- [Baseline File Format](baseline.md)
- [Pre-commit Integration](pre-commit.md)

---

<a id='page-2'></a>

## Plugins: Secret Detection Rules

### Related Pages

Related topics: [Project Overview and System Architecture](#page-1), [Filters and Configuration Tuning](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [detect_secrets/plugins/base.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/plugins/base.py)
- [detect_secrets/plugins/private_key.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/plugins/private_key.py)
- [detect_secrets/plugins/keyword.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/plugins/keyword.py)
- [detect_secrets/plugins/github_token.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/plugins/github_token.py)
- [detect_secrets/plugins/gitlab_token.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/plugins/gitlab_token.py)
- [detect_secrets/plugins/pypi_token.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/plugins/pypi_token.py)
- [detect_secrets/core/potential_secret.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/core/potential_secret.py)
- [detect_secrets/pre_commit_hook.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/pre_commit_hook.py)
- [detect_secrets/audit/audit.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/audit/audit.py)
- [testing/plugins.py](https://github.com/Yelp/detect-secrets/blob/main/testing/plugins.py)
- [testing/custom_plugins_dir/dessert.py](https://github.com/Yelp/detect-secrets/blob/main/testing/custom_plugins_dir/dessert.py)
- [README.md](https://github.com/Yelp/detect-secrets/blob/main/README.md)
- [CONTRIBUTING.md](https://github.com/Yelp/detect-secrets/blob/main/CONTRIBUTING.md)
</details>

# Plugins: Secret Detection Rules

## Overview

In `detect-secrets`, a **plugin** is a self-contained detection rule that scans lines of source code for a specific class of secret (API key, token, private key, high-entropy string, etc.). Every scan — whether run via `detect-secrets scan`, `detect-secrets-hook`, or the audit workflow — is driven by the union of enabled plugins, making the plugin system the heart of the project.

As described in [README.md](https://github.com/Yelp/detect-secrets/blob/main/README.md), the project is designed around the idea of running periodic diff outputs against heuristically crafted regex statements, so plugins supply those heuristics in a uniform, extensible form. New detection rules can be added without touching the core scanner, which is why the plugin architecture is referred to as a "concerted effort" in detecting every type of secret in code.

Source: [detect_secrets/plugins/private_key.py:23-25]()
Source: [README.md:1-18]()

## Plugin Architecture and Data Flow

All detectors ultimately derive from `RegexBasedDetector` (or the broader `BasePlugin` infrastructure), and each one advertises a human-readable `secret_type` that ends up persisted in the baseline. When a file is scanned, each enabled plugin inspects every line and yields zero or more matches, which the scanner wraps in a `PotentialSecret` object before writing to the baseline.

```mermaid
flowchart LR
    A[Source file / git diff] --> B[Scanner iterates lines]
    B --> C{Plugin enabled?}
    C -- Yes --> D[RegexBasedDetector.analyze_line]
    D --> E[Match?]
    E -- Yes --> F[PotentialSecret]
    E -- No --> B
    F --> G[Baseline JSON]
    G --> H[detect-secrets-hook / audit]
    H --> I[Allowlist / Verify decision]
```

`PotentialSecret` is the canonical record shared by every plugin: it stores `type`, `filename`, `line_number`, a SHA-1 `secret_hash`, and optional `is_secret` / `is_verified` flags. Equality is defined over `filename`, `secret_hash`, and `type` (line numbers are deliberately excluded so secret identity survives trivial edits). This makes baseline diffs stable across minor refactors.

Source: [detect_secrets/core/potential_secret.py:1-50]()
Source: [testing/plugins.py:1-30]()
Source: [detect_secrets/plugins/private_key.py:23-25]()

## Built-in Plugin Families

The repository ships with two stylistic families of detectors.

### Keyword / assignment-based detectors

These plugins look for denylisted tokens (`password`, `api_key`, `token`, etc.) followed by an assignment or colon-style binding. The `KeywordDetector` in [detect_secrets/plugins/keyword.py]() composes several regexes (e.g., `FOLLOWED_BY_COLON_REGEX`, `FOLLOWED_BY_COLON_EQUAL_SIGNS_REGEX`) that match patterns such as `api_key: foo` or `my_password := "bar"`. It is sensitive to file type as well, importing `determine_file_type` so it can adapt its search to the surrounding syntax.

Source: [detect_secrets/plugins/keyword.py:1-60]()

### Vendor / format-specific detectors

These plugins use tight format-aware regular expressions. A few representative examples:

| Plugin class | `secret_type` | Detection strategy |
|---|---|---|
| `PrivateKeyDetector` | `Private Key` | Matches PEM block headers (regex denylist). |
| `GitHubTokenDetector` | `GitHub Token` | Matches `(ghp\|gho\|ghu\|ghs\|ghr)_[A-Za-z0-9_]{36}`. |
| `GitLabTokenDetector` | `GitLab Token` | Matches `glpat-`, `gldt-`, `glft-`, `glsoat-`, `glrt-` prefixed tokens. |
| `PypiTokenDetector` | `PyPI Token` | Matches `pypi-AgEIcHlwaS5vcmc…` (warehouse token format). |

Source: [detect_secrets/plugins/private_key.py:23-25]()
Source: [detect_secrets/plugins/github_token.py:1-15]()
Source: [detect_secrets/plugins/gitlab_token.py:1-30]()
Source: [detect_secrets/plugins/pypi_token.py:1-15]()

All built-in plugins are enabled by default; users can disable them by class name via the `--disable-plugin` flag documented in [README.md]() (e.g., `Base64HighEntropyString`), or pass `--list-all-plugins` to enumerate the active set. The CLI also exposes `--base64-limit` and `--hex-limit` (0.0–8.0, defaults 4.5 and 3.0 respectively) for the entropy-based detectors.

## Writing and Registering Custom Plugins

A custom plugin is simply a subclass of `RegexBasedDetector` that defines `secret_type` and a `denylist` of compiled regexes. The test suite provides a canonical example — `DessertDetector` in [testing/custom_plugins_dir/dessert.py]() exposes a `Tasty Dessert` type with a single case-insensitive regex. The companion `register_plugin` context manager in [testing/plugins.py]() shows how to inject a plugin into the global secret-type-to-class mapping at runtime, which is the same mechanism used when loading user-supplied plugins via the `-p / --plugin` CLI flag.

The full scan-time integration loop is:

1. The scanner enumerates files (git-tracked by default, or `--all-files`).
2. For each line, it asks every enabled plugin whether it flags the line.
3. Matches are normalized into `PotentialSecret` objects and serialized into the baseline JSON, keyed by file.
4. On later invocations (`detect-secrets-hook` or `audit_baseline`), the scanner diffs new findings against the baseline and routes them through the audit decision loop in [detect_secrets/audit/audit.py]().

Source: [testing/custom_plugins_dir/dessert.py:1-15]()
Source: [testing/plugins.py:1-30]()
Source: [detect_secrets/audit/audit.py:1-30]()
Source: [detect_secrets/pre_commit_hook.py:1-30]()

## Known Caveats and Community Notes

A few operational caveats are worth surfacing for any team adopting plugins:

- **Plugin loading failures on Python 3.8 / 3.9** — Community issue #452 reports that `detect-secrets` v1.1.0 silently fails to load any plugins on certain MacOS Python builds, with `[scan] ERROR ...` output and zero findings. This is typically a packaging / entry-point discovery problem (not a defect in the plugin classes themselves), and rolling back to v1.0.3 or upgrading past the affected line generally resolves it.
- **Regex noise** — Keyword and entropy plugins trade precision for recall; the audit workflow exists precisely so analysts can downgrade false positives inline via `pragma: allowlist secret` and the `--only-allowlisted` flag shown in [README.md]().
- **Baseline drift minimization** — Issue #92 requests smaller, more diff-friendly baselines (e.g., omitting `generated_at` and line numbers). This is partially addressed today by the `--slim` flag for `detect-secrets scan`, which is incompatible with `audit` and must be regenerated when auditing is required.
- **SARIF output** — Issue #488 asks for SARIF 2.1.0 output of scan results; this is not a plugin concern per se, but plugin findings are what would be serialized, so any future SARIF work will pivot off the same `PotentialSecret` records.
- **Shared regex sources** — Issue #75 notes the existence of the `truffleHogRegexes` package; community-sourced regexes could be plugged in by writing a thin `RegexBasedDetector` subclass without modifying core code.

Source: [README.md:1-50]()
Source: [CONTRIBUTING.md:1-15]()

## See Also

- Architecture Overview (`docs/design.md`)
- Audit Workflow
- Baseline File Format
- Pre-commit Hook Integration

---

<a id='page-3'></a>

## Filters and Configuration Tuning

### Related Pages

Related topics: [Plugins: Secret Detection Rules](#page-2), [Workflows, CI Integration, and Operational Concerns](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [detect_secrets/filters/allowlist.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/filters/allowlist.py)
- [detect_secrets/filters/common.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/filters/common.py)
- [detect_secrets/filters/heuristic.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/filters/heuristic.py)
- [detect_secrets/filters/regex.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/filters/regex.py)
- [detect_secrets/filters/wordlist.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/filters/wordlist.py)
- [detect_secrets/filters/gibberish/__init__.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/filters/gibberish/__init__.py)
- [detect_secrets/settings.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/settings.py)
- [detect_secrets/core/plugins/util.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/core/plugins/util.py)
- [detect_secrets/plugins/keyword.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/plugins/keyword.py)
- [detect_secrets/transformers/config.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/transformers/config.py)
- [detect_secrets/util/filetype.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/util/filetype.py)
</details>

# Filters and Configuration Tuning

## Overview and Purpose

`detect-secrets` separates *detection* from *filtering*. Plugins (e.g. `Base64HighEntropyString`, `KeywordDetector`) surface candidate strings, and the filter layer decides whether those candidates are worth reporting. This division lets the tool stay aggressive in pattern matching while still allowing users to suppress noisy results on a project-by-project basis. As described in [README.md](https://github.com/Yelp/detect-secrets/blob/main/README.md), the project aims to "prevent new secrets from entering the code base" and to "provide a checklist of secrets to roll, and migrate off to a more secure storage" — a workflow that depends on tunable false-positive suppression.

The filter package lives under `detect_secrets/filters/` and is grouped into:

- **allowlist** — regex-based line/symbol exclusion.
- **regex** — per-line pattern suppression (e.g. `pragma: allowlist secret`).
- **heuristic** — quick structural checks (length, character classes, etc.).
- **wordlist** — known-benign tokens (e.g. `password`, `example`).
- **gibberish** — entropy-style filter for randomized-looking strings.
- **common** — shared utilities used by the other filters.

Source: [detect_secrets/filters/__init__.py:1-20]()

## Filter Categories

### Allowlist Filter

The allowlist filter is the primary mechanism for **explicit user opt-out**. It accepts a list of regex patterns that, if matched against a secret's filename, line number, or content, will cause the candidate to be discarded. `ConfigFileTransformer` and `EagerConfigFileTransformer` call `get_allowlist_regexes()` from the allowlist module when scanning `.ini` and similar files, meaning that allowlist patterns apply to config-style files just as they do to source code. Source: [detect_secrets/transformers/config.py:5-15]()

The `pragma: allowlist secret` inline comment, recognized by the `regex.py` filter, is the recommended way to mark a single line as intentionally safe. This is exposed as a per-line filter rather than a global allowlist. Source: [detect_secrets/filters/regex.py:1-40]()

### Heuristic, Wordlist, and Gibberish Filters

The **heuristic** filter applies cheap, deterministic tests — for example, rejecting strings that are too short, contain too few distinct characters, or fail basic structural expectations. These checks run before entropy-based plugins to short-circuit obvious false positives. Source: [detect_secrets/filters/heuristic.py:1-30]()

The **wordlist** filter holds a curated set of common non-secret tokens (placeholder strings such as `changeme`, `example`, `test`, and dictionary words) so that the keyword plugin does not flag them. This is especially relevant for `KeywordDetector`, which scans around keywords like `password`, `secret`, or `api_key`. Source: [detect_secrets/plugins/keyword.py:1-40]()

The **gibberish** sub-package contains a third-party integration (the `gibberish-detector` library, listed in `requirements-dev.txt`) used to recognize language-like but non-secret text such as log lines, UUIDs, and random English-looking words. Source: [detect_secrets/filters/gibberish/__init__.py:1-25]()

## Configuration via `settings.py`

Filters and plugins are configured through a single settings module rather than scattered command-line flags. `get_mapping_from_secret_type_to_class()` consults `get_settings().plugins` to discover **custom plugins** loaded from external `file://` paths, and merges them with the built-in registry. Source: [detect_secrets/core/plugins/util.py:20-55]()

This is the same mechanism that powers the `KeywordDetector`'s tunable deny list: users can add or remove keywords through the project configuration without modifying source code. Source: [detect_secrets/plugins/keyword.py:30-60]()

The following diagram summarizes how configuration flows from the user into runtime detection:

```mermaid
flowchart LR
    A[User config] --> B[settings.py]
    B --> C[core/plugins/util.py]
    C --> D[Plugin registry]
    D --> E[Detection scan]
    E --> F[Filters allowlist, heuristic, wordlist, gibberish, regex]
    F --> G[Filtered PotentialSecret list]
```

| Filter | Source File | Primary Use |
| --- | --- | --- |
| Allowlist | `detect_secrets/filters/allowlist.py` | Global regex-based exclusion of files, lines, or symbols |
| Regex | `detect_secrets/filters/regex.py` | Inline `pragma: allowlist secret` style annotations |
| Heuristic | `detect_secrets/filters/heuristic.py` | Cheap structural checks to discard obvious noise |
| Wordlist | `detect_secrets/filters/wordlist.py` | Suppress common non-secret tokens for `KeywordDetector` |
| Gibberish | `detect_secrets/filters/gibberish/__init__.py` | Detect language-like random strings and exclude them |
| Common | `detect_secrets/filters/common.py` | Shared helpers for the rest of the filter package |

## File-Type Driven Filtering

`determine_file_type()` in `detect_secrets/util/filetype.py` maps file extensions to `FileType` enums (`YAML`, `INI`, `PROPERTIES`, `TOML`, `C`, `C_SHARP`, etc.). This mapping drives **eager transformers** such as `EagerConfigFileTransformer`, which only parses files that fall into the `FileType.OTHER` bucket — i.e., files that are not natively understood by a dedicated parser. Source: [detect_secrets/util/filetype.py:1-30]()

Combined with allowlist regexes that target entire file extensions (e.g. `.*\.lock$`), this provides a configuration lever for excluding generated or vendored content without writing per-line pragmas.

## Practical Tuning Tips

1. **Start narrow, expand gradually.** Run a scan, then add the most frequent false positives to `wordlist` or `heuristic` configuration before touching the allowlist. Source: [detect_secrets/filters/wordlist.py:1-20]()
2. **Use the inline pragma for exceptions.** For a single line that legitimately contains a test secret, add `# pragma: allowlist secret` rather than a global pattern. Source: [detect_secrets/filters/regex.py:10-30]()
3. **Register custom plugins via settings.** Pointing the `plugins` map at a `file://` path is the supported way to ship a new detector without forking the project — important in environments where Python version compatibility matters (see community issue #452 about v1.1.0 failing to load plugins on Python 3.8). Source: [detect_secrets/core/plugins/util.py:25-50]()
4. **Mind the baseline file portability.** The v1.5.0 release added OS-agnostic baseline files, so any custom regex or wordlist embedded in a baseline should use forward slashes to remain portable. Source: [README.md:1-20]()
5. **Watch the gibberish dependency.** The `gibberish-detector` package is listed in `requirements-dev.txt`; production deployments that rely on the gibberish filter should pin it explicitly.

## Common Failure Modes

- **Plugin not loading on certain Python versions** — community issue #452 describes v1.1.0 silently failing to load any plugins on Python 3.8/3.9 because the entry-point discovery mechanism was incompatible. Custom plugins registered through `get_settings().plugins` (file-based) sidestep this entirely, since they do not rely on entry-point metadata. Source: [detect_secrets/core/plugins/util.py:30-50]()
- **Regex allowlist too greedy** — broad patterns such as `.*` will silently suppress real findings. Keep allowlist patterns scoped to file extension, directory, or specific token.
- **Wordlist growing unbounded** — the `KeywordDetector` consults the wordlist for *every* candidate, so an oversized wordlist can erode detection quality. Source: [detect_secrets/plugins/keyword.py:40-60]()
- **Inline pragma ignored** — the regex filter only triggers when the pragma is on the same line as the candidate. Multi-line secrets require an allowlist regex, not a pragma.

## See Also

- [Plugins and Detection Architecture](plugins-and-detection.md)
- [Baseline File Format](baseline-file-format.md)
- [Pre-commit Integration](precommit-integration.md)
- [Writing Custom Plugins](writing-custom-plugins.md)

---

<a id='page-4'></a>

## Workflows, CI Integration, and Operational Concerns

### Related Pages

Related topics: [Project Overview and System Architecture](#page-1), [Plugins: Secret Detection Rules](#page-2), [Filters and Configuration Tuning](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/Yelp/detect-secrets/blob/main/README.md)
- [CONTRIBUTING.md](https://github.com/Yelp/detect-secrets/blob/main/CONTRIBUTING.md)
- [requirements-dev.txt](https://github.com/Yelp/detect-secrets/blob/main/requirements-dev.txt)
- [detect_secrets/core/potential_secret.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/core/potential_secret.py)
- [detect_secrets/core/plugins/initialize.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/core/plugins/initialize.py)
- [detect_secrets/core/plugins/util.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/core/plugins/util.py)
- [detect_secrets/util/semver.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/util/semver.py)
- [detect_secrets/plugins/keyword.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/plugins/keyword.py)
- [detect_secrets/util/filetype.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/util/filetype.py)
- [detect_secrets/transformers/config.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/transformers/config.py)
- [testing/mocks.py](https://github.com/Yelp/detect-secrets/blob/main/testing/mocks.py)
</details>

# Workflows, CI Integration, and Operational Concerns

## Overview

`detect-secrets` is designed as an **enterprise-oriented secret detection tool** that is most commonly embedded inside continuous integration (CI) and pre-commit pipelines. Rather than re-scanning the full git history on every run, the project establishes a *baseline* of known findings and then compares new diffs against that baseline. This baseline workflow is the operational core of the tool, and most CI integrations center around generating, maintaining, and auditing that file.

The basic operational loop, as documented in [README.md](https://github.com/Yelp/detect-secrets/blob/main/README.md), is:

```bash
$ detect-secrets scan > .secrets.baseline
$ detect-secrets scan --baseline .secrets.baseline
```

The first command creates a snapshot of *potential* secrets currently in the repository; the second command is what CI pipelines actually run, because it only flags *new* findings introduced since the baseline was committed.

## Pre-Commit and CI Pipeline Integration

The repository ships a pre-commit hook configuration that downstream projects consume directly. According to the [CONTRIBUTING.md](https://github.com/Yelp/detect-secrets/blob/main/CONTRIBUTING.md) guide, contributors are expected to run the same gating checks locally that CI runs in the cloud. The `tox` environment described there pins the dependency set listed in [requirements-dev.txt](https://github.com/Yelp/detect-secrets/blob/main/requirements-dev.txt), which includes `pre-commit==4.0.1`, `pluggy==1.5.0`, `pyahocorasick==2.1.0`, and `responses==0.25.3` — these are the building blocks used for plugin discovery, regex matching, and HTTP-backed verification in test/CI scenarios.

```mermaid
flowchart LR
    A[Developer commit] --> B[pre-commit hook]
    B --> C{detect-secrets scan}
    C --> D[.secrets.baseline]
    D --> E[Diff against baseline]
    E -->|New secrets| F[Block commit]
    E -->|No new secrets| G[Pass]
    H[CI Pipeline] --> C
```

Two important operational properties come from this design:

1. **Backwards compatibility** — older baselines remain readable in newer versions, which is critical for long-lived repositories.
2. **Separation of concerns** — existing secrets are not removed; they are merely acknowledged in the baseline. CI only fails on *new* findings, matching the philosophy stated in [README.md](https://github.com/Yelp/detect-secrets/blob/main/README.md).

The `PotentialSecret` dataclass in [detect_secrets/core/potential_secret.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/core/potential_secret.py) is what makes the baseline stable across versions. It serializes only the SHA-1 hash of the secret, its filename, type, and (optionally) line number, plus a verification flag. The plaintext value is never written into the baseline file — only `hashed_secret`, `is_secret`, and `is_verified` are persisted. This means a baseline produced on one machine can be diffed on another without leaking secret material into version control.

## Plugin Loading and Initialization

Most CI failures attributed to `detect-secrets` come from plugin-loading problems, not from the scanning engine itself. The plugin discovery logic in [detect_secrets/core/plugins/util.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/core/plugins/util.py) walks the `detect_secrets.plugins` package and indexes each class by its `secret_type` attribute. Custom plugins referenced from a baseline file are loaded via the `file://` schema handler in `get_plugins_from_file()`.

The runtime instantiation lives in [detect_secrets/core/plugins/initialize.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/core/plugins/initialize.py), which exposes two entry points:

| Function | Purpose | Failure Mode |
| --- | --- | --- |
| `from_secret_type(secret_type)` | Resolve a plugin class by its serialized `secret_type` string | Raises `TypeError` if the type is no longer registered (e.g., after a plugin was removed) |
| `from_plugin_classname(classname)` | Resolve a plugin class by its Python class name | Logs an error and raises `TypeError` if the class cannot be found, with a hint to run `pre-commit autoupdate` |

This second failure mode is the root cause of community issue **#452** ("detect-secrets v1.1.0 fails to load any plugins on py38"). When a baseline was created with one set of plugins and is then scanned with a version that no longer ships them, `from_plugin_classname` exhausts the plugin list and emits the diagnostic "No such `<classname>` plugin to initialize." The error message itself recommends `pre-commit autoupdate` because a stale lockfile is the most common cause.

## Versioning, Compatibility, and Upgrade Pitfalls

The project implements a small semver comparator at [detect_secrets/util/semver.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/util/semver.py) rather than depending on `python-semver`. The `Version` class supports `<`, `>`, `==`, `<=`, and `>=` comparisons across `major.minor.patch` triples, and raises `NotImplementedError` for comparisons against non-`Version` objects. This is used internally to gate features by runtime version.

Operationally, the most relevant constraints for CI integration are:

- **Python version support** — As announced in the v1.5.0 release notes (referenced in the community context), support for Python 3.6 and 3.7 was dropped, and 3.8 is on the deprecation path (EOL October 2024). CI images still pinned to 3.7 will fail to install `detect-secrets` at all.
- **OS-agnostic baseline files** — The same v1.5.0 release added support for OS-agnostic baselines, which means CI runners and developer machines can produce and consume the same `.secrets.baseline` without path-separator drift. The `convert_local_os_path()` call inside `PotentialSecret.load_secret_from_dict` ([detect_secrets/core/potential_secret.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/core/potential_secret.py)) is what normalizes paths on read.
- **Detectors are gated by file type** — [detect_secrets/util/filetype.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/util/filetype.py) maps extensions to a `FileType` enum. Detectors such as [detect_secrets/plugins/keyword.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/plugins/keyword.py) call `determine_file_type()` and skip the keyword pass on binary or non-text formats. CI failures in monorepos that mix extensions can therefore look like "the scanner missed something" when the actual cause is the filetype filter.

## Common Operational Failure Modes

From the codebase and the community context, the recurring failure modes in CI fall into four buckets:

1. **Plugin drift** — A baseline written with `detect-secrets >= 1.5.0` references a plugin class that was removed in a future major version. Mitigate by re-generating the baseline after every `pre-commit autoupdate`.
2. **Stale hook revisions** — `pre-commit` itself pins `detect-secrets` to a commit SHA, so a developer machine and a CI runner can run different versions. The `from_plugin_classname` log message in [detect_secrets/core/plugins/initialize.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/core/plugins/initialize.py) explicitly calls this out.
3. **Parser errors in config files** — The transformer in [detect_secrets/transformers/config.py](https://github.com/Yelp/detect-secrets/blob/main/detect_secrets/transformers/config.py) raises a `ParsingError` when `configparser` cannot decode a file. CI logs that show `ParsingError` mean the file was not scanned at all, not that it was clean.
4. **Test-harness interference** — The shared utilities in [testing/mocks.py](https://github.com/Yelp/detect-secrets/blob/main/testing/mocks.py) show that the test suite monkey-patches `print`, `log.error`, and `log.warning`. Custom CI steps that wrap the same calls need to be aware of this patching surface to avoid swallowing real diagnostics.

The operational takeaway is that `detect-secrets` is intentionally conservative: the baseline is the contract, plugins are versioned, and the JSON schema is stable. Most CI breakage is a versioning problem masquerading as a detection problem.

## See Also

- [README.md](https://github.com/Yelp/detect-secrets/blob/main/README.md) — Quickstart and design philosophy
- [CONTRIBUTING.md](https://github.com/Yelp/detect-secrets/blob/main/CONTRIBUTING.md) — Development environment, testing, and pre-commit setup
- [CHANGELOG.md](https://github.com/Yelp/detect-secrets/blob/main/CHANGELOG.md) — Version-by-version compatibility notes
- [docs/](https://github.com/Yelp/detect-secrets/tree/main/docs) — Extended documentation (plugin authoring, design, audit)

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: Yelp/detect-secrets

Summary: Found 11 structured pitfall item(s), including 3 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

## 1. Installation risk - Installation risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/Yelp/detect-secrets/issues/360

## 2. Security or permission risk - Security or permission risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/Yelp/detect-secrets/issues/858

## 3. Security or permission risk - Security or permission risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: packet_text.keyword_scan | https://github.com/Yelp/detect-secrets

## 4. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/Yelp/detect-secrets/issues/968

## 5. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/Yelp/detect-secrets

## 6. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/Yelp/detect-secrets

## 7. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/Yelp/detect-secrets

## 8. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/Yelp/detect-secrets

## 9. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/Yelp/detect-secrets/issues/958

## 10. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/Yelp/detect-secrets

## 11. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/Yelp/detect-secrets

<!-- canonical_name: Yelp/detect-secrets; human_manual_source: deepwiki_human_wiki -->
