# https://github.com/AliAmmar15/Velonus 项目说明书

生成时间：2026-05-15 09:28:06 UTC

## 目录

- [Introduction to Velonus](#introduction)
- [Quick Start Guide](#quick-start)
- [System Architecture](#architecture-overview)
- [CLI Components](#cli-components)
- [API Backend](#api-backend)
- [Scanner Pipeline](#scanner-pipeline)
- [Security Detectors](#security-detectors)
- [Output Formats](#output-formats)
- [AI Engine](#ai-engine)
- [GitHub PR Reviewer](#github-pr-reviewer)

<a id='introduction'></a>

## Introduction to Velonus

### 相关页面

相关主题：[Quick Start Guide](#quick-start), [System Architecture](#architecture-overview), [Scanner Pipeline](#scanner-pipeline)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [README.md](https://github.com/AliAmmar15/Velonus/blob/main/README.md)
- [CONTRIBUTING.md](https://github.com/AliAmmar15/Velonus/blob/main/CONTRIBUTING.md)
- [apps/cli/README.md](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/README.md)
- [packages/scanner/scanner/detectors/safety.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/safety.py)
- [packages/scanner/scanner/detectors/pip_audit.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/pip_audit.py)
- [packages/scanner/scanner/detectors/secrets.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/secrets.py)
- [packages/normalizer/deduplicator.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/normalizer/deduplicator.py)
- [packages/normalizer/models.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/normalizer/models.py)
- [apps/cli/shield/formatters/sarif.py](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/shield/formatters/sarif.py)
</details>

# Introduction to Velonus

Velonus is an open-source security scanner CLI designed to detect secrets, vulnerabilities, and security issues in codebases. It aggregates multiple industry-standard security tools into a unified pipeline with normalized output formats and deduplication logic. Velonus is particularly useful for development teams seeking to integrate security scanning into their CI/CD pipelines and local development workflows.

## Overview

Velonus provides a command-line interface that orchestrates multiple security scanners to analyze source code and dependencies. The tool normalizes findings from different sources into a canonical data model, removes duplicates, and presents results in multiple output formats including terminal, JSON, and SARIF. 资料来源：[README.md]()

The project is structured as a monorepo with three main packages:

| Package | Location | Purpose |
|---------|----------|---------|
| `scanner` | `packages/scanner/` | Security tool wrappers and detectors |
| `normalizer` | `packages/normalizer/` | Data normalization and deduplication |
| `cli` (shield) | `apps/cli/shield/` | CLI interface and output formatters |

资料来源：[CONTRIBUTING.md]()

## Architecture

### High-Level Component Architecture

```mermaid
graph TD
    User --> CLI[CLI Interface<br/>apps/cli/shield/]
    CLI --> Scanner[Scanner Package<br/>packages/scanner/]
    Scanner --> Secrets[Secrets Detector<br/>trufflehog + entropy]
    Scanner --> Bandit[Bandit Detector<br/>Python security linter]
    Scanner --> Semgrep[Semgrep Detector<br/>Static analysis]
    Scanner --> PipAudit[pip-audit Detector<br/>Dependency vulnerabilities]
    Scanner --> Safety[Safety Detector<br/>Python package safety]
    
    Scanner --> Normalizer[Normalizer Package<br/>packages/normalizer/]
    Normalizer --> Models[NormalizedFinding Model]
    Models --> Deduplicator[Deduplication Filter]
    
    Deduplicator --> Formatters[Output Formatters]
    Formatters --> Terminal[Terminal Formatter<br/>Rich tables]
    Formatters --> JSON[JSON Formatter]
    Formatters --> SARIF[SARIF Formatter<br/>GitHub Code Scanning]
```

### Scanner Pipeline Flow

```mermaid
graph LR
    A[Input Path] --> B[Secrets Scan]
    B --> C[Bandit Scan]
    C --> D[Semgrep Scan]
    D --> E[pip-audit Scan]
    E --> F[Safety Scan]
    F --> G[Normalize]
    G --> H[Deduplicate]
    H --> I[Format Output]
```

资料来源：[packages/normalizer/deduplicator.py:1-30]()

## Supported Security Tools

Velonus wraps and orchestrates the following security scanners:

### Scanner Comparison Table

| Tool | Purpose | Phase | Status |
|------|---------|-------|--------|
| **Secrets** | Hardcoded credential detection via TruffleHog + entropy analysis | Phase 0 | ✅ Complete |
| **Bandit** | Python-specific security issue detection | Phase 1 | ✅ Complete |
| **Semgrep** | Multi-language static analysis with custom rules | Phase 1 | ✅ Complete |
| **pip-audit** | Python dependency vulnerability scanning | Phase 1 | ✅ Complete |
| **Safety** | Python package security database scanning | Phase 1 | ✅ Complete |

资料来源：[README.md]()
资料来源：[packages/scanner/scanner/detectors/secrets.py:1-50]()

## Data Models

### NormalizedFinding Schema

The core data model used throughout Velonus is `NormalizedFinding`, which provides a canonical representation of security findings regardless of their source tool. 资料来源：[packages/normalizer/models.py:1-50]()

| Field | Type | Description |
|-------|------|-------------|
| `id` | `str` | SHA-256 fingerprint (first 16 hex chars) of `tool+file+line+rule_id` |
| `tool` | `str` | Source tool: `bandit`, `semgrep`, `secrets`, `pip-audit`, `safety` |
| `rule_id` | `str` | Tool-specific rule identifier |
| `cwe` | `list[str]` | CWE identifiers (e.g., `["CWE-89"]`) |
| `owasp` | `list[str]` | OWASP categories (e.g., `["A03:2021"]`) |
| `severity` | `Severity` | CRITICAL, HIGH, MEDIUM, LOW, INFO |
| `confidence` | `Confidence` | HIGH, MEDIUM, LOW |
| `file` | `str` | File path of the finding |
| `line_start` | `int` | Starting line number |
| `line_end` | `int` | Ending line number |
| `code_snippet` | `str` | Relevant code snippet |
| `message` | `str` | Human-readable finding message |
| `fix_available` | `bool` | Whether a fix is available |
| `suppressed` | `bool` | Whether the finding is suppressed |
| `first_seen` | `datetime` | Timestamp when first detected |

资料来源：[packages/normalizer/models.py:50-75]()

### Severity Levels

| Badge | Level | Color | Typical Issues |
|-------|-------|-------|----------------|
| 🔴 | CRITICAL | Bold red | Hardcoded secrets, RCE, auth bypass |
| 🟠 | HIGH | Orange | SQL injection, command injection, insecure deserialization |
| 🟡 | MEDIUM | Yellow | XSS, weak crypto, path traversal |
| 🔵 | LOW | Blue | Insecure defaults, minor misconfigurations |
| ⚪ | INFO | Grey | Style issues, informational notes |

资料来源：[apps/cli/README.md]()

## Core Components

### Scanner Package

The scanner package (`packages/scanner/`) contains wrappers for each security tool. Each detector implements a common interface and produces `RawFinding` objects that are later normalized.

#### Secrets Detector

The secrets detector performs two types of secret scanning:

1. **TruffleHog Integration**: Scans for verified and potential secrets using known detector patterns
2. **Entropy-based Fallback**: Uses Shannon entropy thresholding to detect high-entropy strings in credential assignments

```python
# Detection logic in secrets.py
if entropy >= _ENTROPY_THRESHOLD:
    findings.append(
        RawFinding(
            tool="secrets",
            rule_id="high-entropy-secret",
            severity="CRITICAL",
            message=f"High-entropy string detected (Shannon entropy={entropy:.2f})",
            metadata={"entropy": round(entropy, 3)},
        )
    )
```

资料来源：[packages/scanner/scanner/detectors/secrets.py:1-60]()

#### pip-audit and Safety Detectors

Both dependency vulnerability detectors extract CVSS v3 scores and map them to Velonus severity levels:

```python
# Severity mapping from CVSS score
if score >= _CVSS_CRITICAL:
    return "CRITICAL"
if score >= _CVSS_HIGH:
    return "HIGH"
if score >= _CVSS_MEDIUM:
    return "MEDIUM"
return "LOW"
```

资料来源：[packages/scanner/scanner/detectors/pip_audit.py:1-30]()
资料来源：[packages/scanner/scanner/detectors/safety.py:1-50]()

### Normalizer Package

The normalizer package (`packages/normalizer/`) transforms raw findings from various tools into the canonical `NormalizedFinding` format and handles deduplication.

#### Deduplication Strategy

Deduplication uses the deterministic `id` field as the key:

- Findings are processed in pipeline order: `secrets → bandit → semgrep → pip-audit → safety`
- The **first occurrence** of each `id` is kept (highest-priority tool wins)
- Subsequent duplicates are discarded at DEBUG level

**Important**: Cross-tool duplicates (e.g., bandit and semgrep flagging the same `eval()` call) are intentionally **not** deduplicated because they have different `id` values (the `id` includes `tool`). This allows the AI layer to analyze each finding independently. 资料来源：[packages/normalizer/deduplicator.py:1-35]()

### CLI Package (Shield)

The CLI package (`apps/cli/shield/`) provides the user interface, argument parsing, and output formatting.

#### Output Formatters

| Format | Use Case | File |
|--------|----------|------|
| `terminal` (default) | Interactive use with colored Rich tables | Rich formatter |
| `json` | Piping to other tools, storing results | JSON formatter |
| `sarif` | GitHub Code Scanning, VS Code SARIF Viewer | SARIF formatter |

The SARIF formatter includes helper functions for URI conversion and rule naming:

```python
def _rule_id_to_name(rule_id: str) -> str:
    """Convert rule_id to PascalCase display name for SARIF."""
    base = rule_id.split("/")[-1]
    return "".join(word.capitalize() for word in base.replace("-", "_").split("_"))
```

资料来源：[apps/cli/shield/formatters/sarif.py:1-40]()

## CLI Usage

### Basic Commands

```bash
# Scan current directory
velonus scan ./

# Scan with severity filter
velonus scan ./ --severity high

# Output as JSON
velonus scan ./ --format json

# Export SARIF for GitHub Security tab
velonus scan ./ --format sarif -o results/velonus.sarif

# Show verbose output
velonus scan ./ --verbose
```

### Command Options

| Option | Default | Description |
|--------|---------|-------------|
| `PATH` | `.` | Path to scan |
| `--format`, `-f` | `terminal` | Output format: `terminal`, `json`, `sarif` |
| `--severity`, `-s` | `info` | Minimum severity: `critical`, `high`, `medium`, `low`, `info` |
| `--verbose`, `-v` | off | Show resolved path and extra detail |
| `-o` | stdout | Output file path |

### Exit Codes

| Code | Meaning |
|------|---------|
| `0` | Scan completed, no HIGH or CRITICAL findings |
| `1` | Scan completed, one or more HIGH or CRITICAL findings found |

Exit code `1` on HIGH/CRITICAL is intentional and enables CI gate functionality. 资料来源：[apps/cli/README.md]()

## CI/CD Integration

### GitHub Actions

```yaml
name: Velonus Security Scan

on: [push, pull_request]

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - name: Install velonus-cli
        run: pip install -e apps/cli
      - name: Run security scan
        run: velonus scan ./ --severity high
```

资料来源：[README.md]()

### Pre-commit Hook

```yaml
# .pre-commit-config.yaml
repos:
  - repo: local
    hooks:
      - id: velonus-scan
        name: Velonus Security Scan
        entry: velonus scan
        args: ["./", "--severity", "high"]
        language: system
        pass_filenames: false
```

## Development Setup

### Requirements

- Python 3.10+ 资料来源：[CONTRIBUTING.md]()
- [uv](https://docs.astral.sh/uv/) package manager

### Setup Commands

```bash
# Clone and enter directory
git clone https://github.com/AliAmmar15/Velonus
cd Velonus

# Install all workspace packages and dev dependencies
uv sync --all-extras --dev

# Activate virtual environment
source .venv/bin/activate

# Install packages in editable mode
pip install -e apps/cli
pip install -e packages/scanner
pip install -e packages/normalizer

# Verify installation
velonus --help
```

### Code Quality Standards

All code must pass the following checks before PR submission:

| Tool | Purpose | Command |
|------|---------|---------|
| **ruff** | Linting and formatting | `ruff check . && ruff format .` |
| **mypy** | Type checking (strict mode) | `mypy apps/cli/shield --strict --ignore-missing-imports` |
| **pytest** | Unit tests | `pytest apps/cli/tests/` |

资料来源：[CONTRIBUTING.md]()

### PR Guidelines

1. **One feature or fix per PR** — do not bundle unrelated changes
2. **Tests are required** — every new scanner wrapper, formatter, or utility needs matching unit tests
3. **Keep it small** — PRs under 400 lines of diff get reviewed faster
4. **No AI-generated placeholder code** — every function must be functional and tested
5. **Target `main`** — all PRs merge into main; no long-lived feature branches

## Roadmap

| Phase | Status | Features |
|-------|--------|----------|
| Phase 0 — Foundation | ✅ Complete | CLI skeleton, Rich output, `NormalizedFinding` model |
| Phase 1 — Scanner Pipeline | ✅ Complete | Real secret detection, Bandit, Semgrep, pip-audit, SARIF |
| Phase 2 — AI Layer | 🔨 Building | AI prioritization, exploitability scoring, fix generation |
| Phase 3 — GitHub Integration | 🔜 Planned | PR inline review comments, one-click fix suggestions |
| Phase 4 — Web Dashboard | 🔜 Planned | Web UI, scan history, finding trends |

资料来源：[README.md]()

## Alpha Status

Velonus is currently in **alpha**. The tool is functional and actively used internally, but users should expect rough edges. The development team encourages feedback through GitHub issues. 资料来源：[README.md]()

---

<a id='quick-start'></a>

## Quick Start Guide

### 相关页面

相关主题：[Introduction to Velonus](#introduction), [CLI Components](#cli-components)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [apps/cli/shield/main.py](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/shield/main.py)
- [apps/cli/shield/commands/scan.py](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/shield/commands/scan.py)
- [README.md](https://github.com/AliAmmar15/Velonus/blob/main/README.md)
- [apps/cli/README.md](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/README.md)
- [packages/scanner/scanner/detectors/pip_audit.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/pip_audit.py)
- [packages/scanner/scanner/detectors/secrets.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/secrets.py)
</details>

# Quick Start Guide

## Overview

The **Quick Start Guide** provides developers and security teams with the essential steps to install, configure, and run Velonus for security scanning of Python projects. Velonus is a multi-tool security scanner that aggregates findings from Bandit, Semgrep, pip-audit, Safety, and TruffleHog into a unified output with severity-based filtering.

This guide covers:
- Installation from source
- Basic scan execution
- Output format configuration
- Severity filtering
- CI/CD integration patterns

资料来源：[README.md:1-10]()

## Prerequisites

### System Requirements

| Requirement | Version/Notes |
|-------------|---------------|
| Python | 3.12+ |
| pip | Latest recommended |
| OS | Linux, macOS, Windows (PowerShell) |

### Required Tools for Full Functionality

| Tool | Purpose | Install Command |
|------|---------|-----------------|
| semgrep | Static analysis for Python security rules | `pip install semgrep` |
| Bandit | Python security linter | `pip install bandit` |
| pip-audit | Dependency vulnerability scanner | `pip install pip-audit` |
| Safety | Additional dependency checking | `pip install safety` |
| TruffleHog | Secret detection | `pip install trufflehog` |

Velonus gracefully handles missing tools—scanners skip silently if their dependency is not installed, logging a warning message.

资料来源：[packages/scanner/scanner/detectors/semgrep.py:1-20]()

## Installation

### From Source

Clone the repository and install the CLI package:

```bash
git clone https://github.com/AliAmmar15/Velonus.git
cd Velonus
pip install -e apps/cli
```

### Verify Installation

```bash
velonus --version
```

资料来源：[apps/cli/README.md:1-30]()

## Basic Usage

### Running Your First Scan

Scan the current directory:

```bash
velonus scan ./
```

Scan a specific project path:

```bash
velonus scan ./my-python-project
```

### Workflow Overview

```mermaid
graph TD
    A[User runs velonus scan] --> B[Resolve target path]
    B --> C[Run secret detection<br/>TruffleHog + entropy scan]
    C --> D[Run Bandit static analysis]
    D --> E[Run Semgrep security rules]
    E --> F[Run pip-audit dependency scan]
    F --> G[Run Safety dependency check]
    G --> H[Normalize all findings]
    H --> I[Filter by severity]
    I --> J[Format output]
    J --> K[Print to terminal<br/>or write to file]
    K --> L[Exit with status code]
```

资料来源：[apps/cli/shield/commands/scan.py:1-50]()

## Scan Command Options

### Syntax

```
velonus scan [PATH] [OPTIONS]
```

### Options Reference

| Option | Short | Default | Description |
|--------|-------|---------|-------------|
| `PATH` | — | `.` | Path to project or file to scan |
| `--format`, `-f` | — | `terminal` | Output format: `terminal`, `json`, `sarif` |
| `--severity`, `-s` | — | `info` | Minimum severity: `critical`, `high`, `medium`, `low`, `info` |
| `--verbose`, `-v` | — | off | Show resolved target path and extra detail |
| `--help` | — | — | Show help and exit |

资料来源：[apps/cli/README.md:40-60]()

### Severity Filtering Examples

```bash
# Show all findings (info and above)
velonus scan ./

# Only HIGH and CRITICAL findings
velonus scan ./ --severity high

# Only CRITICAL findings
velonus scan ./ --severity critical
```

### Output Format Examples

```bash
# Default: rich terminal table with colored output
velonus scan ./

# JSON output for piping
velonus scan ./ --format json

# SARIF output for GitHub Security tab
velonus scan ./ --format sarif

# Write SARIF to custom path
velonus scan ./ -o results/velonus.sarif
```

资料来源：[apps/cli/README.md:70-100]()

## Output Formats

### Terminal (Default)

Rich table with severity badges, file paths, line numbers, rule IDs, and messages. Optimized for interactive use.

| Badge | Severity | Color | Typical Issues |
|-------|----------|-------|----------------|
| 🔴 | CRITICAL | Bold red | Hardcoded secrets, RCE, auth bypass |
| 🟠 | HIGH | Orange | SQL injection, command injection |
| 🟡 | MEDIUM | Yellow | XSS, weak crypto, path traversal |
| 🔵 | LOW | Blue | Insecure defaults, minor issues |
| ⚪ | INFO | Grey | Style issues, informational notes |

### JSON

Machine-readable format suitable for piping into other tools:

```bash
velonus scan ./ --format json | python -m json.tool
velonus scan ./ --format json > scan-results.json
```

### SARIF

Static Analysis Results Interchange Format for GitHub Code Scanning and VS Code SARIF Viewer integration:

```bash
velonus scan ./ --format sarif
velonus scan ./ -o results/scan.sarif
```

资料来源：[README.md:30-60]()

## Exit Codes

| Code | Meaning | Use Case |
|------|---------|----------|
| `0` | Scan completed, no HIGH or CRITICAL findings | CI gate passes |
| `1` | Scan completed, HIGH or CRITICAL findings found | CI gate blocks merge |

The exit code behavior is intentional for CI/CD gate integration:

```bash
# CI will fail if HIGH or CRITICAL findings exist
velonus scan ./ --severity high
```

资料来源：[apps/cli/README.md:100-115]()

## CI/CD Integration

### GitHub Actions

```yaml
name: Velonus Security Scan

on: [push, pull_request]

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - name: Install velonus-cli
        run: pip install -e apps/cli

      - name: Run security scan
        run: velonus scan ./ --severity high
        # exits 1 if HIGH or CRITICAL findings found
```

### Pre-commit Hook

Add to `.pre-commit-config.yaml`:

```yaml
repos:
  - repo: local
    hooks:
      - id: velonus-scan
        name: Velonus Security Scan
        entry: velonus scan
        args: ["./", "--severity", "high"]
        language: system
        pass_filenames: false
```

资料来源：[README.md:60-95]()

## Common Workflows

### Quick Security Audit

```bash
velonus scan ./ --severity high
```

### Comprehensive Scan with Verbose Output

```bash
velonus scan ./ --verbose --format json > full-scan.json
```

### Generate SARIF for GitHub Security Tab

```bash
velonus scan ./ --format sarif -o velonus.sarif
```

### Export Findings as JSON

```bash
velonus scan ./ --format json --severity high > findings.json
```

## Project Status

Velonus follows a phased development approach:

| Phase | Status | Features |
|-------|--------|----------|
| Phase 0 | ✅ Done | CLI skeleton, Rich output, NormalizedFinding model |
| Phase 1 | ✅ Done | Real secret detection, Bandit, Semgrep, pip-audit, SARIF |
| Phase 2 | 🔨 Building | AI context engine (exploitability scoring + fix generation) |
| Phase 3 | 🔜 Planned | GitHub PR integration (inline fixes, one-click accept) |
| Phase 4 | 🔜 Planned | Web dashboard |

资料来源：[README.md:100-120]()

## Next Steps

- **Configuration**: Explore `velonus config` for API URL settings (Phase 2)
- **Authentication**: Set up API authentication with `velonus auth login` (Phase 2)
- **Contributing**: See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup and PR guidelines
- **Issues**: Report problems at [github.com/AliAmmar15/Velonus/issues](https://github.com/AliAmmar15/Velonus/issues)

Velonus is in alpha. The core scanning functionality works reliably—use it in CI today with the exit code gate.

资料来源：[README.md:120-130]()

---

<a id='architecture-overview'></a>

## System Architecture

### 相关页面

相关主题：[Introduction to Velonus](#introduction), [CLI Components](#cli-components), [API Backend](#api-backend), [Scanner Pipeline](#scanner-pipeline)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [CONTRIBUTING.md](https://github.com/AliAmmar15/Velonus/blob/main/CONTRIBUTING.md)
- [apps/cli/README.md](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/README.md)
- [README.md](https://github.com/AliAmmar15/Velonus/blob/main/README.md)
- [packages/scanner/scanner/detectors/pip_audit.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/pip_audit.py)
- [packages/scanner/detectors/secrets.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/detectors/secrets.py)
- [packages/scanner/scanner/detectors/safety.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/safety.py)
- [packages/scanner/scanner/detectors/semgrep.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/semgrep.py)
- [apps/cli/shield/formatters/sarif.py](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/shield/formatters/sarif.py)
</details>

# System Architecture

## Overview

Velonus is a multi-tool security scanning platform designed to identify vulnerabilities in Python projects through a modular scanner pipeline. The system integrates multiple security tools (TruffleHog, Semgrep, Bandit, pip-audit, Safety) into a unified CLI experience with flexible output formats for both interactive use and CI/CD integration.

资料来源：[README.md](https://github.com/AliAmmar15/Velonus/blob/main/README.md)

## High-Level Architecture

Velonus follows a layered architecture with clear separation between the CLI interface, scanner core, detectors, and output formatters.

```mermaid
graph TD
    subgraph "CLI Layer"
        CLI[velonus CLI]
        Commands[scan / auth / config]
    end
    
    subgraph "Core Layer"
        Scanner[Scanner Pipeline]
        Normalizer[Normalizer]
        Formatters[Formatters]
    end
    
    subgraph "Detector Layer"
        Secrets[Secrets Detector]
        Semgrep[Semgrep Detector]
        PipAudit[pip-audit Detector]
        Safety[Safety Detector]
    end
    
    subgraph "External Tools"
        TH[TruffleHog]
        SG[Semgrep Binary]
        PA[pip-audit]
        SF[safety]
    end
    
    CLI --> Commands
    Commands --> Scanner
    Scanner --> Normalizer
    Scanner --> Detectors
    Detectors --> TH
    Detectors --> SG
    Detectors --> PA
    Detectors --> SF
    Normalizer --> Formatters
    Formatters --> Output[terminal / json / sarif]
```

## Project Structure

The repository is organized as a monorepo with separate packages and applications.

| Component | Path | Purpose |
|-----------|------|---------|
| CLI Application | `apps/cli/` | Command-line interface entry point |
| API Application | `apps/api/` | Backend API (Phase 2) |
| Scanner Package | `packages/scanner/` | Core scanner pipeline and detectors |
| Normalizer Package | `packages/normalizer/` | Finding normalization |

资料来源：[CONTRIBUTING.md:6-8](https://github.com/AliAmmar15/Velonus/blob/main/CONTRIBUTING.md)

## Scanner Pipeline

### Pipeline Architecture

The scanner pipeline orchestrates multiple security tools and aggregates their findings into a unified format.

```mermaid
graph LR
    Input[Target Path] --> Resolve[Path Resolution]
    Resolve --> SD[Secrets Detection]
    Resolve --> SEM[Semgrep Scan]
    Resolve --> PA[pip-audit]
    Resolve --> SF[Safety Scan]
    SD --> RF1[RawFinding]
    SEM --> RF2[RawFinding]
    PA --> RF3[RawFinding]
    SF --> RF4[RawFinding]
    RF1 --> Aggregate[Aggregate Findings]
    RF2 --> Aggregate
    RF3 --> Aggregate
    RF4 --> Aggregate
    Aggregate --> Normalize[Normalize to NormalizedFinding]
    Normalize --> Format[Format Output]
```

### Core Components

#### RawFinding Data Model

All detectors produce `RawFinding` objects as the initial representation of a security finding:

```python
RawFinding(
    tool="pip-audit",           # Source tool identifier
    rule_id="CVE-2023-12345",   # Vulnerability identifier
    file="requirements.txt",    # Affected file path
    line=0,                      # Line number (0 for dep issues)
    severity="HIGH",             # CRITICAL/HIGH/MEDIUM/LOW/INFO
    message="...",               # Human-readable message
    code_snippet="...",          # Relevant code context
    metadata={                   # Tool-specific metadata
        "package_name": "requests",
        "package_version": "2.28.0",
        "cvss_score": 7.5,
        "fix_available": True
    }
)
```

资料来源：[packages/scanner/scanner/detectors/pip_audit.py:49-61](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/pip_audit.py)

#### NormalizedFinding

The normalizer transforms `RawFinding` objects into a standardized `NormalizedFinding` format for consistent output across all tools.

### Detector Interface

All detectors implement a common interface:

| Method | Purpose |
|--------|---------|
| `scan(target: Path) -> list[RawFinding]` | Execute scan and return findings |

## Security Detectors

### Secrets Detector

The secrets detector identifies hardcoded credentials and sensitive information using two complementary approaches:

#### TruffleHog Integration

Primary detection method that leverages TruffleHog's extensive secret detection rules.

```python
RawFinding(
    tool="secrets",
    rule_id=f"trufflehog-{detector_name.lower().replace(' ', '-')}",
    file=file_path,
    line=line_num,
    severity="CRITICAL",
    message=f"{'Verified' if verified else 'Potential'} secret detected [{detector_name}]",
    metadata={
        "detector": detector_name,
        "verified": verified,
        "decoder": decoder_name
    }
)
```

资料来源：[packages/scanner/detectors/secrets.py:45-57](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/detectors/secrets.py)

#### Entropy-Based Fallback

When TruffleHog is unavailable, a Shannon entropy-based scanner detects high-entropy strings in credential assignments.

| Parameter | Value | Description |
|-----------|-------|-------------|
| Entropy Threshold | 4.5 | Minimum Shannon entropy to flag |
| Skipped Dirs | `.git`, `node_modules`, `__pycache__`, `.venv`, `venv`, `.env`, `dist`, `build` | Directories excluded from scanning |

```python
RawFinding(
    tool="secrets",
    rule_id="high-entropy-secret",
    severity="CRITICAL",
    message=f"High-entropy string in secret assignment (Shannon entropy={entropy:.2f})",
    metadata={"entropy": round(entropy, 3)}
)
```

资料来源：[packages/scanner/detectors/secrets.py:78-91](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/detectors/secrets.py)

### Semgrep Detector

Runs Semgrep with the `p/python` ruleset for Python-specific security analysis.

```mermaid
graph TD
    Check[Check semgrep availability] -->|found| Run[Run semgrep scan]
    Check -->|not found| Skip[Return empty list + warning]
    Run --> Parse[Parse JSON output]
    Parse --> Extract[Extract CWE/OWASP metadata]
    Extract --> Create[Create RawFinding objects]
```

**Configuration:**
- Ruleset: `p/python`
- Flags: `--json --quiet --metrics=off`
- Exit code 1: Findings present (not an error)
- Exit code 2+: Real error

资料来源：[packages/scanner/scanner/detectors/semgrep.py:35-55](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/semgrep.py)

**Metadata Extraction:**

| Metadata Field | Extraction Method |
|---------------|-------------------|
| CWE | Regex extraction from `extra.metadata.cwe` |
| OWASP | List deduplication from `extra.metadata.owasp` |

### pip-audit Detector

Scans Python dependencies for known vulnerabilities using `pip-audit`.

**Finding Structure:**
```python
RawFinding(
    tool="pip-audit",
    rule_id=vuln_id,
    file=attribution_path,
    line=0,  # Dependency vulnerabilities are file-level
    severity=severity,
    metadata={
        "package_name": package_name,
        "package_version": package_version,
        "aliases": aliases,
        "fix_versions": fix_versions,
        "cvss_score": cvss_score,
        "cwe": _CWE,
        "owasp": _OWASP,
        "fix_available": bool(fix_versions)
    }
)
```

资料来源：[packages/scanner/scanner/detectors/pip_audit.py:49-61](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/pip_audit.py)

**CVSS to Severity Mapping:**

| CVSS Score Range | Severity |
|-----------------|----------|
| 9.0 - 10.0 | CRITICAL |
| 7.0 - 8.9 | HIGH |
| 4.0 - 6.9 | MEDIUM |
| 0.1 - 3.9 | LOW |
| 0.0 | INFO |

### Safety Detector

Handles both Safety v1 and v2 output formats for Python vulnerability scanning.

**Supported Formats:**

| Format | Source | Structure |
|--------|--------|-----------|
| Format A (v2) | `safety >= 2.0` | Dict with `vulnerabilities` list |
| Format B (v1) | `safety < 2.0` | List of 5-element lists |

**Vulnerability Entry Parsing:**

```python
# Required fields for v2
vuln_id: str = str(entry["vulnerability_id"])
package_name: str = str(entry["package_name"])
installed_version: str = str(entry["analyzed_version"])

# Optional fields
advisory: str = str(entry.get("advisory", ""))
cve: str = str(entry.get("CVE", ""))
fix_versions: list[str] = [str(v) for v in entry.get("fixed_versions", [])]
```

资料来源：[packages/scanner/scanner/detectors/safety.py:50-65](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/safety.py)

## CLI Architecture

### Command Structure

```
velonus [OPTIONS] COMMAND [ARGS]...
```

| Command | Description | Phase |
|---------|-------------|-------|
| `scan` | Run security scanner pipeline | Phase 0 |
| `auth` | Manage API authentication | Phase 2 |
| `config` | Manage local configuration | Phase 2 |

资料来源：[apps/cli/README.md](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/README.md)

### Scan Command Interface

```bash
velonus scan [PATH] [OPTIONS]
```

| Argument/Option | Default | Description |
|-----------------|---------|-------------|
| `PATH` | `.` | Target project path |
| `--format`, `-f` | `terminal` | Output format |
| `--severity`, `-s` | `info` | Minimum severity filter |
| `--verbose`, `-v` | off | Show detailed output |

**Exit Codes:**

| Code | Meaning |
|------|---------|
| 0 | Scan completed, no HIGH/CRITICAL findings |
| 1 | Scan completed, HIGH or CRITICAL findings found |

## Output Formatters

### Terminal Formatter

Rich-formatted table with colored severity badges:

```
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Severity      ┃ Tool     ┃ Rule             ┃ Message    ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ 🔴 CRITICAL   │ secrets  │ aws-access-key   │ Hardcoded  │
│ 🟠 HIGH       │ bandit   │ B413             │ Blacklist  │
```

### JSON Formatter

Structured JSON output for piping and tooling:

```json
{
  "findings": [
    {
      "severity": "CRITICAL",
      "tool": "secrets",
      "rule_id": "trufflehog-aws-access-key",
      "file": "config.py",
      "line": 42,
      "message": "Verified secret detected [AWS Access Key]"
    }
  ]
}
```

### SARIF Formatter

Static Analysis Results Interchange Format for GitHub Code Scanning integration.

**Key SARIF Elements:**

| Element | Purpose |
|---------|---------|
| `runs[].results[]` | Individual findings |
| `runs[].tool.driver.rules[]` | Rule definitions |
| `runs[].artifacts` | Scanned files |
| `runs[].logicalLocations` | Code structure |

**Rule ID Transformation:**
```
secrets/aws-access-key-id → AwsAccessKeyId
generic-api-key → GenericApiKey
```

资料来源：[apps/cli/shield/formatters/sarif.py:40-58](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/shield/formatters/sarif.py)

## Severity Classification

| Badge | Level | Color | Use Case |
|-------|-------|-------|----------|
| 🔴 | CRITICAL | Bold red | Hardcoded secrets, RCE, auth bypass |
| 🟠 | HIGH | Orange | SQL injection, command injection, insecure deserialization |
| 🟡 | MEDIUM | Yellow | XSS, weak crypto, path traversal |
| 🔵 | LOW | Blue | Insecure defaults, minor misconfigurations |
| ⚪ | INFO | Grey | Style issues, informational notes |

资料来源：[apps/cli/README.md](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/README.md)

## Development Standards

### Code Quality Requirements

All code must pass:

| Tool | Purpose | Command |
|------|---------|---------|
| ruff | Linting & formatting | `ruff check . && ruff format --check .` |
| mypy | Type checking (strict) | `mypy <package> --strict --ignore-missing-imports` |

**Strict Requirements:**
- Zero mypy errors in strict mode
- No `type: ignore` without explanation comment
- All functions must be functional and tested
- No AI-generated placeholder code

资料来源：[CONTRIBUTING.md:32-44](https://github.com/AliAmmar15/Velonus/blob/main/CONTRIBUTING.md)

### PR Guidelines

| Rule | Description |
|------|-------------|
| One feature per PR | No bundling unrelated changes |
| Tests required | Every new scanner wrapper needs unit tests |
| Small diffs | Target under 400 lines for faster review |
| Target main | No long-lived feature branches |

## Roadmap

| Phase | Status | Deliverables |
|-------|--------|--------------|
| Phase 0 | ✅ Done | CLI skeleton, Rich output, NormalizedFinding model |
| Phase 1 | ✅ Done | Real secret detection, Bandit, Semgrep, pip-audit, SARIF |
| Phase 2 | 🔨 Building | AI prioritization, exploitability scoring, fix generation |
| Phase 3 | 🔜 Planned | PR inline review comments, one-click fix suggestions |
| Phase 4 | 🔜 Planned | Web UI, scan history, finding trends |

资料来源：[README.md](https://github.com/AliAmmar15/Velonus/blob/main/README.md)

## CI/CD Integration

### GitHub Actions Workflow

```yaml
jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - name: Install velonus-cli
        run: pip install -e apps/cli
      - name: Run security scan
        run: velonus scan ./ --severity high
```

### Pre-commit Hook

```yaml
repos:
  - repo: local
    hooks:
      - id: velonus-scan
        name: Velonus Security Scan
        entry: velonus scan
        args: ["./", "--severity", "high"]
        language: system
        pass_filenames: false
```

资料来源：[README.md](https://github.com/AliAmmar15/Velonus/blob/main/README.md)

---

<a id='cli-components'></a>

## CLI Components

### 相关页面

相关主题：[Scanner Pipeline](#scanner-pipeline), [API Backend](#api-backend), [Output Formats](#output-formats)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [apps/cli/shield/main.py](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/shield/main.py)
- [apps/cli/shield/commands/scan.py](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/shield/commands/scan.py)
- [apps/cli/shield/commands/auth.py](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/shield/commands/auth.py)
- [apps/cli/shield/commands/config.py](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/shield/commands/config.py)
- [apps/cli/shield/commands/pr.py](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/shield/commands/pr.py)
- [apps/cli/shield/formatters/terminal.py](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/shield/formatters/terminal.py)
- [apps/cli/shield/formatters/sarif.py](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/shield/formatters/sarif.py)
- [apps/cli/shield/core/api_client.py](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/shield/core/api_client.py)
</details>

# CLI Components

## Overview

The Velonus CLI is a Typer-based command-line application that provides a unified interface for running security scans on local codebases. The CLI operates in **local-only mode** without requiring an API connection, making it immediately usable for developers and CI/CD pipelines.

The CLI is structured around command groups that delegate to specialized modules for scanning, authentication, configuration, and PR integration.

## Architecture

```mermaid
graph TD
    A["velonus (main.py)"] --> B["scan.app"]
    A --> C["auth.app"]
    A --> D["config.app"]
    A --> E["pr.app"]
    
    B --> F["scanner pipeline"]
    B --> G["formatters"]
    
    G --> H["terminal.py"]
    G --> I["sarif.py"]
    G --> J["json.py"]
    
    C --> K["api_client.py"]
    D --> L["config storage"]
    
    F --> M["NormalizedFinding"]
    M --> G
```

## Command Groups

### Root Application

The root Typer application is defined in `apps/cli/shield/main.py` and registers all command subgroups:

```python
app = typer.Typer(
    name="velonus",
    help="[bold green]Velonus[/bold green] — AI-native AppSec scanner for developers.",
    rich_markup_mode="rich",
    no_args_is_help=True,
    pretty_exceptions_enable=True,
    pretty_exceptions_show_locals=False,
)
```

资料来源：[apps/cli/shield/main.py:17-22]()

| Property | Value | Purpose |
|----------|-------|---------|
| `name` | `velonus` | CLI command name |
| `rich_markup_mode` | `rich` | Enable Rich markup for colored output |
| `no_args_is_help` | `True` | Show help when no arguments provided |
| `pretty_exceptions_enable` | `True` | Enhanced error tracebacks |

### Command Registration

| Command | Module | Phase | Description |
|---------|--------|-------|-------------|
| `velonus scan` | `shield.commands.scan` | Phase 0 | Run security scans on local paths |
| `velonus auth` | `shield.commands.auth` | Phase 2 | Authenticate with Velonus API |
| `velonus config` | `shield.commands.config` | Phase 2 | Manage local CLI configuration |
| `velonus pr` | `shield.commands.pr` | Phase 3 | GitHub PR integration utilities |

资料来源：[apps/cli/shield/main.py:29-32]()

## The `scan` Command

The `scan` command is the primary interface for running security analysis on local codebases.

### Command Interface

```bash
velonus scan [PATH] [OPTIONS]
```

| Argument/Option | Default | Type | Description |
|-----------------|---------|------|-------------|
| `PATH` | `.` | Path | Project or file to scan |
| `--format`, `-f` | `terminal` | Choice | Output format: `terminal`, `json`, `sarif` |
| `--severity`, `-s` | `info` | Choice | Minimum severity: `critical`, `high`, `medium`, `low`, `info` |
| `--verbose`, `-v` | `off` | Flag | Show resolved target path and extra detail |
| `--output`, `-o` | stdout | Path | Write output to file (SARIF/JSON) |

资料来源：[apps/cli/shield/commands/scan.py](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/shield/commands/scan.py)

### Severity Levels

| Badge | Level | Color | Use Case |
|-------|-------|-------|----------|
| 🔴 | `CRITICAL` | Bold red | Hardcoded secrets, RCE, auth bypass |
| 🟠 | `HIGH` | Orange | SQL injection, command injection |
| 🟡 | `MEDIUM` | Yellow | XSS, weak crypto, path traversal |
| 🔵 | `LOW` | Blue | Insecure defaults, minor misconfigs |
| ⚪ | `INFO` | Grey | Style issues, informational notes |

### Exit Codes

| Code | Meaning |
|------|---------|
| `0` | Scan completed, no HIGH/CRITICAL findings |
| `1` | Scan completed with HIGH or CRITICAL findings (blocks CI) |

## Output Formatters

The CLI supports multiple output formats through a pluggable formatter system.

```mermaid
graph LR
    A["NormalizedFinding"] --> B["Formatter Interface"]
    B --> C["TerminalFormatter"]
    B --> D["SarifFormatter"]
    B --> E["JsonFormatter"]
```

### Terminal Formatter

Renders findings as colored Rich tables with severity badges, file paths, line numbers, rule IDs, and human-readable messages. This is the default format for interactive use.

**Output Structure:**
```
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Severity     ┃ Tool     ┃ File         ┃ Line ┃ Rule          ┃ Message                ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩
```

### SARIF Formatter

Generates Static Analysis Results Interchange Format output compatible with GitHub Code Scanning, VS Code SARIF Viewer, and other SAST tooling.

**Key transformations:**
- Converts rule IDs to PascalCase display names via `_rule_id_to_name()`
- Generates `file://` URIs for directory paths with trailing slashes per SARIF spec §3.14.14

```python
def _rule_id_to_name(rule_id: str) -> str:
    base = rule_id.split("/")[-1]
    return "".join(word.capitalize() for word in base.replace("-", "_").split("_"))
```

资料来源：[apps/cli/shield/formatters/sarif.py:58-68]()

### JSON Formatter

Outputs findings as structured JSON for piping into other tools or storing results:

```bash
velonus scan ./ --format json | python -m json.tool
velonus scan ./ --format json > scan-results.json
```

## Authentication Module

The `auth` command manages authentication with the Velonus API backend.

| Command | Description | Phase |
|---------|-------------|-------|
| `velonus auth login` | Authenticate via Clerk (browser OAuth flow) | Phase 2 |
| `velonus auth logout` | Clear stored credentials | Phase 2 |
| `velonus auth status` | Show current authentication status | Phase 2 |

> These commands are stubbed in Phase 0 and become functional in Phase 2 when the API backend is live.

## Configuration Module

The `config` command manages local CLI settings stored on disk.

| Command | Description |
|---------|-------------|
| `velonus config show` | Print current configuration |
| `velonus config set <key> <value>` | Set a configuration value |

**Example configuration:**
```bash
velonus config set api_url https://api.velonus.dev
```

> Stubbed in Phase 0, fully functional in Phase 2.

## PR Integration Module

The `pr` command provides GitHub PR integration utilities for Phase 3. This module is planned but not yet implemented in Phase 0.

## Core API Client

The `api_client` module in `shield/core/` handles communication with the Velonus backend API. It is used by the auth and config modules when API connectivity is required.

资料来源：[apps/cli/shield/core/api_client.py](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/shield/core/api_client.py)

## Module Structure

```
apps/cli/shield/
├── main.py                 # Root Typer application
├── commands/
│   ├── __init__.py
│   ├── scan.py            # Security scan command
│   ├── auth.py            # Authentication commands
│   ├── config.py          # Configuration commands
│   └── pr.py              # PR integration (Phase 3)
├── formatters/
│   ├── __init__.py
│   ├── terminal.py        # Rich table output
│   ├── sarif.py           # SARIF format output
│   └── json.py            # JSON output
└── core/
    ├── __init__.py
    └── api_client.py      # Backend API communication
```

## CLI Workflow

```mermaid
sequenceDiagram
    participant User
    participant CLI as velonus scan
    participant Scanner as scanner.pipeline
    participant Formatter
    participant Output

    User->>CLI: velonus scan ./ --severity high
    CLI->>Scanner: Run detectors (secrets, bandit, semgrep, etc.)
    Scanner-->>CLI: List[NormalizedFinding]
    CLI->>Formatter: Format findings based on --format
    Formatter-->>Output: Rendered output (terminal/JSON/SARIF)
    Output-->>User: Display or write to file
    
    alt HIGH/CRITICAL findings
        CLI-->>User: Exit code 1 (CI block)
    else Clean scan
        CLI-->>User: Exit code 0
    end
```

## Usage Examples

### Basic Scan

```bash
velonus scan ./
```

### Severity Filtered Scan

```bash
velonus scan ./ --severity high
```

### Export to SARIF

```bash
velonus scan ./ -o results/velonus.sarif --format sarif
```

### Verbose Output

```bash
velonus scan ./ --verbose
```

### CI/CD Integration

```yaml
- name: Velonus security scan
  run: velonus scan . --severity high
```

## Development Guidelines

Per the project's contribution guidelines:

| Requirement | Tool | Command |
|-------------|------|---------|
| Linting | Ruff | `ruff check .` |
| Formatting | Ruff | `ruff format .` |
| Type Checking | mypy | `mypy apps/cli/shield --strict --ignore-missing-imports` |
| Testing | pytest | `pytest apps/cli/tests/` |

All new CLI components must include matching unit tests. The test suite currently has **367 tests** covering scanner wrappers and formatters.

资料来源：[CONTRIBUTING.md](https://github.com/AliAmmar15/Velonus/blob/main/CONTRIBUTING.md)

---

<a id='api-backend'></a>

## API Backend

### 相关页面

相关主题：[System Architecture](#architecture-overview), [Scanner Pipeline](#scanner-pipeline), [AI Engine](#ai-engine), [GitHub PR Reviewer](#github-pr-reviewer)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [apps/cli/shield/main.py](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/shield/main.py)
- [apps/cli/shield/core/__init__.py](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/shield/core/__init__.py)
- [apps/cli/shield/formatters/sarif.py](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/shield/formatters/sarif.py)
- [packages/scanner/scanner/detectors/secrets.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/secrets.py)
- [packages/scanner/scanner/detectors/semgrep.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/semgrep.py)
- [packages/scanner/scanner/detectors/safety.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/safety.py)
- [packages/scanner/scanner/detectors/pip_audit.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/pip_audit.py)
- [README.md](https://github.com/AliAmmar15/Velonus/blob/main/README.md)
- [CONTRIBUTING.md](https://github.com/AliAmmar15/Velonus/blob/main/CONTRIBUTING.md)
</details>

# API Backend

## Overview

The Velonus API Backend is a FastAPI-based service layer that provides remote scanning capabilities, AI-powered analysis, GitHub integration, and programmatic access to security findings. Currently planned for **Phase 2** of the Velonus roadmap, the API backend will extend the CLI's local-only scanning with cloud-native features including authentication, rate limiting, background processing, and AI-driven remediation suggestions.

**Current Status:** The API backend is in the planning/stub phase. The CLI is fully functional in local-only mode. Auth, config, and GitHub integration commands are stubbed and will be activated once the backend is live. 资料来源：[README.md](https://github.com/AliAmmar15/Velonus/blob/main/README.md)

## Architecture Overview

```mermaid
graph TD
    Client["Client<br/>(CLI / Web UI)"] --> |"HTTP/REST"| API["API Gateway<br/>(FastAPI)"]
    
    API --> |"Auth"| AuthMW["Auth Middleware<br/>(Clerk OAuth)"]
    API --> |"Rate Limit"| RateMW["Rate Limit Middleware"]
    
    AuthMW --> |"Validated"| Routers["Routers"]
    RateMW --> |"Allowed"| Routers
    
    Routers --> Scans["/scans"]
    Routers --> Findings["/findings"]
    Routers --> GitHub["/github"]
    Routers --> Remediation["/remediation"]
    
    Scans --> ScanSvc["Scan Service"]
    Findings --> ScanSvc
    GitHub --> GitHubSvc["GitHub Service"]
    Remediation --> AISvc["AI Service"]
    
    ScanSvc --> ScanWorker["Scan Worker<br/>(Background)"]
    AISvc --> AIWorker["AI Worker<br/>(Background)"]
    
    ScanWorker --> |"Results"| DB["Database"]
    AIWorker --> |"Analysis"| DB
    
    DB --> Findings
```

## Planned Components

### Directory Structure

```
apps/api/shield_api/
├── main.py                 # FastAPI application entry point
├── middleware/
│   ├── auth.py            # Clerk OAuth authentication
│   └── rate_limit.py      # Request rate limiting
├── routers/
│   ├── scans.py           # Scan management endpoints
│   ├── findings.py       # Finding retrieval endpoints
│   ├── github.py          # GitHub integration endpoints
│   └── remediation.py     # AI remediation endpoints
├── services/
│   ├── scan_service.py    # Scan orchestration logic
│   ├── ai_service.py      # AI analysis and scoring
│   └── github_service.py  # GitHub API integration
└── background/
    ├── scan_worker.py     # Background scan processor
    └── ai_worker.py       # Background AI worker
```

资料来源：[README.md](https://github.com/AliAmmar15/Velonus/blob/main/README.md)

## Middleware Layer

### Authentication Middleware

The authentication middleware integrates with **Clerk** for browser-based OAuth authentication. This enables secure API access for authenticated users while maintaining compatibility with the CLI's local-only mode.

| Feature | Description |
|---------|-------------|
| Provider | Clerk (OAuth 2.0) |
| Token Type | Bearer JWT |
| Protected Routes | All `/scans`, `/findings`, `/github`, `/remediation` endpoints |
| CLI Bypass | Local scans work without authentication |

The `velonus auth` command group will provide login/logout/status operations once the backend is live. 资料来源：[README.md](https://github.com/AliAmmar15/Velonus/blob/main/README.md)

### Rate Limiting Middleware

Rate limiting protects the API from abuse and ensures fair resource allocation across users.

| Tier | Limit | Purpose |
|------|-------|---------|
| Anonymous | TBD | Limited scans per hour |
| Authenticated | TBD | Higher quotas for logged-in users |
| Enterprise | TBD | Custom limits based on subscription |

## Router Modules

### `/scans` - Scan Management

Manages scan lifecycle including creation, status tracking, and result retrieval.

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/scans` | POST | Create a new scan job |
| `/scans/{id}` | GET | Get scan status and metadata |
| `/scans/{id}/results` | GET | Retrieve scan results |
| `/scans/{id}/cancel` | POST | Cancel a running scan |

#### Request/Response Model

```json
{
  "id": "uuid",
  "status": "pending|running|completed|failed",
  "target_path": "/path/to/project",
  "created_at": "ISO8601 timestamp",
  "completed_at": "ISO8601 timestamp|null",
  "tools_run": ["secrets", "bandit", "semgrep", "pip-audit", "safety"],
  "findings_count": {
    "critical": 0,
    "high": 0,
    "medium": 0,
    "low": 0,
    "info": 0
  }
}
```

### `/findings` - Finding Retrieval

Provides filtered access to security findings from completed scans.

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/findings` | GET | List findings with filters |
| `/findings/{id}` | GET | Get single finding details |
| `/findings/{id}/acknowledge` | POST | Mark finding as acknowledged |

#### Query Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `scan_id` | UUID | Filter by scan |
| `severity` | string | `critical`, `high`, `medium`, `low`, `info` |
| `tool` | string | `secrets`, `bandit`, `semgrep`, `pip-audit`, `safety` |
| `cwe` | string | Filter by CWE identifier |
| `owasp` | string | Filter by OWASP category |
| `page` | int | Pagination page number |
| `limit` | int | Results per page (max 100) |

### `/github` - GitHub Integration

Enables GitHub Actions integration and PR inline comments for Phase 3.

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/github/webhook` | POST | Receive GitHub webhook events |
| `/github/install` | POST | Install GitHub App |
| `/github/scan` | POST | Trigger scan from PR |
| `/github/comments` | POST | Post inline review comments |

This router integrates with the GitHub Service for repository access and comment posting. 资料来源：[README.md](https://github.com/AliAmmar15/Velonus/blob/main/README.md)

### `/remediation` - AI-Powered Fixes

Provides AI-generated fix suggestions for security findings.

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/remediation/{finding_id}` | GET | Get AI fix suggestion |
| `/remediation/{finding_id}/apply` | POST | Apply fix to codebase |
| `/remediation/pr` | POST | Create PR with fixes |

Phase 2 features include AI prioritization and exploitability scoring. Phase 3 adds one-click fix suggestions. 资料来源：[README.md](https://github.com/AliAmmar15/Velonus/blob/main/README.md)

## Service Layer

### Scan Service

Orchestrates the security scanning pipeline across multiple tools:

| Tool | Purpose | Finding Type |
|------|---------|--------------|
| **Secrets** | TruffleHog-based secret detection | Hardcoded credentials, API keys |
| **Bandit** | Python static analysis | Security bugs, common vulnerabilities |
| **Semgrep** | Rule-based pattern matching | Custom security rules, CWE coverage |
| **pip-audit** | Dependency vulnerability scanning | Known CVEs in Python packages |
| **Safety** | Python dependency security | Vulnerable package versions |

The service normalizes findings into the `NormalizedFinding` model before storage. 资料来源：[packages/scanner/scanner/detectors/secrets.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/secrets.py)

### AI Service

Provides AI-powered analysis capabilities for Phase 2:

| Feature | Description |
|---------|-------------|
| Exploitability Scoring | Assess if a vulnerability is actually exploitable in context |
| Fix Generation | Generate code patches for identified issues |
| False Positive Detection | Reduce noise by validating findings |
| Prioritization | Rank findings by real-world impact |

### GitHub Service

Handles GitHub API interactions:

| Feature | Description |
|---------|-------------|
| Repository Access | Read code from GitHub repos |
| Comment Posting | Post inline PR comments with findings |
| Status Checks | Update commit status for CI/CD gates |
| Webhook Handling | Process GitHub webhook events |

## Background Workers

### Scan Worker

Processes scan jobs asynchronously to avoid blocking HTTP requests.

```mermaid
graph LR
    A[Scan Request] --> B[Queue]
    B --> C[Worker Pool]
    C --> D[Tool 1: Secrets]
    C --> E[Tool 2: Bandit]
    C --> F[Tool 3: Semgrep]
    C --> G[Tool 4: pip-audit]
    C --> H[Tool 5: Safety]
    D & E & F & G & H --> I[Normalize]
    I --> J[Store Results]
```

### AI Worker

Processes AI analysis requests in the background:

| Task | Description |
|------|-------------|
| Batch Analysis | Analyze multiple findings together |
| Fix Generation | Generate remediation code |
| Score Updates | Recalculate exploitability scores |

## Data Models

### NormalizedFinding

All scanners output findings in a standardized format:

| Field | Type | Description |
|-------|------|-------------|
| `tool` | string | Source scanner name |
| `rule_id` | string | Rule identifier (e.g., CWE-78) |
| `file` | string | File path with finding |
| `line` | int | Line number (0 for dependencies) |
| `severity` | enum | `CRITICAL`, `HIGH`, `MEDIUM`, `LOW`, `INFO` |
| `message` | string | Human-readable description |
| `code_snippet` | string | Relevant source code (if applicable) |
| `metadata` | dict | Scanner-specific data (CVE, CVSS, fix versions) |

#### Metadata Schema

| Key | Tool | Description |
|-----|------|-------------|
| `cwe` | semgrep, bandit | CWE identifier(s) |
| `owasp` | semgrep | OWASP category code |
| `cvss_score` | pip-audit, safety | CVSS v3 base score |
| `package_name` | pip-audit, safety | Vulnerable package name |
| `fix_versions` | pip-audit, safety | Safe package versions |
| `detector` | secrets | Secret detector name |
| `verified` | secrets | Whether secret was verified |
| `confidence` | semgrep | Rule confidence level |

资料来源：[packages/scanner/scanner/detectors/semgrep.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/semgrep.py)

## Output Formats

The API supports multiple output formats for findings:

| Format | Use Case | Phase |
|--------|----------|-------|
| `terminal` | Interactive CLI output | Phase 0 |
| `json` | Piping, tooling integration | Phase 0 |
| `sarif` | GitHub Code Scanning, VS Code | Phase 1 |

### SARIF Output

SARIF (Static Analysis Results Interchange Format) provides standardized output for integration with security tooling:

```json
{
  "version": "2.1.0",
  "$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
  "runs": [{
    "tool": {
      "driver": {
        "name": "Velonus",
        "version": "0.1.0"
      }
    },
    "results": [...]
  }]
}
```

资料来源：[apps/cli/shield/formatters/sarif.py](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/shield/formatters/sarif.py)

## Severity Levels

| Badge | Level | Color | Use Case |
|-------|-------|-------|----------|
| 🔴 | `CRITICAL` | Bold red | Hardcoded secrets, RCE, auth bypass |
| 🟠 | `HIGH` | Orange | SQL injection, command injection, insecure deserialization |
| 🟡 | `MEDIUM` | Yellow | XSS, weak crypto, path traversal |
| 🔵 | `LOW` | Blue | Insecure defaults, minor misconfigurations |
| ⚪ | `INFO` | Grey | Style issues, informational notes |

## Roadmap Integration

| Phase | Status | API Features |
|-------|--------|--------------|
| **Phase 0** | ✅ Complete | CLI skeleton, local scanning |
| **Phase 1** | ✅ Complete | Scanner pipeline, SARIF output |
| **Phase 2** | 🔨 Building | AI layer, API backend, authentication |
| **Phase 3** | 🔜 Planned | GitHub PR integration, inline fixes |
| **Phase 4** | 🔜 Planned | Web dashboard, scan history |

资料来源：[README.md](https://github.com/AliAmmar15/Velonus/blob/main/README.md)

## CLI vs API Mode

| Aspect | Local CLI | API Backend |
|--------|-----------|-------------|
| Authentication | Not required | Clerk OAuth required |
| Rate Limiting | None | Enforced per user |
| Scan Execution | Synchronous | Background workers |
| Results Storage | stdout only | Database |
| GitHub Integration | None | PR comments, status checks |
| AI Features | None | Fix generation, scoring |

When the API backend is live, users can choose between:

```bash
# Local mode (always works)
velonus scan ./my-project

# Cloud mode (requires auth)
velonus auth login
velonus scan ./my-project --remote
```

## Configuration

### CLI Configuration

The `velonus config` command manages local CLI settings:

| Command | Description |
|---------|-------------|
| `velonus config show` | Print current configuration |
| `velonus config set api_url <url>` | Set API endpoint |
| `velonus config set api_key <key>` | Set API authentication key |

### Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `VELONUS_API_URL` | API backend URL | `https://api.velonus.dev` |
| `VELONUS_API_KEY` | Authentication key | None |
| `VELONUS_TIMEOUT` | Scan timeout in seconds | 300 |

## Error Handling

| HTTP Code | Meaning |
|-----------|---------|
| `200` | Success |
| `400` | Invalid request parameters |
| `401` | Not authenticated |
| `403` | Forbidden (insufficient permissions) |
| `404` | Resource not found |
| `429` | Rate limit exceeded |
| `500` | Internal server error |
| `503` | Service unavailable (maintenance) |

CLI exit codes:

| Code | Meaning |
|------|---------|
| `0` | Scan completed, no HIGH/CRITICAL findings |
| `1` | Scan completed, HIGH/CRITICAL findings found |

## Development Guidelines

The API backend follows the project's contribution standards:

- All code must pass `ruff check` and `ruff format --check`
- Type checking with `mypy --strict --ignore-missing-imports`
- Unit tests required for all new endpoints and services
- PRs should be under 400 lines of diff
- No AI-generated placeholder code

资料来源：[CONTRIBUTING.md](https://github.com/AliAmmar15/Velonus/blob/main/CONTRIBUTING.md)

---

<a id='scanner-pipeline'></a>

## Scanner Pipeline

### 相关页面

相关主题：[Security Detectors](#security-detectors), [Output Formats](#output-formats), [API Backend](#api-backend)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [packages/scanner/scanner/pipeline.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/pipeline.py)
- [packages/scanner/scanner/detectors/bandit.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/bandit.py)
- [packages/scanner/scanner/detectors/semgrep.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/semgrep.py)
- [packages/scanner/scanner/detectors/pip_audit.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/pip_audit.py)
- [packages/scanner/scanner/detectors/safety.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/safety.py)
- [packages/scanner/scanner/detectors/secrets.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/secrets.py)
- [packages/normalizer/models.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/normalizer/models.py)
</details>

# Scanner Pipeline

## Overview

The Scanner Pipeline is the core orchestration layer in Velonus that coordinates multiple security scanning tools into a unified, concurrent execution framework. It provides a single entry point for running comprehensive security analysis across Python codebases, combining static analysis, secret detection, and dependency vulnerability scanning.

**资料来源：** [packages/scanner/scanner/pipeline.py:1-15]()

## Architecture

The pipeline follows a parallel execution model using Python's `asyncio` framework. Each security tool runs concurrently, with results collected, normalized, and deduplicated before presentation.

```mermaid
graph TD
    A[User Input: Target Path] --> B[ScanPipeline.run]
    B --> C{Is Async Context?}
    C -->|Yes| D[Direct await]
    C -->|No| E[asyncio.run wrapper]
    D --> F[Parallel Scanner Execution]
    E --> F
    F --> G[Secrets Detector]
    F --> H[Bandit Detector]
    F --> I[Semgrep Detector]
    F --> J[pip-audit Detector]
    F --> K[Safety Detector]
    G --> L[RawFinding List]
    H --> L
    I --> L
    J --> L
    K --> L
    L --> M[FindingNormalizer]
    M --> N[DeduplicationFilter]
    N --> O[NormalizedFinding List]
    O --> P[Output Formatters]
    P --> Q[Terminal / JSON / SARIF]
```

## Pipeline Execution Model

### Concurrent Execution

The pipeline executes all five detectors in parallel using `asyncio.gather()`. This approach minimizes total scan time by running independent security checks simultaneously rather than sequentially.

```python
results: list[list[RawFinding]] = await asyncio.gather(
    self._run_detector(secrets_runner, target, verbose),
    self._run_detector(bandit_runner, target, verbose),
    self._run_detector(semgrep_runner, target, verbose),
    self._run_detector(pip_audit_runner, target, verbose),
    self._run_detector(safety_runner, target, verbose),
)
```

**资料来源：** [packages/scanner/scanner/pipeline.py:80-90]()

### Execution Order

Detectors are executed in a fixed order for consistent logging output:

| Order | Tool | Purpose |
|-------|------|---------|
| 0 | secrets | Hardcoded secrets and high-entropy strings |
| 1 | bandit | Python security best practices |
| 2 | semgrep | Custom ruleset scanning |
| 3 | pip-audit | Python dependency vulnerabilities |
| 4 | safety | Additional dependency checking |

**资料来源：** [packages/scanner/scanner/pipeline.py:69-70]()

### Dual Context Support

The pipeline supports both async and sync usage patterns:

```python
# Async context (e.g., API background worker)
pipeline = ScanPipeline()
findings = await pipeline.run(Path("./my-project"), verbose=True)

# Sync context (CLI)
import asyncio
findings = asyncio.run(ScanPipeline().run(Path("./my-project")))
```

**资料来源：** [packages/scanner/scanner/pipeline.py:14-25]()

## Supported Detectors

### Secrets Detector

The secrets detector provides dual-layer secret detection:

#### TruffleHog Integration

TruffleHog is the primary secret scanner. It uses commit history analysis and regex-based detector rules to identify verified secrets with high confidence.

**资料来源：** [packages/scanner/scanner/detectors/secrets.py:40-60]()

#### Entropy-Based Fallback

When TruffleHog detects a high-entropy string, it flags potential hardcoded credentials:

```python
message=(
    f"High-entropy string in secret assignment "
    f"(Shannon entropy={entropy:.2f}) — likely a hardcoded credential"
)
```

**资料来源：** [packages/scanner/scanner/detectors/secrets.py:180-185]()

The entropy scanner walks the target path recursively, skipping non-code directories and binary files:

**Skipped directories:** `.git`, `node_modules`, `__pycache__`, `.venv`, `venv`, `.env`, `dist`, `build`

**资料来源：** [packages/scanner/scanner/detectors/secrets.py:95-100]()

### Bandit Detector

Bandit analyzes Python code for common security issues using configurable test sets. It produces findings with severity ratings based on the potential impact of identified issues.

### Semgrep Detector

Semgrep runs rule-based analysis using the `p/python` ruleset:

```python
def scan(self, target: Path) -> list[RawFinding]:
    return self._run_semgrep(target)
```

**Invocation command:** `semgrep scan --config p/python --json --quiet --metrics=off <target>`

**资料来源：** [packages/scanner/scanner/detectors/semgrep.py:22-30]()

#### Availability Check

Semgrep availability is verified via a lightweight version check:

```python
def _semgrep_available(self) -> bool:
    try:
        subprocess.run(
            ["semgrep", "--version"],
            capture_output=True,
            check=False,
            timeout=10,
        )
        return True
    except FileNotFoundError:
        return False
```

**资料来源：** [packages/scanner/scanner/detectors/semgrep.py:55-65]()

If semgrep is not installed, the detector logs a warning and skips analysis rather than failing the entire scan.

### pip-audit Detector

pip-audit scans Python dependencies against the Python Packaging Advisory Database. It extracts CVSS v3 scores to determine severity ratings and identifies available fix versions.

**资料来源：** [packages/scanner/scanner/detectors/pip_audit.py:60-85]()

### Safety Detector

Safety provides an additional layer of dependency vulnerability checking, extracting CVE identifiers and advisory information for prioritization.

**资料来源：** [packages/scanner/scanner/detectors/safety.py:1-30]()

## Data Flow

```mermaid
graph LR
    A[RawFinding] --> B[FindingNormalizer]
    B --> C[NormalizedFinding]
    C --> D[DeduplicationFilter]
    D --> E[Final Findings]
    
    F[tool+file+line+rule_id] --> G[SHA-256 Hash]
    G --> H[16-char ID]
```

### Raw Finding Structure

Detectors produce `RawFinding` objects containing raw output from security tools:

```python
@dataclass
class RawFinding:
    tool: str
    rule_id: str
    file: str
    line: int
    severity: str
    message: str
    code_snippet: str
    metadata: dict[str, Any]
```

### Normalized Finding Structure

`FindingNormalizer` converts raw findings into a canonical format:

```python
@dataclass
class NormalizedFinding:
    id: str  # SHA-256[:16] of tool+file+line+rule_id
    tool: str  # "bandit"|"semgrep"|"secrets"|"pip-audit"|"safety"
    rule_id: str
    cwe: list[str]  # e.g., ["CWE-89"]
    owasp: list[str]  # e.g., ["A03:2021"]
    severity: Severity  # CRITICAL | HIGH | MEDIUM | LOW | INFO
    confidence: Confidence  # HIGH | MEDIUM | LOW
    file: str
    line_start: int
    line_end: int
    code_snippet: str
    message: str
    fix_available: bool = False
    suppressed: bool = False
    first_seen: datetime
```

**资料来源：** [packages/normalizer/models.py:1-50]()

### Deduplication

The `DeduplicationFilter` removes duplicate findings across scans using deterministic SHA-256 identifiers derived from `tool + file + line + rule_id`. This ensures consistent identification of the same vulnerability across multiple scans.

**资料来源：** [packages/scanner/scanner/pipeline.py:77-78]()

## Severity Classification

| Level | Badge | Use Case |
|-------|-------|----------|
| CRITICAL | 🔴 | Hardcoded secrets, RCE, auth bypass |
| HIGH | 🟠 | SQL injection, command injection, insecure deserialization |
| MEDIUM | 🟡 | XSS, weak crypto, path traversal |
| LOW | 🔵 | Insecure defaults, minor misconfigurations |
| INFO | ⚪ | Style issues, informational notes |

**资料来源：** [apps/cli/README.md:95-105]()

## Usage Examples

### Async Usage

```python
from scanner.pipeline import ScanPipeline
from pathlib import Path

async def scan_project():
    pipeline = ScanPipeline()
    findings = await pipeline.run(Path("./my-project"), verbose=True)
    return findings
```

### Sync Usage

```python
import asyncio
from scanner.pipeline import ScanPipeline
from pathlib import Path

findings = asyncio.run(
    ScanPipeline().run(Path("./my-project"))
)
```

### CLI Usage

```bash
# Full scan with all detectors
velonus scan ./

# High severity only (CI gate)
velonus scan ./ --severity high

# JSON output for tooling
velonus scan ./ --format json

# SARIF output for GitHub Security tab
velonus scan ./ --format sarif -o results.sarif

# Verbose timing output
velonus scan ./ --verbose
```

**资料来源：** [README.md:20-40]()

## Exit Codes

| Code | Meaning |
|------|---------|
| 0 | Scan completed, no HIGH or CRITICAL findings |
| 1 | Scan completed, one or more HIGH or CRITICAL findings found |

Exit code 1 on HIGH/CRITICAL is intentional for CI/CD integration, enabling automated blocking of merges.

**资料来源：** [apps/cli/README.md:70-80]()

## Error Handling

### Semgrep Exit Code Handling

Semgrep exits with code 1 when findings are present. This is not treated as an error:

> Semgrep exits with code 1 when findings are present. This is NOT treated as an error — the JSON output is still fully valid and parsed normally. Exit code 2+ indicates a real error (bad arguments, semgrep crash).

**资料来源：** [packages/scanner/scanner/detectors/semgrep.py:22-30]()

### Missing Tool Handling

If a security tool is not installed, the detector logs a warning and returns an empty list rather than failing the scan:

```python
if not self._semgrep_available():
    logger.warning(
        "semgrep not found on PATH — skipping Semgrep analysis. "
        "Install with: pip install semgrep"
    )
    return []
```

**资料来源：** [packages/scanner/scanner/detectors/semgrep.py:32-37]()

## Performance Characteristics

- **Parallel execution** of all detectors minimizes total scan time
- **Timing logging** occurs at INFO level when `verbose=True`, DEBUG otherwise
- **Skip patterns** exclude common non-code directories to reduce scan scope
- **Empty results** are handled gracefully, allowing partial scans when tools are unavailable

---

<a id='security-detectors'></a>

## Security Detectors

### 相关页面

相关主题：[Scanner Pipeline](#scanner-pipeline), [Output Formats](#output-formats)

<details>
<summary>Relevant Source Files</summary>

以下源码文件用于生成本页说明：

- [packages/scanner/detectors/secrets.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/detectors/secrets.py)
- [packages/scanner/scanner/detectors/secrets.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/secrets.py)
- [packages/scanner/scanner/detectors/pip_audit.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/pip_audit.py)
- [packages/scanner/scanner/detectors/safety.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/safety.py)
- [packages/normalizer/models.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/normalizer/models.py)
</details>

# Security Detectors

## Overview

Security Detectors are the core scanning engines within Velonus that identify vulnerabilities, secrets, and insecure code patterns. Each detector wraps an external security tool (TruffleHog, Bandit, Semgrep, pip-audit, Safety) and normalizes its output into a unified `RawFinding` format. This abstraction layer enables Velonus to run multiple security scanners through a single interface while presenting results in a consistent structure.

The detector system is located in `packages/scanner/detectors/` and `packages/scanner/scanner/detectors/`, with each module handling a specific security tool. Detectors serve as the first stage in Velonus's scanning pipeline, producing raw findings that are later normalized by the `FindingNormalizer` into the canonical `NormalizedFinding` model. [资料来源：packages/scanner/detectors/__init__.py:1]()

## Architecture

```mermaid
graph TD
    A[Scan Request] --> B[Scanner Pipeline]
    B --> C[Secrets Detector]
    B --> D[Bandit Detector]
    B --> E[Semgrep Detector]
    B --> F[pip-audit Detector]
    B --> G[Safety Detector]
    
    C --> H[RawFinding Objects]
    D --> H
    E --> H
    F --> H
    G --> H
    
    H --> I[FindingNormalizer]
    I --> J[NormalizedFinding Objects]
    J --> K[Output Formatters]
    K --> L[Terminal / JSON / SARIF]
```

### Component Overview

| Detector | Tool Wrapped | Purpose |
|----------|--------------|---------|
| `secrets` | TruffleHog + Entropy | Hardcoded secrets and credentials |
| `bandit` | Bandit | Python security issues |
| `semgrep` | Semgrep | Custom security rules |
| `pip-audit` | pip-audit | Python dependency vulnerabilities |
| `safety` | Safety | Legacy Python dependency vulnerabilities |

## Secrets Detector

The Secrets Detector identifies hardcoded secrets, API keys, passwords, and other sensitive credentials in source code. It employs a two-tier detection strategy: primary TruffleHog scanning followed by an entropy-based fallback for generic high-entropy strings.

### Detection Pipeline

```mermaid
graph LR
    A[File Input] --> B{TruffleHog Scan}
    B -->|Verified Secrets| C[RawFinding]
    B -->|No Secrets Found| D[Entropy Fallback]
    D -->|High Entropy| E[RawFinding]
    D -->|Low Entropy| F[Skip]
```

### TruffleHog Integration

The detector invokes TruffleHog as the primary secret scanner. When TruffleHog returns results, each finding is parsed and converted to a `RawFinding` object with the following characteristics:

- **Tool**: `secrets`
- **Rule ID**: `trufflehog-{detector-name}` (lowercase, spaces replaced with hyphens)
- **Severity**: `CRITICAL` (all TruffleHog findings)
- **Verification Status**: Captured in metadata as `verified: bool`
- **Decoder Information**: Extracted from `DecoderName` in TruffleHog output

[资料来源：packages/scanner/detectors/secrets.py:1-30]()

### Entropy-Based Fallback

When TruffleHog is unavailable or returns no findings, the detector falls back to an entropy-based scanner (`_entropy_scan`). This method:

1. Walks the target directory recursively
2. Skips non-code directories (`.git`, `node_modules`, `__pycache__`, `.venv`, `venv`, `.env`, `dist`, `build`)
3. Applies regex patterns for known secret types
4. Calculates Shannon entropy for candidate strings
5. Flags strings exceeding the `_ENTROPY_THRESHOLD`

[资料来源：packages/scanner/detectors/secrets.py:50-80]()

### File Iteration Logic

The `_iter_files` method yields scannable source files while excluding:

```python
# Skipped directories
.git, node_modules, __pycache__, .venv, venv, 
.env, dist, build
```

### Finding Structure

Each secrets finding includes:

| Field | Value | Description |
|-------|-------|-------------|
| `tool` | `secrets` | Scanner identifier |
| `rule_id` | `high-entropy-secret` or `trufflehog-{name}` | Rule identifier |
| `severity` | `CRITICAL` | Always critical for secrets |
| `message` | Dynamic | Includes entropy score or verification status |
| `metadata` | `dict` | Contains entropy value, detector name, decoder |

[资料来源：packages/scanner/detectors/secrets.py:100-130]()

## Dependency Vulnerability Detectors

Velonus includes two detectors for Python dependency vulnerabilities: **pip-audit** and **Safety**. Both analyze dependency manifests (e.g., `requirements.txt`) and report known CVEs.

### pip-audit Detector

The pip-audit detector parses JSON output from the `pip-audit` tool and converts vulnerabilities into `RawFinding` objects.

#### Data Extraction

The detector extracts the following fields from pip-audit JSON:

| Field | Source | Purpose |
|-------|--------|---------|
| `package_name` | `entry` | Vulnerable package name |
| `package_version` | `entry` | Installed version |
| `vuln_id` | `entry` | Vulnerability identifier |
| `aliases` | `entry` | Alternative IDs (CVEs, GHSA) |
| `fix_versions` | `entry` | Safe versions to upgrade to |
| `cvss_score` | Nested `severity` dict | CVSS v3 base score |
| `fix_available` | `bool(fix_versions)` | Whether a fix exists |

[资料来源：packages/scanner/scanner/detectors/pip_audit.py:1-50]()

#### CVSS Score Extraction

The `_extract_cvss_score` helper parses pip-audit's nested CVSS format:

```python
[{"type": "CVSS_V3", "score": "CVSS:3.1/...", "base_score": 7.5}]
```

The detector prefers CVSS v3 scores and selects the highest when multiple are present. [资料来源：packages/scanner/scanner/detectors/pip_audit.py:80-100]()

#### Message Construction

Finding messages include:
- Package name and version
- Vulnerability ID (preferred CVE alias)
- Fix hint (if available)
- Truncated description (max 200 characters)

### Safety Detector

The Safety detector handles both Safety v1 and v2 JSON output formats, which differ significantly in structure.

#### Supported Output Formats

| Format | Version | Structure | Finding Count |
|--------|---------|-----------|----------------|
| Format A | Safety ≥2.0 | `{"vulnerabilities": [...]}` | One per entry |
| Format B | Safety <2.0 | `[...]` (list of lists) | One per 5-element list |

[资料来源：packages/scanner/scanner/detectors/safety.py:1-50]()

#### Entry Parsing (v2 Format)

For Safety v2, each vulnerability entry must contain:

```python
required_fields = ["vulnerability_id", "package_name", "analyzed_version"]
```

Entries missing required fields are skipped with a warning. [资料来源：packages/scanner/scanner/detectors/safety.py:50-80]()

#### Parsed Fields

| Field | Extraction | Notes |
|-------|------------|-------|
| `vuln_id` | `entry["vulnerability_id"]` | str conversion |
| `package_name` | `entry["package_name"]` | str conversion |
| `installed_version` | `entry["analyzed_version"]` | str conversion |
| `advisory` | `entry.get("advisory")` | Human-readable text |
| `cve` | `entry.get("CVE")` | CVE identifier (preferred display) |
| `fix_versions` | `entry.get("fixed_versions", [])` | List of safe versions |

[资料来源：packages/scanner/scanner/detectors/safety.py:80-120]()

#### Severity Mapping

CVSS scores are converted to severity levels:

```python
def _cvss_to_severity(cvss_score: float | None) -> str:
    if cvss_score is None:
        return "MEDIUM"  # Default
    elif cvss_score >= 9.0:
        return "CRITICAL"
    elif cvss_score >= 7.0:
        return "HIGH"
    elif cvss_score >= 4.0:
        return "MEDIUM"
    else:
        return "LOW"
```

## Finding Data Models

### RawFinding

All detectors produce `RawFinding` objects with this structure:

```python
@dataclass
class RawFinding:
    tool: str           # "bandit"|"semgrep"|"secrets"|"pip-audit"|"safety"
    rule_id: str        # Scanner-specific rule identifier
    file: str           # Path to file containing the finding
    line: int           # Line number (0 for dependency findings)
    severity: str       # CRITICAL|HIGH|MEDIUM|LOW|INFO
    message: str        # Human-readable description
    code_snippet: str   # Relevant code (redacted for secrets)
    metadata: dict      # Tool-specific additional data
```

### NormalizedFinding

After normalization, findings conform to the canonical `NormalizedFinding` model:

```python
@dataclass
class NormalizedFinding:
    id: str                          # SHA-256: sha256(tool+file+line+rule_id)[:16]
    tool: str                        # Scanner identifier
    rule_id: str                     # Normalized rule identifier
    cwe: list[str]                   # CWE identifiers (e.g., ["CWE-89"])
    owasp: list[str]                 # OWASP categories (e.g., ["A03:2021"])
    severity: Severity               # Enum: CRITICAL|HIGH|MEDIUM|LOW|INFO
    confidence: Confidence           # Enum: HIGH|MEDIUM|LOW
    file: str                        # File path
    line_start: int                  # Start line
    line_end: int                    # End line
    code_snippet: str                # Redacted code snippet
    message: str                     # Finding message
    fix_available: bool = False      # Whether a fix exists
    suppressed: bool = False         # Whether suppressed
    first_seen: datetime             # Timestamp
```

[资料来源：packages/normalizer/models.py:1-50]()

## Extensibility

Adding a new detector follows a consistent pattern:

1. **Create a new module** in `packages/scanner/detectors/`
2. **Implement the detector class** with `_scan()` method
3. **Return `RawFinding` objects** for each finding
4. **Include metadata** for downstream normalization

### Detector Interface Pattern

```python
class BaseDetector:
    def _scan(self, target: Path) -> list[RawFinding]:
        """Main scanning entry point - override in subclasses."""
        raise NotImplementedError
    
    def _iter_files(self, root: Path) -> Iterator[Path]:
        """File iteration with exclusions - reusable utility."""
        ...
```

## Severity Classification

| Badge | Level | Color | Use Case |
|-------|-------|-------|----------|
| 🔴 | `CRITICAL` | Bold red | Hardcoded secrets, RCE, auth bypass |
| 🟠 | `HIGH` | Orange | SQL injection, command injection, insecure deserialization |
| 🟡 | `MEDIUM` | Yellow | XSS, weak crypto, path traversal |
| 🔵 | `LOW` | Blue | Insecure defaults, minor misconfigurations |
| ⚪ | `INFO` | Grey | Style issues, informational notes |

## Output Integration

Detectors feed into Velonus's output formatters:

| Format | Use Case | Consumer |
|--------|----------|----------|
| `terminal` | Interactive scanning | CLI users |
| `json` | Piping to tools | CI/CD pipelines, scripting |
| `sarif` | GitHub Security tab | GitHub integration |

The scanner exits with code `1` when CRITICAL or HIGH findings are detected, enabling use as a CI gate.

---

<a id='output-formats'></a>

## Output Formats

### 相关页面

相关主题：[CLI Components](#cli-components), [Scanner Pipeline](#scanner-pipeline)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [apps/cli/shield/formatters/__init__.py](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/shield/formatters/__init__.py)
- [apps/cli/shield/formatters/sarif.py](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/shield/formatters/sarif.py)
- [apps/cli/shield/core/output.py](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/shield/core/output.py)
- [packages/scanner/scanner/detectors/pip_audit.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/pip_audit.py)
- [packages/scanner/scanner/detectors/semgrep.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/semgrep.py)
- [packages/scanner/scanner/detectors/safety.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/safety.py)
- [packages/scanner/detectors/secrets.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/detectors/secrets.py)
</details>

# Output Formats

Velonus supports multiple output formats for scan results, enabling integration with various security tooling, CI/CD pipelines, and developer workflows. Each format serves a specific use case and targets a different audience.

## Overview

The output format system transforms normalized `RawFinding` objects into format-specific representations. This abstraction allows Velonus to aggregate results from multiple security scanners (Bandit, Semgrep, pip-audit, Safety, TruffleHog) while presenting them consistently regardless of the underlying tool.

资料来源：[apps/cli/shield/formatters/__init__.py:1]()

## Architecture

```mermaid
graph TD
    A[Scan Command] --> B[Scanner Pipeline]
    B --> C[RawFinding Objects]
    C --> D[Normalizer]
    D --> E[NormalizedFinding]
    E --> F[Output Formatters]
    
    F --> G[Terminal Formatter]
    F --> H[JSON Formatter]
    F --> I[SARIF Formatter]
    
    G --> J[Rich Console Output]
    H --> K[JSON File/Stream]
    I --> L[SARIF 2.1.0 File]
```

## Supported Formats

| Format | Command Flag | Use Case | Phase |
|--------|-------------|----------|-------|
| `terminal` | `--format terminal` | Interactive scanning | Phase 0 |
| `json` | `--format json` | Piping to tools, storage | Phase 0 |
| `sarif` | `--format sarif` | GitHub Security tab, VS Code | Phase 1 |

资料来源：[apps/cli/README.md:60-65]()

## Terminal Format

The default output format using Rich library for colored, formatted tables with severity badges.

### Severity Color Scheme

The terminal formatter uses a consistent color system defined in `output.py`:

| Severity | Color | Badge | Description |
|----------|-------|-------|-------------|
| `CRITICAL` | Bold red | 🔴 | Hardcoded secrets, RCE, auth bypass |
| `HIGH` | Dark orange | 🟠 | SQL injection, command injection |
| `MEDIUM` | Yellow | 🟡 | XSS, weak crypto, path traversal |
| `LOW` | Steel blue | 🔵 | Insecure defaults, minor issues |
| `INFO` | Grey | ⚪ | Style issues, informational notes |

资料来源：[apps/cli/shield/core/output.py:18-29]()

### Output Table Columns

| Column | Description |
|--------|-------------|
| Severity | Emoji badge and severity level |
| Tool | Source scanner (bandit, semgrep, pip-audit, etc.) |
| File | Path to affected file |
| Line | Line number of finding |
| Rule | Rule ID or check name |
| Message | Human-readable finding description |

### Terminal Usage

```bash
# Default terminal output
velonus scan ./

# Explicit terminal format
velonus scan ./ --format terminal

# Filter by severity
velonus scan ./ --severity high
```

## JSON Format

Machine-readable JSON output suitable for piping into other tools or storing results.

### JSON Output Structure

```bash
velonus scan ./ --format json
```

Produces a JSON array of findings with the following schema:

| Field | Type | Description |
|-------|------|-------------|
| `tool` | string | Source scanner name |
| `rule_id` | string | Unique identifier for the rule |
| `file` | string | Path to affected file |
| `line` | integer | Line number (0 for dependency issues) |
| `severity` | string | CRITICAL, HIGH, MEDIUM, LOW, INFO |
| `message` | string | Human-readable description |
| `code_snippet` | string | Relevant source code (may be redacted) |
| `metadata` | object | Additional scanner-specific data |

资料来源：[packages/scanner/scanner/detectors/pip_audit.py:40-52]()

### JSON Usage Examples

```bash
# Pretty-print JSON
velonus scan ./ --format json | python -m json.tool

# Save to file
velonus scan ./ --format json > scan-results.json

# Filter by severity
velonus scan ./ --format json --severity high > findings.json
```

## SARIF Format

Static Analysis Results Interchange Format (SARIF 2.1.0) for compatibility with security tooling.

### SARIF Specification

| Property | Value |
|----------|-------|
| Schema Version | 2.1.0 |
| Specification URL | https://docs.oasis-open.org/sarif/sarif/v2.1.0/sarif-v2.1.0.html |
| Version Constant | `0.1.0` |

资料来源：[apps/cli/shield/formatters/sarif.py:38-41]()

### Severity Mapping

| Velonus Severity | SARIF Level |
|-----------------|-------------|
| CRITICAL | `error` |
| HIGH | `error` |
| MEDIUM | `warning` |
| LOW | `note` |
| INFO | `note` |

资料来源：[apps/cli/shield/formatters/sarif.py:48-54]()

### Rule ID Transformation

SARIF rule IDs are converted to PascalCase display names:

```python
"generic-api-key" → "GenericApiKey"
"secrets/aws-access-key-id" → "AwsAccessKeyId"
```

资料来源：[apps/cli/shield/formatters/sarif.py:59-60]()

### Directory URI Handling

Per SARIF spec §3.14.14, directory URIs must end with a trailing slash:

```python
uri = path.as_uri()
return uri if uri.endswith("/") else uri + "/"
```

资料来源：[apps/cli/shield/formatters/sarif.py:23-24]()

### SARIF Integration Points

```mermaid
graph LR
    A[Velonus Scan] --> B[SARIF File]
    B --> C[GitHub Security Tab]
    B --> D[VS Code SARIF Viewer]
    B --> E[Other SARIF Tools]
```

### SARIF Usage

```bash
# Default SARIF output
velonus scan ./ --format sarif

# Custom output path
velonus scan ./ -o results/velonus.sarif

# Severity filtering (recommended for SARIF)
velonus scan ./ --format sarif --severity high -o findings.sarif
```

### GitHub Actions Integration

```yaml
- name: Velonus security scan
  run: velonus scan . --sarif -o velonus-results.sarif

- name: Upload to GitHub Security tab
  uses: github/codeql-action/upload-sarif@v4
  with:
    sarif_file: velonus-results.sarif
```

资料来源：[README.md:45-50]()

## Finding Data Model

### RawFinding Structure

Each scanner produces `RawFinding` objects that are normalized before formatting:

```python
RawFinding(
    tool="pip-audit",
    rule_id=vuln_id,
    file=attribution_path,
    line=0,  # Dependency findings are file-level
    severity=severity,
    message=message,
    code_snippet="",
    metadata={
        "package_name": package_name,
        "package_version": package_version,
        "cvss_score": cvss_score,
        "fix_available": bool(fix_versions),
    },
)
```

资料来源：[packages/scanner/scanner/detectors/pip_audit.py:40-52]()

### Metadata Fields by Scanner

| Scanner | Unique Metadata Fields |
|---------|----------------------|
| pip-audit | `package_name`, `package_version`, `aliases`, `fix_versions`, `cvss_score`, `cwe`, `owasp`, `fix_available` |
| safety | `vulnerable_spec`, `analysis`, `published_date`, `fixed_versions`, `advisory` |
| semgrep | `cwe`, `owasp`, `confidence`, `rule_short` |
| secrets | `detector`, `verified`, `decoder`, `entropy` |

资料来源：[packages/scanner/scanner/detectors/semgrep.py:25-35]()

## CLI Options

### Global Format Options

| Option | Short | Default | Description |
|--------|-------|---------|-------------|
| `--format` | `-f` | `terminal` | Output format: `terminal`, `json`, `sarif` |
| `--severity` | `-s` | `info` | Minimum severity: `critical`, `high`, `medium`, `low`, `info` |
| `--output` | `-o` | (stdout) | Output file path (for SARIF/JSON) |
| `--verbose` | `-v` | off | Show resolved target path and extra detail |

资料来源：[apps/cli/README.md:61-66]()

### Exit Codes

| Code | Meaning |
|------|---------|
| `0` | Scan completed, no HIGH or CRITICAL findings |
| `1` | Scan completed, one or more HIGH or CRITICAL findings found |

资料来源：[apps/cli/README.md:97-98]()

## Format Selection Guide

```mermaid
graph TD
    A[Choose Output Format] --> B{Use Case}
    
    B -->|Interactive scanning| C[terminal]
    B -->|CI/CD pipeline| D{Hosting Platform}
    B -->|Tooling integration| E[json]
    
    D -->|GitHub| F[sarif]
    D -->|GitLab| G[json]
    D -->|Azure DevOps| H[sarif]
    
    C --> I[Rich colored tables]
    F --> J[GitHub Security tab]
    E --> K[JSON API processing]
```

### Quick Reference

| Scenario | Recommended Format |
|----------|-------------------|
| Local development | `terminal` (default) |
| Pre-commit hooks | `terminal --severity high` |
| CI gate (GitHub) | `sarif` |
| Log aggregation | `json` |
| Custom tooling | `json` |
| Security dashboards | `sarif` |

---

<a id='ai-engine'></a>

## AI Engine

### 相关页面

相关主题：[API Backend](#api-backend), [Scanner Pipeline](#scanner-pipeline)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [packages/ai-engine/ai_engine/__init__.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/ai-engine/ai_engine/__init__.py)
- [packages/ai-engine/context_engine.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/ai-engine/context_engine.py)
- [packages/ai-engine/remediation_engine.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/ai-engine/remediation_engine.py)
- [packages/ai-engine/prompts.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/ai-engine/prompts.py)
- [packages/ai-engine/cache.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/ai-engine/cache.py)
- [apps/api/shield_api/services/ai_service.py](https://github.com/AliAmmar15/Velonus/blob/main/apps/api/shield_api/services/ai_service.py)
</details>

# AI Engine

The **AI Engine** is a core component of Velonus (Phase 2) that provides intelligent prioritization, exploitability scoring, and automated fix generation for security findings. It processes normalized findings from the scanner pipeline and applies AI-driven analysis to help developers focus on the most critical issues first.

## Architecture Overview

The AI Engine is structured as a modular system with three primary sub-engines:

| Component | Purpose |
|-----------|---------|
| **Context Engine** | Enriches findings with contextual information about the codebase |
| **Remediation Engine** | Generates actionable fix suggestions for identified vulnerabilities |
| **AI Service** | Handles API communication and LLM integration |

```mermaid
graph TD
    A[NormalizedFindings] --> B[Context Engine]
    B --> C[Enriched Findings]
    C --> D[Remediation Engine]
    D --> E[Fix Suggestions]
    C --> F[Exploitability Scoring]
    F --> G[Priority Ranking]
    G --> H[Dashboard / CLI Output]
```

资料来源：[packages/ai-engine/context_engine.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/ai-engine/context_engine.py)

## Core Sub-Engines

### Context Engine

The Context Engine analyzes security findings in their broader codebase context to determine:

- **Reachability analysis** — Is the vulnerable code actually executed?
- **Data flow analysis** — Does user input reach the vulnerable code path?
- **Dependency context** — Are there mitigating factors in dependencies?

资料来源：[packages/ai-engine/context_engine.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/ai-engine/context_engine.py)

### Remediation Engine

The Remediation Engine generates specific, actionable fix recommendations based on:

- The vulnerability type and severity
- The affected code location
- Project-specific patterns and conventions

资料来源：[packages/ai-engine/remediation_engine.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/ai-engine/remediation_engine.py)

### AI Service

Located in the API layer, the AI Service handles:

- LLM provider integration
- Request batching and rate limiting
- Response parsing and validation

资料来源：[apps/api/shield_api/services/ai_service.py](https://github.com/AliAmmar15/Velonus/blob/main/apps/api/shield_api/services/ai_service.py)

## Prompt Engineering

The AI Engine uses carefully crafted prompts stored in `prompts.py` to guide the LLM in:

| Task | Prompt Purpose |
|------|----------------|
| Exploitability Assessment | Determine if a vulnerability can be exploited in the given context |
| Fix Generation | Generate code patches that resolve the vulnerability |
| Impact Analysis | Evaluate the potential blast radius of a vulnerability |

资料来源：[packages/ai-engine/prompts.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/ai-engine/prompts.py)

## Caching Strategy

To optimize performance and reduce API costs, the AI Engine implements a caching layer:

```mermaid
graph LR
    A[Finding Hash] --> B{Cache Lookup}
    B -->|Hit| C[Return Cached Result]
    B -->|Miss| D[Call LLM API]
    D --> E[Store in Cache]
    E --> C
```

Cache entries are keyed by:
- Finding hash (vulnerability type + file + line)
- Code context snippet
- Project fingerprint

资料来源：[packages/ai-engine/cache.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/ai-engine/cache.py)

## Exploitability Scoring

The AI Engine assigns exploitability scores (1-10) based on:

| Factor | Weight | Description |
|--------|--------|-------------|
| Reachability | 30% | Is the code path executable? |
| Attack Surface | 25% | Is user-controlled input present? |
| Preconditions | 20% | What conditions must be met? |
| Impact | 25% | What is the potential damage? |

资料来源：[packages/ai-engine/context_engine.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/ai-engine/context_engine.py)

## Integration with Scanner Pipeline

The AI Engine receives input from the [ScanPipeline](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/pipeline.py) after findings have been normalized:

```python
# Pipeline execution order (from packages/scanner/scanner/pipeline.py)
# 0 = secrets, 1 = bandit, 2 = semgrep, 3 = pip-audit, 4 = safety
```

Normalized findings flow into the AI Engine which:
1. Deduplicates findings using the [DeduplicationFilter](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/pipeline.py)
2. Enriches each finding with AI-generated context
3. Ranks findings by priority
4. Generates remediation suggestions

## Configuration

The AI Engine is configured via the CLI configuration system (`velonus config set`):

| Config Key | Default | Description |
|------------|---------|-------------|
| `ai_provider` | `openai` | LLM provider (openai, anthropic) |
| `ai_model` | `gpt-4` | Model to use for analysis |
| `ai_temperature` | `0.3` | Creativity level for fix generation |
| `cache_enabled` | `true` | Enable/disable result caching |
| `max_fixes_per_finding` | `3` | Maximum fix suggestions per vulnerability |

资料来源：[packages/ai-engine/ai_engine/__init__.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/ai-engine/ai_engine/__init__.py)

## CLI Integration

The AI Engine features are accessible through Phase 2 CLI commands:

```bash
# View AI-generated context for findings
velonus scan ./ --ai-context

# Generate fix suggestions
velonus fix <finding-id>

# Run with AI prioritization
velonus scan ./ --ai-prioritize
```

资料来源：[apps/cli/README.md](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/README.md)

## Development Status

| Phase | Status | Components |
|-------|--------|------------|
| Phase 0 | ✅ Done | CLI skeleton, Rich output, NormalizedFinding model |
| Phase 1 | ✅ Done | Real secret detection, Bandit, Semgrep, pip-audit, SARIF |
| **Phase 2** | 🔨 **Building** | AI context engine, exploitability scoring, fix generation |
| Phase 3 | 🔜 Planned | GitHub PR inline review comments, one-click fix suggestions |
| Phase 4 | 🔜 Planned | Web UI, scan history, finding trends |

资料来源：[README.md](https://github.com/AliAmmar15/Velonus/blob/main/README.md)

## API Endpoints

The AI Engine exposes REST endpoints via the API service:

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/v1/ai/enrich` | POST | Enrich findings with AI context |
| `/api/v1/ai/score` | POST | Calculate exploitability scores |
| `/api/v1/ai/remediate` | POST | Generate fix suggestions |

资料来源：[apps/api/shield_api/services/ai_service.py](https://github.com/AliAmmar15/Velonus/blob/main/apps/api/shield_api/services/ai_service.py)

## Security Considerations

The AI Engine handles sensitive data and implements the following safeguards:

- **No code exfiltration** — Source code is only processed locally or sent to configured LLM providers
- **Input sanitization** — All prompts are sanitized before sending to LLM
- **Audit logging** — All AI requests are logged for compliance
- **Cache encryption** — Cached results are encrypted at rest

资料来源：[packages/ai-engine/cache.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/ai-engine/cache.py)

---

<a id='github-pr-reviewer'></a>

## GitHub PR Reviewer

### 相关页面

相关主题：[API Backend](#api-backend), [Scanner Pipeline](#scanner-pipeline)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [CONTRIBUTING.md](https://github.com/AliAmmar15/Velonus/blob/main/CONTRIBUTING.md)
- [README.md](https://github.com/AliAmmar15/Velonus/blob/main/README.md)
- [apps/cli/README.md](https://github.com/AliAmmar15/Velonus/blob/main/apps/cli/README.md)
- [packages/scanner/scanner/detectors/secrets.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/secrets.py)
- [packages/scanner/scanner/detectors/pip_audit.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/pip_audit.py)
- [packages/scanner/scanner/detectors/semgrep.py](https://github.com/AliAmmar15/Velonus/blob/main/packages/scanner/scanner/detectors/semgrep.py)
</details>

# GitHub PR Reviewer

## Overview

The GitHub PR Reviewer is a planned feature in the Velonus security scanning platform designed to provide automated code review capabilities directly within GitHub Pull Requests. Based on the project roadmap, this feature is designated as **Phase 3 — GitHub Integration** and is currently marked as **Not Started**.

资料来源：[README.md:Phase 3]()

The Velonus project is a security scanner designed to detect vulnerabilities, secrets, and security issues in Python codebases. The platform currently supports multiple security scanning tools including Bandit, Semgrep, pip-audit, Safety, and TruffleHog for secret detection.

资料来源：[README.md:Roadmap]()

## Project Architecture

Velonus follows a modular architecture with clear separation between the CLI layer, scanner package, and planned API layer.

```mermaid
graph TD
    A[velonus scan CLI] --> B[packages/scanner]
    B --> C[Detectors]
    C --> D[Secrets Scanner]
    C --> E[Bandit]
    C --> F[Semgrep]
    C --> G[pip-audit]
    C --> H[Safety]
    B --> I[Normalizer]
    I --> J[Formatted Output]
    J --> K[Terminal]
    J --> L[JSON]
    J --> M[SARIF]
    N[Planned: GitHub API] -.-> O[PR Reviewer]
```

资料来源：[README.md:Architecture Overview]()
资料来源：[apps/cli/README.md:Commands]()

## Current Scanner Pipeline

The existing scanner pipeline forms the foundation upon which GitHub PR Reviewer will build. The scanner orchestrates multiple security tools and normalizes their findings into a unified format.

### Supported Detectors

| Detector | Purpose | Severity Levels |
|----------|---------|-----------------|
| **Secrets Scanner** | Detects hardcoded credentials, API keys, and high-entropy strings | CRITICAL |
| **Bandit** | Static security analysis for Python | CRITICAL, HIGH, MEDIUM, LOW |
| **Semgrep** | Rule-based pattern matching (config: `p/python`) | Multiple |
| **pip-audit** | Python dependency vulnerability scanning | Based on CVSS score |
| **Safety** | Additional dependency checking | Based on CVSS score |

资料来源：[packages/scanner/scanner/detectors/secrets.py:Entropy Scanner]()
资料来源：[packages/scanner/scanner/detectors/semgrep.py:RULESET Definition]()

### Output Formats

The current implementation supports three output formats that GitHub PR Reviewer can leverage:

| Format | Use Case | Status |
|--------|----------|--------|
| `terminal` | Interactive display with Rich tables | Default |
| `json` | Programmatic consumption, piping | Available |
| `sarif` | GitHub Code Scanning, VS Code SARIF Viewer | Phase 1 |

资料来源：[apps/cli/README.md:Output Formats]()

The SARIF format is particularly relevant for future GitHub integration, as it provides compatibility with GitHub's security tab through the `github/codeql-action/upload-sarif` action.

```yaml
- name: Velonus security scan
  run: velonus scan . --sarif -o velonus-results.sarif

- name: Upload to GitHub Security tab
  uses: github/codeql-action/upload-sarif@v4
  with:
    sarif_file: velonus-results.sarif
```

资料来源：[README.md:CI Integration]()

## Planned GitHub PR Reviewer Features

Based on the project roadmap and existing architecture patterns, the GitHub PR Reviewer is expected to provide the following capabilities:

### Inline Review Comments

The primary feature will be posting inline comments on pull requests directly where security issues are detected. This follows the pattern of existing GitHub code scanning integrations.

| Feature | Description |
|---------|-------------|
| **Inline Comments** | Post findings as PR review comments with file paths and line numbers |
| **One-Click Fixes** | Suggest code changes to remediate vulnerabilities |
| **Status Checks** | Integration with GitHub's required status check system |

资料来源：[README.md:Phase 3 — GitHub Integration]()

### Severity-Based Filtering

The existing `--severity` filtering mechanism will be leveraged to determine which findings trigger PR blocking:

| Severity | Color | CI Behavior |
|----------|-------|-------------|
| 🔴 CRITICAL | Bold red | Blocks merge |
| 🟠 HIGH | Orange | Blocks merge |
| 🟡 MEDIUM | Yellow | Warning only |
| 🔵 LOW | Blue | Informational |
| ⚪ INFO | Grey | Informational |

资料来源：[apps/cli/README.md:Severity Levels]()

The CLI already exits with code `1` when CRITICAL or HIGH findings are detected, providing a natural CI gate mechanism.

### Finding Metadata Structure

The normalized finding format provides the data structure that GitHub PR Reviewer will consume:

```python
RawFinding(
    tool="secrets",           # Source tool
    rule_id="high-entropy-secret",  # Finding type
    file=str(path),           # File path
    line=line_num,            # Line number
    severity="CRITICAL",      # Severity level
    message=str,              # Human-readable message
    code_snippet=str,         # Code context
    metadata={}               # Additional data
)
```

资料来源：[packages/scanner/scanner/detectors/secrets.py:RawFinding Creation]()

## Current GitHub Integration Path

While the full GitHub PR Reviewer is not yet implemented, the project provides alternative GitHub integration methods:

### GitHub Actions Workflow

For CI/CD integration before Phase 3, users can leverage the existing Velonus CLI in GitHub Actions:

```yaml
name: Velonus Security Scan

on: [push, pull_request]

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - name: Install velonus-cli
        run: pip install -e apps/cli
      - name: Run security scan
        run: velonus scan ./ --severity high
```

资料来源：[apps/cli/README.md:GitHub Actions]()

### Pre-commit Hook

Security scanning can be integrated into the development workflow using pre-commit hooks:

```yaml
repos:
  - repo: local
    hooks:
      - id: velonus-scan
        name: Velonus Security Scan
        entry: velonus scan
        args: ["./", "--severity", "high"]
        language: system
        pass_filenames: false
```

资料来源：[apps/cli/README.md:Pre-commit hook]()

## Development Guidelines

The project follows strict development standards that will apply to the GitHub PR Reviewer implementation:

### Code Quality Requirements

| Requirement | Tool | Command |
|-------------|------|---------|
| Linting | ruff | `ruff check .` |
| Formatting | ruff | `ruff format .` |
| Type Checking | mypy | `mypy --strict` |

资料来源：[CONTRIBUTING.md:Code Quality]()

### PR Guidelines

The contributing guidelines establish patterns that GitHub PR Reviewer development must follow:

1. **One feature per PR** — No bundling unrelated changes
2. **Tests required** — Every new component needs unit tests
3. **Small PRs preferred** — Under 400 lines of diff for faster review
4. **Target main** — All changes merge into main branch
5. **No AI-generated placeholder code** — Every function must be functional

资料来源：[CONTRIBUTING.md:PR Guidelines]()

### Commit Message Format

```
<type>: <short imperative summary>

Types: feat | fix | refactor | test | docs | infra | chore
```

资料来源：[CONTRIBUTING.md:Commit Message Format]()

## Project Roadmap

| Phase | Status | Components |
|-------|--------|------------|
| Phase 0 — Foundation | ✅ Done | CLI skeleton, Rich output, NormalizedFinding model |
| Phase 1 — Scanner Pipeline | ✅ Done | Bandit, Semgrep, pip-audit, Safety, SARIF |
| Phase 2 — AI Layer | 🔨 Building | AI prioritization, exploitability scoring, fix generation |
| **Phase 3 — GitHub Integration** | 🔴 Not Started | PR inline review comments, one-click fix suggestions |
| Phase 4 — Dashboard | 🔴 Not Started | Web UI, scan history, finding trends |

资料来源：[README.md:Roadmap]()

## Dependencies and Requirements

The scanner already includes several dependencies that would support GitHub PR Reviewer functionality:

### Security Scanning Dependencies

- **TruffleHog** — For secrets detection with decoder support
- **Bandit** — Python-specific security issues
- **Semgrep** — Pattern-based security rules
- **pip-audit** — Python dependency vulnerability scanning

### CVSS Integration

The pip-audit and Safety detectors already extract and normalize CVSS scores:

```python
def _extract_cvss_score(cvss_list: Any) -> float | None:
    """Extract the highest CVSS v3 base score from pip-audit's cvss array."""
    # Returns CVSS:3.1/... formatted scores
```

资料来源：[packages/scanner/scanner/detectors/pip_audit.py:CVSS Helpers]()

## Summary

The GitHub PR Reviewer is an anticipated Phase 3 feature of the Velonus security scanning platform. While not yet implemented, the foundation is being built through:

1. A robust scanner pipeline with multiple security tool integrations
2. Normalized finding formats suitable for PR comment generation
3. SARIF output for GitHub Security tab compatibility
4. CI/CD integration patterns via GitHub Actions
5. Strict code quality standards for future development

The existing architecture and roadmap indicate that when Phase 3 development begins, the PR Reviewer will leverage the normalized `RawFinding` data structures to generate inline comments on pull requests, provide one-click fix suggestions, and integrate with GitHub's status check system to block merges when critical vulnerabilities are detected.

---

---

## Doramagic 踩坑日志

项目：aliammar15/velonus

摘要：发现 8 个潜在踩坑项，其中 0 个为 high/blocking；最高优先级：安装坑 - 来源证据：[bug] Scanner reports test-file findings as production vulnerabilities (no path exclusion)。

## 1. 安装坑 · 来源证据：[bug] Scanner reports test-file findings as production vulnerabilities (no path exclusion)

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：[bug] Scanner reports test-file findings as production vulnerabilities (no path exclusion)
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_62cdd9b76ac447b8b11064afda81fb9a | https://github.com/AliAmmar15/Velonus/issues/1 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 2. 能力坑 · 能力判断依赖假设

- 严重度：medium
- 证据强度：source_linked
- 发现：README/documentation is current enough for a first validation pass.
- 对用户的影响：假设不成立时，用户拿不到承诺的能力。
- 建议检查：将假设转成下游验证清单。
- 防护动作：假设必须转成验证项；没有验证结果前不能写成事实。
- 证据：capability.assumptions | hn_item:48143235 | https://news.ycombinator.com/item?id=48143235 | README/documentation is current enough for a first validation pass.

## 3. 维护坑 · 维护活跃度未知

- 严重度：medium
- 证据强度：source_linked
- 发现：未记录 last_activity_observed。
- 对用户的影响：新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- 建议检查：补 GitHub 最近 commit、release、issue/PR 响应信号。
- 防护动作：维护活跃度未知时，推荐强度不能标为高信任。
- 证据：evidence.maintainer_signals | hn_item:48143235 | https://news.ycombinator.com/item?id=48143235 | last_activity_observed missing

## 4. 安全/权限坑 · 下游验证发现风险项

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：下游已经要求复核，不能在页面中弱化。
- 建议检查：进入安全/权限治理复核队列。
- 防护动作：下游风险存在时必须保持 review/recommendation 降级。
- 证据：downstream_validation.risk_items | hn_item:48143235 | https://news.ycombinator.com/item?id=48143235 | no_demo; severity=medium

## 5. 安全/权限坑 · 存在评分风险

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：风险会影响是否适合普通用户安装。
- 建议检查：把风险写入边界卡，并确认是否需要人工复核。
- 防护动作：评分风险必须进入边界卡，不能只作为内部分数。
- 证据：risks.scoring_risks | hn_item:48143235 | https://news.ycombinator.com/item?id=48143235 | no_demo; severity=medium

## 6. 安全/权限坑 · 来源证据：[bug] Same vulnerability reported twice when detected by both Bandit and Semgrep

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：[bug] Same vulnerability reported twice when detected by both Bandit and Semgrep
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_3a9b7a6193ee4ea289f4dc4a12f6a0c1 | https://github.com/AliAmmar15/Velonus/issues/2 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 7. 维护坑 · issue/PR 响应质量未知

- 严重度：low
- 证据强度：source_linked
- 发现：issue_or_pr_quality=unknown。
- 对用户的影响：用户无法判断遇到问题后是否有人维护。
- 建议检查：抽样最近 issue/PR，判断是否长期无人处理。
- 防护动作：issue/PR 响应未知时，必须提示维护风险。
- 证据：evidence.maintainer_signals | hn_item:48143235 | https://news.ycombinator.com/item?id=48143235 | issue_or_pr_quality=unknown

## 8. 维护坑 · 发布节奏不明确

- 严重度：low
- 证据强度：source_linked
- 发现：release_recency=unknown。
- 对用户的影响：安装命令和文档可能落后于代码，用户踩坑概率升高。
- 建议检查：确认最近 release/tag 和 README 安装命令是否一致。
- 防护动作：发布节奏未知或过期时，安装说明必须标注可能漂移。
- 证据：evidence.maintainer_signals | hn_item:48143235 | https://news.ycombinator.com/item?id=48143235 | release_recency=unknown

<!-- canonical_name: aliammar15/velonus; human_manual_source: deepwiki_human_wiki -->
