Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

CSM - Continuous Security Monitor

Security monitoring and response for Linux web servers. Single Go binary that detects compromise, phishing, mail abuse, and suspicious activity - then auto-responds and alerts within seconds.

Originally designed as a full Imunify360 replacement for cPanel/WHM on CloudLinux/AlmaLinux. Also runs on plain Ubuntu/Debian + Nginx/Apache and on plain AlmaLinux/Rocky/RHEL + Apache/Nginx: the daemon auto-detects the OS, control panel, and web server at startup and picks the correct log paths, config candidates, and check set.

Includes nftables firewall (replaces LFD/fail2ban), ModSecurity management, email security, threat intelligence, hardening audit, performance monitoring, and a web dashboard.

See installation.md for supported platforms and how the check set differs between cPanel and non-cPanel hosts.

What CSM Does

csm daemon
 +-- fanotify file monitor         < 1s detection on /home, /tmp, /dev/shm
 +-- inotify log watchers          ~2s detection on auth, access, exim, FTP logs
 +-- PAM brute-force listener      Real-time login failure tracking
 +-- PHP runtime shield            auto_prepend_file protection
 +-- critical scanner (10 min)     Processes, network, tokens, logins, firewall
 +-- deep scanner (60 min)         WP/CMS integrity, package integrity, DB injection, phishing
 +-- nftables firewall engine      Kernel netlink API, IP sets, rate limiting
 +-- threat intelligence           IP reputation, attack scoring, GeoIP
 +-- ModSecurity manager           Rule deployment, overrides, escalation
 +-- email security                AV scanning, quarantine, password/forwarder audit
 +-- challenge server              Proof-of-work pages for suspicious IPs
 +-- alert dispatcher              Email, Slack, Discord, webhooks
 +-- web UI                        HTTPS dashboard with authenticated operator pages
 +-- hardening audit               On-demand server hardening checks + scoring
 +-- performance monitor           PHP, MySQL, Redis, WordPress metrics

Built From Real Incidents

CSM was built after real attacks where GSocket reverse shells, LEVIATHAN webshell toolkits, credential-stuffed cPanel accounts, and phishing kits were found across production servers.

Installation

Supported Platforms

PlatformWeb serverPackageNotes
cPanel/WHM on CloudLinux / AlmaLinux / RockyApache (EA4) or LiteSpeed.rpmPrimary target. Full cPanel account, WordPress, Exim, and WHM plugin coverage.
Plain AlmaLinux / Rocky / RHEL 8+ / CentOS Stream 8+Apache (httpd) or Nginx.rpmGeneric Linux + web server checks. cPanel-specific checks are skipped cleanly.
Plain Ubuntu 20.04+ / Debian 11+Apache (apache2) or Nginx.debSame as above, with debsums/dpkg --verify in place of rpm -V.

The daemon auto-detects the OS, control panel (cPanel/Plesk/DirectAdmin/none), and web server (Apache/Nginx/LiteSpeed) at startup. The detected platform is logged at startup as:

[2026-04-10 08:13:37] platform: os=ubuntu/24.04 panel=none webserver=nginx

Check it with journalctl -u csm.service | grep platform: after starting the daemon.

The package repository at mirrors.pidginhost.com/csm/ is the preferred install method for Debian and Ubuntu. Future updates are picked up automatically via apt upgrade, and package metadata is GPG-signed so the trust chain is enforced by dpkg.

# 1. Install the signing key
curl -fsSL https://mirrors.pidginhost.com/csm/csm-signing.gpg | \
  sudo gpg --dearmor -o /etc/apt/keyrings/csm.gpg

# 2. Add the repository
echo "deb [signed-by=/etc/apt/keyrings/csm.gpg] https://mirrors.pidginhost.com/csm/deb stable main" | \
  sudo tee /etc/apt/sources.list.d/csm.list

# 3. Install
sudo apt update
sudo apt install csm

Works on Ubuntu 20.04+, Debian 11+, and any derivative. The single stable suite serves all Debian/Ubuntu releases – the Go binary is statically linked and has no per-release glibc dependency.

To upgrade later: sudo apt update && sudo apt upgrade csm.

# 1. Import the signing key into the RPM keyring
sudo rpm --import https://mirrors.pidginhost.com/csm/csm-signing.gpg

# 2. Add the repository
sudo tee /etc/yum.repos.d/csm.repo >/dev/null <<'EOF'
[csm]
name=CSM - Continuous Security Monitor
baseurl=https://mirrors.pidginhost.com/csm/rpm/el$releasever/$basearch
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.pidginhost.com/csm/csm-signing.gpg
EOF

# 3. Install
sudo dnf install csm

The explicit rpm --import is important: without it, the first dnf install csm prompts “Is this ok [y/N]:” to trust the repo key, and dnf install -y answers package install prompts but not the key-trust prompt. If the prompt goes unanswered on a non-interactive install, dnf fails with repomd.xml GPG signature verification error: Signing key not found.

The $releasever variable auto-selects the matching EL major (8, 9, or 10). Both x86_64 and aarch64 are published. Works on AlmaLinux 8+, Rocky 8+, RHEL 8+, CloudLinux 8+, and cPanel-managed hosts.

To upgrade later: sudo dnf upgrade csm.

Quick Install (all platforms, one-shot)

For situations where you can’t add a package repository (disconnected hosts, air-gapped mirrors, Docker base images):

curl -sSL https://raw.githubusercontent.com/pidginhost/csm/main/scripts/install.sh | bash

Auto-detects hostname, email, and generates a WebUI auth token. Prompts for confirmation before applying. Works on Debian/Ubuntu and RHEL-family distros. Non-interactive mode:

curl -sSL https://raw.githubusercontent.com/pidginhost/csm/main/scripts/install.sh | bash -s -- --email admin@example.com --non-interactive

Manual .rpm / .deb download

If you need a specific version or want to install without adding the repository:

# RHEL family
curl -LO https://github.com/pidginhost/csm/releases/latest/download/csm-VERSION-1.x86_64.rpm
sudo dnf install -y ./csm-VERSION-1.x86_64.rpm

# Debian/Ubuntu
curl -LO https://github.com/pidginhost/csm/releases/latest/download/csm_VERSION_amd64.deb
sudo apt install -y ./csm_VERSION_amd64.deb

Replace VERSION with a real version (e.g. 2.2.2). Both files are also available at https://mirrors.pidginhost.com/csm/deb/pool/main/c/csm/ and https://mirrors.pidginhost.com/csm/rpm/elN/ARCH/ if you prefer to pin versions from the mirror.

Filesystem layout

The package uses FHS paths for config, state, drop-ins, and shipped profiles. Upgrades keep /opt/csm/csm.yaml as a compatibility link for older scripts:

ConcernCurrent path
Main config/etc/csm/csm.yaml
Legacy config link/opt/csm/csm.yaml
Drop-in fragments/etc/csm/conf.d/*.yaml
State directory/var/lib/csm/state/
Shipped profiles/usr/lib/csm/profiles/
Audit log/var/log/csm/audit.jsonl
Binary/opt/csm/csm
Quarantine/opt/csm/quarantine/
YARA / signature rules/opt/csm/rules/

The systemd unit declares StateDirectory=csm and ConfigurationDirectory=csm so systemd manages permissions for the FHS directories. On upgrade, the package copies a real legacy main config into /etc/csm/csm.yaml when needed and points /opt/csm/csm.yaml at it. On first start the daemon copies a non-empty legacy /opt/csm/state/ into /var/lib/csm/state/ (only when the new directory is empty), then continues using the FHS state path. See Upgrading - FHS migration for the manual-binary-swap case.

Post-install (all methods)

sudo vi /etc/csm/csm.yaml              # Set hostname, alert email, infra IPs
sudo csm validate                      # Check config syntax (validates merged conf.d too)
sudo systemctl enable --now csm.service
sudo csm baseline                      # Record current state as known-good via the daemon

Rollback to an older version

Both the APT and DNF repositories retain the last 5 tagged releases at any time. To downgrade:

# Debian/Ubuntu
sudo apt-cache policy csm              # Show available versions
sudo apt install csm=2.2.0-1

# RHEL family
sudo dnf --showduplicates list csm     # Show available versions
sudo dnf downgrade csm

Verifying platform auto-detection

After systemctl start csm.service, the first line after “CSM daemon starting” reports what CSM detected:

[2026-04-10 08:13:37] CSM daemon starting
[2026-04-10 08:13:37] platform: os=almalinux/10.0 panel=none webserver=apache
[2026-04-10 08:13:37] Watching: /var/log/secure
[2026-04-10 08:13:37] Watching: /var/log/httpd/error_log
[2026-04-10 08:13:37] Watching: /var/log/httpd/access_log

If any field shows none or unknown when you expect something, the auto-detect missed it. File a bug with the output of cat /etc/os-release, systemctl is-active nginx apache2 httpd, and which nginx apache2 httpd.

Optional system dependencies

CSM runs as a single static Go binary and has no hard dependencies beyond systemd, but a few host packages enable additional checks:

PackagePlatformsEnables
auditdAllShadow file / SSH key tamper detection via auditd
debsumsDebian/UbuntuCleaner system binary integrity output vs. dpkg --verify fallback
logrotateAllRotation of /var/log/csm/monitor.log
wp-cliOptionalWordPress core integrity check
ModSecurityAllWAF enforcement checks (see platform-specific install below)

Installing ModSecurity

CSM detects ModSecurity but doesn’t install it for you. Platform-specific commands:

# Ubuntu/Debian + Nginx
sudo apt install libnginx-mod-http-modsecurity modsecurity-crs

# Ubuntu/Debian + Apache
sudo apt install libapache2-mod-security2 modsecurity-crs && sudo a2enmod security2

# AlmaLinux/Rocky/RHEL + Apache (requires EPEL)
sudo dnf install -y epel-release
sudo dnf install -y mod_security
sudo systemctl restart httpd

# AlmaLinux/Rocky/RHEL + Nginx (requires EPEL)
sudo dnf install -y epel-release
sudo dnf install -y nginx-mod-http-modsecurity
sudo systemctl restart nginx

After installing ModSecurity, run csm check and the waf_status finding should disappear.

Manual (deploy.sh)

/opt/csm/deploy.sh install
vi /etc/csm/csm.yaml   # set hostname, alert email, infra IPs
csm validate
systemctl enable --now csm.service
csm baseline

Post-Install

  1. Edit /etc/csm/csm.yaml – set hostname, alert email, infrastructure IPs
  2. Run csm validate to check config syntax (add --deep for connectivity probes)
  3. Start the daemon: systemctl enable --now csm.service
  4. Run csm baseline to record current state for change tracking (see below)
  5. Open the Web UI: https://<server>:9443/login

All installation methods produce the same installed state. RPM/DEB packages auto-detect hostname and email, and generate the auth token.

Baseline Scan

The csm baseline command scans the entire server and records the current state for change tracking. This is required on first install so CSM knows what’s “normal” for your server. Findings that should never be silently trusted, such as non-standard MySQL superusers or WHM root API tokens, can still be reported on this first scan.

What it does:

  • Scans all cPanel accounts for malware, permissions, and configuration issues
  • Records file hashes, email forwarder hashes, and plugin versions
  • Stores everything in the bbolt database (/var/lib/csm/state/csm.db)

How long it takes: Depends on server size. A server with 100+ cPanel accounts and thousands of WordPress sites can take 5-10 minutes. The daemon must be running because the baseline is coordinated through the control socket.

When to re-run:

  • After a fresh install
  • After restoring from backup
  • After an intentional state reset approved by the operator
  • You do NOT need to re-run for normal deploys/upgrades – the daemon handles incremental state

Important: Start csm.service before running csm baseline. If existing history would be cleared, rerun with csm baseline --confirm only after verifying that reset is intended.

Configuration

CSM is configured via /etc/csm/csm.yaml, with --config <path> to override. Legacy installs that only have /opt/csm/csm.yaml keep working; packaged upgrades migrate that file into /etc/csm/csm.yaml and leave the old path as a compatibility link. Optional drop-in fragments under /etc/csm/conf.d/*.yaml are merged on top of the main file at startup; see conf.d drop-ins below.

Platform & Web Server

CSM auto-detects the host OS (Ubuntu, Debian, AlmaLinux, Rocky, RHEL, CloudLinux), control panel (cPanel, Plesk, DirectAdmin, or none), and web server (Apache, Nginx, LiteSpeed, or none) at daemon startup. The detected platform is logged as:

[2026-04-10 08:13:37] platform: os=ubuntu/24.04 panel=none webserver=nginx

The daemon then chooses the correct log paths, config candidates, and check set without any configuration from you. Verify with:

journalctl -u csm.service | grep platform:

Web server overrides

For hosts with a custom layout (reverse proxy, non-standard package locations, chroot), add a web_server: section to csm.yaml. Every field is optional – anything left blank falls back to auto-detection.

web_server:
  type: "nginx"                          # apache | nginx | litespeed -- overrides auto-detect
  config_dir: "/etc/nginx"               # for info/diagnostics only
  access_logs:                           # tried in order until one exists
    - "/var/log/nginx/access.log"
    - "/srv/logs/nginx/access.log"
  error_logs:                            # used by ModSecurity deny watcher
    - "/var/log/nginx/error.log"
  modsec_audit_logs:
    - "/var/log/nginx/modsec_audit.log"

modsec_error_log (legacy single-path override) is still honored and takes precedence over web_server.error_logs for the ModSecurity watcher only:

modsec_error_log: "/opt/myapp/logs/modsec_audit.log"

Account roots (plain Linux web-scan coverage)

By default, the account-scan based checks (perf_error_logs, perf_wp_config, perf_wp_transients, and related) iterate /home/*/public_html which is the cPanel layout. On plain Ubuntu / AlmaLinux with Nginx or Apache, point CSM at your actual web roots:

account_roots:
  - "/var/www/*/public"            # e.g. Laravel/Symfony sites
  - "/srv/http/*"                  # Arch / generic layouts
  - "/home/*/public_html"          # add if you also have cPanel-style accounts

Each entry is a glob pattern expanded at scan time. Non-existent matches are silently dropped. If account_roots is empty and CSM is not on a cPanel host, the account-scan checks return no findings (they run but find nothing, which is the correct behavior for a plain-Linux host with no configured web roots).

Today, three checks consume this: perf_error_logs, perf_wp_config, perf_wp_transients. The remaining account-scan checks (WordPress core integrity, phishing kit detection, htaccess tampering, fileindex, etc.) still assume the cPanel /home/*/public_html layout and will be migrated in a follow-up release.

Minimal Config

hostname: "csm.example.com"

alerts:
  email:
    enabled: true
    to: ["admin@example.com"]
    disabled_checks: []                   # optional: suppress these checks from email only
    smtp: "localhost:25"

webui:
  enabled: true
  listen: "0.0.0.0:9443"
  auth_token: "your-secret-token"

infra_ips: ["10.0.0.0/8"]

Full Reference

hostname: "csm.example.com"

# --- Alerts ---
alerts:
  email:
    enabled: true
    to: ["admin@example.com"]
    from: "csm@csm.example.com"
    smtp: "localhost:25"
    disabled_checks: []                 # check names to keep in web/history but exclude from email
  webhook:
    enabled: false
    url: ""
    type: "slack"                       # slack, discord, generic, phpanel
    hmac_secret: ""                     # phpanel webhook signing secret
    hmac_secret_env: ""                 # env var containing phpanel signing secret
    per_finding: false                  # phpanel sends one signed POST per finding
  heartbeat:
    enabled: false
    url: ""                             # healthchecks.io, cronitor, dead man's switch
  max_per_hour: 10                      # default: 10
  audit_log:                            # SIEM-friendly per-finding stream
    file:
      enabled: false
      path: /var/log/csm/audit.jsonl    # default; logrotate fragment ships with the package
    syslog:
      enabled: false
      network: udp                      # udp | tcp | unix | unixgram | tls
      address: 127.0.0.1:514            # host:port, or filesystem path for unix variants
      facility: local0                  # default: local0
      tls_ca: ""                        # optional CA cert for tls transport

# --- Integrity ---
integrity:
  binary_hash: ""                       # auto-populated by install/rehash
  config_hash: ""                       # auto-populated by install/rehash
  confd_hash: ""                        # auto-populated by install/rehash
  immutable: false                      # prevent config changes at runtime

# --- Thresholds ---
thresholds:
  mail_queue_warn: 500                  # default: 500
  mail_queue_crit: 2000                 # default: 2000
  state_expiry_hours: 24                # default: 24
  deep_scan_interval_min: 60            # minutes between deep scans (default: 60)
  wp_core_check_interval_min: 60        # WordPress core checksum interval (default: 60)
  webshell_scan_interval_min: 30        # webshell scan interval (default: 30)
  filesystem_scan_interval_min: 30      # filesystem scan interval (default: 30)
  multi_ip_login_threshold: 3           # IPs per account before alert (default: 3)
  multi_ip_login_window_min: 60         # time window for multi-IP check (default: 60)
  cred_stuffing_distinct_accounts: 5    # failed accounts from one IP before credential_stuffing (default: 5)
  plugin_check_interval_min: 1440       # WordPress plugin check interval (default: 1440)
  brute_force_window: 5000              # failed auth attempts window (default: 5000)
  domlog_max_files: 500                 # per-domain access logs per WP brute-force scan (default: 500)
  domlog_tail_lines: 500                # trailing lines tailed from each domlog per scan (default: 500)
  domlog_max_age_min: 30                # skip per-domain access logs untouched in this many minutes (default: 30)
  mail_log_tail_lines: 500              # trailing lines of /var/log/exim_mainlog read by the mail-per-account scanner (default: 500)
  syslog_messages_tail_lines: 200       # trailing lines of /var/log/messages read by the FTP login scanner (default: 200)
  account_scan_max_files: 10000         # account and mail-domain paths per scanner cycle (default: 10000)
  # If this cap clips /home/<account>/ paths, account_scan_truncated names the affected account.
  crontab_base64_blob_max_bytes: 16384  # encoded bytes per crontab base64 candidate before decoded-content matching; must be a multiple of 4 (default: 16384)

  # HTTP request flood, User-Agent spoof, and distributed HTTP detection.
  # These detectors scan the same per-vhost access-log stream as the WP
  # brute-force scanner; no extra log tailer is needed.
  #
  # http_flood_threshold: minimum per-IP request count inside the window
  # that emits http_request_flood. 0 disables the detector. The detector
  # ships disabled so operators can sample local baseline traffic first.
  # Adjust up for CDNs or CGNAT-heavy visitor pools before enabling.
  http_flood_threshold: 0              # 0 = disabled; set after sampling baseline traffic
  http_flood_window_min: 5             # rate window in minutes (default: 5)

  # http_ua_spoof_threshold: per-IP per-window count for non-browser UA
  # kinds before http_ua_spoof fires. Claimed search-engine bots (Googlebot,
  # Bingbot, Applebot) that fail reverse-DNS confirmation fire regardless of
  # this threshold once the rDNS cache confirms the IP is not the real bot.
  http_ua_spoof_threshold: 30          # default: 30

  # http_distributed_min_ips: distinct already-abusive source IPs that hit
  # the same vhost in one scan window before a per-vhost distributed flood
  # finding fires. 0 disables the rollup for existing configs that do not
  # opt in.
  http_distributed_min_ips: 10         # sample setting; omit or set 0 to disable

  # These three opt-in flags extend UA spoof detection to additional UA
  # classes. Leave disabled on busy shared hosts; scripting-language agents
  # and headless browsers appear on many legitimate monitoring stacks.
  http_ua_scripting_enabled: false     # flag curl/wget/python-requests/Go-http style UAs
  http_ua_headless_enabled: false      # flag Puppeteer/Playwright/PhantomJS UAs
  http_ua_empty_enabled: false         # flag requests with no UA at all

  # SMTP brute-force tracker (Exim mainlog, dovecot SASL on submission ports)
  smtp_bruteforce_threshold: 5            # per-IP failed auths before block (default: 5)
  smtp_bruteforce_window_min: 10          # sliding window in minutes (default: 10)
  smtp_bruteforce_suppress_min: 60        # cooldown between repeat findings (default: 60)
  smtp_bruteforce_subnet_threshold: 8     # unique IPs per /24 before subnet block (default: 8)
  smtp_account_spray_threshold: 12        # unique IPs targeting one mailbox before visibility finding (default: 12)
  smtp_bruteforce_max_tracked: 20000      # soft cap on tracked entries; oldest evicted (default: 20000)

  # SMTP probe-abuse tracker (raw connect-rate per IP; catches scanners that
  # never reach AUTH). Threshold sized well above any legitimate MUA usage.
  smtp_probe_threshold: 100               # per-IP connects before block (default: 100; explicit 0 disables)
  smtp_probe_window_min: 5                # sliding window in minutes (default: 5)
  smtp_probe_suppress_min: 60             # cooldown between repeat findings (default: 60)
  smtp_probe_max_tracked: 20000           # soft cap on tracked entries; oldest evicted (default: 20000)

  # Mail brute-force tracker (IMAP/POP3/ManageSieve via mail_logs source)
  mail_bruteforce_threshold: 5            # per-IP failed auths before block (default: 5)
  mail_bruteforce_window_min: 10          # sliding window in minutes (default: 10)
  mail_bruteforce_suppress_min: 60        # cooldown between repeat findings (default: 60)
  mail_bruteforce_subnet_threshold: 8     # unique IPs per /24 before subnet block (default: 8)
  mail_account_spray_threshold: 12        # unique IPs targeting one mailbox before visibility finding (default: 12)
  mail_bruteforce_max_tracked: 20000      # soft cap on tracked entries; oldest evicted (default: 20000)
  mail_brute_account_key: "builtin:dovecot-user" # builtin:dovecot-user | builtin:postfix-sasl | regex:<capture>
  modsec_escalation_hits: 3          # denies from one IP before ModSecurity escalation (default: 3)
  modsec_escalation_window_min: 10   # ModSecurity escalation window in minutes (default: 10)

# --- Web server overrides ---
# Leave these empty to use auto-detected paths for the running platform.
web_server:
  # Override the per-vhost access-log glob patterns. Empty uses the
  # auto-detected default for the panel (cPanel, Plesk, DirectAdmin,
  # bare Apache, or bare Nginx).
  domlog_globs: []
  # IPs or CIDRs whose X-Forwarded-For header is trusted for client-IP
  # extraction. Leave empty to ignore XFF and use RemoteIP as-is.
  trusted_proxies: []

# --- Infrastructure ---
infra_ips: []                           # management IPs/CIDRs/hostnames - never blocked

# --- Mail Logs ---
# Packaged releases include journald support. Custom builds need
# `make JOURNAL=1 build-yara` before `source: journal` can be selected.
mail_logs:
  source: auto                          # auto | file | journal
  file: ""                              # optional path override for file source
  units: ["postfix", "dovecot"]         # journal units for source=journal or auto fallback

# --- State ---
state_path: "/var/lib/csm/state"        # bbolt DB and state files

# --- Suppressions ---
suppressions:
  upcp_window_start: "00:30"            # cPanel nightly update window start
  upcp_window_end: "02:00"              # cPanel nightly update window end
  known_api_tokens: []                  # API tokens to ignore in auth logs (e.g. ["phclient"])
  ignore_paths:                         # glob patterns to skip in filesystem scans
    - "*/cache/*"
    - "*/vendor/*"
  suppress_webmail_alerts: true         # don't alert on webmail logins
  suppress_cpanel_login_alerts: false   # don't alert on cPanel direct logins
  suppress_blocked_alerts: true         # don't alert on IPs that were auto-blocked
  trusted_countries: ["RO"]             # ISO 3166-1 alpha-2 - suppress cPanel login alerts from these

# --- Auto-Response ---
auto_response:
  enabled: false
  kill_processes: false                 # kill malicious processes
  quarantine_files: false               # move malware to quarantine
  block_ips: false                      # block attacker IPs via firewall
  block_expiry: "24h"                   # duration for temp blocks (e.g. "24h", "12h")
  max_blocks_per_hour: 50               # per-IP blocks per hour; 0/omitted uses default
  enforce_permissions: false            # auto-chmod 644 world/group-writable PHP files
  block_cpanel_logins: false            # block IPs on cPanel/webmail/FTP/API thresholded brute findings (multi-IP login, webmail/API brute, FTP brute). Single direct cPanel form logins stay audit-only regardless of this flag.
  netblock: false                       # auto-block IPv4 /24 or IPv6 /64 subnets
  netblock_threshold: 3                 # IPs from same IPv4 /24 or IPv6 /64 before subnet block
  permblock: false                      # promote temp blocks to permanent
  permblock_count: 4                    # temp blocks before promotion
  permblock_interval: "24h"             # window for counting temp blocks
  clean_database: false                 # auto-drop confirmed malicious DB objects after backup
  clean_htaccess: false                 # auto-clean .htaccess directives flagged by hardened detectors (backups under /opt/csm/quarantine/pre_clean/)
  disable_enforce_af_alg: false         # suspend periodic AF_ALG hardening re-assertion
  copy_fail_kill_process: false         # kill processes caught opening AF_ALG sockets via the live listener
  dry_run: true                         # safe default; logs intended IP blocks without mutating nftables
  verdict_callback:
    enabled: false                      # call panel before each auto-block
    url: ""                             # POST target for verdict requests
    hmac_secret: ""                     # signing secret, or use hmac_secret_env
    hmac_secret_env: ""                 # env var read at call time
    allow_unsigned: false               # true only for staged unsigned rollouts
    require_response_signature: true    # reject unsigned callback replies
    timeout_sec: 2                      # callback request timeout

  # PHP-relay auto-freeze. Off by default; only kicks in on cPanel hosts
  # where email_protection.php_relay.enabled is true. dry_run defaults to
  # true even when freeze is true, so an operator who enables freeze
  # without thinking gets a dry-run rather than a live exim -Mf storm.
  # Override at runtime with `csm phprelay dry-run on|off|reset`.
  php_relay:
    freeze: false                       # opt in to wire the exim -Mf hook into the alert pipeline
    dry_run: true                       # safe default; flip with `csm phprelay dry-run off [--persist]`
    max_actions_per_minute: 60          # rolling 60s cap on exim -Mf invocations

# --- Detection ---
detection:
  # db_object_scanning is tri-state: omit for the default (on),
  # `false` to explicitly disable. When off, the MySQL persistence
  # scanner emits no findings; the manual `csm db-clean --drop-object`
  # CLI keeps working for operator-driven cleanup.
  # db_object_scanning: true
  db_object_allowlist: []               # entries: <account>:<schema>:<type>:<name> -- suppresses db_unexpected_* warnings only
  admin_overlap_min_accounts: 2         # raise only if routine shared-admin accounts are expected on this host
  admin_overlap_trusted_emails: []       # exact reviewed admin emails that may manage multiple cPanel accounts
  admin_overlap_trusted_domains: []      # exact reviewed email domains for developer or reseller admin accounts
  # rescan_on_signature_update: true    # tri-state; omit for default-on, false to disable retroactive sweeps
  af_alg_backend: "auto"                # auto | bpf | auditd | none
  connection_tracker_backend: "auto"    # auto | bpf | legacy | none
  connection_poll_interval: 30s         # legacy connection tracker interval
  exec_monitor_backend: "auto"          # auto | bpf | legacy | none
  exec_monitor_poll_interval: 30m       # legacy process monitor interval
  sensitive_files_backend: "auto"       # auto | bpf | legacy | none
  sensitive_files_poll_interval: 5m     # sensitive-file poll/watchset refresh interval
  direct_smtp_egress:
    enabled: false                      # detect non-MTA local processes opening outbound SMTP
    backend: "auto"                     # auto | bpf | legacy | none
    dry_run: true                       # safe default for detector-scoped action
    ports: [25, 465, 587]               # destination ports to inspect

# --- BPF Enforcement ---
bpf_enforcement:
  enabled: false                        # master switch for in-kernel denial
  dry_run: true                         # log intended denials, allow the connect
  direct_smtp_egress: false             # gate enforcement on direct SMTP egress matches
  verdict_callback: false               # userspace advisory callback after the BPF decision

# --- Challenge Pages ---
challenge:
  enabled: false                        # enable PoW challenge pages instead of hard block
  listen_addr: 127.0.0.1                # bind address; use 0.0.0.0 for public direct redirects
  listen_port: 8439                     # port for challenge server; must fit the TCP port range
  tls_cert: ""                          # optional HTTPS cert for direct/public challenge listener
  tls_key: ""                           # optional HTTPS key for direct/public challenge listener
  public_url: ""                        # required by webserver-integration, e.g. https://host:8439/challenge
  secret: ""                            # HMAC secret for tokens (auto-generated if empty)
  difficulty: 2                         # SHA-256 proof-of-work difficulty 0-5 (default: 2)
  trusted_proxies: []                   # IPs/CIDRs allowed to supply X-Forwarded-For
  port_gate:
    enabled: false                      # nftables gate for non-loopback challenge listener
  captcha_fallback:                     # widget for JS-disabled visitors (default off)
    provider: ""                        # "turnstile" | "hcaptcha" | "" (off)
    site_key: ""                        # public key embedded in the widget
    secret_key: ""                      # verified server-side
    timeout: 10s
  verified_session:                     # signed-cookie bypass for authenticated operators
    enabled: false
    cookie_name: csm_admin_session
    ttl: 4h
    admin_secret: ""                    # POST'd to /challenge/admin-token to mint cookie
  verified_crawlers:                    # reverse-DNS forward-confirm for search crawlers
    enabled: false
    providers: []                       # names: googlebot | bingbot
    cache_ttl: 15m

# --- PHP Shield ---
php_shield:
  enabled: false                        # watch the PHP Shield event log for alerts

# --- Reputation ---
reputation:
  abuseipdb_key: ""                     # AbuseIPDB API key for IP reputation lookups
  whitelist: []                         # IPs to never flag as malicious
  # Async PTR + forward-A verification for IPs that claim search-engine
  # bot UAs (Googlebot, Bingbot, Applebot). When an IP claims a bot UA
  # but reverse DNS does not confirm it, the request counts toward
  # http_ua_spoof. Transient DNS lookup failures fail open and are
  # retried later. Set false only if your resolver is unreliable. See
  # docs/src/auto-response.md for the always-block behavior.
  bot_verify_enabled: true              # default: true
  rspamd:
    enabled: false                      # include rspamd rolling history in IP reputation
    url: "http://127.0.0.1:11334"       # rspamd controller URL
    token: ""                           # controller password, or use token_env
    token_env: ""                       # env var read at query time
  upstream:
    enabled: false                      # include panel-side threat-intel cache scores
    url: ""                             # HTTPS base URL; HTTP only allowed for loopback
    token: ""                           # bearer token, or use token_env
    token_env: ""                       # env var read at query time
    cache_ttl_min: 15                   # local cache TTL for upstream scores
    timeout_sec: 5                      # upstream request timeout
  report:
    enabled: false                      # opt-in abuse report delivery; restart required
    classes: []                         # bruteforce | php_relay | credential_stuffing | bad_asn_egress
    spool_path: ""                      # default: <state_path>/abuse_reports.db
    spool_max: 10000                    # max queued reports per target
    targets:
      - name: ""                        # stable target name
        url: ""                         # HTTPS collector URL; HTTP only allowed for loopback
        transport: "hmac"               # hmac | ed25519
        node_id: ""                     # sender node ID
        key_id: ""                      # receiver key ID
        key_env: ""                     # HMAC secret or Ed25519 private key env var
        token_env: ""                   # optional bearer token env var for HMAC targets
  central:
    enabled: false                      # opt-in central scored-set consume; restart required
    set_url: ""                         # HTTPS scored-set endpoint; HTTP only for loopback
    pubkey_env: ""                      # env var with Ed25519 public key hex
    refresh_interval: 6h                # pull interval; default 6h
    action: "challenge"                 # off | challenge | block_if_local_corroborated
    block_threshold: 80                 # score needed before local corroboration can block

# --- Signatures ---
signatures:
  rules_dir: "/opt/csm/rules"           # YAML signature rules directory
  update_url: ""                        # remote URL to fetch rule updates
  auto_update: false                    # auto-download rules on schedule
  update_interval: ""                   # how often to check (e.g. "24h")
  signing_key: ""                       # required for any remote rule update path; 64-char hex Ed25519 public key
  yara_forge:
    enabled: false                      # auto-fetch YARA Forge community rules
    tier: "core"                        # "core", "extended", "full" (default: "core")
    update_interval: "168h"             # how often to check for updates (default: weekly)
    download_url: ""                    # signed ZIP URL/template; supports {tier} and {version}
  disabled_rules: []                    # YARA rule names to exclude from Forge downloads
  # yara_worker_enabled: true           # tri-state: omit for the default (on), `false` to explicitly disable

# signatures.signing_key is mandatory whenever either signatures.update_url
# is set or signatures.yara_forge.enabled is true. It must be the hex
# Ed25519 public key used to verify detached .sig files for rule bundles.
# Remote update URLs must use HTTP or HTTPS and must not point at localhost,
# loopback, link-local, unspecified, or RFC1918 / ULA private addresses.
#
# YARA Forge upstream GitHub releases do not publish CSM detached signatures.
# To enable automatic Forge updates, mirror the ZIPs, sign each ZIP, publish
# the signature at the ZIP URL plus .sig, and set yara_forge.download_url to
# that signed mirror. Otherwise leave update_url empty and yara_forge.enabled
# false.

# --- Web UI ---
webui:
  enabled: true
  listen: "0.0.0.0:9443"               # address:port for HTTPS server
  auth_token: ""                        # Bearer/cookie auth token (auto-generated on install)
  tokens: []                            # optional scoped tokens: name/token/scope (admin or read)
  metrics_token: ""                     # optional Bearer token for /metrics only
  tls_cert: ""                          # path to TLS certificate PEM file
  tls_key: ""                           # path to TLS private key PEM file
  ui_dir: ""                            # path to UI files on disk (default: /opt/csm/ui)

# --- Email AV ---
email_av:
  enabled: false
  clamd_socket: "/var/run/clamd.scan/clamd.sock"  # path to ClamAV daemon socket
  scan_timeout: "30s"                   # per-attachment scan timeout
  max_attachment_size: 26214400         # max single attachment size in bytes (25MB)
  max_archive_depth: 1                  # max nested archive extraction depth
  max_archive_files: 50                 # max files extracted from a single archive
  max_extraction_size: 104857600        # max total extraction size in bytes (100MB)
  quarantine_infected: true             # quarantine emails with infected attachments
  scan_concurrency: 4                   # parallel scan workers

# --- Email Protection ---
email_protection:
  password_check_interval_min: 1440     # how often to audit email passwords (default: 1440)
  high_volume_senders: []               # accounts expected to send high volume (skip rate alerts)
  rate_warn_threshold: 50               # emails per window before warning (default: 50)
  rate_crit_threshold: 100              # emails per window before critical (default: 100)
  rate_window_min: 10                   # rate check window in minutes (default: 10)
  known_forwarders: []                  # expected plain mail forwarders

  # PHP-relay detector (cPanel only; gated by platform.IsCPanel at startup).
  # Off by default. When enabled, the daemon spawns the inotify spool
  # watcher, runs a startup spool walk, and starts the Path 2b retro scan
  # on /var/log/exim_mainlog. See docs/src/detection-realtime.md#php-relay
  # for what each path actually triggers on.
  php_relay:
    enabled: false                      # opt in to start the watcher
    rate_window_min: 5                  # Path 1 rolling window
    header_score_volume_min: 5          # Path 1: don't score until script has emitted N msgs
    absolute_volume_per_hour: 30        # Path 2 threshold per script
    account_volume_per_hour: 0          # Path 2b operator override; 0 = auto-derive from cpanel.config maxemailsperhour
    reputation_failures_per_24h: 3      # Path 3 threshold (Stage 2)
    fanout_distinct_scripts: 3          # Path 4 threshold
    fanout_window_min: 5                # Path 4 window
    baseline_sigma: 3.0                 # Path 5 (Stage 3)
    baseline_observation_days: 7        # Path 5 (Stage 3)
    policies_dir: "/opt/csm/policies/php_relay"  # mailer_classes.yaml + http_proxy_ranges.yaml; SIGHUP-reloadable
  cloud_relay:
    allow_users: []                     # full mailbox opt-outs for cloud-relay detection
    allow_domains: []                   # domain-wide opt-outs for cloud-relay detection

  # Email forward guard (cPanel only). Opt-in MTA-native enforcement for
  # external forward copies. Enforce mode can hold null-sender backscatter and
  # bad-sender-IP copies before they relay to an external provider, while the
  # local mailbox copy still delivers. Spam, malware, and auth-fail signals are
  # accounted in dry-run until Exim content scanning is wired. CSM is not in the
  # live mail path; an installed Exim rule can keep holding matching copies even
  # if the daemon is down. Held copies can be released or deleted from the Email page.
  forward_guard:
    enabled: false                      # master switch (default off)
    dry_run: true                       # account/log only, do not actually hold (default true)
    quarantine_retention_days: 14       # held-copy retention window
    skip_forwarders: []                 # reserved forwarder exemptions; not enforced yet
    hold_signals:                       # signal toggles, each default true
      bounce_backscatter: true          # null-sender bounce backscatter (enforceable)
      spam_flagged: true                # message flagged as spam (dry-run/accounting only)
      malware: true                     # message carries malware (dry-run/accounting only)
      bad_sender_ip: true               # originating IP has bad reputation (enforceable)
      auth_fail: true                   # sender failed SPF/DKIM/DMARC auth (dry-run/accounting only)

# --- Firewall ---
firewall:
  enabled: false

  # Open ports (IPv4). SSH (22) is intentionally absent; uncomment in
  # the YAML lists if sshd listens on 22. TCP 853 is DNS-over-TLS;
  # UDP 853 is DNS-over-QUIC.
  # 6277/24441 are DCC/Pyzor network checks used by SpamAssassin.
  tcp_in: [20,21,25,26,53,80,110,143,443,465,587,853,993,995,2077,2078,2079,2080,2082,2083,2091,2095,2096]
  tcp_out: [20,21,25,26,37,43,53,80,110,113,443,465,587,853,873,993,995,2082,2083,2086,2087,2089,2195,2325,2703]
  udp_in: [53,443,853]
  udp_out: [53,113,123,443,853,873,6277,24441]

  # IPv6
  ipv6: false
  tcp6_in: []                           # if empty, uses tcp_in
  tcp6_out: []                          # if empty, uses tcp_out
  udp6_in: []                           # if empty, uses udp_in
  udp6_out: []                          # if empty, uses udp_out

  # Restricted ports (infra IPs only)
  restricted_tcp: [2086,2087,2325]      # WHM ports

  # Passive FTP range
  passive_ftp_start: 49152
  passive_ftp_end: 65534

  # Infra IPs/CIDRs/hostnames for firewall rules
  infra_ips: []

  # Rate limiting
  conn_rate_limit: 200                  # new connections/min per IP (CGNAT-tolerant)
  syn_flood_protection: true
  conn_limit: 400                       # max concurrent connections per IP (0 = disabled)

  # Per-port flood protection: rate-limit new connections per source IP and IP family.
  # Defaults are sized for a busy mail host: 600/300s = 120 new conns/min/IP,
  # which tolerates a Thunderbird/iPhone client opening 5-15 parallel sessions
  # while still capping single-IP flood storms.
  port_flood:
    - port: 25
      proto: tcp
      hits: 600
      seconds: 300
    - port: 465
      proto: tcp
      hits: 600
      seconds: 300
    - port: 587
      proto: tcp
      hits: 600
      seconds: 300

  # UDP flood protection
  udp_flood: true
  udp_flood_rate: 100                   # packets per second
  udp_flood_burst: 500                  # burst allowance

  # Country blocking
  country_block: []                     # ISO country codes to block
  country_db_path: ""                   # path to MaxMind DB (uses geoip config if empty)

  # Silent drop (no logging)
  drop_nolog: [23,67,68,111,113,135,136,137,138,139,445,500,513,520]

  # IP limits
  deny_ip_limit: 3000                   # max permanent blocked IPs
  deny_temp_ip_limit: 500               # max temporary blocked IPs

  # Outbound SMTP restriction
  smtp_block: false                     # block outgoing mail except allowed users
  smtp_allow_users: []                  # usernames allowed to send
  smtp_ports: [25,465,587]

  # Dynamic DNS
  dyndns_hosts: []                      # hostnames to resolve and whitelist periodically

  # Logging
  log_dropped: true                     # log dropped packets
  log_rate: 5                           # log entries per minute

# --- GeoIP ---
geoip:
  account_id: ""                        # MaxMind account ID
  license_key: ""                       # MaxMind license key
  editions:                             # MaxMind database editions
    - GeoLite2-City
    - GeoLite2-ASN
  auto_update: true                     # auto-update GeoIP databases (default: true when credentials set)
  update_interval: "24h"                # update check interval

# --- ModSecurity ---
modsec_error_log: ""                    # path to Apache/LiteSpeed error log for ModSec parsing
modsec:
  rules_file: ""                        # path to modsec2.user.conf
  overrides_file: ""                    # path to csm-overrides.conf
  reload_command: ""                    # command to reload web server (e.g. "/usr/sbin/apachectl graceful")

# --- Performance ---
performance:
  enabled: true
  load_high_multiplier: 1.0             # load average / CPU cores multiplier for warning (default: 1.0)
  load_critical_multiplier: 2.0         # load average / CPU cores multiplier for critical (default: 2.0)
  php_process_warn_per_user: 20         # per-user PHP process count warning (default: 20)
  php_process_critical_total_multiplier: 5  # total PHP processes / CPU cores for critical (default: 5)
  error_log_warn_size_mb: 50            # error log size warning threshold (default: 50)
  mysql_join_buffer_max_mb: 64          # MySQL join_buffer_size warning threshold (default: 64)
  mysql_wait_timeout_max: 3600          # MySQL wait_timeout warning threshold (default: 3600)
  mysql_max_connections_per_user: 10    # per-user MySQL connections warning (default: 10)
  redis_bgsave_min_interval: 900        # minimum seconds between Redis BGSAVE (default: 900)
  redis_large_dataset_gb: 4             # Redis dataset size warning threshold in GB (default: 4)
  wp_memory_limit_max_mb: 512           # WordPress memory_limit warning threshold (default: 512)
  wp_transient_warn_mb: 1               # WordPress transient data warning in MB (default: 1)
  wp_transient_critical_mb: 10          # WordPress transient data critical in MB (default: 10)

# --- Cloudflare ---
cloudflare:
  enabled: false                        # auto-whitelist Cloudflare IP ranges
  refresh_hours: 6                      # how often to refresh Cloudflare IPs (default: 6)

# --- Threat Intel ---
c2_blocklist: []                        # known C2 server IPs to block permanently
backdoor_ports: [4444,5555,55553,55555,31337]  # ports indicating backdoor activity

# --- Update check ---
updates:
  check_enabled: true                   # notify only; CSM never downloads or applies updates
  interval: "24h"                       # release check interval
  github_api_url: ""                    # optional release API mirror or test endpoint
  package_name: "csm"                   # apt/dnf package name for package-manager fallback

# --- Incidents ---
incidents:
  auto_close:
    enabled: true                       # auto-close idle open/contained incidents
    dry_run: false                      # log decisions without writing status changes
    by_kind:
      mailbox_takeover: 24h
      credential_spray: 24h
      web_account_compromise: 168h
  spray_suppression:
    enabled: false                      # collapse one-source credential spray into one incident
    dry_run: true
    distinct_mailboxes: 10
    severity_escalate_at: 50
    per_check: [email_auth_failure_realtime, pam_auth_failure, ssh_bruteforce]
    max_tracked_ips: 10000
    block_at_severity: ""              # "" | high | critical
  auto_block:
    enabled: false                      # block source IPs from incident correlations
    block_at_severity: ""              # "" | high | critical
    kinds: []                           # empty means all non-spray kinds with remote_ip

# --- Disabled checks (skip whole categories per host) ---
# Listed finding names disable the scheduled check runner(s) that emit them,
# including sibling findings from the same runner. Realtime findings are not
# affected. Use for whole categories that don't apply to a host (e.g. WAF/web
# checks on DNS-only cPanel servers, where httpd is installed but no virtual
# hosts serve traffic).
# For email-only suppression, use `alerts.email.disabled_checks` instead.
disabled_checks: []                     # e.g. [waf_status, waf_rules, waf_detection_only]

# --- Retention (bbolt growth control) ---
retention:
  enabled: false                        # opt-in; when true, a daily sweep prunes old entries and compacts bbolt
  findings_days: 90                     # keep active findings this long (0 disables the findings sweep)
  history_days: 30                      # keep findings-history entries this long
  reputation_days: 180                  # keep IP reputation/attack entries this long
  sweep_interval: "24h"                 # how often the retention goroutine runs
  compact_min_size_mb: 128              # don't consider compaction below this file size
  compact_fill_ratio: 0.5               # compact when used_bytes / file_size drops below this

# --- Sentry (error reporting) ---
sentry:
  enabled: false                        # ship panics and selected errors to a Sentry server
  dsn: ""                               # Sentry project DSN
  environment: "production"             # e.g. "production", "staging"
  sample_rate: 1.0                      # 0.0 -> 1.0 (capture all errors)
  debug: false                          # SDK debug logs to stderr

TLS Certificates

The Web UI serves over HTTPS. Configure TLS certificates under webui:

webui:
  tls_cert: "/var/cpanel/ssl/cpanel/mycpanel.pem"   # certificate PEM file
  tls_key: "/var/cpanel/ssl/cpanel/mycpanel.pem"     # private key PEM file

On cPanel servers, you can reuse the cPanel self-signed certificate (both cert and key are in the same PEM file). For production, use a proper certificate from Let’s Encrypt or your CA.

If tls_cert and tls_key are empty, the Web UI will not start.

Validation

csm validate           # syntax check
csm validate --deep    # syntax + connectivity probes (SMTP, webhooks)
csm config show        # display config with secrets redacted

Editing csm.yaml by hand

CSM stores a sha256 of the main config in integrity.config_hash and a separate digest of loaded drop-ins in integrity.confd_hash. It refuses to start if the on-disk files disagree with those values. This is a tamper-detection feature. There are two supported edit workflows depending on which fields you touch.

Fast path: SIGHUP reload (safe fields only)

For fields tagged as hot-reload-safe (alerts, thresholds, detection, suppressions, auto_response, bpf_enforcement, reputation, email_protection, disabled_checks), the daemon can accept the change without a restart:

sudo cp /etc/csm/csm.yaml /etc/csm/csm.yaml.bak-$(date +%s)

# edit /etc/csm/csm.yaml with your favourite editor

sudo systemctl reload csm
sudo journalctl -u csm -n 20 --no-pager

systemctl reload sends SIGHUP (wired via ExecReload= in the unit file). The daemon re-reads the file, validates it, diffs it against the running config, and if every change is on a field tagged hotreload:"safe" it swaps the new values into the live config and re-signs the integrity hashes on disk. The next check tick sees the new thresholds; fanotify marks are not dropped.

The tagged-safe top-level fields are alerts, thresholds, detection, suppressions, auto_response, bpf_enforcement, reputation, email_protection, and disabled_checks. The Settings API derives its restart hints from the same manifest that drives config.Diff, so UI hints and SIGHUP behavior cannot drift silently. Changes to their sub-keys are picked up on the next tick by the periodic scanners, the auto-response helpers (block/kill/quarantine/challenge/permission-fix), alert dispatch, and the heartbeat.

Two sub-keys are exceptions. They live under a safe-tagged parent but seed a long-lived in-memory structure at daemon startup; the reload accepts the edit and re-signs the hash, but the running structure keeps the old value until the next restart:

  • reputation.whitelist – seeded into the threat database at startup. The threat database exposes its own runtime API for adding and removing whitelist entries (via the Threat Intelligence page in the Web UI or the /api/v1/threat/* endpoints); those paths survive restarts because the threat database persists the runtime list to disk. Reloading reputation.whitelist from csm.yaml does not automatically propagate to the running threat database.
  • email_protection.known_forwarders – captured by the forwarder watcher at startup and read by scheduled forwarder and mail-filter checks. No runtime API yet; send a restart if you edit this list.

If you change either of the above, send systemctl restart csm instead of a reload. The rest of the sub-keys in every safe-tagged section are read per-call (inside check functions, auto-response helpers, alert dispatchers) and hot-reload cleanly on the next tick.

Look for one of three log shapes in the journal:

  • SIGHUP: config reloaded; safe fields updated: [thresholds] – success. The new values are live.
  • config_reload_restart_required: SIGHUP reload: restart-required fields changed: [hostname ...]; live config unchanged – the edit touched a field that cannot be hot-swapped. A Warning config_reload_restart_required finding is also emitted. Fall back to the restart path below.
  • config_reload_error: SIGHUP reload: parse failed ... or ... validation error ... – the file on disk is not loadable or fails csm validate. A Critical config_reload_error finding is emitted. The live config is unchanged; fix the file and repeat.

Restart path: unsafe fields

Fields not tagged hotreload:"safe" (the majority, including hostname, state_path, webui.listen, firewall.*, email_av.* and anything that survives only one re-init per daemon lifetime) require a full restart. The integrity check must be re-signed first:

sudo cp /etc/csm/csm.yaml /etc/csm/csm.yaml.bak-$(date +%s)

# edit /etc/csm/csm.yaml with your favourite editor

sudo /opt/csm/csm rehash     # re-signs integrity hashes
sudo /opt/csm/csm validate   # syntax + value sanity
sudo systemctl restart csm
sudo systemctl status csm    # confirm active, no crash-loop

If the restart fails (most commonly because rehash was skipped), roll back with sudo cp <backup> /etc/csm/csm.yaml && sudo systemctl restart csm. The backup carries its own matching hash so no second rehash is needed.

Config-management tools

Config-management workflows (Ansible, Puppet, Chef) should:

  • For safe changes, notify systemctl reload csm instead of restart. The daemon re-signs the hash itself; no separate csm rehash step is required.
  • For any change that may touch a restart-required field, run csm rehash before the restart notify fires. Or always send reload first, read the journal, and promote to restart only when the reload logs restart-required.

conf.d drop-ins

Files matching /etc/csm/conf.d/*.yaml are loaded after the main config and deep-merged on top of it. Override with --config-dir <path> or CSM_CONFIG_DIR; the flag wins when both are set.

  • Order: lexicographic by filename. Scalar keys in 20-overrides.yaml override the same keys in 10-base.yaml. Use a numeric prefix.
  • Merge semantics: maps merge recursively; scalars replace the value from the main file; lists append in fragment order. All-scalar lists drop duplicate entries while keeping the first occurrence; structured lists such as webui.tokens keep every entry.
  • Trust: override directories must be absolute, must exist, and must be owned by root or the running process. The directory and every loaded fragment must not be group- or world-writable. Safe symlinked fragments are allowed, so packaged profiles can still be linked into /etc/csm/conf.d/.
  • Integrity ownership: drop-ins cannot set the integrity block. Integrity metadata is stored only in the main config.
  • Hash: integrity.config_hash covers the main file and integrity.confd_hash covers loaded drop-ins. After editing a drop-in by hand, run csm rehash before restarting, or use systemctl reload csm so the daemon can re-sign after validating the merged config. Web settings saves refuse to bless a drop-in change that has not already been re-signed.
  • Use cases: packaged integration profiles (e.g. /usr/lib/csm/profiles/phpanel-agent.yaml symlinked into conf.d/), per-host automation that should not touch the operator’s csm.yaml, secret material rendered from a vault.
ls /etc/csm/conf.d/
# 10-phpanel-agent.yaml   20-tenant-overrides.yaml

csm validate                # validates the merged config
csm config show             # prints the merged, redacted config
csm config schema --json    # JSON Schema for editor / CI validation

csm validate and csm config show always operate on the merged config so you can audit the effective state without grepping fragments.

detection.direct_smtp_egress

Phase 3 detector. backend accepts auto, bpf, legacy, or none; ports must contain TCP ports in the 1-65535 range. See Direct SMTP egress.

bpf_enforcement

Phase 4 enforcement. Requires a BPF-capable connection tracker at runtime; auto falls back to legacy detection on older servers. See BPF enforcement.

Upgrading

/opt/csm/deploy.sh upgrade

This will:

  1. Stop the daemon
  2. Back up the current binary
  3. Download the new version
  4. Verify SHA256 checksum
  5. Extract UI assets and rules
  6. Rehash config
  7. Restart the daemon

Rolls back automatically on failure.

Troubleshooting

“store: opening bbolt: timeout” – Most operator commands that need live state now route through the control socket at /var/run/csm/control.sock. This error should only appear from commands that intentionally open the bbolt file directly, such as csm store compact, csm store import, csm store reset-bot-verify, csm db-clean --drop-object, or a second daemon start while one daemon already owns the database.

Fix: stop the daemon before direct-store maintenance commands, then retry:

systemctl stop csm
csm store compact
systemctl start csm

If systemctl says CSM is stopped but bbolt still times out, find the process holding /var/lib/csm/state/csm.db and stop that process after review. Do not delete csm.lock; it is only the daemon instance guard and does not release bbolt’s file lock.

“csm: daemon not running” – CLI commands that talk to the daemon exit 2 with this message when the control socket is missing. This includes csm run*, csm check*, csm baseline, csm status, csm firewall ..., csm store export, csm export --since, and csm phprelay .... Start the daemon with systemctl start csm. Bootstrap commands that run before the daemon exists (csm install, csm validate, csm config schema, csm verify, csm rehash) do not require it.

Never delete csm.db – it contains all historical findings, firewall state, email forwarder baselines, and per-account data. If you delete it, the web UI will show empty data until the next full scan cycle (up to 60 minutes for deep scan findings). Restore from backup when possible; for an intentional reset, run csm baseline --confirm rather than removing the database by hand.

Config changes require rehash – After editing csm.yaml, run csm rehash twice (the config hash is stored inside the config file, creating a circular dependency – the second run stabilizes it). Or just restart via systemctl restart csm.

RPM/DEB

yum update csm              # RPM
dpkg -i csm_NEW.deb         # DEB

Package managers handle stop/start automatically.

FHS migration (state, config, drop-ins, and profiles)

Current packages use FHS paths for state, config, drop-ins, and shipped profiles. Legacy main configs continue to work during the transition.

ConcernLegacy pathCurrent path
Drop-in fragmentsn/a/etc/csm/conf.d/*.yaml
State directory/opt/csm/state/var/lib/csm/state
Shipped profilesn/a/usr/lib/csm/profiles
Binary/opt/csm/csm/opt/csm/csm (unchanged)
Main config/opt/csm/csm.yaml/etc/csm/csm.yaml
Legacy config pathn/a/opt/csm/csm.yaml symlink

The package postinstall creates the FHS directories with the right ownership. If /opt/csm/csm.yaml is a real file and /etc/csm/csm.yaml is absent or still the shipped placeholder, the package copies the legacy config into /etc/csm/csm.yaml and then replaces the old path with a symlink. If both paths are real files with different operator content, CSM refuses the implicit default path until you move one aside or pass --config <path>.

The daemon copies a non-empty legacy /opt/csm/state/ into the new state directory on first start, but only when the new directory is empty (so a partial migration cannot corrupt it). The legacy directory is left in place; remove it after you have verified the new install.

Operators upgrading by manual binary swap (without re-running the package postinstall) keep the legacy state path if state_path: /opt/csm/state is pinned in the existing csm.yaml. To move state to the FHS layout, either reinstall the package or create the directories by hand and remove the state_path: override.

systemd Type=notify drop-in

The packaged unit file is Type=notify with WatchdogSec=300. The daemon signals READY=1 after watchers attach and pings WATCHDOG=1 on schedule, so systemctl is-active reflects truth and the watchdog kills a hung daemon.

Older units shipped Type=simple. The watchdog still functions because the daemon pings regardless of unit type, but systemctl status only sees the process, not “watchers attached.” If you need the new behavior on an older unit, drop in:

# /etc/systemd/system/csm.service.d/notify.conf
[Service]
Type=notify
NotifyAccess=main

Then systemctl daemon-reload && systemctl restart csm. Verify with systemctl show csm -p Type -p StatusText.

Auto-response dry-run safety default

auto_response.dry_run defaults to true when the key is absent. The daemon records every IP it would have blocked but does not touch nftables. If your auto_response: block sets enabled: true and block_ips: true but does not set dry_run, add dry_run: false explicitly before relying on auto-block. Verify with:

csm status --json | jq '.capabilities, .severities'
csm firewall status            # check that "Recently Blocked" picks up new entries after the restart

Manual csm firewall ... operations bypass dry-run and always apply.

CLI Commands

Global flags

FlagDescription
--config <path>Override the main config path. Default: /etc/csm/csm.yaml, with fallback to /opt/csm/csm.yaml on legacy installs.
--config-dir <path>Override the conf.d directory. Default: /etc/csm/conf.d. Wins over CSM_CONFIG_DIR when both are set. Override paths must be absolute, trusted, and not group- or world-writable; loaded fragments must meet the same write-safety check.

Daemon

CommandDescription
csm daemonRun as persistent daemon (fanotify + inotify + PAM + periodic checks). Signals systemd READY=1 after watchers attach and pings WATCHDOG=1 on the configured interval.

Checks

CommandDescription
csm runRun all checks now via the daemon, send alerts
csm run-criticalCritical checks now via the daemon (the daemon also schedules critical checks internally every 10 min)
csm run-deepDeep checks now via the daemon (the daemon also schedules deep checks internally every 60 min)
csm checkRun all checks via the daemon, print findings to stdout, no alerts / auto-response
csm check-criticalTest critical checks only (dry-run via daemon)
csm check-deepTest deep checks only (dry-run via daemon)
csm scan <user>Scan single cPanel account

Management

CommandDescription
csm installDeploy config, systemd, auditd rules, logrotate, WHM plugin
csm uninstallClean removal
csm baselineFull server scan via the daemon, records current state for change tracking. Dangerous privileged accounts or WHM root tokens can still be reported on first scan. Takes 5-10 min on large servers. Required on first install. Add --confirm when existing history would be cleared. The daemon must be running.
csm rehashUpdate binary/config hashes without scanning. Use after config edits. Run twice (circular hash).
csm statusShow current state, last run, active findings, and automation rollout state. Add --json for the full health snapshot (watchers, severity counts, store health, blocklist size, capabilities, version, hashes, automation).
csm doctorConfig + daemon + watchers + store sanity check. csm doctor challenge checks challenge public URL, TLS, port gate, webserver snippets, configtest, and the live /challenge/gate endpoint. Add --json for machine-readable output.
csm validateValidate config (--deep for connectivity probes)
csm config showDisplay config with secrets redacted
csm config schema --jsonPrint a JSON Schema reflected from the Config struct. Use for CI validation of conf.d drop-ins or panel-side editor schemas.
csm verifyVerify binary and config integrity
csm versionVersion and build info

Backup & restore

CommandDescription
csm backup <path>Bundle csm.yaml, /etc/csm/conf.d/, and the state directory into a tar.gz at <path>. Use for clean DR snapshots. Daemon may be running.
csm restore <archive>Extract a backup archive into the live csm.yaml + conf.d + state directory. Rejects path-traversal entries and pre-existing symlinks under restore targets. Stop the daemon first.

csm store export / csm store import (below) is the lower-level alternative: tar+zstd, sha256-verified, finer-grained --only= flags. csm backup/restore is the convenience wrapper most operators want.

Hardening

Operator-driven mitigations applied to the host. Run csm harden with no arguments to print the available subcommands on the current host (the audit detects kernel build, panel, and existing mitigations and only offers what’s relevant). Background, full list, and live-detection details: CVE Mitigations.

CommandDescription
csm hardenPrint the hardening menu for this host.
csm harden --copy-failApply the CVE-2026-31431 (Copy Fail) modprobe mitigation: blacklist algif_aead + af_alg, unload them. Refuses on built-in-AF_ALG kernels.
csm harden --copy-fail-seccompApply the CVE-2026-31431 seccomp mitigation: write systemd RestrictAddressFamilies=~AF_ALG drop-ins for LiteSpeed, Apache/Nginx, every PHP-FPM pool, cron, and mail units. The right path on built-in-AF_ALG kernels (typical cPanel/CloudLinux 8).

Remediation

CommandDescription
csm clean <path>Clean infected PHP file (backs up original)
csm db-clean --option <account> <option_name> [--preview]Sanitize malicious WordPress option values (e.g. injected siteurl / home)
csm db-clean --revoke-user <account> <user_id> [--demote] [--preview]Revoke or demote a compromised WordPress admin and invalidate their sessions
csm db-clean --delete-spam <account> [--preview]Purge spam comments and trackbacks from a WordPress account
csm db-clean --drop-object <account> <schema> <type> <name> [--preview]Drop a MySQL trigger / event / stored procedure / stored function, capturing its CREATE SQL into the db_object_backups bbolt bucket first. <type> must be trigger, event, procedure, or function. <schema> must match a database discovered for <account>. Daemon must be stopped.
csm enable --php-shieldEnable PHP runtime protection
csm disable --php-shieldDisable PHP runtime protection

State database

CommandDescription
csm store compactReclaim unused space in the bbolt state file (atomic rename over the live DB). Requires the daemon to be stopped (systemctl stop csm) because bbolt holds an exclusive file lock while running.
csm store compact --previewSnapshot into a temp file next to the live DB and print src/dst sizes without replacing anything. Use to estimate reclaim before scheduling a maintenance window.
csm store export <path>Write a tar+zstd backup containing the bbolt store, the state directory, and the signature-rules cache. A sibling <path>.sha256 companion file holds the archive hash for verification. Daemon must be running.
csm store import <path>Restore from a backup archive. Daemon must be stopped. Default restores everything; --only=baseline restores only state JSON files (file hashes); --only=firewall merges only firewall buckets into the existing bbolt; --force-platform-mismatch allows restoring an archive captured on a different OS / panel / web server.
csm store reset-bot-verifyDrop cached bot PTR verification results so the next scan re-runs reverse DNS checks. Requires the daemon to be stopped because bbolt holds an exclusive file lock while running.
csm export --since <when>Dump audit-log events for SIEM backfill. <when> is RFC 3339 (2026-04-01T00:00:00Z) or a duration relative to now (24h, 7d). One JSON event per line on stdout, in the same v=1 schema the live audit_log sinks emit. Pipe to a file or directly into a log shipper. Daemon must be running.

Updates

CommandDescription
csm update-rulesDownload latest signature rules
csm update-geoipUpdate MaxMind GeoLite2 databases

PHP-relay (mail abuse, cPanel only)

Operator controls for the email PHP-relay detector. Talks to the daemon’s control socket; the daemon must be running. See Real-time detection for what the detector fires on, and Auto-response for the freeze action.

CommandDescription
csm phprelay statusPrint the detector’s current state as JSON: enabled, platform, effective dry-run + source (runtime/bbolt/csm.yaml), Path 2b effective account limit, scripts/IPs/accounts tracked, msgID-index size, active ignores. Use to confirm the watcher is wired on a fresh install.
csm phprelay ignore-script <scriptKey> [--for-hours N] [--persist] [--reason ...]Suppress all 4 paths for a host:/path scriptKey. Default TTL 168h (7d). --persist writes to the bbolt phprelay:ignore bucket so the suppression survives daemon restarts; without it the entry is in-memory only. <scriptKey> is the value the daemon prints in email_php_relay_abuse findings (e.g. shop.example.com:/wp-admin/admin-ajax.php).
csm phprelay unignore <scriptKey> [--persist]Remove an active ignore. --persist also deletes the bbolt row.
csm phprelay ignore-listList all active ignores as JSON: scriptKey, expiresAt, addedBy, reason.
csm phprelay dry-run on|off|reset [--persist]Override the auto-freeze dry-run state at runtime. on = freeze findings emitted but no exim -Mf runs; off = live freezes; reset clears the runtime override and falls back to bbolt or csm.yaml. Precedence: runtime > bbolt > yaml. --persist writes the on/off choice to the bbolt phprelay:settings bucket so it survives restarts; on reset --persist the bbolt row is also deleted.
csm phprelay thaw <msgID>Manually thaw a frozen Exim message. Wraps exim -Mt with msgID validation (rejects anything that isn’t [A-Za-z0-9-]{16,32}) and writes a thaw entry to the auto-freeze JSONL audit at /var/log/csm/php_relay_audit.jsonl.

Firewall

See Firewall for the full reference.

csm firewall status
csm firewall deny <ip> [reason]
csm firewall allow <ip> [reason]
csm firewall tempban <ip> <dur> [reason]
csm firewall deny-subnet <cidr> [reason]
csm firewall grep <pattern>
csm firewall flush
csm firewall rollback status|confirm|revert
# ...

Real-Time Detection

CSM detects threats in under 2 seconds using three kernel-level watchers running inside the daemon.

fanotify File Monitor (< 1 second)

Monitors /home, /tmp, /dev/shm for filesystem events.

Detects:

  • Webshell creation (PHP files in web directories)
  • PHP in uploads, languages, upgrade directories
  • PHP in .ssh, .cpanel, mail directories (critical escalation)
  • Executable drops in .config
  • .htaccess injection (auto_prepend, eval, base64 handlers)
  • .user.ini tampering
  • Obfuscated PHP (encoded, packed, concatenated)
  • Fragmented base64 evasion ($a="base"; $b="64_decode" – function name split across variables)
  • Concatenation payloads (hundreds of $z .= "xxxx" lines with eval at end)
  • Tail scanning: payloads appended to the end of large legitimate PHP files (beyond the 32KB head window)
  • CGI backdoors: Perl, Python, Bash, Ruby scripts in web directories (e.g., LEVIATHAN toolkit)
  • SEO spam: gambling/togel dofollow link injection in PHP/HTML files
  • Phishing pages and credential harvest logs
  • Phishing kit ZIP archives
  • YAML signature matches (PHP, HTML, .htaccess, .user.ini)
  • YARA-X rule matches (if built with -tags yara)

Features:

  • Per-path alert deduplication (30s cooldown)
  • Process info enrichment (PID, command, UID)
  • Auto-quarantine on high-confidence matches (category + entropy validation)

inotify Log Watchers (~2 seconds)

Tails auth, access, and mail logs in real-time. The exact file paths are chosen per platform at daemon startup – see the platform: ... line in the daemon log.

LogPlatformsWhat it detects
cPanel session log (/usr/local/cpanel/logs/session_log)cPanel onlyLogins from non-infra IPs, password changes, File Manager uploads
cPanel access log (/usr/local/cpanel/logs/access_log)cPanel onlycPanel-API auth patterns
Auth logAllSSH logins and failures. /var/log/auth.log on Debian/Ubuntu, /var/log/secure on RHEL family and cPanel
Exim mainlog (/var/log/exim_mainlog)cPanel; non-cPanel when the file existsMail anomalies, queue issues, SMTP brute force, probe abuse, and cloud relay abuse
Apache/LiteSpeed/Nginx access logAllWordPress brute force (wp-login.php, xmlrpc.php), real-time. Paths: /var/log/apache2/access.log (Debian), /var/log/httpd/access_log (RHEL), /var/log/nginx/access.log (Nginx), /usr/local/apache/logs/access_log (cPanel)
Mail log (platform file or journal)All hosts with Postfix/Dovecot logsIMAP/POP3/ManageSieve account compromise and mail brute-force
FTP log (/var/log/messages)cPanel onlyFTP logins and failures
ModSecurity error logAll (if ModSec installed)WAF blocks and attacks. Auto-discovered from the detected web server
Nginx error log (/var/log/nginx/error.log)Nginx hostsGeneral web errors, ModSecurity denies

cPanel-only log watchers are not registered on non-cPanel hosts, so you will not see “not found, retrying every 60s” warnings for them on plain Ubuntu or AlmaLinux.

SMTP / Dovecot Brute-Force Tracker

Detects credential stuffing, password spray, and raw SMTP probe storms. Runs as part of the Exim mainlog watcher on cPanel hosts and on non-cPanel Exim hosts where /var/log/exim_mainlog exists.

Four attack patterns:

SignalWhat triggers itAuto-response
smtp_bruteforceA single attacker IP exceeds the per-IP failed-auth threshold within the configured windowIP blocked via nftables
smtp_probe_abuseA single attacker IP exceeds the raw SMTP connect-rate threshold before AUTHIP blocked via nftables
smtp_subnet_sprayMultiple distinct attacker IPs from the same /24 subnet exceed the subnet thresholdEntire /24 subnet blocked via nftables
smtp_account_sprayMany distinct attacker IPs targeting the same mailbox exceed the account thresholdVisibility finding only. No auto-block, because attackers span many subnets and no single-IP action helps

Tunable via the thresholds.smtp_bruteforce_* and thresholds.smtp_probe_* keys in csm.yaml. Infrastructure IPs (from infra_ips) are never counted or blocked.

Cloud-Relay Credential Abuse

Detects authenticated outbound Exim deliveries where the same mailbox is sending through public-cloud relay sources. The realtime Exim mainlog watcher evaluates new accepted deliveries, and a bounded startup replay covers recent lines already on disk.

The finding is email_cloud_relay_abuse. Auto-response actions follow the global dry-run and block settings plus the email hold path. Operators with legitimate cloud mailers can opt out specific mailboxes or domains under email_protection.cloud_relay, or use email_protection.high_volume_senders for known high-volume senders.

Mail Auth Brute-Force Tracker

Detects credential stuffing and password spray against IMAP, POP3, and ManageSieve. Runs through the mail_logs reader: file source uses /var/log/mail.log on Debian-family hosts and /var/log/maillog on RHEL-family and cPanel hosts, while journal source reads configured Postfix/Dovecot units. The wrapper composes with the existing geo-based login monitor, so email_suspicious_geo keeps firing for successful logins from novel countries.

Four attack patterns:

SignalWhat triggers itAuto-response
mail_bruteforceA single attacker IP exceeds the per-IP failed-auth threshold within the configured windowIP blocked via nftables
mail_subnet_sprayMultiple distinct attacker IPs from the same /24 subnet exceed the subnet thresholdEntire /24 subnet blocked via nftables
mail_account_sprayMany distinct attacker IPs targeting the same mailbox exceed the account thresholdVisibility finding only. No auto-block, because attackers span many subnets and no single-IP action helps
mail_account_compromisedA successful login comes from an IP that just failed auth against the same accountIP blocked immediately. Rotate the password and revoke sessions

Tunable via the thresholds.mail_bruteforce_* keys in csm.yaml. Independent from the SMTP tracker so the Dovecot noise floor can be tuned separately. Infrastructure IPs are never counted or blocked.

Admin-Panel Brute-Force Tracker

Counts repeated POST requests to high-value non-WordPress admin login endpoints. Runs as part of the web access-log watcher.

Covered endpoints (tight set to avoid false positives on shared hosting):

  • phpMyAdmin: /phpmyadmin/index.php, /pma/index.php, /phpMyAdmin/index.php
  • Joomla: /administrator/index.php

When an IP crosses the POST-rate threshold, admin_panel_bruteforce fires and the attacker IP is auto-blocked.

Drupal /user/login and Tomcat Manager /manager/html are intentionally out of scope here. Drupal’s path is too generic on shared hosting, and Tomcat Manager uses HTTP Basic auth (repeated GET requests with 401 responses), not POST form submissions. Both need different detectors and are tracked as follow-up work.

PHP-Relay (Mail Abuse, cPanel Only)

Real-time inotify watcher on /var/spool/exim/input catches WordPress contact-form spam relays where an attacker uses PHPMailer (or similar) with a spoofed From, an external Reply-To, and a script URL that doesn’t belong to the cPanel account. The occonsultingcy incident (2026-04) drove the design: a legitimate site running a vulnerable contact-form plugin became a per-message spam relay through the operator’s own mail account.

The detector runs four paths and only fires email_php_relay_abuse (Critical) when one of them crosses threshold. Paths 1 and 2 are scoped per-script, using the host:/path from the X-PHP-Script Exim header. Path 2b is per cPanel user. Path 4 is per HTTP source IP across distinct scripts.

PathWhat triggers itWhy it exists
Path 1: header scorePer-script: From domain not in the account’s authorised domains AND additional signal (PHPMailer / suspicious Reply-To / suspicious User-Agent), evaluated over a rolling 5-min window once the script has emitted at least header_score_volume_min messagesThe shape that matched the original incident: spoofed sender, contact-form-style. FromMismatch is a HARD precondition – the score never accumulates without it
Path 2: absolute volume per scriptA single script emits more than absolute_volume_per_hour messages in the last hourCatches a compromised script even if the headers themselves are legit-shaped
Path 2b: account log-tail volumePer cPanel user: more than effective_account_limit outbound messages through the redirect_resolver router in the last hour. The effective limit is auto-derived from /var/cpanel/cpanel.config’s maxemailsperhour (60% of it, clamped to 20-60), capped at 95% of the cPanel limit when an operator override is setBackstop for when Path 2 misses the window. Reads /var/log/exim_mainlog directly; only fires on lines tagged B=redirect_resolver so forwarders don’t trip it
Path 4: HTTP-IP fanoutPer HTTP source IP: one source IP appears in more than fanout_distinct_scripts distinct script keys in fanout_window_min minutes, after excluding loaded HTTP-proxy ranges, loopback, and the host’s own interface addressesCatches one client walking many scripts while avoiding CDN/proxy traffic and local cron or panel callbacks

Path 5 (behavioural baseline) is deferred to Stage 2.

The detector starts a one-shot retrospective scan of exim_mainlog at daemon startup so Path 2b can fire on history already on disk. IN_Q_OVERFLOW triggers a bounded recovery walk of the spool (capped at 1000 files; if more were skipped, a email_php_relay_overflow_scan_truncated Critical fires too – Path 2b backstops the missed messages).

Operator suppressions (csm phprelay ignore-script <host:/path>) short-circuit the pipeline before any path scoring runs, so a known-noisy contact form can be opted out individually without disabling the detector. See PHP-relay CLI for the full operator surface.

PAM Brute-Force Listener

Real-time authentication monitoring across all PAM-enabled services.

  • SSH login tracking with geolocation
  • cPanel, FTP, and webmail authentication
  • Credential stuffing / password spray breadth: one source IP failing against many distinct accounts inside thresholds.multi_ip_login_window_min. The finding is credential_stuffing; tune the account floor with thresholds.cred_stuffing_distinct_accounts (default 5).
  • Blocks IPs within seconds of threshold breach
  • Integrates with the nftables firewall for instant blocking

Process Context

Exec and outbound-connection findings carry an optional process object with PID, PPID, UID, user, cPanel account (when known), comm, exe, sanitized cmdline, and a parent chain up to depth 5. The chain is materialized from an in-memory LRU+TTL cache (cap 16384 entries, 30-minute TTL) populated from BPF exec events. Cache misses trigger a bounded async /proc read, so process-context enrichment does not add blocking work to the connection event loop. When neither cache nor enricher has data (e.g., a process that exited before userspace reads its event), the process field is omitted entirely and the finding still emits.

Counters exposed at /metrics:

  • csm_process_context_cache_entries
  • csm_process_context_cache_evictions_total (LRU)
  • csm_process_context_cache_ttl_purges_total
  • csm_process_context_cache_misses_total (includes TTL purges)
  • csm_process_context_enrich_queue_drops_total
  • csm_process_context_enrich_reads_total
  • csm_process_context_enrich_errors_total
  • csm_process_context_enrich_stale_total
  • csm_process_context_enrich_latency_seconds

Caveats:

  • started_at is emitted only when the event source supplies a trustworthy start timestamp. Phase 1 does not infer process start time from procfs directory metadata. A future refinement may add /proc/<pid>/stat field 22
    • /proc/stat btime for kernel-tick precision.
  • After daemon restart, the csm_process_context_enrich_* counters may show a small enqueued - reads delta. Pending requests in the enricher queue are dropped on shutdown by design.
  • Hosts without BPF support fall back to /proc/net/tcp[6] polling. That path has no PID, so emitted findings do not carry a process field. A future refinement could resolve the socket inode to a PID via /proc/<pid>/fd, but that is out of scope for Phase 1.

HTTP Flood, UA Spoof, and Distributed Flood

http_request_flood, http_ua_spoof, and http_distributed_flood are periodic, not real-time. They run inside the same wp_bruteforce scheduled check that scans per-vhost access logs every 10 minutes. A real-time inotify tailer would need to hold per-IP state across log rotations and is out of scope for the initial release (see the plan non-goals). For attack types where sub-minute response matters, the access-log inotify watcher already covers wp_login_bruteforce and xmlrpc_abuse; the periodic scan adds volume-based rate enforcement and per-vhost distributed attack rollups on top.

Direct SMTP Egress

Outbound connections to SMTP ports from non-MTA local processes emit a direct_smtp_egress finding. See Direct SMTP egress for the full rule set, config schema, and metric.

Direct SMTP egress

CSM watches the local mail stack via spool + log scanning. Non-MTA processes that open outbound SMTP connections directly bypass that path. The direct SMTP egress detector catches that at connect time and feeds the incident correlator from Phase 2.

What fires

A finding with check: "direct_smtp_egress" is emitted when:

  • A non-root process opens an outbound TCP connection.
  • Destination port is one of the configured SMTP ports (default 25, 465, 587).
  • Destination IP is not loopback, infra, or in the operator’s infra_ips list.
  • The process user is NOT a known MTA user (mail, mailnull, postfix, dovecot, dovenull, mailman, plus exim on cPanel).

Process names are never a standalone allow condition. A hosted account renaming malware to smtp or smtpd still emits a finding.

The detector always emits findings when enabled. The dry_run knob does not suppress findings; it participates in the Phase 4 BPF enforcement gate, where any dry_run=true layer keeps kernel denial in observe-only mode.

Configuration

detection:
  direct_smtp_egress:
    enabled: true
    backend: auto       # auto / bpf / legacy / none
    dry_run: true       # safety default for detector-scoped action
    ports:             # each value must be 1-65535
      - 25
      - 465
      - 587

Backends

  • auto – allow both BPF and legacy scan paths. Live backend choice still follows detection.connection_tracker_backend.
  • bpf – emit only from the cgroup/connect4,6 consumer.
  • legacy – emit only from the /proc/net/tcp[6] polling path (live poller or scheduled critical scan). This path lacks PID/comm; MTA matching is user-only.
  • none – detector disabled even when enabled: true is set elsewhere; useful for staged rollout.

The generic outbound connection tracker is still governed by detection.connection_tracker_backend; this setting only gates direct_smtp_egress findings.

Metric

csm_direct_smtp_egress_findings_total – monotonic counter, incremented per finding emitted by the BPF connection consumer. The legacy poller does not bump this counter today; operators who run backend=legacy should track findings via the audit log.

rDNS enrichment

When the BPF backend is active, finding details include a Domain field populated from a TTL-cached reverse lookup (30 min TTL, 1 second per-lookup deadline). The lookup runs only after the cheap direct-SMTP filters match. On resolver miss or timeout the field is omitted; the finding still fires.

Caveats

  • 2525 is intentionally NOT in the default port list. Many operators run unrelated services on it. Add it to ports if your infra uses it for submission.
  • The detector emits regardless of the dry_run knob. Kernel denial requires auto_response.dry_run, this dry_run key, and bpf_enforcement.dry_run to all be explicitly false.

BPF cgroup-deny enforcement

Phase 4 of the BPF Incident Response Roadmap. Optional in-kernel denial of outbound connections that match a Phase 3 detection (direct SMTP egress is the only gate landed today). Defaults are all-safe; operators flip live denial only after Phase 3 telemetry review.

What it does

When bpf_enforcement.enabled=true, direct_smtp_egress=true, the connection tracker is running on BPF, and all dry-run layers are false:

  • The cgroup/connect4 + cgroup/connect6 BPF program inspects each outbound TCP connect.
  • If destination port is in the protected set AND the source UID is not in the safe-UID map AND the gated detector matches, the program returns 0 (kernel denies the connect).
  • Userspace observes the decision via the decision field on the ringbuf event and emits an audit-log entry.

When any dry-run layer is true (the default), the program emits the decision but always returns 1 (allow). Operators can run dry-run for as long as they need to gather telemetry before flipping to live denial.

What it does NOT do

  • It does NOT wait on remote verdict callbacks in-kernel. That would add HTTP latency to every connect. The verdict callback (if enabled) runs in userspace after the BPF decision and enriches the emitted finding; it cannot undo a kernel denial.
  • It does NOT enforce on UDP, ICMP, or non-cgroup paths.
  • It does NOT replace any Phase 3 detection. Detections still run regardless; enforcement is a separate, layered control.

Configuration

bpf_enforcement:
  enabled: false              # master switch; default off
  dry_run: true               # safety default; flip after telemetry review
  direct_smtp_egress: false   # gate enforcement on the Phase 3 detector
  verdict_callback: false     # userspace post-decision callback

bpf_enforcement.enabled=true requires at least one feature gate. Today the only gate is direct_smtp_egress, which itself requires detection.direct_smtp_egress.enabled=true. The connection tracker backend must be auto or bpf, and the direct SMTP backend must be auto or bpf.

Kernel requirements

  • Linux >= 4.10 with CONFIG_CGROUP_BPF=y.
  • cgroup/connect4 and cgroup/connect6 BPF program types.
  • The capability surface bpf_enforcement.available.v1 is the wire signal that the binary supports the feature; combined with bpf_enforcement_active on the health snapshot, operators can detect both feature presence and runtime state.

On older kernels or default builds without the BPF tag, detection.connection_tracker_backend: auto falls back to the legacy /proc/net/tcp[6] poller. In that state direct SMTP findings still work when detection.direct_smtp_egress.backend is auto or legacy, but BPF enforcement is inactive.

When CSM attempts BPF and cannot start it, it emits a bpf_unavailable finding. The message reports whether the daemon is running on a fallback backend or has no live fallback active.

Metrics

  • csm_bpf_enforcement_decisions_total{decision="allow|dry_run|deny"}
  • csm_bpf_enforcement_uid_map_refresh_total – successful periodic refreshes of the safe-UID BPF map.
  • csm_bpf_enforcement_uid_map_refresh_failures_total – failed refreshes (e.g. /etc/passwd unreadable).

Dry-run precedence

Three independent dry_run knobs interact:

  1. auto_response.dry_run (global): suppresses every automatic action (firewall block, kill, etc.).
  2. detection.direct_smtp_egress.dry_run: detector-scoped action knob.
  3. bpf_enforcement.dry_run: kernel-side denial knob.

Rule: any dry_run=true wins. Live denial requires all three to be false at the layer they apply, plus a BPF runtime backend. Defaults are dry_run=true everywhere on first install.

Rollout recipe

  1. Phase 3 detector enabled, no Phase 4 wiring. Watch csm_direct_smtp_egress_findings_total for a week.
  2. Phase 4 enabled with dry_run: true. Watch csm_bpf_enforcement_decisions_total{decision="dry_run"} and confirm dry-run denials track expected hosted-account egress.
  3. Phase 4 dry_run=false on a single canary host. Audit incidents for false positives.
  4. Roll out to fleet.

Critical Checks

Critical checks run every 10 minutes. Typical wall-clock cost on a busy shared host is a few seconds; the runner enforces the 10-minute cadence even when a tick takes longer.

Process & System

CheckDescription
fake_kernel_threadsNon-root processes masquerading as kernel threads (rootkit indicator)
suspicious_processesReverse shells, interactive shells, GSocket, suspicious executables
php_processesPHP process execution, working dirs, environment variables
shadow_changes/etc/shadow modification outside maintenance windows
uid0_accountsUnauthorized root (UID 0) accounts
kernel_modulesKernel module loading (post-baseline)
af_alg_socket_useAF_ALG socket use that may indicate Copy Fail exploit activity
af_alg_enforcementAF_ALG hardening policy drift and correction status

SSH & Access

CheckDescription
ssh_keysUnauthorized entries in /root/.ssh/authorized_keys
sshd_configSSH hardening (PermitRootLogin, PasswordAuthentication, etc.)
ssh_loginsSSH access anomalies with geolocation
api_tokenscPanel/WHM API token usage
whm_accessWHM/root login patterns, multi-IP access
cpanel_loginscPanel login anomalies, multi-IP correlation
cpanel_filemanagerFile Manager usage for unauthorized access

Network

CheckDescription
outbound_connectionsRoot-level outbound to non-infra IPs (C2, backdoor ports)
user_outboundPer-user outbound connections (non-standard ports)
bad_asn_outboundOutbound connection whose destination resolves (via GeoLite2-ASN) to a bad or unexpected autonomous system. Config detection.bad_asn_outbound: blocked_asns (always bad) and/or allowed_asns (allowlist mode – anything outside is bad). Classified for every process including root (the periodic connection scan); non-root connections are also flagged in real time by the live BPF tracker. Off by default; the third leg of the host_takeover incident chain
dns_connectionsDNS exfiltration and suspicious queries
firewallFirewall status and rule integrity

Brute Force & Auth

CheckDescription
wp_bruteforceWordPress login brute force (wp-login.php, xmlrpc.php)
http_ua_spoofIP claiming a search-engine bot UA (Googlebot, Bingbot, Applebot) that fails reverse-DNS verification, or exceeding the per-IP spoof threshold for scripting/headless/empty UAs when those opt-in flags are enabled
http_distributed_floodMany already-abusive HTTP source IPs hitting the same vhost in one scheduled scan window
ftp_loginsFTP access patterns and failed auth
webmail_loginsRoundcube/Horde access anomalies
api_auth_failuresAPI authentication failure patterns

Email

CheckDescription
mail_queueMail queue buildup (spam outbreak indicator)
mail_per_accountPer-account email volume spikes

Data & Integrity

CheckDescription
crontabsSuspicious cron jobs and scheduled commands
mysql_usersMySQL user accounts and privileges
database_dumpsDatabase exfiltration attempts
exfiltration_pasteConnections to pastebin/code-sharing sites

Threat Intelligence

CheckDescription
ip_reputationIPs against external threat databases and optional rspamd history
local_threat_scoreAggregated score from internal attack database
modsec_auditModSecurity audit log parsing

Performance

CheckDescription
perf_loadCPU load average thresholds
perf_php_processesPHP process count and memory
perf_memorySwap usage and OOM killer activity

Health

CheckDescription
healthDaemon health, binary integrity, required services

Platform Support

Runs on every supported platform unless noted below. The daemon auto-detects OS and panel at startup and silently skips cPanel-specific checks on plain Linux hosts (no “not found” spam).

cPanel-only (skipped on plain Ubuntu/AlmaLinux):

  • api_tokens, whm_access, cpanel_logins, cpanel_filemanager – read WHM API and cPanel session logs
  • wp_bruteforce – iterates /home/*/public_html/*/wp-login.php and per-domain access logs. The domlog pass ranks recent logs first and honors thresholds.domlog_max_files, thresholds.domlog_tail_lines, and thresholds.domlog_max_age_min.
  • webmail_logins – parses cPanel Roundcube/Horde logs
  • mail_queue, mail_per_account – read Exim queue and /var/log/exim_mainlog

Plain Linux equivalents that still provide coverage:

  • Access log brute-force detection (wp_login_bruteforce, xmlrpc_abuse) runs against the detected web server’s access log (/var/log/nginx/access.log or /var/log/httpd/access_log), so WordPress brute-force alerts still fire on non-cPanel hosts – they just rely on the live log watcher rather than per-domain domlog scanning.
  • modsec_audit runs on any host with ModSecurity installed.
  • ssh_logins, SSH brute force, PAM listener, firewall, kernel modules, RPM/DEB integrity, and threat intelligence all run on every supported platform.

Deep Checks

Deep checks run every 60 minutes and cover thorough filesystem, CMS, email, and database scans.

Filesystem

CheckDescription
filesystemBackdoors, hidden executables, suspicious SUID binaries
webshellsKnown webshell patterns (c99, r57, b374k, etc.)
htaccess.htaccess injection (auto_prepend_file, eval, base64 handlers) plus seven hardened per-pattern detectors – htaccess_php_in_uploads, htaccess_auto_prepend, htaccess_user_agent_cloak, htaccess_spam_redirect, htaccess_filesmatch_shield, htaccess_header_injection, htaccess_errordocument_hijack. Auto-cleaning gated by auto_response.clean_htaccess.
file_indexIndexed file listing to detect new/unauthorized files
php_contentSuspicious PHP functions (exec, eval, system, passthru)
group_writable_phpWorld/group-writable PHP files (privilege escalation)
symlink_attacksSymlink-based privilege escalation attempts

WordPress

CheckDescription
wp_coreCore file integrity via official WordPress.org checksums
nulled_pluginsCracked/nulled plugin detection
outdated_pluginsPlugins with known CVEs
db_contentDatabase injection, siteurl hijacking, rogue admins, spam. Multisite-aware: when wp-config.php declares define('MULTISITE', true), secondary blogs (wp_<N>_options / wp_<N>_posts for active blog IDs from wp_blogs) are scanned alongside the unprefixed main-site tables.
db_content_joomlaJoomla database content scanning. Discovers installs via configuration.php containing class JConfig, parses credentials from public $...; assignments. Scans <prefix>extensions params, <prefix>content article bodies, and joins <prefix>users with <prefix>user_usergroup_map for Super User detection (group_id=8). Findings: joomla_extensions_injection, joomla_content_injection, joomla_admin_injection.
db_content_drupalDrupal 8+ database content scanning. Discovers installs via sites/default/settings.php plus the core/lib/Drupal.php marker. Credentials parsed from the $databases array. Scans config, node_revision__body, and users_field_data joined with user__roles (administrator role). Findings: drupal_settings_injection, drupal_content_injection, drupal_admin_injection. Drupal 7 not yet covered.
db_content_magentoMagento 1.x and 2.x database content scanning. Discovers installs via app/etc/env.php (M2, preferred) or app/etc/local.xml (M1). Credentials parsed via encoding/xml for M1 (CDATA-aware) or field-level regex for M2. Scans core_config_data, catalog_product_entity_text, cms_block, cms_page, and admin_user (with the configured db.prefix). Findings: magento_settings_injection, magento_content_injection, magento_admin_injection.
db_content_opencartOpenCart database content scanning. Discovers installs via the config.php + admin/config.php pair both containing define('DB_DRIVER'. Credentials parsed from DB_HOSTNAME / DB_USERNAME / DB_PASSWORD / DB_DATABASE / DB_PREFIX defines. Scans <prefix>setting (config_url / config_ssl are canonical hijack targets), <prefix>product_description, <prefix>information_description, and <prefix>user (admin/staff). Findings: opencart_settings_injection, opencart_content_injection, opencart_admin_injection.
db_objectsMySQL persistence mechanisms: triggers, events, stored procedures, stored functions. Critical when the body matches known-malware patterns (sys_+exec, INTO OUTFILE, LOAD_FILE, etc.); Warning when an object exists at all (vanilla CMSes ship none). Toggle with detection.db_object_scanning; suppress Warnings via detection.db_object_allowlist. Manual drop via csm db-clean --drop-object.
admin_overlapWordPress administrator email overlap across cPanel accounts. Reports when the same admin email appears on the configured number of accounts, with reviewed emails and domains suppressible in detection.
credential_reuseWordPress administrator password-hash reuse across cPanel accounts. Groups identical hashes with an in-memory fingerprint and reports only the affected accounts and count.
supply_chainComposer and npm lockfile advisory matching against the local advisory database. Silent when no advisory file is present.

CMS Scanner Support Policy

New CMS scanner work targets upstream-supported major versions. EOL versions are best-effort when the existing scanner covers them through the same low-risk layout or schema. Adding a new EOL-only scanner needs operator fleet data and an explicit security reason.

Current scanner scope:

  • WordPress single-site and multisite.
  • Joomla installs using the common configuration.php / JConfig layout and standard content/user tables used by supported Joomla releases.
  • Drupal 8 and newer. Drupal 7 is not a planned support target.
  • Magento 1 and 2.
  • OpenCart installs using the standard storefront and admin config pair.

Phishing & Malware

CheckDescription
phishing8-layer phishing detection (kit directories, credential harvesting)
email_contentOutbound email body scanning for credentials and suspicious URLs

System Integrity

CheckDescription
rpm_integritySystem binary verification via rpm -V
open_basediropen_basedir restriction validation
php_config_changesphp.ini modifications

DNS & SSL

CheckDescription
dns_zonesDNS zone file changes (MX record hijacking)
ssl_certsSSL certificate issuance (subdomain takeover)
waf_statusWAF mode, staleness, bypass detection

Email Security

CheckDescription
email_weak_passwordEmail accounts with weak passwords
email_forwarder_auditForwarders redirecting to external addresses
email_mail_filtersExim mail filters that intercept mail (copy to an external address while keeping a local copy), forward externally, pipe to a command, or blackhole all mail

Performance

CheckDescription
perf_php_handlerPHP handler configuration (DSO vs CGI vs FPM)
perf_mysql_configMySQL my.cnf optimization
perf_redis_configRedis configuration
perf_error_logsError log file growth (bloat)
perf_wp_configWordPress wp-config.php settings
perf_wp_transientsWordPress database transient bloat
perf_wp_cronWordPress cron scheduling (missed crons)

Platform Support

The deep checks are the most cPanel-biased part of CSM because they iterate account home directories and per-user public_html trees. On plain Ubuntu/AlmaLinux the account-scan based checks do not run today:

cPanel-only (skipped on plain Linux):

  • htaccess, file_index, php_content, group_writable_php, symlink_attacks – iterate /home/*/public_html/**
  • wp_core, nulled_plugins, outdated_plugins, db_content – find WordPress installs under /home/*/public_html
  • supply_chain – scans composer.lock and package-lock.json under /home/* and /home/*/public_html
  • phishing, email_content – scan user home directories and Exim spool
  • dns_zones, ssl_certs – read cPanel’s DNS zone store and SSL installation records
  • email_weak_password, email_forwarder_audit – read /etc/valiases, Dovecot/Courier auth databases
  • email_mail_filters – read per-mailbox Exim filters under /home/*/etc/<domain>/<localpart>/filter and domain filters under /etc/vfilters
  • open_basedir, php_config_changes – read EA-PHP php.ini under /opt/cpanel/ea-php*/
  • perf_wp_config, perf_wp_transients, perf_wp_cron, perf_php_handler – WordPress and PHP handler introspection via cPanel’s EA-PHP layout

Runs on every platform:

  • filesystem, webshells – fanotify and file-tree scans over /home, /tmp, /dev/shm
  • rpm_integrity – dispatches to rpm -V on RHEL family or debsums / dpkg --verify on Debian family
  • waf_status – detects ModSecurity on Apache, Nginx, and LiteSpeed across all supported distros
  • perf_mysql_config, perf_redis_config, perf_error_logs – rely on standard service locations

Operators on plain Linux can opt a subset of the account-scan perf checks (perf_error_logs, perf_wp_config, perf_wp_transients) into scanning generic webroots by configuring the account_roots glob list in csm.yaml (see configuration.md). The remaining account-scan checks still assume the cPanel /home/*/public_html layout.

Auto-Response

When enabled, CSM automatically responds to detected threats. All actions are logged in the audit trail.

Actions

ActionDescription
Kill processesFake kernel threads, reverse shells, GSocket. Never kills root or system processes.
Quarantine filesMoves webshells, backdoors, phishing to /opt/csm/quarantine/ with full metadata (owner, permissions, mtime). Restoreable from the web UI.
Block IPsAdds attacker IPs to the nftables firewall with configurable expiry. Rate-limited by auto_response.max_blocks_per_hour (default 50/hour).
Clean malware7 strategies: @include removal, prepend/append stripping, inline eval removal, base64 chain decoding, chr/pack cleanup, hex injection removal, confirmed database cleanup.
Drop malicious DB objectsWhen clean_database is on, confirmed-malicious stored triggers/events/procedures/functions are dropped after a SHOW CREATE backup is recorded, so the drop is reversible. Detection runs regardless; the drop is gated on the operator opt-in.
PHP shieldBlocks PHP execution from uploads/tmp directories, detects webshell parameters.
PAM blockingInstant IP block on brute force threshold breach.
Subnet blockingAuto-blocks IPv4 /24 or IPv6 /64 when 3+ IPs from the same range attack.
Permblock escalationPromotes temporary blocks to permanent after N repeated offenses.
Auto-freeze (PHP relay)When the email PHP-relay detector fires (Path 1 / 2 / 4), runs exim -Mf against the message IDs the offending script is currently sending. Snapshots activeMsgs from the per-script window first, falls back to a spool walk if the snapshot was capped or if the finding is a late reputation event. Default dry-run; flip to live with csm phprelay dry-run off. Skips volume_account (per-cpuser, no scriptKey). Rate-limited to auto_response.php_relay.max_actions_per_minute (default 60). cPanel only. See PHP-relay CLI.

Configuration

auto_response:
  enabled: true
  kill_processes: true
  quarantine_files: true
  block_ips: true
  block_expiry: "24h"         # default temp block duration
  max_blocks_per_hour: 50     # per-IP blocks per hour; 0/omitted uses default
  netblock: true              # enable subnet blocking
  netblock_threshold: 3       # IPs from same IPv4 /24 or IPv6 /64 before subnet block
  permblock: true             # promote temp blocks to permanent
  permblock_count: 4          # temp blocks before promotion

  # SAFETY DEFAULT: dry_run defaults to TRUE when this key is absent.
  # In dry-run, BlockIP records the intended block to bbolt but does
  # NOT touch nftables. Manual operator commands (`csm firewall ...`)
  # bypass via BlockIPForce and always apply. Flip to false only after
  # verifying the policy in dry-run.
  dry_run: false

  # Advisory verdict callback. CSM POSTs each impending auto-block
  # to the panel before applying. The panel can downgrade to "allow"
  # (audit-only), attach `tenant_id` for downstream correlation, or
  # add a reason. CSM fails open on hook errors. Wire contract:
  # docs/verdict-callback-contract.md.
  verdict_callback:
    enabled: false
    url: ""                            # POST target
    hmac_secret: ""                    # signing secret, or use hmac_secret_env
    hmac_secret_env: ""
    allow_unsigned: false              # true only for staged unsigned rollouts
    require_response_signature: true   # reject unsigned callback replies
    timeout_sec: 2

  # PHP-relay auto-freeze (cPanel only). Off by default; opt in
  # explicitly. dry_run defaults to true even when freeze=true so an
  # operator who enables freeze without thinking gets a dry-run.
  php_relay:
    freeze: true                       # enable the exim -Mf hook
    dry_run: true                      # safe default; flip with `csm phprelay dry-run off`
    max_actions_per_minute: 60         # rolling 60s window cap on exim -Mf invocations

Dry-run safety default

auto_response.dry_run defaults to true when the key is absent. This is deliberate: an operator who turns on block_ips: true without thinking through policy gets recorded-but-not-applied blocks. The dry-run count surfaces in csm status --json and /api/v1/status so dashboards can verify the policy before flipping live. CSM clears those records when auto-response starts or reloads in live mode, and ages out records older than a week while dry-run remains enabled.

IP auto-blocking still requires firewall.enabled: true. The firewall engine owns both live nftables mutations and dry-run block records; with the firewall disabled there is no engine to call, so csm validate warns on auto_response.enabled: true plus block_ips: true.

Verify dry-run state explicitly:

csm status --json | jq '.severities, .blocklist_size'
csm firewall status   # "Recently Blocked" entries with timestamps after the restart confirm live mode

To go live: set dry_run: false, run csm rehash (twice, due to the circular hash), then restart or SIGHUP-reload (the field is hot-reload-safe).

Verdict callback (advisory)

When verdict_callback.enabled: true, every auto-block call POSTs a signed JSON request to the panel before mutating nftables. CSM refuses to start without hmac_secret or a non-empty hmac_secret_env value unless allow_unsigned: true is set for a staged unsigned rollout. Without that opt-in, an unsigned allow response is rejected and the default block continues. When a secret is configured, CSM also requires the panel to sign the response body unless require_response_signature: false is set for a staged rollout. With that opt-out, CSM still checks any echoed nonce or timestamp when a secret is configured; a legacy response that omits both keeps working. The panel can return {"verdict": "block"} (apply), {"verdict": "allow"} (audit-only; CSM logs the decision and skips nftables), or attach metadata (tenant_id, note). The callback runs after local validation and infra-IP safety checks, and before the dry-run gate, so panels can observe dry-run decisions too.

CSM fails open on hook errors (timeout, non-2xx, malformed body): the block continues as if the hook were disabled, or is recorded as dry-run when dry-run is active. The failure is written to the daemon log. Full request/response schema: docs/verdict-callback-contract.md.

Infrastructure IP DNS guard

Hostnames listed in top-level infra_ips or firewall.infra_ips are resolved every 5 minutes and their current addresses feed the infra auto-block guard. If a hostname stops resolving, the daemon emits an infra_ips_unresolvable Warning finding and keeps the last known addresses protected during the grace period (default 10 min). The finding auto-clears when resolution recovers.

Findings that always trigger IP block

When auto_response.block_ips: true and the firewall is enabled, the source IP is blocked for every finding in this list. The dry-run gate still applies if dry_run: true.

FindingDescription
wp_login_bruteforceWordPress login flood via wp-login.php
xmlrpc_abuseXML-RPC endpoint flood
http_request_floodPer-IP HTTP request volume exceeds threshold (disabled by default; enable by setting thresholds.http_flood_threshold > 0)
http_ua_spoofIP spoofing a search-engine bot UA or exceeding the UA anomaly threshold (periodic; see configuration.md for opt-in flags)
ftp_bruteforceFTP authentication flood
smtp_bruteforceSMTP authentication flood
smtp_probe_abuseRaw SMTP connect-rate flood before AUTH
mail_bruteforceIMAP/POP3/ManageSieve authentication flood
mail_account_compromisedSuccessful login from an IP that just failed auth on the same mailbox
admin_panel_bruteforcephpMyAdmin or Joomla admin POST flood
ssh_login_unknown_ipSSH login from an IP with no prior history
ssh_login_realtimeSSH login anomaly detected by realtime watcher
c2_connectionOutbound connection to a known C2 server
ip_reputationIP flagged by AbuseIPDB / rspamd / upstream threat-intel
local_threat_scoreIP crosses the aggregated internal attack-history threshold
modsec_block_escalationModSecurity deny escalation
waf_attack_blockedWAF high-volume attacker
email_compromised_accountEmail account compromise indicator
email_cloud_relay_abuseCloud relay abuse

Distributed HTTP flood rollups do not trigger a direct IP block because they describe one targeted vhost, not one source IP. The per-IP findings that feed the rollup still drive normal block decisions.

Safety Guards

  • Never kills root processes, system daemons, or cPanel services
  • Infrastructure IPs (infra_ips in config) are never blocked
  • Subnet blocks refuse the default route and any range that covers infrastructure, local host, allowed, or port-specific allowed IPs
  • Quarantined files preserve full metadata for restoration
  • Auto-quarantine requires high confidence: category match (webshell/backdoor/dropper) + entropy >= 4.8 or hex density > 20%. This prevents legitimate WordPress plugins from being quarantined.
  • IP block rate limited by auto_response.max_blocks_per_hour (default 50/hour) to prevent runaway blocking
  • CRITICAL alerts always bypass the email rate limit (default 30/hour)
  • Trusted countries (trusted_countries) suppress login alerts from expected geolocations

What CSM Detects in Real-Time

Beyond standard malware patterns, CSM detects advanced evasion techniques:

  • Fragmented function names: attackers split base64_decode across variables ($a="base"; $b="64_decode") to evade simple string matching
  • Appended payloads: malicious code added to the end of large legitimate files, beyond typical scan windows. Realtime PHP checks scan the first and last 32KB, and periodic PHP content analysis scans a larger head window plus the tail.
  • Non-PHP backdoors: Perl, Python, Bash CGI scripts in web directories (detects toolkits like LEVIATHAN)
  • SEO spam injection: gambling/togel dofollow link injection into theme files
  • WordPress brute force: real-time access log monitoring for wp-login.php and xmlrpc.php floods (blocks within seconds, not the 10-minute periodic scan)
  • Admin-panel brute force: same access-log path, tracks POSTs to /phpmyadmin/index.php, /pma/index.php, /phpMyAdmin/index.php, and Joomla /administrator/index.php. Emits admin_panel_bruteforce and auto-blocks the IP. Path matcher is intentionally tight to avoid false positives on shared hosting; Drupal and Tomcat Manager use different attack shapes and need separate detectors.
  • SMTP brute force and probes: tails /var/log/exim_mainlog on cPanel and non-cPanel Exim hosts where the file exists. Emits smtp_probe_abuse and smtp_bruteforce (per-IP, auto-blocks), smtp_subnet_spray (per-/24, auto-blocks the whole subnet), and smtp_account_spray (per-mailbox, visibility only).
  • Mail brute force: tails /var/log/maillog for direct IMAP, POP3, and ManageSieve auth failures. Composes with the existing geo-login monitor so email_suspicious_geo keeps working. Emits mail_bruteforce, mail_subnet_spray, mail_account_spray, and mail_account_compromised (the last one fires when a successful login arrives from an IP that just failed auth against the same mailbox; auto-blocks with no false positives by construction).

Dry-run precedence (Phase 4)

CSM has three independent dry_run knobs after Phase 4. Any dry_run that is true wins; live actions require all applicable knobs to be false.

LayerKnobDefaultEffect when true
Globalauto_response.dry_runtrueSuppress all automatic actions
Detectordetection.direct_smtp_egress.dry_runtrueSuppress detector-scoped action
Kernelbpf_enforcement.dry_runtrueBPF program emits decision but allows traffic

The kernel knob is consulted by the BPF program itself; the others gate userspace action paths. All three default to true on a first install so a configuration mistake cannot start blocking traffic.

Incidents

CSM groups related findings into Incident objects so operators see one escalating story per account, mailbox, or process instead of a stream of unrelated findings. Original findings are not mutated or suppressed – the Incident is layered on top.

Lifecycle

StatusMeaning
openActive. New findings for the same correlation key keep merging in.
containedOperator marked under control. Findings still merge in window.
resolvedClosed. Future findings start a new incident.
dismissedFalse positive. Future findings start a new incident.

Resolved and dismissed incidents are pruned 30 days after their last update. Open and contained incidents are never auto-pruned by the retention loop, but they may be auto-resolved by the per-kind idle threshold described under “Auto-close” below.

Auto-close

To stop the open-incident backlog from growing without bound on busy hosts, the daemon scans Open / Contained incidents shortly after startup and then once an hour, auto-resolving any whose updated_at exceeds the per-kind idle threshold. A live sweep closes at most 1000 stale incidents at a time; if more stale incidents remain, follow-up sweeps run every 30 seconds until the backlog drains. Dry-run sweeps still scan the full set so the counters show every would-close decision. Auto-resolved incidents carry closed_by: "auto:stale" and an incident_auto_closed action in their timeline so reporting can distinguish them from operator closes.

Defaults (configurable in csm.yaml):

incidents:
  auto_close:
    enabled: true            # set false to disable
    dry_run: false           # set true to log decisions without writing back
    by_kind:
      mailbox_takeover: 24h
      credential_spray: 24h
      web_account_compromise: 168h

Kinds absent from by_kind are never auto-closed. The default map omits host_integrity_risk, host_takeover, and post_exploit_process because those host-level incidents should stay open until an operator reviews them. host_takeover is the compound escalation raised when any two of three host-takeover legs (a new uid-0 account, a planted suid binary, an outbound connection to a bad ASN) are correlated for the same host inside the merge window.

If a fresh finding for the same correlation key arrives after the auto-close, the merge-window stale-binding logic creates a new open incident – nothing about auto-close blocks re-detection. History is preserved on the closed record.

Tuning on high-volume hosts. Each by_kind threshold is the idle time before a kind auto-resolves; they are independent and operator-set. A host under sustained brute-force keeps a large open set mostly from the longer-lived kinds (web_account_compromise defaults to 168h). If the open-incident count is higher than you want to triage, shorten the relevant by_kind entry (e.g. web_account_compromise: 72h) rather than disabling auto-close. The closed records are retained 30 days regardless, measured from when the incident resolves, so shortening the threshold also moves the eventual prune point earlier relative to the last finding. Auto-close still keeps a resolved record for follow-up instead of deleting history at close time.

Metrics: csm_incidents_auto_closed_total and csm_incidents_auto_close_dry_run_total.

Credential-spray suppression

Without this path, an attacker IP that brute-forces 6500 distinct usernames produces 6500 mailbox_takeover incidents because the correlator keys on the mailbox, not the source IP. The spray-suppression detector tracks the distinct-mailbox set per source IP across the merge window and, once an IP exceeds distinct_mailboxes, opens a single credential_spray super-incident keyed on the IP. Subsequent findings from that IP attach to the spray incident’s timeline instead of opening per-mailbox incidents.

Defaults (configurable in csm.yaml):

incidents:
  spray_suppression:
    enabled: false           # default OFF; opt-in
    dry_run: true            # default ON; counters move, routing unchanged
    distinct_mailboxes: 10   # threshold to trip
    severity_escalate_at: 50 # bump severity to CRITICAL at this many
    per_check:
      - email_auth_failure_realtime
      - pam_auth_failure
      - ssh_bruteforce
    max_tracked_ips: 10000
    block_at_severity: ""    # "" detection-only, "high" block on open,
                             # "critical" block on escalation

Setting block_at_severity hands the source IP to the firewall as soon as the spray detector trips at the chosen tier, once spray_suppression.dry_run is false. The detector also requires auto_response.enabled and auto_response.block_ips; the firewall still honors auto_response.dry_run, so a dry-run host logs the would-be block without applying nftables rules. Live accepted requests are recorded on the incident timeline as a credential_spray_block_requested action. Non-live outcomes (dry-run, verdict-allow, already blocked) and failed attempts do not latch the incident, so a later finding can retry after blocking is live again. Concurrent findings for the same incident share one in-flight firewall call, and resolved or dismissed spray incidents do not make new block decisions.

Whitelisted IPs (entries in reputation.whitelist and the live bbolt whitelist updated via the Web UI) are skipped from spray detection so internal mail relays, NAT egresses, and known-good infrastructure never produce a spray incident.

Choosing block_at_severity:

  • "" (default) – detection-only. Spray incidents open, no firewall hand-off. Use during dry-run validation and on hosts where blocking is owned by a separate system.
  • high – block at the distinct_mailboxes trip. Recommended once the dry-run counter looks clean. Trips on the first sustained burst before the source IP goes idle for longer than the merge window.
  • critical – block only after severity escalates, i.e. one IP hits severity_escalate_at distinct mailboxes before the source IP is idle for more than the merge window. A low-and-slow attacker that stays below that count before each idle reset never escalates and never blocks. Pick this only when you have strong shared-NAT exposure and accept that slow sprayers evade the gate.

Rollout:

  1. Ship the daemon with enabled: false, dry_run: true. The detector tracks per-IP mailbox sets and increments csm_credential_spray_dry_run_total whenever the threshold would have tripped, but routing stays on the legacy per-mailbox path.
  2. Validate the counter on your own infrastructure for 24h. If a trusted IP shows up in the dry-run trips, add it to reputation.whitelist.
  3. Flip enabled: true, dry_run: false. New attacker IPs route through the spray path; existing per-mailbox backlog drains via the auto-close path.
  4. After another 24h, set block_at_severity: high. The firewall hand-off runs on every spray decision (open + merge), so an incident opened before the flag was armed still blocks on the next finding from the same IP.

Metrics: csm_credential_spray_opened_total, csm_credential_spray_suppressed_mailbox_takeover_total, csm_credential_spray_dry_run_total, csm_credential_spray_tracked_ips.

Incident auto-block

spray_suppression only handles the credential_spray super-incident kind. Low-and-slow scanners that never trip a per-detector window (modsec escalation, mail brute-force, smtp probe) still produce mailbox_takeover or web_account_compromise incidents but never get firewalled. The incidents.auto_block block adds a generic incident-driven firewall hand-off:

incidents:
  auto_block:
    enabled: false           # default OFF; opt-in
    block_at_severity: ""    # "" / "high" / "critical"
    kinds: []                # empty = any non-spray kind with one source IP

When the gate trips, the correlator hands the source IP to the firewall through the same dry-run / block_ips gate as the spray path. A live accepted request records incident_block_requested; non-live outcomes (dry-run, verdict-allow, already blocked) do not latch the incident, so an operator who arms auto_block AFTER an incident has already crossed the gate still gets a block on the next finding while the incident is open or contained. Incidents with multiple source IPs are left for manual review. If a long-running incident’s timeline was truncated and the source IP is not part of the incident key, auto-block also stays off because the remaining visible timeline may not contain every source IP.

credential_spray is explicitly excluded from this path; the dedicated spray hand-off owns it. Set kinds to narrow the surface (e.g. only web_account_compromise) if you do not want every CRITICAL mailbox_takeover incident to block its source IP.

This pairs naturally with the ModSecurity escalation thresholds (thresholds.modsec_escalation_hits, thresholds.modsec_escalation_window_min) – raising the window from the shipped default of 10 minutes to e.g. 4 hours lets the modsec detector promote paced scanners to a Critical escalation finding, which then trips the generic auto_block gate.

Kinds

  • web_account_compromise – default for findings attributable to a hosted account or script (PHP relay, webshell, login bruteforce, etc.).
  • mailbox_takeover – SMTP/SASL, suspicious-login, credential-abuse, and rate signals tied to a mailbox or cPanel-local mail account.
  • post_exploit_process – process exec from /tmp, /var/tmp, /dev/shm.
  • host_integrity_risk – daemon/kernel-level signals (sensitive file writes, fake kernel threads, auditd disabled).
  • host_takeover – any two of a new uid-0 account, a planted suid binary, and an outbound connection to a bad ASN, seen for the same host inside the merge window.
  • credential_spray – one source IP brute-forcing many distinct mailboxes/accounts inside the merge window. Keyed on the source IP rather than per-mailbox, so a scanner spraying thousands of usernames produces one super-incident instead of thousands of mailbox_takeover rows. Findings from the same IP after the trip attach to this incident’s timeline. See “Credential-spray suppression” below.

Severity policy

Severity escalates only. Each incident keeps the highest severity any joined finding has carried. Findings themselves are never re-emitted at a higher severity. The audit trail records an incident_severity_changed action when an incident’s severity bumps.

Correlation window

15 minutes by default. Findings outside the window for the same key start a new incident. The window is a named constant in code; not yet exposed via config.

Open threshold

Non-Critical findings need at least two correlated sightings inside the merge window before an incident opens. The first sighting is held in a pending bucket and counted toward the threshold; the second promotes both into a new incident with a two-event timeline. Stale pending entries are pruned by the daily retention sweep.

Critical-severity findings (account compromise, cloud-relay abuse, modsec rule escalations) bypass the threshold and open immediately so escalations still page on first hit.

The threshold suppresses one-shot scanner noise (a single modsec deny from a wandering scanner, an isolated mistyped password) without hiding sustained activity. The current pending-bucket size is exposed as the csm_incidents_pending gauge.

The stored incident includes the full correlation key, including process PID/UID and remote IP when those are the only available dimensions, so active incidents keep merging after daemon restart.

API

  • GET /api/v1/incidents – list, newest first. Without query parameters the response is a bare JSON array (compat with the existing wire shape phpanel/SIEM consumers decode against). When ?limit=, ?offset=, or ?status= is present the response switches to an envelope: {"items":[...], "total":N, "offset":N, "limit":N, "status":"..."}. Status accepts the four spec values plus active (open + contained, the default web UI filter). Limit is capped server-side at a safe maximum.
  • GET /api/v1/incidents/<id> – one incident.
  • POST /api/v1/incidents/<id>/status – transition status.

See api.md for endpoint detail.

Web UI

Open Monitor -> Incidents. The page has three tabs:

  • Correlated – the default flat list of incidents with status filter, page size, and detail panel. The detail panel shows the current firewall block state for the incident’s source IP (permanent, temporary, cphulk, or not blocked) when an IP is known.
  • Grouped – rolls up incidents by (kind, source) so a credential spray that produced thousands of mailbox_takeover incidents shows as one row per attacker IP. Pageable with the same page-size selector as Correlated. Click a group to see member incidents in the detail panel, which also surfaces the source IP’s firewall block state; clicking a sample id jumps back to the Correlated tab focused on that incident.
  • Timeline Search – the older IP/account history search across the audit log.

Admin tokens can transition incident status (open / contained / resolved / dismissed); read-scope tokens can browse all three tabs.

Control socket

csm incidents list [--status all|active|open|contained|resolved|dismissed] [--limit N] [--offset N] [--all]
csm incidents show <id>
csm incidents status <id> <open|contained|resolved|dismissed> [details]
csm incidents bulk-status --older-than 24h [--last-seen-before RFC3339] [--status active|open|contained] [--kind K] [--domain D] [--account A] [--mailbox M] [--limit N] [--to resolved|dismissed] [--apply --confirm]

csm incidents list returns the first 100 incidents by default. Use --offset for the next page, --status active for open + contained incidents, or --all for an explicit full dump.

csm incidents bulk-status defaults to dry-run. It prints the total match count and a bounded preview of the incidents that would change. At least one age guard is required: --older-than, --last-seen-before, or both. To mutate incidents, pass both --apply and --confirm.

Metrics

  • csm_incidents_open – gauge of currently open + contained incidents.
  • csm_incidents_created_total
  • csm_incidents_severity_changed_total
  • csm_incidents_status_changed_total
  • csm_incidents_findings_merged_total
  • csm_incidents_compacted_total
  • csm_incidents_pending – gauge of findings held in the threshold gate, awaiting a second correlated sighting.

Incident Response Runbook

Use this flow when CSM flags account compromise, mailbox takeover, malicious database triggers, or outbound spam on a production cPanel host.

Safety rules

  • Do not delete customer files during first response.
  • Do not thaw, release, or purge queued mail until affected credentials are rotated or an operator approves the specific queue action.
  • Do not close incidents until the account was reviewed, credentials were rotated or explicitly deferred, and a fresh scan is clean.
  • Take a CSM backup before upgrading CSM or changing incident state.

1. Verify the deployed binary

Deploy only after the required GitLab pipeline passed and the package was published.

/root/deploy-csm.sh check
/root/deploy-csm.sh upgrade
/opt/csm/csm version
/opt/csm/csm doctor --json

2. Take a backup

mkdir -p /root/csm-backups
/opt/csm/csm backup /root/csm-backups/csm-pre-response-$(date +%Y%m%d%H%M%S).tar.gz
sha256sum /root/csm-backups/csm-pre-response-*.tar.gz

Confirm the archive is readable:

gzip -t /root/csm-backups/csm-pre-response-*.tar.gz
tar -tzf /root/csm-backups/csm-pre-response-*.tar.gz | sed -n '1,80p'

3. Preserve evidence

mkdir -p /root/csm-forensics
/opt/csm/csm forensic-snapshot <account> --out /root/csm-forensics/<account>-$(date +%Y%m%d%H%M%S).tar.gz
sha256sum -c /root/csm-forensics/<account>-*.sha256
tar -xOzf /root/csm-forensics/<account>-*.tar.gz manifest.txt

Check the manifest for private-path exclusions, schema count, capture errors, and recent_mtimes_status=ok.

4. Map affected accounts

Map incident domains and queued local senders to cPanel users before rotating credentials or changing mail queue state.

/opt/csm/csm incidents list --status open --all
exim -bpc
exim -bp | exiqsumm
grep -E '^example.com:' /etc/trueuserdomains /etc/userdomains
whmapi1 listaccts searchtype=user search=<account> --output=json

Use native cPanel APIs for inventory:

uapi --user=<account> Email list_pops --output=json
uapi --user=<account> Ftp list_ftp --output=json
uapi --user=<account> Mysql list_users --output=json

5. Rotate credentials

Rotate the cPanel account password, FTP accounts, affected mailboxes, WordPress administrator users, database users, and application secrets for the affected account. Prefer WHM and UAPI calls or the control panel over direct file edits.

Do this before releasing mail or marking incidents resolved unless the operator explicitly defers rotation for a documented reason.

6. Review queued mail

Start with read-only summaries:

exim -bpc
exim -bp | exiqsumm
exim -bp

Review headers before any queue action:

exim -Mvh <message-id>

Group messages into:

  • safe to remove: frozen bounces, obvious backscatter, duplicate failed delivery notices with no customer value
  • do not touch: current customer conversations, invoices, form leads, or any message where the business value is unclear
  • needs review: suspicious local sender messages, mixed external bulk mail, or messages tied to an account whose credentials are not rotated

Only remove or thaw message IDs that were reviewed:

exim -Mrm <message-id>
exim -Mt <message-id>

7. Review stale incidents

Preview first:

/opt/csm/csm incidents bulk-status --older-than 72h --status active --kind web_account_compromise --limit 20
/opt/csm/csm incidents bulk-status --older-than 24h --status active --kind mailbox_takeover --limit 20

Apply in bounded batches only after review:

/opt/csm/csm incidents bulk-status --older-than 72h --status active --kind web_account_compromise --limit 100 --apply --confirm --details "operator cleanup after review"

For mailbox incidents, confirm mailbox rotation or explicit operator deferral before applying status changes.

8. Confirm recovery

/opt/csm/csm status --json
/opt/csm/csm doctor --json
exim -bpc

Keep the forensic archives, CSM backup, command notes, and queue decisions with the incident record.

CVE Mitigations

CSM treats CVEs as a three-layer problem:

  1. Operator-driven hardening via csm harden ... - applies the right preventive control for the host (modprobe blacklist, seccomp drop-ins, sysctl tweaks).
  2. Continuous enforcement by the daemon - re-asserts the control on every startup and as a periodic check, so a package upgrade or manual modprobe does not silently undo the mitigation.
  3. Live detection - auditd/BPF listeners flag exploit signatures the moment they fire, even on hosts where the kernel itself cannot be patched.

The hardening audit detects what the host actually has (kernel build, KernelCare livepatches, seccomp coverage) and refuses to claim protection it cannot deliver. Run csm harden with no arguments for the full list of available mitigations on the current host.

Active mitigations

CVE-2026-31431 - “Copy Fail” (Linux kernel AF_ALG)

Two operator paths depending on whether AF_ALG is loadable on the kernel:

  • csm harden --copy-fail - writes /etc/modprobe.d/csm-copy-fail-mitigation.conf blacklisting algif_aead and af_alg, then unloads them. Refuses on kernels where AF_ALG is built in (typical cPanel / CloudLinux 8), since the blacklist would have no effect there.
  • csm harden --copy-fail-seccomp - writes systemd RestrictAddressFamilies=~AF_ALG drop-ins for the units that spawn untrusted code: LiteSpeed, Apache/Nginx, every PHP-FPM pool, cron, mail. The right path on built-in-AF_ALG kernels.

The audit recognises KernelCare CVE-2026-31431 livepatches via kcarectl --patch-info and reports pass only when the host is genuinely mitigated (module blacklisted, seccomp drop-ins applied, or KernelCare livepatch active).

Live detection and BPF blocking

BPF-tagged builds (make BPF=1 or go build -tags bpf) load an LSM socket_create program on kernels with BPF LSM and ringbuf support. That program refuses socket(AF_ALG, ...) from non-root UIDs before the AF_ALG socket is allocated, returns EPERM, emits a ringbuf event, and feeds the existing Critical af_alg_socket_use finding path. UID 0 keeps AF_ALG access for system crypto users. There are no UID-range or alert-only BPF tunables today; use detection.af_alg_backend to select auditd or none if a host needs to avoid kernel-side refusal.

Default builds, BPF-tagged builds on unsupported kernels, and hosts forced to detection.af_alg_backend: auditd keep the audit-log listener. The audit path catches non-system AF_ALG socket attempts at Critical within about 500 ms but cannot stop the syscall before it reaches the kernel. Hosts that are not exploitable skip the live listener.

If CSM attempts the AF_ALG BPF path and cannot start it, it emits a bpf_unavailable finding. The finding says whether the audit fallback is active or no live fallback is available.

Auto-response

  • auto_response.copy_fail_kill_process: true - SIGKILL the offending process when the live listener fires. Default off (alert-only).
  • auto_response.disable_enforce_af_alg: true - suspend the periodic re-assertion of the module blacklist without removing the hardening marker. For triage only.

The daemon self-heals its auditd rule file on startup if it has drifted from the embedded copy, closing the upgrade gap where a new binary shipped without re-running postinstall would leave detection silently inactive.

Configuration knob

  • detection.af_alg_backend - auto (default) | bpf | auditd | none. auditd is the kill switch for a misbehaving BPF-tagged release. bpf is strict mode (no fallback). none disables the live listener entirely.

The csm_af_alg_backend{kind="bpf-lsm"|"auditd-tail"|"none"} Prometheus gauge surfaces which backend the coordinator selected at startup, so dashboards can see the active path without parsing logs.

BPF validation

On a BPF LSM host with a BPF-tagged CSM build:

  1. Set detection.af_alg_backend: bpf for strict validation, or leave it as auto and confirm BPF was selected.

  2. Start the daemon and check metrics:

    curl -k -H "Authorization: Bearer $CSM_TOKEN" https://127.0.0.1:9443/metrics \
      | grep -E 'csm_af_alg_backend|csm_bpf_backend'
    

    Expected selected series:

    csm_af_alg_backend{kind="bpf-lsm"} 1
    csm_bpf_backend{feature="af_alg",kind="bpf"} 1
    
  3. As a non-root user, attempt an AF_ALG socket:

    sudo -u nobody python3 - <<'PY'
    import errno
    import socket
    import sys
    
    try:
        socket.socket(socket.AF_ALG, socket.SOCK_SEQPACKET, 0)
    except OSError as exc:
        print(exc.errno)
        sys.exit(0 if exc.errno == errno.EPERM else 1)
    
    raise SystemExit("AF_ALG socket unexpectedly succeeded")
    PY
    
  4. Confirm the command prints 1 (EPERM) and CSM emits a Critical af_alg_socket_use finding. The finding details should say the call was refused by the BPF LSM program.

CVE-2026-41940 - cPanel/WHM auth-bypass

Detection in the access-log path:

  • Non-infra WHM login attempts surface at Warning (suppressible alongside other cPanel-login alerts) so an operator can see brute force traffic against the bypass surface.
  • The tokenless WHM-script request the published exploit uses for cache promotion surfaces at Critical, always-on, and feeds auto-block.

No operator hardening command is required. The host fix is to apply the cPanel patched build. CSM provides the detection layer for windows where patching has not yet rolled out.

Future CVEs

New mitigations land here as they ship. The bar is:

  • The host can be measurably hardened (modprobe / seccomp / sysctl / config), and/or
  • An exploit-signature detector can fire reliably without false positives.

CVEs that are purely “patch the package”, with no preventive control we can apply and no signature we can detect, do not get a CSM mitigation; the right answer is the vendor patch. The daemon’s package-integrity check (rpm -V / debsums) covers the “did the operator actually apply the patch” question.

Firewall (nftables)

CSM includes a native nftables firewall engine that replaces LFD and fail2ban. It uses the kernel netlink API directly via google/nftables - no iptables, no Perl, no shell commands.

Features

  • Atomic ruleset - single netlink transaction, no partial application
  • Named IP sets with per-element timeouts (blocked, allowed, infra, country)
  • Rate limiting - SYN flood, UDP flood, per-IP connection rate, per-port flood
  • Country blocking via MaxMind GeoIP CIDR ranges
  • Outbound SMTP restriction by UID (prevent spam from compromised accounts)
  • Subnet/CIDR blocking with auto-escalation from individual IPs and safety guards for infra, local, and allowed addresses
  • Permanent block escalation after repeated temp blocks
  • Dynamic DNS hostname resolution (updated every 5 min) with grace-period guard against transient resolver failures
  • IPv6 dual-stack with separate sets
  • Commit-confirmed safety - Juniper-style auto-rollback timer
  • Infra IP protection - refuses to block infrastructure IPs
  • Auto-response dry-run - safety default that records intended blocks without touching nftables
  • Verdict callback - optional advisory hook to the panel before each auto-block (allow / block / attach metadata)
  • cphulk integration - unblock flushes cphulk too
  • Audit trail - JSONL log with 10MB rotation
  • State persistence with atomic writes

CLI Commands

# Status
csm firewall status                              # Show status and statistics
csm firewall ports                               # Show configured port rules

# Block / Allow
csm firewall deny <ip> [reason]                  # Block IP permanently
csm firewall allow <ip> [reason]                 # Allow IP (all ports)
csm firewall allow-port <ip> <port> [reason]     # Allow IP on specific port
csm firewall remove <ip>                         # Remove from blocked and allowed
csm firewall remove-port <ip> <port>             # Remove port-specific allow

# Temporary
csm firewall tempban <ip> <dur> [reason]         # Temporary block
csm firewall tempallow <ip> <dur> [reason]       # Temporary allow

# Subnets
csm firewall deny-subnet <cidr> [reason]         # Block subnet
csm firewall remove-subnet <cidr>               # Remove subnet block

# Search
csm firewall grep <pattern>                      # Search blocked/allowed IPs
csm firewall lookup <ip>                         # GeoIP + block status lookup

# Bulk operations
csm firewall deny-file <path>                    # Bulk block from file
csm firewall allow-file <path>                   # Bulk allow from file
csm firewall flush                               # Clear all dynamic blocks

# Safety
csm firewall apply-confirmed <minutes>           # Apply with auto-rollback timer
csm firewall confirm                             # Confirm applied changes
csm firewall rollback status|confirm|revert      # Manage pending config rollback
csm firewall restart                             # Reapply full ruleset

# Profiles
csm firewall profile save|list|restore <name>    # Profile management

# Audit
csm firewall audit [limit]                       # View audit log

# GeoIP
csm firewall update-geoip                        # Download country IP blocks

# Cloudflare
csm firewall cf-status                           # Show Cloudflare IP whitelist status

Configuration

Firewall defaults can be edited in two places:

  • Web UI: Settings -> Firewall section. Port lists, rate limits, flood protection, deny caps, country block, and outbound SMTP restriction are all editable. Changes are restart-class. The save endpoint warns if the WebUI listen port is missing from tcp_in. The port_flood per-port rule list is YAML-only for now.
  • YAML: edit /etc/csm/csm.yaml directly. Run csm rehash then systemctl restart csm.

Tentative apply (rollback timer)

The Firewall section in the Web UI offers two save buttons. Save writes the new config and prompts you to restart. Apply with rollback timer writes the new config, restarts the daemon, and starts a timer (default 5 minutes, range 1-30). If you do not click Confirm before the timer expires, the daemon restores the previous config and restarts again. This protects against locking yourself out by, for example, removing the WebUI port from tcp_in.

When the Web UI is unreachable (firewall mistuned, daemon broken), use the CLI escape hatch:

csm firewall rollback status
csm firewall rollback confirm
csm firewall rollback revert

Rollback state survives daemon restarts (the snapshot is persisted in bbolt). On startup the daemon checks for a pending rollback: if the deadline has already passed it restores the previous config and restarts; otherwise it rearms the timer for the remaining window.

firewall:
  enabled: true
  ipv6: false
  conn_rate_limit: 200         # new connections per minute per IP (CGNAT-tolerant)
  syn_flood_protection: true
  conn_limit: 400              # max concurrent connections per IP (0 = disabled)
  smtp_block: false            # restrict outbound SMTP
  log_dropped: true
  dyndns_hosts:                # resolved every 5 min and whitelisted
    - "monitoring.example.com"

Full firewall reference: Configuration - Firewall.

Auto-response interaction

Auto-block calls require firewall.enabled: true because they go through the firewall engine. The engine consults two policy hooks first:

  1. auto_response.verdict_callback - when enabled, the engine POSTs a signed JSON request to the panel after local validation and infra-IP safety checks. When a secret is configured, CSM rejects unsigned callback replies by default. The panel can downgrade to allow (audit-only), attach tenant_id for downstream correlation, or add a note. CSM fails open on hook errors. Wire contract: docs/verdict-callback-contract.md.

  2. auto_response.dry_run - when true (or absent; safety default), BlockIP() records the intended block to bbolt and returns success without touching nftables. Manual csm firewall ... operator commands bypass via BlockIPForce and always apply. Verify with csm firewall status after policy changes; “Recently Blocked” timestamps newer than the last restart confirm live mode. See Auto-response - Dry-run safety default.

Subnet blocks refuse the default route and any range that contains an infrastructure IP, a resolved infra hostname, a local host address, a full-IP allow, or a port-specific allow. Remove the allow or narrow the CIDR before applying the block.

Infrastructure IP DNS guard

Hostnames listed in top-level infra_ips or firewall.infra_ips are resolved every 5 minutes and their current addresses feed the infra auto-block guard. If a hostname stops resolving, the daemon emits an infra_ips_unresolvable Warning finding and keeps the last known addresses protected during the grace period (default 10 min). This prevents a transient DNS outage from deprotecting the management plane. The finding auto-clears when resolution recovers.

ModSecurity Integration

CSM detects and manages ModSecurity (WAF) on Apache, Nginx, and LiteSpeed across cPanel, plain Debian/Ubuntu, and plain AlmaLinux/Rocky/RHEL hosts. It deploys custom rules (cPanel only) and provides a web UI for rule overrides and escalation.

Supported Web Servers

Web serverConfig candidatesStatus checkCustom rule deployment
Apache on cPanel EA4/usr/local/apache/conf/*, /etc/apache2/conf.d/modsec*, whmapi1 modsec_is_installedYesYes (via cPanel modsec user conf)
Apache on Debian/Ubuntu/etc/apache2/mods-enabled/security2.conf, /etc/apache2/conf-enabled/*, /etc/apache2/conf.d/modsec2.confYesNot yet (plain Linux)
Apache on RHEL/Alma/Rocky/etc/httpd/conf.d/mod_security.conf, /etc/httpd/conf.modules.d/*YesNot yet (plain Linux)
Nginx on any distro/etc/nginx/nginx.conf, /etc/nginx/modules-enabled/50-mod-http-modsecurity.conf, /etc/nginx/modsec/main.confYesNot yet (plain Linux)
LiteSpeed/usr/local/lsws/conf/httpd_config.xml, /usr/local/lsws/conf/modsec2.confYesNot yet

When ModSecurity is not installed, the waf_status check emits a platform-specific install hint:

# On Ubuntu + Nginx:
Install: apt install libnginx-mod-http-modsecurity modsecurity-crs

# On Ubuntu + Apache:
Install: apt install libapache2-mod-security2 modsecurity-crs && a2enmod security2

# On AlmaLinux + Apache:
Install (requires EPEL): dnf install -y epel-release && dnf install -y mod_security

# On AlmaLinux + Nginx:
Install (requires EPEL): dnf install -y epel-release && dnf install -y nginx-mod-http-modsecurity

# On cPanel:
Install: WHM > Security Center > ModSecurity

Rule-staleness alerts scan both the flat CRS layout (/usr/share/modsecurity-crs/rules/*.conf) used by distro packages and the per-vendor subdirectory layout used by cPanel (/usr/local/apache/conf/modsec_vendor_configs/VENDOR/*.conf). Update instructions are also platform-specific (apt update && apt upgrade modsecurity-crs, dnf upgrade modsecurity-crs, or WHM on cPanel).

Features

  • Custom CSM rules - IDs 900000-900999 in configs/csm_modsec_custom.conf (cPanel only today)
  • Rule override management - SecRuleRemoveById directives for false positive suppression
  • Escalation control - change rule severity or action per-rule
  • Live deny escalation - repeated ModSecurity deny events from one IP emit an escalation finding that feeds auto-response blocking. CSM-owned rules keep their existing per-rule escalation controls.
  • WAF event log parsing - correlates events by IP, URI, and rule ID
  • Hot-reload - apply changes without Apache restart (cPanel only)

Web UI Pages

ModSecurity (/modsec) - WAF status overview, event log, active block list

ModSec Rules (/modsec/rules) - per-rule management:

  • View loaded rules with descriptions
  • Enable/disable individual rules
  • Override rule severity or action
  • Deploy custom rules

API Endpoints

GET  /api/v1/modsec/stats            WAF statistics
GET  /api/v1/modsec/blocks           Blocked request log
GET  /api/v1/modsec/events           WAF event details
GET  /api/v1/modsec/rules            Loaded rules list
POST /api/v1/modsec/rules/apply      Apply custom rules
POST /api/v1/modsec/rules/escalation Change rule severity/action

Signature Rules

CSM uses YAML and YARA-X rules for malware detection. Rules are stored in /opt/csm/rules/ and scanned both in real-time (fanotify) and during deep scans.

YAML Rules

rules:
  - name: webshell_c99
    severity: critical
    category: webshell
    file_types: [".php"]
    patterns: ["c99shell", "c99_buff_prepare"]
    min_match: 1

  - name: phishing_login
    severity: high
    category: phishing
    file_types: [".html", ".php"]
    patterns: ["password.*submit", "credit.*card.*number"]
    exclude: ["legitimate_form_handler"]
    min_match: 2

Fields:

  • name - unique rule identifier
  • severity - critical, high, or warning
  • category - webshell, backdoor, phishing, dropper, exploit
  • file_types - file extensions to match (or ["*"] for all)
  • patterns - literal strings or regex patterns
  • exclude - patterns that prevent a match (false positive reduction)
  • min_match - minimum patterns that must match

YARA-X Rules (Optional)

Build CSM with YARA-X support:

CGO_LDFLAGS="$(pkg-config --libs --static yara_x_capi)" go build -tags yara ./cmd/csm/

Place .yar or .yara files alongside YAML rules in /opt/csm/rules/. CSM compiles them at startup and uses them for:

  • Real-time fanotify file scanning
  • Deep scan filesystem sweeps
  • Email attachment scanning

Without the yara build tag, YARA rules are silently ignored.

Updating Rules

csm update-rules          # download latest rules and reload the running daemon

csm update-rules now asks the daemon to reload through the control socket once the download completes. If the daemon is not running, the next start picks the files up automatically. kill -HUP $(pidof csm) still works.

Or from the web UI: Rules page > Reload Rules button.

Remote rule updates are now signature-verified. Any configuration that enables signatures.update_url or signatures.yara_forge.enabled must also set signatures.signing_key to the 64-character hex-encoded Ed25519 public key that verifies the downloaded .sig files. Remote update URLs must use HTTP or HTTPS and must not point at localhost, loopback, link-local, unspecified, or RFC1918 / ULA private addresses.

YARA Forge Integration

CSM can automatically fetch curated YARA rules from YARA Forge, which aggregates and quality-tests rules from 40+ public sources including signature-base, Elastic, Malpedia, and ESET.

Configuration

signatures:
  signing_key: "0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"
  yara_forge:
    enabled: true
    tier: "core"              # core (5K rules, low FP), extended (10K), full (12K)
    update_interval: "168h"   # weekly
    download_url: "https://mirrors.pidginhost.com/csm/yara-forge/{version}/yara-forge-rules-{tier}.zip"
  disabled_rules:             # rule names to exclude from Forge downloads
    - SUSP_Example_Rule

The project operates the signed mirror shown above. A ready-to-use drop-in is shipped at /usr/lib/csm/profiles/yara-forge.example.yaml; copy or include it under /etc/csm/conf.d/ to enable Forge without editing the main csm.yaml. The matching signing_key is the project Ed25519 public key (hex), published on the release signing page.

signing_key must be a hex string for the Ed25519 public key that matches the private key used to sign the remote Forge artifact. It is not a PEM block and not a file path.

YARA Forge’s upstream GitHub releases publish ZIP files, but not CSM detached signatures. CSM therefore requires yara_forge.download_url to point at a mirror you operate. The URL may contain {tier} and {version} placeholders. The detached signature must be available at the resolved ZIP URL plus .sig.

If you do not have a signed update source yet, disable remote updates instead:

signatures:
  signing_key: ""
  update_url: ""
  yara_forge:
    enabled: false

Tiers

TierRulesSizeFalse Positive Risk
core~5,0001.6 MBLow (quality >= 70, score >= 65)
extended~10,5003.3 MBMedium
full~11,6003.7 MBHigher (includes score >= 40)

Update Flow

  1. CSM checks the latest YARA Forge release tag on GitHub
  2. If newer than the installed version, downloads the ZIP for the configured tier from yara_forge.download_url and its detached signature
  3. Verifies the download against signatures.signing_key
  4. Filters out any rules listed in disabled_rules
  5. Compile-tests the rules with YARA-X before installing
  6. Atomically replaces the previous Forge rules file
  7. Reloads the YARA scanner

Custom rules in malware.yar are never overwritten by the Forge fetcher.

Disabling Rules

If a Forge rule produces false positives, add its name to disabled_rules in the config and reload:

signatures:
  disabled_rules:
    - SUSP_XOR_Encoded_URL
    - HKTL_Mimikatz_Strings

After editing, send SIGHUP or restart the daemon to apply.

How Rules Avoid False Positives

Signature rules require structural nesting, not co-presence of strings. Two dangerous function calls appearing in the same file but in unrelated code paths won’t trigger a rule. The call must directly wrap or chain with the other for a match.

Auto-quarantine adds a safety gate: files need Shannon entropy >= 4.8 or hex density > 20% before automatic quarantine. Legitimate plugin code (~4.2 entropy) passes through; obfuscated malware (~5.5+) is caught.

Alert Rate Limiting

Default: 30 emails/hour (configurable via max_per_hour). CRITICAL findings always get through regardless of rate limit. Only lower-severity alerts are rate-limited.

Suppressions

Create suppression rules to silence known false positives:

  • From the Findings page: click the suppress button on any finding
  • From the Rules page: manage suppression rules directly
  • Via API: POST /api/v1/suppressions

To suppress email alerts for specific checks while keeping them visible in the web UI, use disabled_checks in your config:

alerts:
  email:
    disabled_checks:
      - "email_spam_outbreak"
      - "perf_memory"

Email AV

CSM scans email attachments in real-time using ClamAV and YARA-X on the Exim mail spool.

How It Works

  1. fanotify watches the Exim spool directory for new messages
  2. Attachments are extracted and scanned by ClamAV (socket) and YARA-X (if available)
  3. Zip and tar.gz attachments are unpacked within configured size and file limits
  4. Extracted parts are staged under state_path/emailav-tmp, which must stay daemon-owned and private
  5. Attachment names written to logs and the UI use sanitized base names
  6. Infected messages are quarantined with full metadata
  7. Sender, recipient, and message ID are logged

Web UI

The Email page (/email) shows:

  • AV watcher status (active, engine health)
  • Scan statistics (scanned, infected, quarantined)
  • Quarantined email list with release/delete actions

API Endpoints

GET  /api/v1/email/stats         Scan statistics
GET  /api/v1/email/quarantine    Quarantined email list
GET  /api/v1/email/av/status     AV watcher status
POST /api/v1/email/quarantine/   Release or delete quarantined email
  • email_content - scans outbound email body for credentials and suspicious URLs
  • email_weak_password - detects email accounts with weak passwords
  • email_forwarder_audit - audits forwarders for exfiltration redirects
  • mail_queue - alerts on queue buildup (spam outbreak indicator)
  • mail_per_account - per-account sending volume spikes

Threat Intelligence

CSM tracks, scores, and correlates attacks using a local attack database enriched with external feeds and GeoIP data.

Attack Database

  • Per-IP event tracking (brute force, webshell upload, phishing, C2, WAF block)
  • Threat score calculation with temporal decay (older attacks weighted less)
  • Auto-block on reputation threshold
  • Top attackers leaderboard

IP Intelligence

Combines multiple sources into a unified verdict:

SourceData
Local attack DBEvent count, types, score
AbuseIPDBExternal reputation (if API key configured)
RspamdPer-IP rolling history (if controller access configured)
Upstream HTTP cachePanel-side shared score (if reputation.upstream configured)
Permanent blocklistOperator-managed persistent blocks
Firewall stateCurrently blocked/allowed status
GeoIPCountry, city, ASN, ISP
RDAPNetwork name, organization (cached 24h)

Verdicts: clean, suspicious, malicious, blocked

Pluggable sources

Threat-intel sources implement a small Source interface (lookup-by-IP returning a score + reason). The aggregator queries every enabled source in parallel, applies per-source weighting, and produces the unified verdict above. Adding a new source means implementing the interface and registering it; no existing source code changes.

Currently shipped:

  • AbuseIPDB (reputation.abuseipdb_key) - external IP reputation feed. CSM caps uncached lookups per cycle and reserves store-backed daily quota before sending requests.
  • Rspamd (reputation.rspamd.*) - per-IP rolling-history signals from the local rspamd controller. Token resolves from token_env at query time so rotation does not require a daemon restart.
  • Upstream HTTP cache (reputation.upstream.*) - shared panel-side cache of AbuseIPDB or proprietary scores. Useful in fleets: agents pay a bounded local cache hit (cache_ttl_min, default 15 m) instead of hammering the upstream once per agent. CSM temporarily opens a fail-open circuit breaker after repeated upstream failures and lets only one cooldown probe through at a time. Use HTTPS for remote panels; plain HTTP is accepted only for loopback. Wire contract: docs/upstream-threat-intel-contract.md.

Abuse Reporting

reputation.report can send minimized confirmed-abuse reports to a central database or private collector. It is off by default. Remote targets must use HTTPS; plain HTTP is accepted only for loopback collectors. Keys and target wiring are read at daemon startup, so changes to this block require a restart.

Web UI

The Threat Intel page (/threat) provides:

  • IP lookup with composite scoring
  • Top attackers with GeoIP enrichment
  • Attack type breakdown chart
  • Hourly trend chart
  • Whitelist management (permanent and temporary)

API Endpoints

GET  /api/v1/threat/stats            Attack stats and type breakdown
GET  /api/v1/threat/top-attackers    Top attacking IPs with GeoIP
GET  /api/v1/threat/ip               IP threat lookup
GET  /api/v1/threat/events           IP event history
GET  /api/v1/threat/whitelist        Whitelisted IPs
GET  /api/v1/threat/db-stats         Attack database statistics
POST /api/v1/threat/block-ip         Block IP permanently
POST /api/v1/threat/whitelist-ip     Permanent whitelist
POST /api/v1/threat/temp-whitelist-ip  Temporary whitelist
POST /api/v1/threat/clear-ip         Clear from attack DB
POST /api/v1/threat/unwhitelist-ip   Remove from whitelist

GeoIP

MaxMind GeoLite2 integration for IP geolocation and ASN enrichment.

Features

  • City database - country, city, latitude/longitude
  • ASN database - ISP, organization, autonomous system number
  • Auto-download on first use
  • Auto-update every 24 hours (configurable)
  • RDAP fallback for detailed ISP/org info (cached 24h)

Where It’s Used

  • Threat intel page (top attackers, IP lookup)
  • Firewall audit log (country flags)
  • Login alerts (geographic context)
  • Country-based login suppression (trusted_countries)
  • Country blocking (firewall CIDR ranges)

Configuration

geoip:
  account_id: "YOUR_MAXMIND_ACCOUNT_ID"
  license_key: "YOUR_MAXMIND_LICENSE_KEY"
  editions:
    - GeoLite2-City
    - GeoLite2-ASN
  auto_update: true
  update_interval: 24h

Free account: maxmind.com/en/geolite2/signup

CLI

csm update-geoip                    # Manual database update
csm firewall update-geoip           # Download country CIDR blocks
csm firewall lookup <ip>            # GeoIP + block status lookup

API

GET  /api/v1/geoip              IP geolocation (?ip=&detail=1)
POST /api/v1/geoip/batch        Batch lookup (array of IPs)

Challenge Pages

JavaScript proof-of-work challenge pages - a CAPTCHA alternative for suspicious IPs.

How It Works

  1. Suspicious IP hits a protected resource
  2. CSM serves a challenge page requiring client-side SHA-256 proof-of-work
  3. Browser computes the proof (shows progress bar)
  4. On valid solution, CSM issues an HMAC-verified token
  5. Subsequent requests pass through automatically

Features

  • SHA-256 based difficulty - configurable 0-5 levels
  • Client-side computation - no server load
  • HMAC token verification - prevents replay attacks
  • Nonce-based anti-replay
  • User-friendly - progress bar, instant feedback
  • Bot filtering - headless browsers and scripts fail the challenge

Use Cases

  • Gray-listing alternative to hard IP blocks
  • Protecting WordPress login pages
  • Rate limiting without blocking legitimate users
  • DDoS mitigation layer

Routing Behavior

When challenge.enabled: true, CSM routes eligible IPs to the challenge page instead of hard-blocking them. This works independently of auto_response settings.

Challenge-Eligible Checks

Pre-auth, browser-visible attack signals only: wp_login_bruteforce, xmlrpc_abuse, wp_user_enumeration, webmail_bruteforce, ip_reputation, local_threat_score. Post-auth audit events (cPanel, webmail, file upload, WHM logins), WAF high-volume attacker findings, and non-browser protocols (SSH, FTP, DNS recursion, outbound traffic, API auth) are excluded - their IPs have no useful challenge step or no browser session to render the PoW page.

Always Hard-Blocked

Confirmed malware (webshells, YARA/signature matches), WAF high-volume attackers, C2 connections, backdoor ports, phishing pages, database injections, and spam outbreaks are hard-block candidates immediately, even when challenge is enabled.

Timeout Escalation

If an IP doesn’t solve the PoW challenge within 30 minutes, it is automatically escalated to a hard firewall block.

Bind address

The listener binds to 127.0.0.1 by default, so enabling the challenge server alone does not expose a new public port. The webserver integration uses direct redirects to challenge.public_url; installed direct mode therefore needs a non-loopback listener and a public URL ending in /challenge.

challenge:
  enabled: true
  listen_addr: 0.0.0.0
  listen_port: 8439
  public_url: https://cpanel.example.com:8439/challenge
  tls_cert: /var/cpanel/ssl/cpanel/mycpanel.pem
  tls_key:  /var/cpanel/ssl/cpanel/mycpanel.pem

When CSM’s firewall is enabled and challenge.port_gate.enabled is true, the daemon also opens challenge.listen_port in the main firewall rules. The port-gate chain still drops traffic to that port unless the source is loopback, an infra_ips entry, or an IP currently on the challenge list. Port-gate rules follow the configured listener address family. An IPv6-only listener gates only IPv6 clients; IPv4 challenge entries stay in the webserver map but are ignored by the IPv6 nftables set.

Run csm doctor challenge after changing these fields. The command checks the public URL shape, TLS files, port-gate setting, installed webserver snippet version, webserver configtest, and the live /challenge/gate endpoint. Add --json for automation.

TLS

The challenge listener serves HTTPS when challenge-specific TLS material is configured. Loopback listeners stay on plain HTTP by default. Direct/public listeners can reuse the Web UI cert.

Resolution order:

  1. challenge.tls_cert + challenge.tls_key (explicit per-service).
  2. webui.tls_cert + webui.tls_key (shared cert; cPanel mycpanel.pem covers both webui and the challenge port without extra config) only when challenge.listen_addr is not loopback.
  3. Plain HTTP. This is expected for the default loopback-only path. Public listeners without TLS log a startup warning. HSTS-pinned parent domains (cPanel, phpanel, customer apex) will fail with ERR_SSL_PROTOCOL_ERROR because the browser auto- upgrades the URL to https; ship TLS material in production.
challenge:
  tls_cert: /var/cpanel/ssl/cpanel/mycpanel.pem
  tls_key:  /var/cpanel/ssl/cpanel/mycpanel.pem

Trusted Proxies

By default, the challenge server uses RemoteAddr to identify clients. The shipped webserver integration redirects browsers directly to challenge.public_url, so it does not need trusted_proxies. Configure trusted proxies only for a custom proxy deployment where CSM receives traffic from a proxy and must trust X-Forwarded-For from that proxy.

challenge:
  enabled: true
  trusted_proxies:
    - "127.0.0.1"
    - "::1"

Without trusted_proxies, X-Forwarded-For is ignored to prevent IP spoofing.

Successful Verification

When a client passes the challenge:

  1. The IP is temporarily allowed through the firewall for 4 hours
  2. A verification cookie is set
  3. The IP is removed from the challenge list so the webserver stops sending that visitor to the challenge flow

Webserver Integration

The webserver integration redirects challenge-listed IPs to challenge.public_url. The installer refuses to run until that URL is an absolute http or https URL ending in /challenge, and the configured challenge listener is non-loopback.

csm webserver-integration install     # initial wire-up
csm webserver-integration upgrade     # re-apply after a CSM upgrade
csm webserver-integration status      # show detected stack + version drift
csm webserver-integration validate    # run the webserver's configtest
csm webserver-integration remove      # uninstall the snippet

The installer auto-detects the active webserver via internal/platform. Supported stacks and snippet paths:

StackSnippet path
cPanel + Apache (EasyApache)/etc/apache2/conf.d/csm-challenge.conf
Debian/Ubuntu Apache/etc/apache2/conf-enabled/csm-challenge.conf
RHEL family Apache (httpd)/etc/httpd/conf.d/csm-challenge.conf
LiteSpeed (LSWS)/usr/local/lsws/conf/templates/csm-challenge.conf
Nginx (plain + Engintron + phpanel)/etc/nginx/conf.d/csm-challenge.conf

The snippets are rendered from the effective CSM config. Apache and LSWS read their RewriteMap from /run/csm/challenge_ips.txt; Nginx reads a native map include from /run/csm/challenge_ips.nginx.map. Both live outside the private state directory so the webserver user can read them. CSM rewrites the Nginx include on challenge-list changes and reloads Nginx only when the file content changes.

On every run, the installer:

  1. Writes the new snippet to a sibling temp file and renames it into place atomically.
  2. Runs the webserver’s own configtest (apachectl configtest, nginx -t, lswsctrl conftest).
  3. On pass: reloads the webserver gracefully and exits 0.
  4. On fail: restores the previous snippet bytes (or removes the file if it did not exist before) and exits non-zero with the captured configtest output. The webserver is never reloaded with a broken config.

The snippet header carries a version marker; upgrade is a no-op when the on-disk version matches the shipped version. Hand-edited files (missing or mismatched marker) trip a “modified” status and the installer refuses to overwrite them - remove or rename first.

Hosts with no detectable webserver exit with status=skipped so package post-install hooks succeed cleanly on, e.g., a plain phpanel worker that doesn’t run nginx locally.

Bypass Paths

Three opt-in bypass mechanisms let legitimate traffic skip the PoW page entirely. All default off; an upgraded csm.yaml with no new blocks behaves exactly as before.

CAPTCHA Fallback (JS-Disabled Visitors)

The PoW solver requires JavaScript. Visitors with JS off (older mobile browsers, accessibility tooling, text browsers, scripted integrations) would otherwise be locked out. When configured, CSM renders a Cloudflare Turnstile or hCaptcha widget inside a <noscript> block; on completion the form posts to /challenge/captcha-verify and CSM validates the token server-side against the provider’s siteverify endpoint. Provider rejections do not spend the page nonce, so a visitor can retry the same challenge page after a mistyped, expired, or failed widget response.

challenge:
  captcha_fallback:
    provider: turnstile         # turnstile | hcaptcha | "" (off)
    site_key: "0xAAAA..."       # public key embedded in the widget
    secret_key: "0xBBBB..."     # verified server-side; never sent to client
    timeout: 10s

Verified Operator Sessions

Operators who repeatedly hit the challenge during normal admin work can mint a signed cookie that bypasses PoW for the cookie’s TTL. The signing key is generated at daemon startup and rotates on every restart – old cookies stop working automatically.

challenge:
  verified_session:
    enabled: true
    cookie_name: csm_admin_session    # default
    ttl: 4h                            # default
    admin_secret: "long-shared-secret" # required

To issue a cookie, POST the secret to the challenge server:

curl -i -X POST -d "secret=long-shared-secret" \
  https://your-host:8439/challenge/admin-token
# 204 No Content
# Set-Cookie: csm_admin_session=...; Path=/; HttpOnly; Secure; SameSite=Lax

The cookie binds to the requester’s IP, so a stolen cookie does not work from a different network.

Verified Search Crawlers

Googlebot and Bingbot can be allow-passed by reverse-DNS forward-confirm. CSM looks up the visitor’s PTR, checks it ends in a known crawler suffix (e.g. .googlebot.com), then forward-resolves that name to confirm the original IP appears in the result. A spoofed User-Agent: Googlebot from a residential IP fails forward-confirm and falls through to PoW.

challenge:
  verified_crawlers:
    enabled: true
    providers: [googlebot, bingbot]
    cache_ttl: 15m

Positive results cache for cache_ttl; negative results cache for one-fifth that long so a transiently-broken resolver does not lock out a real crawler for the full window.

Operational

Backups

csm store export and csm store import capture the bbolt store, state JSON files (baseline file hashes), and signature-rules cache into a single tar+zstd archive. Use these for re-provisioning, cluster cloning, and disaster recovery rather than re-baselining a 200k-file account tree from scratch.

csm store export /var/backups/csm-$(date +%F).csmbak
sha256sum -c /var/backups/csm-$(date +%F).csmbak.sha256
# transfer the .csmbak + .sha256 to the target host
systemctl stop csm
csm store import /var/backups/csm-2026-04-27.csmbak
systemctl start csm

Partial restore: --only=baseline restores only the file-hash state (useful after a full re-install where firewall and history should stay fresh); --only=firewall merges the firewall buckets into an existing daemon (useful for cloning blocklists across a cluster).

Performance Monitor

CSM monitors server performance metrics and generates findings when thresholds are exceeded.

Critical Checks (every 10 min)

CheckWhat it monitors
perf_loadCPU load average vs core count (critical/high/warning thresholds)
perf_php_processesPHP process count and total memory usage
perf_memorySwap usage percentage and OOM killer activity

Deep Checks (every 60 min)

CheckWhat it monitors
perf_php_handlerPHP handler type (DSO vs CGI vs FPM) and configuration
perf_mysql_configMySQL my.cnf settings (buffer pool, connections, query cache)
perf_redis_configRedis memory limits, persistence, eviction policy
perf_error_logsError log file sizes (bloat detection)
perf_wp_configWordPress wp-config.php hardening and debug settings
perf_wp_transientsWordPress database transient bloat
perf_wp_cronWordPress cron scheduling (missed crons, excessive events)

Web UI

The Performance page (/performance) shows real-time metrics:

  • Server load and CPU usage
  • PHP process and memory charts
  • MySQL and Redis health
  • WordPress performance indicators

The findings list also exposes admin-only fixes, per-row and as a Bulk fix dropdown that applies one fix to every matching finding at once:

  • perf_error_logs: truncate a bloated error_log in place. The inode is preserved so running PHP processes keep writing to the same file.
  • perf_wp_config: disable display_errors in .user.ini, php.ini, or .htaccess by commenting the matched line and appending an Off override.
  • perf_wp_cron: add define('DISABLE_WP_CRON', true) to wp-config.php and install a per-user system cron that runs wp-cron.php on a fixed interval. Disabling WP-Cron alone would stop scheduled WordPress tasks, so the cron is installed in the account owner’s own crontab (visible and editable by the customer). The cron is installed before the define is written, so a crontab failure leaves WordPress scheduling unchanged. The define is inserted before the “stop editing” marker (or the wp-settings.php require); insertion points inside multiline comments or heredocs are ignored, and the fix refuses a wp-config.php with no safe insertion point rather than corrupt it.

These actions are limited to configured account roots, reject symlinks and unsupported file types, and remove the fixed row from the active findings view after a successful edit.

WP-Cron fix settings

Tune the WP-Cron remediation under Settings -> Performance:

  • performance.wp_cron_fix.interval_minutes (default 5, range 1-60): how often the installed system cron runs wp-cron.php. 5 minutes balances scheduled-task responsiveness against the load that HTTP-triggered WP-Cron creates.
  • performance.wp_cron_fix.php_bin (default empty = auto-detect): the PHP interpreter for the cron line. CLI php is used instead of an HTTP request so the job never ties up a web worker.

To let the daemon apply this fix automatically on every WP-Cron finding, set auto_response.fix_wp_cron: true (default false; requires auto_response.enabled: true). It is opt-in because it edits customer wp-config.php files and crontabs.

MySQL telemetry auth

The MySQL panel runs mysql -e "SHOW STATUS LIKE 'Threads_connected'" from the csm process. The client needs to authenticate against the local server, and csm supports two setups out of the box:

  • A ~/.my.cnf for the csm runtime user with credentials for a MySQL account that holds at least the PROCESS privilege. cPanel and CloudLinux ship /root/.my.cnf for the root user; csm running as root picks it up automatically.

  • A unix-socket grant for the csm OS user, e.g. on Debian/Ubuntu MariaDB:

    CREATE USER 'root'@'localhost' IDENTIFIED VIA unix_socket;
    GRANT PROCESS ON *.* TO 'root'@'localhost';
    

If neither is configured, the MYSQL card renders n/a / n/a instead of a misleading 0 conn. csm makes no attempt to connect over TCP or store credentials on its own.

Redis telemetry auth

The Redis panel connects to local Redis at 127.0.0.1:6379. If Redis requires a password, set REDISCLI_AUTH in the csm daemon environment. The dashboard uses that password for its in-process Redis client.

API

GET /api/v1/performance    Current performance metrics snapshot
POST /api/v1/perf/fix-error-log
POST /api/v1/perf/fix-display-errors
POST /api/v1/perf/fix-wp-cron

Web UI

HTTPS dashboard with polling-based live updates (10s feed, 60s stats). Dark/light theme toggle.

The sidebar groups pages by operator workflow. URLs are stable; the groups only reorder visibility:

  • Overview - Dashboard
  • Triage - Incidents, Findings (Active and History tabs)
  • Response - Firewall, Quarantine, Cleanup, Email, ModSecurity, Threat Intel
  • Operations - Performance, Hardening, Rules, ModSec Rules, Audit
  • Configuration - Settings

Sidebar group expand/collapse state is saved in the browser. On viewports under 992px the sidebar collapses into a top-bar drawer toggled from the hamburger button. Account detail (/account) is hidden from the sidebar; it is reached from finding rows, incident detail, and Threat Intel result panels. Read-scope sessions hide admin-only navigation entries such as Configuration and ModSec Rules.

Pages

PageURLPurpose
Dashboard/dashboardTriage queue, daemon status strip, Components matrix, system posture, 24h stats, recent activity, accounts at risk, auto-response summary, brute-force summary, timeline charts
Findings/findingsActive findings with search, check/account filters, header grouping toggle, detail panel, fix/dismiss/suppress actions, sticky bulk operations, modal account scan
Findings > History/findings?tab=historyPaginated archive of all findings with date range and severity filters, CSV export
Quarantine/quarantineQuarantined files with content preview, restore capability
Cleanup/cleanup-historyFile pre-clean backups and DB-object backups with preview and restore controls
Firewall/firewallSubview-tabbed page (?view=overview/lookup/blocks/allow/config/audit/danger): blocked IPs/subnets with GeoIP, whitelist management, search, audit log; destructive actions live under the Danger tab
ModSecurity/modsecWAF workbench: status strip, Active WAF pressure summary list (top attackers by hits), top rules / domains side panel, and Blocked IPs / Events / Rules tabs. Block detail panels show first-seen, top URIs, sample events, and direct links to Threat Intel, Firewall lookup, and rule management
ModSec Rules/modsec/rulesPer-rule management, overrides, escalation control
Email/emailEmail workbench: status strip (queue, frozen, oldest, AV, group counts), grouped action rows on the left (compromised, spam outbreak, auth failure, queue, malware), Mail protection state on the right, and Findings / Auth failures / Queue / Quarantine / Senders / Forwarders / Deliverability / Outbound abuse tabs below. Queue breaks the spool into real mail vs null-sender bounce backscatter (frozen count, oldest age, top stuck recipients) and flushes frozen backscatter in one click without touching real or retrying mail. Forwarders lists cPanel forwarders – destination provider, owner, and whether a local copy is also kept – so off-server relays to free providers are visible at a glance; held forward copies appear here to release or delete. Enforce mode currently holds null-sender backscatter and bad-sender-IP copies before external relay while the local copy still delivers. Deliverability shows which providers are throttling the server, the affected sending IPs, and each provider’s stated reason. Outbound abuse lists recent PHP-mail relay detections (spam outbreaks from one source IP across many sites, high-volume scripts or accounts) with the contributing site/script breakdown and a one-click 24h block.
Threat Intel/threatIP lookup with scoring/GeoIP/ASN, top attackers, attack type charts, trends
Hardening/hardeningOn-demand hardening audit, stored report, score, and remediation guidance
Incidents/incidentCorrelated incident list with detail panel plus forensic timeline search by IP or account
Rules/rulesYAML/YARA rule management, suppressions, state export/import, test alerts
Account/accountPer-account analysis: findings, quarantine, history, on-demand scan
Audit/auditSystem-wide action log with search, action and date filters, URL state, and export
Performance/performanceServer load, PHP processes, MySQL, Redis, WordPress metrics
Settings/settingsSearchable config editor with grouped large sections, field-level validation errors, restart notices, redacted secret updates, and firewall tentative apply with rollback timer

Security

  • Authentication - Bearer token (header or HttpOnly/Secure/SameSite=Strict cookie)
  • CSRF - HMAC-derived token on all POST mutations
  • Headers - X-Frame-Options DENY, Content-Security-Policy, HSTS, nosniff
  • TLS - Auto-generated self-signed certificate
  • Rate limiting - 5 login attempts/min, 600 API requests/min per IP
  • Bearer auth skips CSRF (for API-to-API calls)

Keyboard Shortcuts

General

KeyAction
?Show shortcut help
/Focus search input
Ctrl-K / Cmd-KOpen command palette
KeyAction
g dGo to Dashboard
g fGo to Findings
g hGo to Findings > History tab
g tGo to Threat Intel
g rGo to Rules
g bGo to Blocked IPs (Firewall)

Findings page

KeyAction
j / kMove selection down/up
dDismiss selected finding
fFix selected finding

WHM Plugin

CSM installs a WHM plugin (addon_csm.cgi) that redirects operators from WHM to the daemon Web UI. After the redirect, API calls are same-origin requests to the daemon.

API Reference

Machine-readable HTTPS API. All endpoints require token authentication. State-changing POST, PUT, PATCH, and DELETE requests require CSRF protection for browser cookie sessions.

Authentication

# Bearer token (header)
curl -H "Authorization: Bearer YOUR_TOKEN" https://server:9443/api/v1/status

# Cookie-based (after login)
curl -b "csm_auth=YOUR_TOKEN" https://server:9443/api/v1/status

Cookie-authenticated state-changing requests require the X-CSRF-Token header (obtained from the login response or page meta tag). Admin-scope Bearer requests are CSRF-exempt because the Authorization header is the write credential.

Token scopes

Configure tokens under webui.tokens: with a scope of admin or read:

webui:
  tokens:
    - name: "operator"
      token: "..."
      scope: admin       # full read+write
    - name: "panel-readonly"
      token: "..."
      scope: read        # status, findings, history, stats, blocked IPs, health, components, capabilities, SSE

The legacy single-token webui.auth_token: is migrated automatically to a legacy-auth-token admin entry on first start. Read-scope tokens are intended for orchestrators and dashboards that consume status, findings, history, stats, blocked-IP summaries, health, components, capabilities, and SSE events. Admin scope is still required for write routes and for sensitive reads such as quarantine, settings, firewall internals, threat-intel detail, rules, ModSecurity, account detail, exports, incident timelines, and audit history. metrics_token: is a separate, read-only credential for /metrics only.

Status & Data

GET  /api/v1/status              Full health snapshot: version, uptime, watchers, severity counts,
                                 store health, blocklist size, capabilities[], config_hash, binary_hash,
                                 automation rollout state, challenge pending count, rollback state.
                                 `latest_scan` is the canonical last-scan timestamp; `last_scan_time`
                                 is a legacy alias kept for older clients and will be removed.
GET  /api/v1/capabilities        Static feature list (e.g. `confd.dropins.v1`, `events.sse.v1`,
                                 `webhook.phpanel.v1`, `webui.prefs.v1`, `webui.undo.v1`,
                                 `mail.queue.composition.v1`). Use for orchestrator feature-detect.
GET  /api/v1/components          Watcher/component matrix with attachment, event, and upstream freshness state.
GET  /api/v1/events              Server-Sent Events stream of findings as they dispatch.
                                 Read-scope token sufficient. One JSON event per `data:` line.
GET  /api/v1/health              Daemon health (fanotify, watchers, engines)
GET  /api/v1/findings            Current active findings
GET  /api/v1/findings/enriched   Enriched findings with GeoIP, accounts, fix info
GET  /api/v1/finding-detail      Finding detail with action history (?check=&message=)
GET  /api/v1/history             Paginated history (?limit=&offset=&from=&to=&severity=&search=)
GET  /api/v1/history/csv         CSV export (up to 5,000 entries)
GET  /api/v1/stats               24h severity counts, accounts at risk, auto-response summary
GET  /api/v1/stats/trend         30-day daily severity counts
GET  /api/v1/stats/timeline      Event timeline
GET  /api/v1/quarantine          Quarantined files with metadata (incl. htaccess pre_clean backups)
GET  /api/v1/quarantine-preview  Preview quarantined file content (?id=)
GET  /api/v1/db-object-backups   db_object_backups bucket (MySQL trigger/event/procedure/function drops)
GET  /api/v1/db-object-backup-preview Preview captured CREATE SQL (?key=)
GET  /api/v1/blocked-ips         Blocked IPs with reason and expiry
GET  /api/v1/accounts            cPanel account list
GET  /api/v1/account             Per-account findings, quarantine, history (?name=)
GET  /api/v1/audit               UI audit log
GET  /api/v1/export              Export state (suppressions, whitelist)
GET  /api/v1/incident            Incident timeline (?ip=&account=&hours=)
GET  /api/v1/performance         Performance metrics snapshot
POST /api/v1/perf/fix-error-log  Truncate a fixed-row error_log finding
POST /api/v1/perf/fix-display-errors
                                  Disable display_errors for a fixed-row config finding
GET  /api/v1/hardening           Last stored hardening audit report

GeoIP

GET  /api/v1/geoip               IP geolocation (?ip=&detail=1)
POST /api/v1/geoip/batch         Batch GeoIP lookup (JSON array of IPs)

Threat Intelligence

GET  /api/v1/threat/stats        Attack stats, type breakdown, hourly trend
GET  /api/v1/threat/top-attackers Top attacking IPs with GeoIP (?limit=)
GET  /api/v1/threat/ip           IP threat lookup (?ip=)
GET  /api/v1/threat/events       IP event history (?ip=&limit=)
GET  /api/v1/threat/whitelist    Whitelisted IPs
GET  /api/v1/threat/db-stats     Attack database statistics
POST /api/v1/threat/block-ip     Block IP permanently
POST /api/v1/threat/whitelist-ip       Permanent whitelist
POST /api/v1/threat/temp-whitelist-ip  Temporary whitelist (with expiry)
POST /api/v1/threat/clear-ip           Clear IP from attack database
POST /api/v1/threat/unwhitelist-ip     Remove from whitelist
POST /api/v1/threat/bulk-action        Bulk block/clear/whitelist across many IPs

Firewall

GET  /api/v1/firewall/status         Config, blocked/allowed counts
GET  /api/v1/firewall/allowed        Whitelisted IPs
GET  /api/v1/firewall/subnets        Blocked subnets
GET  /api/v1/firewall/audit          Firewall audit log
GET  /api/v1/firewall/check          Check if IP is blocked (?ip=)
POST /api/v1/block-ip                Block an IP
POST /api/v1/unblock-ip              Unblock an IP
POST /api/v1/unblock-bulk            Bulk unblock IPs
POST /api/v1/firewall/allow-ip       Allow an IP
POST /api/v1/firewall/remove-allow   Remove IP from allow list
POST /api/v1/firewall/deny-subnet    Block subnet
POST /api/v1/firewall/remove-subnet  Remove subnet block
POST /api/v1/firewall/flush          Clear all blocks
POST /api/v1/firewall/unban          Unblock IP + flush cphulk
POST /api/v1/firewall/cphulk-clear   Flush cphulk bans only

ModSecurity

GET  /api/v1/incidents/groups          Roll up open/contained incidents by (kind, source) so credential spray collapses into one row per attacker IP. Read scope. Accepts ?status=active|all|open|contained|resolved|dismissed, ?kind=, ?limit=.

GET  /api/v1/modsec/stats              WAF statistics (read scope)
GET  /api/v1/modsec/blocks             Blocked requests log, aggregated per IP (read scope)
GET  /api/v1/modsec/events             WAF event details (read scope)
GET  /api/v1/modsec/rules              Loaded rules list
POST /api/v1/modsec/rules/apply        Apply custom rules
POST /api/v1/modsec/rules/escalation   Change rule severity/action

Rules & Suppressions

GET  /api/v1/rules/status        YAML/YARA rule counts, version
GET  /api/v1/rules/list          Rule files
GET  /api/v1/suppressions        Suppression rules
POST /api/v1/rules/reload        Reload signature rules from disk
POST /api/v1/suppressions        Add or delete suppression rule
POST /api/v1/rules/modsec-escalation   ModSec escalation override

Email

GET  /api/v1/email/stats         Email scanning statistics
GET  /api/v1/email/forwarders    Mail forwarder inventory with destination providers and local-copy flags (read scope)
GET  /api/v1/email/deferrals     Outbound deferral rollup by provider and sending IP with reason codes, parsed from exim_mainlog (read scope)
GET  /api/v1/email/queue-composition  Mail queue makeup: real vs null-sender bounce backscatter, frozen count, oldest age, top stuck recipients (read scope)
POST /api/v1/email/queue/flush-backscatter  Remove only frozen null-sender (backscatter) messages from the exim queue on cPanel hosts; returns removed count or 503 when unavailable (admin scope, CSRF)
GET  /api/v1/email/held          Forward copies held by the forward guard (admin scope)
POST /api/v1/email/held/{id}/release   Re-inject a held forward copy to its external recipient (admin scope, CSRF)
DELETE /api/v1/email/held/{id}   Discard a held forward copy (admin scope, CSRF)
GET  /api/v1/email/groups        Server-grouped action rows (kind=compromised_account|spam_outbreak|auth_failure|queue_alert|malware) with from/to/limit (read scope)
GET  /api/v1/email/relay-abuse   Outbound PHP-mail abuse detections (spam outbreaks, high-volume scripts/accounts) with per-site script breakdown; from/to/limit (read scope)
GET  /api/v1/email/quarantine    Quarantined email list
GET  /api/v1/email/av/status     Email AV watcher status
POST /api/v1/email/quarantine/   Release or delete quarantined email

Hardening

GET  /api/v1/hardening           Load last hardening audit report
POST /api/v1/hardening/run       Run hardening audit and save report

Actions

POST /api/v1/fix                      Apply fix for a finding
POST /api/v1/fix-bulk                 Bulk fix multiple findings
POST /api/v1/dismiss                  Dismiss a finding
POST /api/v1/scan-account             On-demand account scan
POST /api/v1/quarantine-restore       Restore quarantined file
POST /api/v1/quarantine/bulk-delete   Bulk-delete quarantined files
POST /api/v1/db-object-backup-restore Restore a dropped MySQL object from its db_object_backups record
POST /api/v1/test-alert               Send test alert through all channels
POST /api/v1/import                   Import state bundle (suppressions, whitelist)

Settings

GET  /api/v1/settings             List editable config sections
GET  /api/v1/settings/<section>   Read a config section (secrets redacted)
POST /api/v1/settings/<section>   Update a config section (safe fields reload, restart fields queue)
POST /api/v1/settings/restart     Request a daemon restart (after editing restart-required fields)
POST /api/v1/settings/firewall/tentative-apply  Save firewall config, restart, and arm rollback timer
GET  /api/v1/settings/firewall/rollback         Read pending rollback state
POST /api/v1/settings/firewall/confirm          Confirm tentative firewall changes
POST /api/v1/settings/firewall/revert           Revert tentative firewall changes now

Sections map to top-level config keys: alerts, auto_response, challenge, reputation, performance, infra_ips, sentry, etc. Writes persist to csm.yaml, re-sign the integrity hash, and hot-reload where possible; restart-required changes are queued for /api/v1/settings/restart. Invalid field values return 422 and do not touch disk. Firewall tentative apply is restart-class by design: it snapshots the previous config, writes the new one, restarts the daemon, and auto-reverts unless the operator confirms before the timer expires.

Operator preferences

Per-operator state (UI density, timestamp display, default auto-refresh, saved filter views) is keyed server-side by SHA-256 of the auth token, so preferences follow the operator across browsers and devices without the daemon ever storing the raw credential. Capability flag: webui.prefs.v1. These endpoints require admin scope because they read or mutate operator-private UI state.

GET    /api/v1/prefs/user        Read this operator's UI preferences
PUT    /api/v1/prefs/user        Replace the prefs blob (CSRF on cookie sessions)
GET    /api/v1/prefs/views       List saved views; `?page=findings` filters by page
PUT    /api/v1/prefs/views       Upsert one view {page, name, params} (CSRF on cookie sessions)
DELETE /api/v1/prefs/views       Delete one view {page, name} (CSRF on cookie sessions)

Response shape for GET /api/v1/prefs/user:

{
  "density":       "comfortable",
  "timezone":      "local",
  "auto_refresh":  "on",
  "table_columns": { "findings-table": ["check","severity","when"] }
}

density is comfortable or compact. timezone is server, local, or an IANA-shaped zone string (e.g. Europe/Bucharest). auto_refresh is on or off. Server-side sanitisation drops any other value. Unset prefs encode as empty strings; the UI applies comfortable, local, and on defaults.

Response shape for GET /api/v1/prefs/views:

[
  {
    "name": "Critical SSH",
    "page": "findings",
    "params": { "severity": "critical", "check": "smtp_bruteforce" },
    "updated": 1779743255
  }
]

Saved views are operator-scoped and capped at 200 per operator. The saved view collection is stored as one 64 KiB preference blob. page and params keys must be simple identifiers: ASCII letters, digits, underscore, hyphen, or dot, up to 64 bytes. Each view has at most 32 params, and param string values are capped at 256 bytes. name must be 1-80 bytes with no control characters. PUT and DELETE return {"status":"ok"} on success.

Bulk-action undo

Bulk threat block / whitelist and bulk firewall unblock responses return an undo_token when the daemon queues an inverse operation server-side for 30 seconds. The UI surfaces a banner with the same TTL; CLI callers can act on the token through the endpoints below. Each successful undo writes an undo_<original_action> audit entry. Capability flag: webui.undo.v1. These endpoints require admin scope because they read or mutate operator-private action state.

GET  /api/v1/undo/pending    Latest pending undo entry for this operator (empty object if none)
POST /api/v1/undo/run        Consume an entry and dispatch its inverse {id}; empty id uses latest

Non-empty response shape for GET /api/v1/undo/pending:

{
  "id": "188d1f2a6c8b0000",
  "action": "threat_bulk_block",
  "inverse": "threat_bulk_unblock",
  "summary": "Blocked 2 IPs",
  "recorded_at": "2026-05-26T00:07:09Z",
  "expires_at": "2026-05-26T00:07:39Z"
}

POST /api/v1/undo/run returns {status, action, inverse, count} on success, or 410 Gone when the entry is missing, already consumed, or past its 30-second TTL. Recognised inverse action keys are threat_bulk_unblock, threat_bulk_block, threat_bulk_unwhitelist, threat_bulk_whitelist, and firewall_bulk_reblock. Other bulk actions (quarantine delete, generic fix) do not surface an undo token because they have no clean inverse.

Finding fields

Every finding in /api/v1/findings, /api/v1/events, and the JSONL audit log carries optional correlation fields when CSM can attribute them:

FieldMeaning
tenant_idTenant attribution from the verdict callback or panel-side webhook reply
domainDomain associated with the event (e.g. PHP-relay scriptKey host, mailbox domain)
mailboxMailbox attribution (e.g. mail brute-force target, PHP-relay envelope-from)
relay_totalPHP-relay trigger count for the path that fired
relay_breakdownPHP-relay script samples that contributed to the alert, with script key, hit count, last seen time, and a bounded sample subject when available

Fields are omitted when the daemon could not attribute them. Orchestrators should treat absence as “unknown,” not “global.”

Cleanup fields

GET /api/v1/quarantine also powers the Cleanup page’s file-backup list. Entries include:

FieldMeaning
kindquarantine or pre_clean
live_stateoriginal_missing, live_differs, original_not_file, archive_missing, archive_not_file, or unknown. Byte-identical restored entries are hidden.

GET /api/v1/db-object-backups returns restored and restored_at when a captured MySQL trigger/event/procedure/function backup has already been replayed.

Incidents

GET /api/v1/incidents

Returns every incident (open, contained, resolved, dismissed) sorted by updated_at descending.

GET /api/v1/incidents/<id>

Returns one incident by id. 404 if not found.

POST /api/v1/incidents/<id>/status

Body:

{"status": "resolved", "details": "operator-marked"}

Status values: open, contained, resolved, dismissed. Closing an incident (resolved/dismissed) means future findings for the same correlation key start a fresh incident. Reopening an incident binds the same key again. Incident JSON includes correlation_key when CSM has a stored account, mailbox, domain, process, or remote-IP key.

Metrics (Prometheus)

CSM exposes a /metrics endpoint on its HTTPS web UI port (default 9443). The endpoint serves the Prometheus text exposition format (Content-Type: text/plain; version=0.0.4) and is safe to scrape every 15 seconds.

“Available metrics” below is the shipped set. New call sites are instrumented in ongoing releases; check CHANGELOG.md under ## [Unreleased] for the latest additions.

Enabling

Metrics are on whenever webui.enabled: true is set in csm.yaml. The endpoint has its own auth knob:

webui:
  enabled: true
  auth_token: "<UI login token>"
  metrics_token: "<long random string for Prometheus scraper>"

metrics_token is optional. When set, a Bearer header containing this exact value unlocks /metrics. The UI auth_token or a valid UI session cookie is also accepted so the dashboard can self-scrape, but keeping the two tokens separate is recommended: rotating auth_token does not then break Prometheus scraping, and giving your monitoring stack the scrape token does not also give it UI access.

Prometheus scrape config

scrape_configs:
  - job_name: csm
    scheme: https
    tls_config:
      # CSM serves a self-signed cert by default; either skip
      # verification here or pin the CA you chose.
      insecure_skip_verify: true
    authorization:
      type: Bearer
      credentials: "<metrics_token from csm.yaml>"
    static_configs:
      - targets:
          - csm-host-1.example.internal:9443
          - csm-host-2.example.internal:9443

A complete, validated version of this snippet (with global: block) ships as docs/src/examples/prometheus-scrape.yml. The CI pipeline runs promtool check config against that file in the promtool-check job; if the example ever stops validating, the pipeline fails.

Quick check

curl -sk -H "Authorization: Bearer $METRICS_TOKEN" \
    https://localhost:9443/metrics | head

Available metrics

Build / process

  • csm_build_info{version} (gauge, always 1): build metadata. Scrape once to discover the running version. Join on it in queries via group_left(version).

YARA-X worker (default-on; off only if signatures.yara_worker_enabled: false)

  • csm_yara_worker_restarts_total (counter): cumulative number of times the supervisor has restarted the csm yara-worker child. Alert on sustained growth: a single restart is routine (rule deploys), a steady climb means the worker is crash-looping and real-time YARA scans are degraded.

Findings

  • csm_findings_total{severity} (counter): every finding CSM records is counted here. Severities are CRITICAL, HIGH, and WARNING (matching the alert.Severity enum). Use rate(...) for arrival velocity; watch for sudden CRITICAL spikes.

Alert delivery

  • csm_alert_dispatch_failures_total (counter): alert channel sends that failed after CSM detected findings. Counts email, webhook, and phpanel delivery failures. Sustained growth means findings are not reaching operators; check SMTP, webhook reachability, and credentials.

State

  • csm_store_size_bytes (gauge): on-disk size of the bbolt state database (/var/lib/csm/state/csm.db by default). Enable the retention: block to bound logical growth and run csm store compact during maintenance to reclaim freelisted pages; without either, this gauge only climbs.

Fanotify realtime monitor

  • csm_fanotify_queue_depth (gauge): current number of queued events waiting for the analyzer pool. The queue capacity is 4000; sustained values near that cap mean drops are imminent. Alert target: max_over_time(csm_fanotify_queue_depth[5m]) > 3500.
  • csm_fanotify_events_dropped_total (counter): cumulative events dropped because the analyzer queue was full. The reconcile pass still rescans drop-affected directories 60 s later, so dropped events do not disappear from detection – they arrive delayed. Alert target: rate(csm_fanotify_events_dropped_total[5m]) > 0 paired with a short for-clause.
  • csm_fanotify_reconcile_latency_seconds (histogram): how long the post-overflow reconcile pass takes to walk drop-affected directories and rescan recent files. Buckets: 0.01 s .. 60 s. Watch p95: reconcile stealing tens of seconds means bulk events are piling up faster than the walker can keep up.
  • csm_checks_domlog_discovery_dropped_total{reason} (counter): per-vhost access-log paths the WP brute-force domlog discovery helper dropped before scanning. Labels: reason is evalsymlinks_error (broken symlink, attacker-removed log file) or stat_error (file vanished between glob and stat, permission regression on the log directory). Steady growth means a real chunk of vhosts is being silently skipped each cycle. Stale-mtime drops are intentional filtering and are NOT counted here.
  • csm_realtime_content_scan_truncated_total{check} (counter): cumulative real-time content checks where the underlying file was larger than the main read window, so the full-rule pass saw only the leading window. The read cap protects RE2 cost on huge files; sustained growth on a label means full-rule coverage is capped on large files. Labels currently emitted: phpcontent_inline (known webshell filename), phpcontent_uploads (PHP in uploads), php_check (generic PHP content scan), crontab (per-user /var/spool/cron write), htaccess (per-vhost .htaccess write), user_ini (per-vhost .user.ini write), html_phishing (HTML phishing heuristic), and cgi_backdoor (CGI backdoor heuristic). Compare against finding history for webshell_content_realtime (or the matching check name for non-PHP labels) to judge whether a raised cap would surface real findings.

Periodic check runner

  • csm_checks_crontab_base64_truncated_total (counter): crontab base64 candidates that exceeded the per-blob decode cap before decoded-content pattern matching ran. Sustained growth means encoded cron content is larger than the scanner currently inspects; review affected crontabs and tune the scanner before redeploying.

  • csm_check_duration_seconds{name,tier} (histogram): wall-clock time each check takes to complete. Label name is one of the 62 checks (fake_kernel_threads, webshells, …); label tier is critical, deep, or all. Buckets: 0.01 s .. 900 s. Most checks keep the 300 s timeout ceiling; heavy filesystem checks can run up to 900 s. Useful aggregations:

    # p95 of the slowest check in the critical tier:
    histogram_quantile(0.95,
      sum by (le, name) (
        rate(csm_check_duration_seconds_bucket{tier="critical"}[10m])
      )
    )
    
    # total time each cycle spends in deep-tier checks:
    sum by (tier) (rate(csm_check_duration_seconds_sum{tier="deep"}[1h]))
    

Threat intelligence

Registered when reputation.upstream.enabled: true.

  • csm_threatintel_cache_hits_total (counter): upstream threat-intel lookups served from CSM’s local per-IP cache.
  • csm_threatintel_cache_misses_total (counter): upstream threat-intel lookups not served from the local cache. A miss may still fail open without an HTTP request when the breaker is open.
  • csm_threatintel_backend_failures_total (counter): upstream backend failures from network errors, non-200 responses, malformed JSON, response IP mismatches, or out-of-range scores.
  • csm_threatintel_breaker_open (gauge): 1 while the upstream circuit breaker is refusing calls, 0 when closed or allowing its single cooldown probe.

Firewall

  • csm_blocked_ips_total (gauge): number of IPs currently on the firewall block list. Excludes expired temp bans – the store’s LoadFirewallState filters those before the gauge reads.
  • csm_firewall_rules_total (gauge): total firewall rules across all four categories (blocked IPs, allowed IPs, blocked subnets, port-specific allows). Excludes expired temp blocks and allow-list rows. Sudden drops are worth investigating; expected drops happen when temporary block or allow deadlines pass.

Config reloads

  • csm_config_reloads_total{result} (counter): SIGHUP reload attempts, by outcome. Labels: result is one of:
    • success – safe fields swapped in place, integrity hash re-signed, live config updated.
    • restart_required – one or more fields that need a full restart changed; live config unchanged.
    • error – YAML parse failure, validation failure, or re-sign failure; live config unchanged.
    • noop – file edit produced no semantic change (identical values, whitespace edit, etc.). Alert target: rate(csm_config_reloads_total{result="error"}[5m]) > 0 paired with a short for-clause.

Auto-response

  • csm_auto_response_actions_total{action} (counter): every auto-response action fired, by class. Labels: action is kill, quarantine, or block. Incremented once per finding the corresponding Auto* helper produces, so a batch blocking four IPs in one cycle adds 4 to action=block. Useful for detecting response storms: rate(csm_auto_response_actions_total[5m]).

Retention (when retention.enabled: true)

  • csm_retention_sweeps_total (counter): number of retention sweep cycles completed since daemon start. A flatline after a restart means the sweep goroutine is not scheduling; a healthy daemon increments this on every sweep_interval tick.
  • csm_retention_deleted_total (counter): cumulative entries deleted across the history, attacks:events, and reputation buckets. Spikes on the first sweep after enabling retention (initial backlog), then settles to the steady-state churn. Useful for estimating when the file might benefit from a csm store compact maintenance window.

PHP-relay (email abuse, cPanel only)

All series are prefixed csm_php_relay_. Registered when email_protection.php_relay.enabled: true and the host is cPanel; otherwise zero across the board. See Real-time detection.

  • csm_php_relay_findings_total{path} (counter): findings emitted per detection path. Labels: path is one of header, volume, volume_account, fanout (and later baseline, reputation for Stages 2-3). Use rate(...) to spot detection storms; a sudden rate jump on header typically means a contact-form vulnerability is being exploited, on volume_account typically means an account password was leaked.
  • csm_php_relay_actions_total{action,result} (counter): auto-freeze invocations attempted. Labels: action is currently freeze; result is ok or fail. Pair with csm_php_relay_findings_total to confirm freeze keeps up with detection.
  • csm_php_relay_action_gone_total (counter): messages already absent from the spool by the time exim -Mf ran. Normal queue churn; not a failure. Sustained growth means the spool is moving fast and the freezer is racing the queue runner.
  • csm_php_relay_path_skipped_total{path,reason} (counter): path evaluation that bailed before producing a finding. Labels: path matches the finding labels above; reason enumerates the gate that fired (e.g. ignore-list match, missing scriptKey).
  • csm_php_relay_spool_scan_fallbacks_total{reason} (counter): AutoFreeze fell back to a full spool walk to find msgIDs. Labels: reason is capped (the in-memory activeMsgs per script hit its cap, so a fresh disk walk was needed) or reputation (a late reputation finding arrived for a script with no live activeMsgs). Sustained growth on capped means a single script is firing faster than the in-memory window keeps state for; consider raising header_score_volume_min or adding an ignore.
  • csm_php_relay_active_msgs_capped_total (counter): per-script activeMsgs set hit its cap and dropped the oldest entry. Counts the eviction event itself; the next freeze for that script will land in csm_php_relay_spool_scan_fallbacks_total{reason="capped"}.
  • csm_php_relay_windows_active{kind} (gauge): retained per-script / per-IP / per-account window state. Labels: kind is script, ip, or account. Sized by Flow E sweep cadence (5 min for windows, 24 h retention for accounts); flat values across hours are normal.
  • csm_php_relay_msgid_index_size{layer} (gauge): msgID dedup index size by storage layer. Labels: layer is memory (in-process map) or bbolt (persisted batch writer). Memory ceiling is 200k entries; bbolt grows freely until the 25 h Flow E sweep prunes it.
  • csm_php_relay_msgindex_persist_dropped_total (counter): bbolt persist queue overflow drops (the 4096-deep buffered channel was full when the watcher tried to enqueue). Should be zero in steady state; a non-zero value means the bbolt writer is blocked on disk and the in-memory dedup is the only thing protecting against double-fire on a queue-runner re-write.
  • csm_php_relay_msgindex_persist_errors_total (counter): bbolt commit failures from the async batch writer. Each bump also emits a Critical email_php_relay_msgindex_persist_failed finding. Disk-full or permissions issue on /var/lib/csm/state/csm.db.
  • csm_php_relay_inotify_overflows_total (counter): kernel IN_Q_OVERFLOW events on the spool watcher. Each one triggers a bounded recovery scan (default cap 1000 files); if the cap fires, also emits email_php_relay_overflow_scan_truncated Critical. Sustained growth means the spool is churning faster than inotify can keep up — usually a backup restore or a real attack.
  • csm_php_relay_spool_read_errors_total (counter): emailspool.ParseHeaders errors on -H files the watcher tried to consume. Usually transient (file disappeared between inotify event and open) and self-correcting; sustained growth points at a permissions or filesystem problem.
  • csm_php_relay_userdata_errors_total (counter): cpanelUserDomains resolver errors reading /var/cpanel/userdata/. Used by the Path 1 From mismatch check; errors here mean Path 1 is potentially undercounting until the read recovers.

Signature retroactive rescans

  • csm_signature_rescans_total (counter): full deep-tier sweeps completed because a signature file’s mtime advanced. Steady-state zero on hosts that don’t auto-update rules; ticks once per update-rules invocation otherwise.

Counter reset semantics

Prometheus counters in CSM live in process memory. They reset to zero whenever the daemon restarts (config change, binary upgrade, crash recovery). This is the standard behaviour for every Prometheus-instrumented daemon; Prometheus’s scrape pipeline detects counter resets on its own and rate(), increase(), and rate_over_time() all handle them correctly.

Operators should not alert on “counter decreased across a scrape” as a failure condition. Alert on rate() or increase() of a counter over a window long enough to absorb expected restarts.

Persisting counters across restarts would require writing to bbolt on every increment, which would not pay for itself. If a specific metric needs restart-stable behaviour later, a gauge-over-the-bbolt-counter pattern can be added for that one case without affecting the rest.

Caveats

  • Scrape the web UI’s HTTPS port, not a separate listener.
  • curl -k / insecure_skip_verify is appropriate only when the cert is self-signed and the network path is trusted. Pin a CA for anything else.
  • Prometheus label cardinality: per-account and per-IP labels are deliberately not exposed. Shared-hosting deployments with 1000+ cPanel users would otherwise overwhelm a Prometheus server.
  • Metric vectors cap label-value combinations at 1000 children per metric, including the overflow bucket. Once a vector reaches that cap, new combinations are aggregated under _overflow_.

Not instrumented (yet)

  • Per-account labels on any metric. Deliberately off: shared-hosting deployments with 1000+ cPanel users would blow out Prometheus cardinality.
  • Fanotify inline auto-response actions (the quarantine-while- seeing-the-write path in fanotify.go). The periodic csm_auto_response_actions_total does not count those; a follow- up may split the metric or add a source label.
  • bbolt per-bucket size breakdown, csm_store_used_bytes, and csm_store_last_compact_ts. Deferred to the online-compaction follow-up of the retention work (see ROADMAP.md).

Audit Log (SIEM)

Audit Log

CSM ships every deduplicated finding to one or more SIEM-friendly sinks before the operator-alert rate limiter runs, so Splunk, Loki, Elastic, and friends always see the complete picture even when email and webhook traffic is throttled.

Two sink types ship today, both opt-in via csm.yaml. They can be enabled together or independently.

Schema

Every event, regardless of transport, has the same shape:

{
  "v": 1,
  "ts": "2026-04-28T10:32:14.512938Z",
  "finding_id": "8e3f1c204c1d8b95",
  "severity": "CRITICAL",
  "check": "webshell_realtime",
  "message": "PHP execution primitive in uploads/",
  "details": "...",
  "file_path": "/home/customer/public_html/uploads/x.php",
  "hostname": "host.example.com"
}

The v field is the schema version. CSM bumps it on incompatible changes and will not bump it for additive fields, so SIEM parsers can pin on v: 1 and ignore unknown keys.

finding_id is a stable 16-hex-char hash of the canonical fields (timestamp, check, severity, message, file path). Two emits of the same finding produce the same ID, so downstream dedup works across re-runs.

Process context

Exec and outbound-connection findings on BPF-backed hosts carry an optional process object with PID, PPID, UID, user, cPanel account (when known), comm, exe, sanitized cmdline, and a parent chain. The field is omitted when no context is available, so existing parsers that ignore unknown keys see no schema change.

{
  "severity": "HIGH",
  "check": "outbound_connection",
  "message": "Suspicious outbound connection",
  "process": {
    "pid": 4242,
    "ppid": 4200,
    "uid": 1001,
    "user": "alice",
    "account": "alice",
    "comm": "ncat",
    "exe": "/usr/bin/ncat",
    "cmdline": ["ncat", "203.0.113.10", "587"],
    "parent": {
      "pid": 4200,
      "ppid": 4100,
      "uid": 1001,
      "comm": "sh"
    }
  },
  "timestamp": "2026-05-07T12:34:56Z"
}

The parent chain may be truncated at depth 5 and may stop early if an intermediate parent has been evicted from the cache.

File sink (JSONL)

alerts:
  audit_log:
    file:
      enabled: true
      path: /var/log/csm/audit.jsonl    # default

The default path is created with mode 0640 and the parent dir with 0750. The packaged logrotate fragment uses copytruncate mode so the daemon’s open file descriptor stays valid across rotation – no SIGHUP needed.

Tail it for an interactive view:

tail -F /var/log/csm/audit.jsonl | jq -c

Or hand it to a log shipper like Vector, Filebeat, or Fluentbit.

Syslog sink (RFC 5424)

alerts:
  audit_log:
    syslog:
      enabled: true
      network: udp                  # udp | tcp | unix | unixgram | tls
      address: 127.0.0.1:514        # host:port for udp/tcp/tls, path for unix*
      facility: local0              # default
      tls_ca: ""                    # optional PEM file for tls transport

Wire-line is RFC 5424 with the JSON event embedded as the MSG body, so receivers that already understand the JSONL schema parse it the same way regardless of transport. UDP and unix-datagram emit one datagram per message; TCP, TLS, and unix-stream use LF framing.

Severity mapping onto the standard syslog level set:

CSM severitySyslog levelNumeric
CRITICALcrit2
HIGHerr3
WARNINGwarning4

Tested against rsyslog and syslog-ng receivers in integration.

Backfill

When you first turn on the audit log, the SIEM has no history. Use csm export --since <when> to dump prior findings in the same JSONL schema:

csm export --since 24h > recent.jsonl
csm export --since 2026-04-01T00:00:00Z > q2.jsonl

<when> is either an RFC 3339 timestamp or a duration relative to now (24h, 7d). The output is one JSON event per line on stdout, identical in shape to what the live sinks emit, so you can pipe it straight into the same ingest pipeline.

Requires a running daemon.

What gets logged

Every finding the alert pipeline produces, after deduplication but before:

  • the per-account rate limiter (so audit signal is not lost when email and webhook are throttled);
  • the “blocked IP suppression” filter (so SIEM correlation sees events that operators were spared);
  • the per-sink disabled-checks list (audit log is not subject to email’s disabled_checks).

This means audit-log volume is generally higher than the email or webhook stream. Plan SIEM retention accordingly.

What does not get logged

The audit log is not a replacement for csm.history (the bbolt history bucket). Only findings that pass through alert.Dispatch() are emitted. Internal state changes – daemon startup, reload events, config changes – live in journald via csm.service and are not mirrored here.

Building & Testing

Build

# Standard build (no YARA-X)
go build ./cmd/csm/

# Build with YARA-X support (requires libyara_x_capi)
CGO_LDFLAGS="$(pkg-config --libs --static yara_x_capi)" go build -tags yara ./cmd/csm/

Test

go test ./... -count=1           # all tests
go test -race -short ./...       # CI mode (race detector, skip slow tests)

Fuzz

CSM has a dozen parsers that read attacker-controlled input: Exim mainlog lines, Dovecot maillog lines, Apache Combined Log Format, /proc/net/tcp rows, wp-config.php bodies, /etc/shadow, auditd comm fields, and finding messages coming back from the WebUI.

Each parser has a Go fuzz target (files named fuzz_parsers_test.go under internal/checks/ and internal/daemon/). Fuzz targets do two things:

  1. Their seed corpus runs as part of the normal test suite. go test ./... executes every seed, so a known-bad input stays a regression test forever.
  2. The actual fuzzer runs with -fuzz=FuzzFoo.

Run a target for a fixed time while investigating:

go test ./internal/checks/... -run=^$ -fuzz=^FuzzExtractPHPDefine$ -fuzztime=30s

Run only the seeds:

go test -run=Fuzz ./internal/checks/... ./internal/daemon/...

If the fuzzer finds a crasher it writes the failing input to testdata/fuzz/FuzzFoo/<hash>. Commit that file alongside the fix and the input becomes a permanent seed.

Adding a fuzz target:

func FuzzMyParser(f *testing.F) {
    // Seeds: real-world valid shape, empty, malformed.
    f.Add("valid input")
    f.Add("")
    f.Add("corrupt/truncated")

    f.Fuzz(func(t *testing.T, s string) {
        _ = myParser(s)   // must not panic on any input
    })
}

Keep the target tight: call one function, assert it returns. Output verification belongs in a regular test.

Lint

make lint                        # must pass before push
gofmt -l .                       # must produce no output

make lint uses repo-local cache directories under .cache/ so the command behaves consistently in local shells, sandboxes, and CI runners.

Linter config in .golangci.yml: errcheck, govet, staticcheck, unused, ineffassign, gocritic, misspell, bodyclose, nilerr.

CI/CD

GitLab CI (.gitlab-ci.yml) is the internal build pipeline. It runs lint/test/package jobs, publishes internal packages, mirrors to GitHub, and creates the public GitHub release artifacts.

StageWhat it does
lintgolangci-lint, gofmt, gosec (blocking), govulncheck
testgo test -v -race -timeout=300s -covermode=atomic -coverprofile -coverpkg=./internal/... ./...
build-imageBuild CSM builder Docker image with YARA-X (manual trigger)
buildTwo architectures: amd64 with YARA-X CGO, arm64 pure Go
integrationSpin up AlmaLinux + Ubuntu cloud servers via phctl, install CSM from the public mirror, run the integration test binary on both hosts, collect coverage. Only runs on main
packageRPM + DEB via nFPM
signDetached signatures on release artifacts
publishInternal GitLab Generic Package Registry (versioned + latest)
repoPublish RPM/DEB to the public mirrors.pidginhost.com apt/dnf repos
pagesDocs + coverage HTML (GitLab Pages preview)
cleanupRemove old package versions
releaseGitLab release on tags matching v*
githubMirror to GitHub + upload release artifacts (auto on tag push)

Public Releases

To cut a release:

  1. Move the [Unreleased] heading in CHANGELOG.md to the new version (e.g. [2.4.2] - YYYY-MM-DD), commit as release: cut X.Y.Z.
  2. Tag and push:
    git tag vX.Y.Z
    git push origin main vX.Y.Z
    
  3. Wait. The tag pipeline runs integration, publishes packages to the mirror, creates the GitHub release, and uploads every artifact including the fresh merged-coverage.out. No manual pipeline clicks needed.

The coverage badge rebuilds automatically once the GitHub release exists, because the Pages workflow fetches merged-coverage.out from the latest release that carries one (it walks back through releases if the newest is missing the asset).

Installs and upgrades on end-user servers come from the GitHub release artifacts or the apt/dnf mirror. The internal GitLab package registry is operational tooling only.

Code Conventions

  • Imports: stdlib, blank line, third-party, blank line, internal. Use goimports -local github.com/pidginhost/csm
  • Errors: Return up the call stack. Wrap with fmt.Errorf("context: %w", err)
  • Store: store.Global() singleton bbolt DB. Always nil-check.
  • State: state.Store handles finding dedup, alert throttling, baseline tracking, latest findings persistence. Passed to subsystems at init
  • Web UI: Vanilla JS, no framework, no build step. Tabler CSS framework. Use CSM.get() / CSM.post() / CSM.delete() for API calls. Escape string-built markup with CSM.esc(); prefer DOM APIs for attacker-controlled values.
  • Logging: New code should use internal/log (wraps log/slog). Legacy fmt.Fprintf(os.Stderr, "[%s] ...", ts()) call sites remain valid until migrated.

Structured Logging (slog)

CSM’s daemon emits ~190 log lines via fmt.Fprintf(os.Stderr, "[%s] ...", ts()). The internal/log package provides a drop-in slog wrapper so operators can opt into JSON output for log-shipping pipelines (Loki, ELK, Datadog) without a big bang migration.

Operator controls

Two environment variables, read once at daemon startup:

VariableValuesDefaultEffect
CSM_LOG_FORMATtext, jsontextOutput handler
CSM_LOG_LEVELdebug, info, warn, errorinfoMinimum log level

Set via systemd drop-in:

# /etc/systemd/system/csm.service.d/logging.conf
[Service]
Environment="CSM_LOG_FORMAT=json"
Environment="CSM_LOG_LEVEL=info"

Then systemctl daemon-reload && systemctl restart csm.

Writing new logging code

import csmlog "github.com/pidginhost/csm/internal/log"

csmlog.Info("scan complete", "findings", len(f), "duration_ms", d.Milliseconds())
csmlog.Warn("log not found, will retry", "path", path, "retry_in", "60s")
csmlog.Error("alert dispatch failed", "err", err, "channel", "email")

Keys should be snake_case. Values should be machine-parseable (numbers, strings, booleans) – avoid formatted strings when you can pass the raw value.

Migrating legacy call sites

Migration is incremental and optional. The legacy format stays valid. Start with the hottest subsystems (alert dispatch, firewall operations, WAF handlers) where structured fields provide the most value, then work outward. Do not batch-convert – each subsystem should get a dedicated commit with before/after log samples in the PR description.

Keep the [TIMESTAMP] prefix of journalctl lines readable by humans: slog’s text handler uses time=... level=... msg=... which is also human-parseable, so journalctl viewers still work.

YARA-X Worker Process

CSM runs YARA-X in a supervised child process by default (since the 2026-04-23 default-flip). The goal is blast-radius control: a cgo crash inside yara_x_capi (the 2026-04-16 production incident) stays contained to the child and the daemon keeps its fanotify watchers, log watchers, and firewall engine alive. See ROADMAP.md (Related work already landed → “YARA-X process isolation”) for the decision record.

The knob is a tri-state *bool: omit it (or set true) for the default-on child process; set false to fall back to the in-process scanner.

signatures:
  # yara_worker_enabled: true    # default; omit for default-on
  # yara_worker_enabled: false   # explicit opt-out → in-process

When on, daemon startup:

  1. Does not call yara.Init() in the daemon process.
  2. Builds a yaraworker.Supervisor and calls Start(ctx).
  3. The supervisor runs exec.Command(/opt/csm/csm, "yara-worker", "--socket", "/var/run/csm/yara-worker.sock", "--rules-dir", <rulesDir>).
  4. Supervisor waits for the worker’s first Ping before returning.
  5. Installs itself as yara.SetActive(...) so the existing yara.Active() callers (fanotify, rule reload) route transparently through the IPC.

Operator view:

  • ps axf shows the daemon with one csm yara-worker child.
  • New socket: /var/run/csm/yara-worker.sock (0600, root-only).
  • Crashes produce a Critical yara_worker_crashed finding (rate- limited to one per minute) and restart with exponential backoff (1 s, 2 s, 4 s, capped at 60 s). Restarts reset to 1 s after the worker stays up for 30 s.
  • A csm update-rules run that completes triggers the supervisor’s in-process Reload (the worker recompiles). Escalate to a full worker restart from Go code via Supervisor.RestartWorker().

Emailav under worker mode: the IPC wire format carries string-valued rule metadata on every match (yaraipc.Match.Meta / yara.Match.Meta). The emailav adapter consumes Meta["severity"] via yara.Active(), so both in-process and worker backends produce the same verdict shape. Non-string metadata (ints, floats, bytes) is deliberately dropped at the worker boundary; add a typed value struct here only if a future consumer actually needs one.

Testing:

  • Unit-level: internal/yaraipc (protocol framing + round-trip) and internal/yaraworker (handler adapter, Run, supervisor). The supervisor tests re-invoke the test binary as a mock worker via the standard TestMain + env-var helper-process pattern, including a real SIGKILL-driven signal-death test that exercises the syscall.WaitStatus.Signaled() branch.
  • Integration: staged in the GitLab pipeline’s integration stage against AlmaLinux + Ubuntu cloud servers.

Building the Documentation

cd docs
mdbook build              # generates docs/book/
mdbook serve              # local preview at http://localhost:3000

Release Signing

CSM has two separate signing paths:

  • Package repository signing for the normal APT/DNF install path.
  • Detached Ed25519 artifact signatures for raw binaries, tarballs, and package files downloaded outside the package manager.

Do not reuse keys between these paths. The package repositories use GPG because APT and DNF verify repository metadata that way. Detached release signatures use Ed25519 because the standalone install and deploy scripts verify raw artifact bytes with OpenSSL.

Status

SurfaceKey typeCI variableNotes
APT repository metadataGPGCSM_GPG_SIGNING_KEYPublished by repo:publish; operators install with signed-by=/etc/apt/keyrings/csm.gpg.
RPM packages and repository metadataGPGCSM_GPG_SIGNING_KEYPublished by repo:publish; operators use gpgcheck=1 and repo_gpgcheck=1.
Raw binaries, tarballs, .deb, .rpm siblingsEd25519CSM_SIGNING_KEYDetached .sig files for direct downloads and standalone scripts.
YARA Forge rule ZIPsEd25519CSM_SIGNING_KEYSigned by the yara-forge-mirror job; clients verify via signatures.signing_key.

The preferred operator path is the signed APT/DNF repository documented in Installation. Standalone scripts also verify detached signatures: scripts/install.sh embeds the Ed25519 public key in EMBEDDED_SIGNING_KEY. Override it at runtime with CSM_SIGNING_KEY_PEM. Without any key, the scripts warn and continue unless CSM_REQUIRE_SIGNATURES=1 is set.

Public Key

The same Ed25519 key signs release artifacts and YARA Forge rule ZIPs.

Hex form, for signatures.signing_key in CSM config:

2d1472b2a1d9728c2717b75111487145a7863f7ce731c1b44181f7a68bb908f7

PEM form, for standalone script verification (EMBEDDED_SIGNING_KEY / CSM_SIGNING_KEY_PEM):

-----BEGIN PUBLIC KEY-----
MCowBQYDK2VwAyEALRRysqHZcownF7dREUhxRaeGP3znMcG0QYH3pou5CPc=
-----END PUBLIC KEY-----

Package Repository Signing

repo:publish runs on version tag pipelines and rebuilds the public package repositories from the current tag plus the retained historical releases.

Required protected CI variables:

VariableTypePurpose
CSM_GPG_SIGNING_KEYFileGPG private key used to sign APT metadata, RPM packages, and RPM repo metadata.
CSM_MIRROR_SSH_KEYFileSSH key used to publish the mirror output.
CSM_MIRROR_KNOWN_HOSTSVariableSSH host keys for the mirror host.

The job exports the public key as csm-signing.gpg and publishes it at the mirror root so install docs can reference:

https://mirrors.pidginhost.com/csm/csm-signing.gpg

APT verifies signed repository metadata through the signed-by= keyring. DNF verifies both RPM package signatures and repository metadata via gpgcheck=1 and repo_gpgcheck=1.

Detached Artifact Signatures

sign:artifacts signs release files with the Ed25519 private key in CSM_SIGNING_KEY when that variable is present. Each signed file gets a .sig sibling uploaded with the artifact.

Examples:

csm-linux-amd64
csm-linux-amd64.sig
csm_3.0.0_amd64.deb
csm_3.0.0_amd64.deb.sig
csm-3.0.0-1.x86_64.rpm
csm-3.0.0-1.x86_64.rpm.sig

The signature covers the raw artifact bytes with no hashing wrapper. Verification uses:

openssl pkeyutl -verify -pubin -inkey csm-signing.pub -rawin \
  -sigfile csm-linux-amd64.sig -in csm-linux-amd64

Detached Signature Setup

On a trusted workstation:

openssl genpkey -algorithm ed25519 -out csm-signing.key
openssl pkey -in csm-signing.key -pubout -out csm-signing.pub

Store the private key in GitLab as a protected CSM_SIGNING_KEY variable. Keep the private key in an offline password manager and a second secure backup location. Do not commit it.

For standalone script verification, either:

  • Embed the public key PEM in EMBEDDED_SIGNING_KEY in scripts/install.sh, scripts/deploy.sh, and scripts/deploy-gitlab.sh.
  • Or pass the public key at runtime with CSM_SIGNING_KEY_PEM.

To make missing signatures or missing public keys fatal:

CSM_REQUIRE_SIGNATURES=1 curl -sSL https://raw.githubusercontent.com/pidginhost/csm/main/scripts/install.sh | bash

If a .sig file exists but verification fails, the installer aborts regardless of CSM_REQUIRE_SIGNATURES.

Key Rotation

Package repository GPG key rotation:

  1. Generate a new GPG signing key.
  2. Replace CSM_GPG_SIGNING_KEY in protected CI variables.
  3. Publish a tag pipeline so repo:publish exports the new public key to the mirror.
  4. Update install docs or automation if the key URL changes.

Detached Ed25519 key rotation:

  1. Generate a new Ed25519 key pair.
  2. Replace CSM_SIGNING_KEY in protected CI variables.
  3. Update the embedded public key in standalone scripts, or rotate the CSM_SIGNING_KEY_PEM value used by automation.
  4. Tag a new release.

Old detached signatures remain verifiable only with the old public key. Archive old public keys alongside release metadata so historical releases can still be checked.

Manual Detached Verification

curl -LO https://github.com/pidginhost/csm/releases/download/v3.0.0/csm-linux-amd64
curl -LO https://github.com/pidginhost/csm/releases/download/v3.0.0/csm-linux-amd64.sig

openssl pkeyutl -verify -pubin -inkey csm-signing.pub -rawin \
  -sigfile csm-linux-amd64.sig -in csm-linux-amd64

If verification fails, treat the artifact as untrusted. Do not install it.