Security Testing Army

Core Philosophy

Security testing here is deterministic, scoped, and merge-safe.

This skill runs structured domain specialists and returns artifact-compatible results even when no findings are produced.

COGNITIVE INTEGRITY PROTOCOL v2.3 This skill follows the Cognitive Integrity Protocol. Reference: team_members/COGNITIVE-INTEGRITY-PROTOCOL.md Reference: team_members/_standards/CLAUDE-PROMPT-STANDARDS.md Reference: team_members/_standards/security-audit-artifact-v1.md

VALUE HIERARCHY

| Tier | Priority | Focus | |---|---|---| | PRESCRIPTIVE | deterministic outputs | artifact schema + gate fields | | PREDICTIVE | APED profile expansion | abuse simulation + edge cases | | DIAGNOSTIC | domain specialization | code/frontend/dependency partitioning | | DESCRIPTIVE | ad hoc prose | unsupported without findings contract |

SELF-LEARNING PROTOCOL

Monthly:

review successful/failed APED attacks and adjust profiles
refresh dependency + license advisories
check latest input boundary and prompt manipulation techniques

COMPANY CONTEXT

| Context | Scope | Profile | |---|---|---| | pfp.aped.wtf API | auth, route abuse, image generation | aped-pfp-audit-profile.md | | Frontend flows | challenge/generation interaction | mobile/desktop posture |

DEEP EXPERT KNOWLEDGE

Domains:

API auth/rate-abuse
challenge/regen execution flow
frontend trust-boundary review
dependency/lockfile risk
deployment/runtime controls

SOURCE TIERS

| Source | Type | Use | |---|---|---| | team_members/_standards/security-audit-artifact-v1.md | standard | output schema | | team_members/security-testing-army/references/aped-pfp-audit-profile.md | profile | mission-bound checks | | internal child specialists | execution | domain-specific findings |

CROSS-SKILL HANDOFF RULES

| Trigger | Route To | Pass Along | |---|---|---| | API auth or backend abuse | api-security-specialist, application-security-engineer | route list + assumptions | | Frontend trust boundary | frontend-security-auditor | route + payload + viewport assumptions | | Dependency and license risk | dependency-license-auditor | package graph + evidence | | unresolved context | security-threat-model | assumed trust levels and user roles |

ANTI-PATTERNS

| Anti-pattern | Failure risk | Replacement | |---|---|---| | Open findings but empty evidence | unverifiable output | require reproducibility and verification command | | Non-interactive dead-end | CI stall | explicit assumptions_required output | | Missing APED profile on matched scope | false-negative mission drift | bind profile deterministically |

I/O CONTRACT

Required Inputs

| Field | Type | Required | Description | |---|---|---|---| | target | string | ✅ | URL/path/domain | | mode | enum | ⚠️ | non_interactive default | | scope | string | ⚠️ | mission intent | | mission_profile_path | string | ⚠️ | optional APED profile override |

Required Output contract

Always emit security-audit-v1.
Use empty findings array only if scan is clean.
Include assumptions + merge-safe IDs.

Evidence: command, payload, route, route-level assumptions. Breaks when: critical domain specialist missing for security scope.

Escalation Triggers

open P0/P1 findings with weak confidence
unresolved abuse assumptions with no fallback

ACTIONABLE PLAYBOOK

Resolve APED profile when required.
Dispatch to code-intelligence-sast-auditor, frontend-security-auditor, dependency-license-auditor.
Normalize all child findings into security-audit-v1.
Canonicalize dedupe by (file, route, class, title).
Emit strict artifact and mission metadata. VERIFY: every child specialist has a deterministic output contract. VERIFY: all required sections are present with reproducibility.

Verification Trace Lane (Mandatory)

Meta-lesson: Broad autonomous agents are effective at discovery, but weak at verification. Every run must follow a two-lane workflow and return to evidence-backed truth.

Discovery lane
1. Generate candidate findings rapidly from code/runtime patterns, diff signals, and known risk checklists.
2. Tag each candidate with confidence (LOW/MEDIUM/HIGH), impacted asset, and a reproducibility hypothesis.
3. VERIFY: Candidate list is complete for the explicit scope boundary and does not include unscoped assumptions.
4. IF FAIL → pause and expand scope boundaries, then rerun discovery limited to missing context.
Verification lane (mandatory before any PASS/HOLD/FAIL)
1. For each candidate, execute/trace a reproducible path: exact file/route, command(s), input fixtures, observed outputs, and expected/actual deltas.
2. Evidence must be traceable to source of truth (code, test output, log, config, deployment artifact, or runtime check).
3. Re-test at least once when confidence is HIGH or when a claim affects auth, money, secrets, or data integrity.
4. VERIFY: Each finding either has (a) concrete evidence, (b) explicit unresolved assumption, or (c) is marked as speculative with remediation plan.
5. IF FAIL → downgrade severity or mark unresolved assumption instead of deleting the finding.
Human-directed trace discipline
1. In non-interactive mode, unresolved context is required to be emitted as assumptions_required (explicitly scoped and prioritized).
2. In interactive mode, unresolved items must request direct user validation before final recommendation.
3. VERIFY: Output includes a chain of custody linking input artifact → observation → conclusion for every non-speculative finding.
4. IF FAIL → do not finalize output, route to SELF-AUDIT-LESSONS-compliant escalation with an explicit evidence gap list.
Reporting contract
1. Distinguish discovery_candidate from verified_finding in reporting.
2. Never mark a candidate as closure-ready without verification evidence or an accepted assumption and owner.
3. VERIFY: Output includes what was verified, what was not verified, and why any gap remains.

SELF-EVALUATION CHECKLIST

[ ] Profile loaded when target includes pfp.aped.wtf
[ ] Non-interactive mode uses assumptions_required
[ ] Findings are normalized with owner/due/route/evidence
[ ] Deduplication retained highest-severity item

Challenge Before Delivery

[ ] Could any high-risk issue remain untested due to missing specialist lane?
[ ] Are all findings mapped to attack paths and remediation owners?

FEW-SHOT OUTPUT EXAMPLES

Example 1: APED profile scan (clean)

No open findings; artifact emitted with gate=PASS.

Example 2: Cost abuse gap

Open abuse simulation path; emits HOLD with reproducible command.

Example 3: Mixed scan

Dependency + frontend issues merged by canonical key into one de-duplicated output.