Enterprise Cybersecurity Management
Architecture and Automation
Role of Security Architecture
The security architecture team is responsible for defining the cybersecurity toolset: selecting detective and protective controls, deploying logging and monitoring infrastructure, and specifying configurations. Architectural decisions — how networks are laid out, where controls are placed, and how systems are segmented — can be more impactful to security than the tools themselves.
Build vs. Buy — "Builders Buy"
Rather than defaulting to either pure in-house development or pure vendor purchasing, the recommended approach is "builders buy": build a prototype first to validate the need, understand the hard problems, and develop intelligent questions for vendor evaluation. Benefits of building first:
- Confirms a genuine organic need exists before going to market
- Reveals which parts of the problem are trivial vs. genuinely difficult — enabling informed vendor assessment
- Develops in-house engineering talent that is broadly valuable across the security organization
- Justifies custom development when a problem is truly unique to the organization and no market solution fits
Scalability is the primary reason to ultimately buy: vendors invest in high availability, resiliency, and maturity that internal prototypes will not reach. The right time to go to market is after conceptual validation — when the remaining gap is scalability and polish, not core functionality.
Buying Solutions, Not Problems
Vendors have financial incentive to expand their addressable market, creating marketing pressure that can distort an organization's sense of its own threat profile. The discipline is to identify problems before going to market. The correct sequence:
- Calibrate threat objectives (governance + threat intelligence)
- Red team against those specific threat objectives using real-world TTPs
- Document what the red team was able to accomplish — these are your actual problems
- Only then go to market with a defined problem statement and prototype ideas
This prevents vendors from selling problems that do not apply to the organization's actual threat profile.
Defense in Depth vs. Zero Trust
Defense in depth (30+ year concept): do not rely on a single control; layer complementary controls so that failure of one does not result in compromise. Key distinction: complementary controls address different attack vectors (e.g., a Layer 3 network firewall + a Layer 7 web application firewall); redundant controls address the same vector twice and create the "too many outfielders" problem — diffused ownership leads to both parties assuming the other has coverage, resulting in neither maintaining the control.
Zero trust (recent): assume no device, process, or piece of software is trustworthy by default, even within the internal network. Implementation approach:
- Place any new device or third-party software in an isolated network with no default access
- Define the precise use case — what the software actually needs to communicate with and why
- Implement specific allow rules for only those communications; block everything else including outbound internet access
- Monitor for attempted violations of those restrictions
SolarWinds example: if zero-trust egress restrictions had been in place, malware implanted via the software update would have been unable to establish command-and-control communication. The attack would have been broken at the kill chain even after initial compromise. Zero trust does not prevent the initial infection but contains blast radius and enables early detection.
Roles and Responsibilities for Tool Deployment
Three distinct ownership zones for any security tool:
- Security Architecture (cybersecurity) — selects the tool, runs vendor bake-offs, defines configuration standards, writes firewall policies and security rules
- Operations and Engineering (IT/infrastructure) — deploys hardware, provisions servers, installs software, ensures high availability and disaster recovery, applies configurations consistently at scale
- Security Operations (cybersecurity) — operates the tool: logs in, runs queries, performs threat hunting, generates reports
Mission-critical tools that affect production (e.g., firewalls, load balancers) are deployed by engineering/operations because they have the automation, templating, and consistency required. Passive tools (e.g., intrusion detection systems) that do not affect production traffic can be deployed and operated by security directly.
Security Tool Taxonomy by NIST CSF Function
Identify: bug bounties, attack surface management, vulnerability scanners (internal and external vantage points), Cloud configuration auditors, static/dynamic application security testing (SAST/DAST), source composition analysis for open-source dependencies.
Protect (Ingress): firewalls, DDoS protection, web application firewalls (WAF — Layer 7, inspects HTTP request strings for known attack patterns), email security gateways.
Protect (Egress): firewalls, proxy servers restricting outbound internet access.
Protect (Other): privileged access management (PAM — authenticated gateway requiring MFA before connecting to sensitive resources, full session recording); endpoint protection platform (EPP — behavioral analysis of memory allocation and binary execution, the successor to antivirus).
Detect: intrusion detection systems (IDS), network behavioral analytics (dynamic baseline, flags deviations), endpoint detection and response (EDR — telemetry collection: process lists, DNS lookups per process, network connections; used for threat hunting and forensics), SIEM (centralized log aggregation + correlation logic that generates incidents), deception technology/honeypots (less effective than anticipated; sophisticated red teams can detect and evade them).
Respond: endpoint quarantine/containment capabilities, packet capture infrastructure (must be pre-positioned or runbooks developed to use native tools), security orchestration and automation response (SOAR).
Recover: primarily IT/engineering domain — system rebuilding, code redeployment. Security role: declare clean bill of health before rebuild begins; data diodes (one-way network connections) to protect backup infrastructure from reinfection.
Cyber Defense Matrix (CDM)
The Cyber Defense Matrix is a two-dimensional MECE framework for organizing cybersecurity capabilities. The x-axis is the NIST CSF (Identify, Protect, Detect, Respond, Recover). The y-axis adds five asset classes: Devices, Applications, Networks, Data, Users. This second dimension enables gap analysis — identifying where no controls exist for a given asset type and function combination (e.g., application detection and response is an observed gap in the commercial market).
People/Process/Technology dimension: Identify and Protect should be heavily technology-dependent (automation scales; manual asset inventory does not). Detect, Respond, and Recover should be heavily people-dependent — technology has failed by the time detection occurs, and novel situations require human judgment. Vendors selling detect/respond technology as a replacement for headcount have consistently underdelivered.
CDM use cases: buy vs. build decisions (gaps in the matrix = build candidates), vendor evaluation (force vendors to identify their single primary matrix cell rather than claiming to cover everything), defense in breadth (complements defense in depth by identifying entire uncovered capability classes), and Zero Trust mapping (identity attributes from multiple asset classes feed access control decisions in both Identify and Protect columns).
DNS Tunneling — Architecture Deep Dive
Classic three-tier web architecture uses egress restrictions to prevent back-end servers from communicating with the internet. This blocks most command-and-control establishment. However, DNS queries travel a separate path through DNS servers that typically have recursive internet resolution enabled.
The attack: malware on a back-end server encodes data into dynamically generated subdomain strings within DNS queries (e.g., [encoded-data].badsite.com). The DNS server resolves these outbound. The adversary operates a custom DNS server that decodes the subdomains, extracting data and injecting instructions in the DNS response. A full command shell can be operated entirely through DNS traffic, bypassing all firewall egress rules.
Mitigations: restrict back-end servers to local DNS resolution only; implement Response Policy Zones (RPZ) to whitelist only necessary domain lookups; route DNS through a policy-enforced resolver that blocks queries exceeding length thresholds or matching encoded-string patterns. Automated key control testing should periodically verify these DNS restrictions remain in place — security control failures are silent and generate no operational pain signal.
Automation Team
Security-specific automation teams (Python-focused scripting rather than full-stack product development) serve all cybersecurity functions:
- GRC — risk workflow automation: scoring, assignment, escalation timers, management reminders
- Red Team — automated key control playback testing; verifying controls remain operational across all environments, Cloud regions, offices, and OS types
- AppSec — aggregating findings from multiple testing tools into a single queue; converting findings into developer task formats
- IAM — automated account provisioning and termination across all systems triggered by HR events
- Threat Intelligence — operationalizing indicator feeds into endpoint and network controls
- SOC/DFIR — triage automation: header analysis, link reputation checks, sender history lookups; SOAR runbooks for common incident patterns
- Management — automated dashboard and report generation for board and governance reporting
Security controls fail silently. Operational failures are noticed immediately. Security control failures often go undetected until the next red team exercise or actual incident. Automated key control testing — periodic synthetic tests that verify controls are blocking what they should block — is the primary defense against silent control degradation.
Cyber Incident Response Procedures
Key Terminology
- SOC (Security Operations Center) — team or physical location for security monitoring; organized into L1, L2, L3 by expertise; best practice is for junior analysts to retain ownership throughout the incident lifecycle for professional development
- IR (Incident Response) — also means Investor Relations in corporate contexts; clarify when using the abbreviation
- DFIR (Digital Forensics and Incident Response) — IR plus forensic investigation: deep media analysis, recovery of deleted artifacts, evidence preservation
- CIRP (Cyber Incident Response Plan) — procedural document governing incident response; serves dual audience: practitioners following it and third parties (regulators, auditors, law enforcement) reviewing it
- CINC (Cyber Incident) — an actual security incident with a unique identifier; not a vulnerability, not a project, not an audit finding
- SIEM — log aggregation database plus correlation logic; correlation logic is what distinguishes a SIEM from a simple log repository
- DPO (Data Privacy Officer) — separate from security; triggered by potential data disclosure; has independent regulatory notification obligations that do not map neatly to cybersecurity severity levels
CIRP Structure and Document Governance
The CIRP is a procedural document (not a policy) — it contains detailed technical minutia about tools and workflows. It is approved within the information security team, not ratified by a governance committee. Changes are disseminated to stakeholders but do not require formal governance approval. A companion policy document (approved by governance) simply commits the organization to maintaining a CIRP.
The CIRP should define: incident field definitions (the schema for the incident register), severity level definitions and escalation thresholds, escalation path by severity level, integration touchpoints with operational incident management, and chain of custody guidelines (apply selectively — only when prosecution is a realistic path).
Chain of custody: modeled on physical crime scene investigation. In practice, very few cybersecurity incidents lead to prosecution due to jurisdictional limitations and extradition barriers. Mandating full chain of custody on every incident is unsustainable and impedes investigation speed. Best practice: give management discretion to invoke chain of custody procedures when prosecution appears viable; document that discretion in the CIRP.
Incident Fields (CINC Schema)
- Unique identifier — auto-generated sequential ID
- Severity level — dropdown (5 through 1, or organization's equivalent scale)
- Reported by — source: internal staff, external customer, public report, automated tool (SIEM, IDS, WAF, EDR, deception technology, etc.); used to measure detection coverage gaps
- Potential data disclosure — Boolean; if checked, immediately engages Data Privacy Officer regardless of severity; triggers privacy notification timeline review
- Risk register referral — Boolean; if checked, creates a corresponding risk entry for long-term remediation tracking; incidents are closed once contained, but associated vulnerabilities persist in the risk register
- Status — lifecycle states: New → Assigned → Triage → Review → Closed; each transition is timestamped
- Timestamps — enable calculation of mean time to detect (MTTD), mean time to respond (MTTR), mean time to contain (MTTC)
Note: MTTD/MTTR/MTTC calculated from real incidents are inherently biased — they only capture incidents that were detected. Red team exercises provide more reliable metrics because start/end times are known and the red team can confirm whether containment was real or partial.
Severity Level Model (5 to 1)
Events (raw logs) exist below the incident threshold — processed by automated correlation logic in the SIEM. When correlation logic fires, an incident is created at an initial severity level.
- Severity 5 — warrants human review; unconfirmed suspicious activity; automated alerts of ambiguous significance
- Severity 4 — confirmed unauthorized activity or policy violation; even if no damage occurred (e.g., a phishing attempt that was fully blocked), confirmed malicious activity elevates to Severity 4
- Severity 3 — either (a) targeted malicious intent (adversary demonstrably researched and targeted the organization, e.g., spear phish using internal names/roles) OR (b) operational impact (any system disruption, data movement, or service interruption), even without targeted intent
- Severity 2 — both targeted malicious intent AND operational impact; triggers immediate escalation to senior leadership and compliance personnel; candidate for immediate regulatory reporting
- Severity 1 — material impact; engages major incident response team including non-security stakeholders (communications, legal, investor relations, press)
Severity and reporting: Severity 3 = periodic (quarterly) report to regulators and board. Severity 2+ = immediate reporting evaluation; compliance team engaged to assess regulatory notification obligations and timelines.
Operational vs. Cyber Incident Integration
Operational incidents are triggered by observable impact (outages, performance degradation). Cyber incidents are often impact-free. Best practice: when an operational incident reaches a defined severity threshold, automatically spawn a cybersecurity investigation. If the incident is predictable and explainable, the cyber investigation closes quickly. If it is unpredictable or unexplained, full cyber investigation proceeds.
Cyber incidents without operational impact should remain confidential — premature operational disclosure can tip off an adversary or compromise evidence. Operational incidents rarely need to be created from cyber incidents for this reason.
Logging Pipeline and SIEM Architecture
Log pipeline hierarchy (largest to smallest volume):
- Raw logs — all system events; operationally determined retention periods; error on the side of more logging
- Alerts — security-significant log entries (failed logins, WAF blocks, security tool notifications); subset of raw logs
- Events — correlated alerts; SIEM logic combines multiple alerts into a meaningful security event
- Incidents — subset of events requiring human investigation; Severity 5 through 1
High-volume, low-fidelity log sources (DLP events, firewall drops, phish reports) are not ingested into the SIEM directly — SIEM storage is expensive and correlation value is low. Instead, a pointer/summary entry is created in the SIEM and analysts follow the link to purpose-built tools (DLP console, firewall log viewer, phish analysis platform) for detailed investigation.
CIRP Examples: Hypothetical Incident Walkthroughs
Incident Classification Framework
Each incident should be analyzed along three dimensions: (1) Is there targeted malicious intent? (2) Is there operational impact? (3) Is there potential data disclosure? These three factors drive severity classification and required escalation actions.
False Positive Spam / Unsolicited Commercial Email
Severity 5 if automation determines benign; Severity 4 if flagged as potentially malicious. Employees trained to report phishing will generate false positives from legitimate marketing email. Automation performs initial triage: link reputation, sender domain history, attachment analysis. Key lesson: high false-positive volume from phishing reporting buttons is expected — automate the triage.
Narrative Phish / Business Email Spoof (BES)
Example: email impersonating the CFO (display name matches, email address is external Gmail) sent to accounts payable. Targeted malicious intent: yes (adversary identified the CFO name and targeted a financial control point). Classification: Severity 3 minimum; Severity 2 if a wire transfer was initiated.
Terminology: Business Email Compromise (BEC) = an actual email account was compromised. Vendor Email Compromise (VEC) = a vendor's real account was hijacked. Business Email Spoof (BES) = display name impersonation using an external address; no account compromise occurred.
Red Team Finds Prior Compromise During M&A Assessment
After acquiring a company, a red team assessment finds Active Directory is accessible from the internet and discovers evidence of prior unauthorized access. Severity 2 minimum. Key lessons:
- Maintain an M&A cybersecurity playbook; conduct red team assessment before network integration
- Red teaming is more effective than a compromise assessment for detecting prior breach — red teams find the attack paths adversaries use and observe forensic evidence along those paths
- Compromise assessments require agent installation on every device (often blocked on production systems), are slow, and expensive; red teams generate results quickly and focus forensic effort precisely
- Cold case problem: if evidence is five-plus years old and logs have rolled off, you may have a reporting obligation because you cannot confirm what was taken
Unsecured Cloud Storage (S3 Bucket)
A researcher reports finding 100GB of data in a publicly accessible storage bucket. Severity depends on intent determination. Cooperative researcher sharing data: benevolent, no malicious intent. Demands payment before sharing details: extortion, engage law enforcement. Middle ground (asking about bug bounty enrollment): triage aggressively to resolve intent quickly.
Key lessons: maintain a vulnerability disclosure program (VDP) with a security.txt file at the domain root; publish safe harbor language; have a bug bounty program if resources permit. Pre-build tools to rapidly process large data sets and identify sensitive content — "dumpster diving preparedness."
Insider Data Exfiltration Before Resignation ("Hit It and Quit It")
DLP flags a large outbound transfer from an employee who resigned shortly after. Standard approach: ask the employee to identify what data they believe is theirs; HR and security review the specific files; package and return genuinely personal data through approved channels.
Steganography indicator: if an outbound file claims to be an image but is unusually large (e.g., 45MB), investigate for hidden data encoding. An employee who used steganography to conceal data has surrendered plausible deniability. Key lesson: act quickly before final paycheck or severance is issued — financial leverage enables cooperation that disappears after payment.
Network Device Sold on eBay with Configuration Data
A buyer contacts the organization reporting a purchased network switch contained the organization's configuration data and credentials. Respond cooperatively with a buyback offer. If the finder demands inflated payment, treat as extortion and involve law enforcement. Key lesson: the best risk assessments are free — this incident reveals a gap in the device decommissioning and secure wipe process. Risk register referral: yes.
Wiper Malware (Sabotage — Remote Office)
All computers in a remote office are rendered unusable. Severity 1 if material impact; Severity 2 minimum given targeted malicious intent and operational impact. Key lessons:
- Network segmentation success: if only one office is affected, segmentation worked — reinforce and document
- Containment drills: practice isolating an office or subnet from the WAN and internet before a real incident forces the decision
- Incident commander authority: designate in advance who has authority to make network isolation decisions
- Potential data disclosure: yes — modern ransomware/wiper actors typically exfiltrate before encrypting; assume data may have left until investigation proves otherwise
DDoS Extortion
Organization receives a ransom demand accompanied by a 10-minute test attack demonstrating capability. Targeted malicious intent: yes. Classification: Severity 2 if the test caused measurable impact; Severity 3 if absorbed by existing mitigation.
Always-on DDoS protection activates immediately; scrubbing-center protection may have a trigger threshold before activating, creating a gap window. Investigate whether targeting is IP-based or DNS-based — DNS targeting reveals which specific properties the adversary researched. Share crypto wallet addresses with the security community: a single wallet shared across victims proves the ransom is non-targeted; unique wallets per victim indicate an organized adversary.
Unexplained EPP Alert — Password Dumping Tool (Mimikatz)
EDR detects Mimikatz on a developer's desktop. First step: contact the employee. Authorized use on own credentials: Severity 5. Policy violation but no malicious intent: Severity 4. No knowledge of the tool: assume intrusion, escalate to Severity 3, begin full forensic investigation of lateral movement from that endpoint.
Key principle: unexplained alerts must be fully resolved before closing. An alert that cannot be attributed to authorized activity should be treated as an intrusion indicator. Absence of observed impact does not mean absence of intrusion.
Responsible Disclosure — Key Principles
- Publish a security.txt file at [domain]/security.txt following the RFC standard
- Publish a vulnerability disclosure policy with safe harbor language (reference DHS guidelines) to encourage good-faith reporting without fear of prosecution
- If running a bug bounty program, link to it from the disclosure policy
- Engage law enforcement when a researcher demands payment before sharing vulnerability details — that crosses from disclosure into extortion
- Do not antagonize researchers who may be acting in good faith; cooperative engagement resolves intent ambiguity faster than adversarial posturing