Post

Falsification in the SOC Applying Karl Popper's Philosophy to Modern Cybersecurity

Karl Popper's philosophy of science, particularly his concept of falsificationism, offers a surprisingly powerful framework for rethinking how we approach cybersecurity. While Popper was concerned with demarcating science from pseudoscience, his ideas about conjecture, refutation, and the asymmetry between verification and falsification map remarkably well onto the challenges facing security professionals today.

Falsification in the SOC Applying Karl Popper's Philosophy to Modern Cybersecurity

Introduction

Karl Popper’s philosophy of science, particularly his concept of falsificationism, offers a surprisingly powerful framework for rethinking how we approach cybersecurity. While Popper was concerned with demarcating science from pseudoscience, his ideas about conjecture, refutation, and the asymmetry between verification and falsification map remarkably well onto the challenges facing security professionals today.

In this post, I’ll explore how Popperian philosophy can reshape our approach to threat detection, defense architecture, and security epistemology. We’ll examine why the security industry’s obsession with “knowing” we’re secure is fundamentally misguided, and how embracing falsificationism can lead to more robust, adaptive security postures.

The Verification Problem in Cybersecurity

The Allure of the White List

Traditional security thinking has long been seduced by the idea of positive verification. Consider the evolution of application control:

1
2
3
4
5
6
7
8
9
# The verification approach - attempt to enumerate "safe"
APPROVED_APPLICATIONS = {
    "firefox.exe": "a1b2c3d4...",
    "chrome.exe": "e5f6g7h8...",
    "outlook.exe": "i9j0k1l2..."
}

def is_safe(application):
    return application in APPROVED_APPLICATIONS

This approach mirrors the logical positivists’ attempt to verify statements through observation. Just as Popper argued that no number of white swans can prove “all swans are white,” no whitelist can prove “all running processes are safe.” The asymmetry is stark: a single black swan falsifies the universal claim, just as a single unrecognized malicious process defeats the security control.

The Induction Problem in Threat Intelligence

Hume’s problem of induction, which heavily influenced Popper, appears throughout security operations. When we observe that a particular IP address has sent malicious traffic 100 times, we induce that it will do so on the 101st connection. Our entire threat intelligence ecosystem is built on inductive reasoning:

1
2
3
4
5
6
7
# Threat Intelligence Feed Entry
indicator: 192.0.2.47
type: ip-address
threat_score: 95
observed_malicious_behavior: 1,247
first_seen: 2024-01-15
last_seen: 2026-02-01

But as Popper argued, induction is logically invalid. Past observations don’t logically entail future behavior. The attacker who has always attacked from 192.0.2.47 can switch to 192.0.2.48. The malware family that has always used TCP/443 can pivot to DNS tunneling.

This isn’t merely philosophical pedantry. The security industry’s confidence in “known bad” indicators has created a massive blind spot exploited by sophisticated adversaries who simply avoid triggering known signatures.

Falsificationism as Security Methodology

Conjectures and Refutations in Threat Hunting

Popper’s methodology of “conjectures and refutations” provides a rigorous framework for threat hunting. Rather than passively consuming threat intelligence (inductive reasoning), we actively propose hypotheses about adversary behavior and attempt to falsify them.

A Popperian threat hunt looks like this:

Conjecture: “No attacker has established command-and-control channels using DNS TXT record exfiltration in our environment.”

Falsification Attempt:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- Hunt query attempting to falsify the hypothesis
SELECT 
    timestamp,
    source_ip,
    query_name,
    txt_record_length,
    entropy(txt_record) as entropy_score
FROM dns_logs
WHERE 
    record_type = 'TXT'
    AND txt_record_length > 100
    AND entropy(txt_record) > 4.5
    AND query_frequency < 10 per hour
ORDER BY entropy_score DESC
LIMIT 1000;

If we find evidence matching our criteria, we’ve falsified our hypothesis; we’ve discovered something that contradicts our assumption of safety. If we find nothing, we haven’t proven the hypothesis true; we’ve merely failed to falsify it this time, with these particular tests.

This approach is fundamentally different from signature matching. We’re not looking for known bad; we’re systematically attempting to prove our assumptions wrong.

The Demarcation Problem: Security Theater vs. Real Security

Popper’s demarcation criterion—the principle that distinguishes science from pseudoscience—is that scientific theories must be falsifiable. A theory that cannot be proven wrong isn’t scientific; it’s metaphysical or pseudoscientical.

Apply this to security controls:

Unfalsifiable Security Theater:

  • “Our security awareness training makes employees more secure”
  • “Our AI-powered next-gen solution stops advanced threats”
  • “We follow industry best practices”

These statements are often unfalsifiable because they lack specific, measurable predictions that could be contradicted by observation. They’re the security equivalent of Freudian psychoanalysis, which Popper famously criticized for being able to explain any outcome.

Falsifiable Security Claims:

  • “Our MFA implementation will prevent credential-based attacks that don’t involve MFA bypass techniques or session hijacking”
    • Falsification test: Attempt phishing attacks, credential stuffing, and password spraying against MFA-protected systems
  • “Our network segmentation prevents lateral movement from the DMZ to the internal database subnet”
    • Falsification test: Deploy ethical adversary simulation attempting to pivot from compromised DMZ hosts
  • “Our EDR solution will detect execution of in-memory .NET assemblies within 60 seconds”
    • Falsification test: Execute known red team tools using assembly loading techniques and measure detection latency

The key insight is that good security engineering makes specific, risky predictions that could be proven wrong. Bad security makes vague claims that explain all possible outcomes.

Asymmetry of Verification and Falsification

Proving Security is Impossible

One of Popper’s central insights is the asymmetry between verification and falsification. We can never verify a universal positive claim (“all processes are benign”), but we can falsify it with a single counterexample.

In cybersecurity terms:

1
2
3
4
5
Verification claim: "Our network is secure"
Required proof: Demonstrate absence of all possible vulnerabilities across all systems, configurations, and attack vectors
               
Falsification: Demonstrate a single successful attack
Required proof: One working exploit

This asymmetry explains why defenders feel perpetually disadvantaged. Organizations pour millions into security tools seeking the impossible: verification of security. Meanwhile, attackers need only falsify the claim once.

Embracing the Asymmetry: Assume Breach

Microsoft’s “assume breach” methodology aligns perfectly with Popperian thinking. Rather than attempting to verify that no breach has occurred (impossible), we design security architectures that remain resilient even after falsification of our perimeter security.

1
2
3
4
5
6
Traditional Model:
"Our firewall prevents intrusions" → Seeks verification → Fails on first bypass

Assume Breach Model:
"Attackers cannot access Tier 0 assets from Tier 2 even after compromising perimeter" 
→ Falsifiable prediction → Testable through Purple Team exercises → Fails gracefully

Zero Trust Architecture takes this further, treating every access request as a hypothesis to be tested:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def evaluate_access_request(request):
    """
    Popperian access control: Each request is a hypothesis 
    "User X is authorized to access Resource Y from Context Z"
    We attempt to falsify through continuous verification
    """
    
    # Series of falsification attempts
    falsification_tests = [
        verify_identity_tokens(request),
        check_device_posture(request),
        evaluate_behavioral_analytics(request),
        assess_risk_score(request),
        validate_resource_permissions(request),
        check_contextual_factors(request)
    ]
    
    # Any falsification denies access
    for test in falsification_tests:
        if test.falsifies_authorization():
            return DENY
    
    # Failure to falsify grants temporary access
    # Not proof of authorization, merely insufficient evidence against
    return GRANT_WITH_MONITORING

The Growth of Knowledge Through Error Elimination

Learning from Incidents

Popper argued that science progresses through bold conjectures followed by severe attempts at refutation. When theories survive testing, we haven’t proven them true, but we’ve eliminated errors and grown our knowledge.

Post-incident reviews should follow this model:

Traditional Approach (seeks to verify): “What happened? How do we prevent this specific attack?” → Results in narrow, signature-based controls → Defeated by variation

Popperian Approach (seeks to falsify broader hypotheses): “What assumptions about our security posture were falsified by this incident?” → Results in architectural changes → Eliminates classes of errors

Example from a real incident:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Incident: Attacker pivoted from marketing WordPress server to internal database

Traditional Analysis:
- Patch WordPress vulnerability
- Add WordPress-specific WAF rules
- Implement file integrity monitoring on web servers

Popperian Analysis:
Falsified Hypothesis: "Web application compromises cannot reach internal databases"

This reveals flawed assumptions about:
- Network segmentation architecture
- Service authentication models  
- Privilege boundaries

Remediation:
- Implement application-layer network segmentation
- Enforce service authentication via mutual TLS
- Deploy database activity monitoring
- Redesign to eliminate credential-based service auth

The Popperian approach eliminates broader categories of error rather than playing whack-a-mole with specific exploits.

Evolutionary Security Architecture

Popper’s evolutionary epistemology, the idea that knowledge grows through variation and selection similar to biological evolution, provides a model for security architecture:

1
2
3
4
5
6
P1 → TT → EE → P2

P1: Initial Problem (security requirement)
TT: Tentative Theories (proposed security controls)
EE: Error Elimination (testing, red team, incidents)
P2: New Problem (refined understanding, new requirements)

This maps directly to DevSecOps and continuous security improvement:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Security Control Evolution Cycle
iteration_n:
  problem: "Prevent unauthorized access to production databases"
  
  tentative_theory:
    - "Network firewalls will block unauthorized database connections"
    - "VPN access will authenticate users before network access"
  
  error_elimination:
    - red_team_testing: "Found lateral movement path from dev to prod"
    - incident_analysis: "Compromised VPN credentials led to breach"
    
  refined_problem: "Prevent unauthorized database access even after network compromise"
  
iteration_n+1:
  tentative_theory:
    - "Database-level authentication separate from network authentication"
    - "Query-level authorization enforcement"
    - "Just-in-time privileged access with approval workflows"
  
  error_elimination:
    - purple_team_validation: "Attempted privilege escalation blocked"
    - chaos_testing: "Removed database network rules, access still controlled"

Critical Rationalism in Security Decision Making

The Myth of the Secure Baseline

Popper’s critical rationalism holds that we can never achieve certain knowledge, only progressively better approximations through criticism and testing. This has profound implications for compliance and security baselines.

Many organizations treat frameworks like CIS Benchmarks or NIST CSF as verified secure configurations. This is a fundamental misunderstanding. These frameworks are conjectures, well-reasoned, thoroughly tested conjectures, but conjectures nonetheless.

A critical rationalist approach:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
class SecurityControl:
    def __init__(self, control_id, hypothesis, implementation):
        self.control_id = control_id
        self.hypothesis = hypothesis
        self.implementation = implementation
        self.falsification_attempts = []
        self.status = "conjecture"
    
    def test(self, attack_scenario):
        """
        Attempt to falsify the control's security hypothesis
        """
        result = attack_scenario.execute_against(self.implementation)
        
        self.falsification_attempts.append({
            'scenario': attack_scenario,
            'result': result,
            'timestamp': now()
        })
        
        if result.bypassed_control:
            self.status = "falsified"
            return self.propose_refinement(result)
        else:
            # Not proven, merely corroborated
            self.status = "corroborated"
            return self.increase_confidence()

# CIS Control 5.1: Establish and Maintain an Inventory
control_5_1 = SecurityControl(
    control_id="CIS-5.1",
    hypothesis="Complete asset inventory enables detection of unauthorized systems",
    implementation=AssetInventorySystem()
)

# Subject to continuous falsification attempts
control_5_1.test(RogueDeviceScenario())
control_5_1.test(ShadowITScenario())
control_5_1.test(TemporaryContractorDeviceScenario())

Degrees of Corroboration vs. Probability

Popper distinguished between degrees of corroboration (how well a theory has withstood testing) and probability. A theory that makes risky predictions and survives severe tests is highly corroborated, even if we can’t assign it a probability of being true.

This distinction is crucial for security metrics. Common approaches try to quantify security in probabilistic terms:

1
2
"There is a 73% probability our network is secure"
"Our security posture score is 8.2/10"

These numbers are often meaningless because they lack the theoretical foundation for probability assignment. We don’t have the frequency data or the causal models necessary for genuine probability estimates.

Instead, we should think in terms of corroboration:

1
2
3
4
5
6
7
8
9
Control: MFA on all external-facing authentication
Corroboration: 
  - Survived 15 red team engagements attempting MFA bypass
  - Withstood 3 months of production credential phishing campaigns
  - Tested against 47 documented MFA bypass techniques
  - 12 failed falsification attempts in purple team exercises
  
Status: Highly corroborated for defending against credential-based attacks
        Not tested against: MFA fatigue attacks, adversary-in-the-middle with real-time phishing proxies (identified as gap for next testing cycle)

This provides actionable information: where our security has been tested, where it’s survived scrutiny, and crucially, where it hasn’t been tested at all.

The Open Society and Security Through Transparency

Security Through Obscurity as Unfalsifiable Doctrine

Popper’s political philosophy in “The Open Society and Its Enemies” argued for transparent, falsifiable governance over closed, authoritarian systems. The parallel to “security through obscurity” is striking.

Security through obscurity is often unfalsifiable:

1
2
3
4
5
6
Claim: "Our system is secure because attackers don't know how it works"

Problem: This cannot be falsified without revealing the system design
         It explains success (attackers were confused) and failure 
         (attackers figured it out) equally well
         It's immune to criticism because criticism requires knowledge

Contrast with Kerckhoffs’s principle, which is fundamentally Popperian:

1
2
3
4
5
Claim: "Our cryptographic system remains secure even when the algorithm is public, relying solely on key secrecy"

This is falsifiable: Publish the algorithm and attempt to break it 
                     If broken without key compromise, hypothesis is falsified
                     If it survives public scrutiny, it's corroborated

Modern security benefits from this open approach:

  • Open source security tools: Subjected to global scrutiny and falsification attempts
  • Responsible disclosure: Security vulnerabilities are publicly tested hypotheses about system security
  • Public CTFs and bug bounties: Crowdsourced falsification attempts
  • Penetration test reports: Documented falsification attempts and results

The Paradox of Tolerance in Security

Popper’s paradox of tolerance states that unlimited tolerance must lead to the disappearance of tolerance. In security terms: unlimited trust inevitably leads to compromise.

This justifies defense-in-depth not as paranoia, but as rational skepticism:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class SecurityLayering:
    """
    Popperian defense-in-depth: Each layer is a falsifiable hypothesis
    System security doesn't depend on any single layer being true
    """
    
    layers = [
        Layer("perimeter", hypothesis="Unauthorized traffic blocked at edge"),
        Layer("network", hypothesis="Segmentation prevents lateral movement"),
        Layer("host", hypothesis="Endpoint controls prevent execution"),
        Layer("application", hypothesis="App logic enforces authorization"),
        Layer("data", hypothesis="Encryption protects data at rest")
    ]
    
    def evaluate_security(self):
        """
        Security survives even if multiple layers are falsified
        Not because we trust any layer, but because we trust none completely
        """
        return all(layer.provides_independent_control() for layer in self.layers)

Practical Applications

Threat Modeling as Hypothesis Generation

Traditional threat modeling often focuses on enumerating threats. A Popperian approach generates falsifiable hypotheses:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Traditional: "What threats exist?"
- Results in endless enumeration
- Never complete
- Focuses on known threats

Popperian: "What security properties must hold?"
- Results in testable hypotheses
- Focuses on critical assumptions
- Can be systematically falsified

Example:
Hypothesis: "Users cannot execute arbitrary code in the web application context"

Falsification attempts:
- XSS injection testing
- Template injection testing  
- Server-side request forgery
- File upload restrictions bypass
- Deserialization attacks

If any attempt succeeds → Hypothesis falsified → Remediate
If all attempts fail → Hypothesis corroborated → Document and continue testing

Continuous Validation Pipelines

Modern CI/CD can incorporate Popperian falsification:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# .github/workflows/security-falsification.yml
name: Security Hypothesis Testing

on: [push, pull_request, schedule]

jobs:
  falsify-authentication-hypothesis:
    runs-on: ubuntu-latest
    steps:
      - name: Hypothesis
        run: |
          echo "All authentication endpoints require valid JWT tokens"
          echo "All endpoints reject expired/malformed/unsigned tokens"
      
      - name: Attempt Falsification - No Token
        run: |
          for endpoint in $AUTH_REQUIRED_ENDPOINTS; do
            response=$(curl -s -o /dev/null -w "%{http_code}" $endpoint)
            if [ $response != "401" ]; then
              echo "FALSIFIED: $endpoint accessible without authentication"
              exit 1
            fi
          done
      
      - name: Attempt Falsification - Expired Token
        run: |
          # Generate intentionally expired JWT
          expired_token=$(generate_expired_jwt)
          # Test all endpoints reject it
          test_token_rejection $expired_token
      
      - name: Attempt Falsification - Malformed Token
        run: |
          # Test various malformation attacks
          test_token_rejection "eyJhbGciOiJub25lIn0..."
          test_token_rejection "....${jndi:ldap://...}"
          
  falsify-authorization-hypothesis:
    runs-on: ubuntu-latest
    steps:
      - name: Hypothesis  
        run: |
          echo "Users can only access resources they own or have been granted access to"
      
      - name: Attempt Falsification - Horizontal Privilege Escalation
        run: |
          # User A should not access User B's resources
          test_idor_vulnerability
          
      - name: Attempt Falsification - Vertical Privilege Escalation
        run: |
          # Regular user should not access admin functions
          test_privilege_escalation

Security Metrics That Actually Mean Something

Popperian thinking leads to better security metrics focused on falsification attempts rather than vague scores:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Bad Metrics (unfalsifiable):
- "Security posture: 87%"
- "Risk reduced by 45%"
- "99.9% of threats blocked"

Good Metrics (based on falsification):
- "Authentication bypassed in 0 of 47 red team attempts this quarter"
- "Lateral movement prevented in 12 of 12 purple team exercises"
- "Zero successful privilege escalations in 6 months of chaos testing"
- "MFA bypass attempts: 127 attempted, 0 successful, 3 new techniques identified"

Better yet, track corroboration over time:
{
  "control": "Network segmentation prevents database access from web tier",
  "falsification_history": [
    {"date": "2025-Q1", "attempts": 15, "successes": 0, "new_techniques": 2},
    {"date": "2025-Q2", "attempts": 23, "successes": 1, "remediation": "added app-level auth"},
    {"date": "2025-Q3", "attempts": 31, "successes": 0, "new_techniques": 5},
    {"date": "2025-Q4", "attempts": 28, "successes": 0, "new_techniques": 1}
  ],
  "corroboration_level": "High - survived 97 attempts including 8 novel techniques",
  "known_gaps": ["DNS exfiltration path not yet tested", "IPv6 paths under-tested"]
}

Challenges and Limitations

The Resource Problem

Popperian methodology demands rigorous, continuous testing. Attempting to falsify every security hypothesis requires significant resources:

1
2
3
4
5
6
7
8
9
10
Hypothesis: "Our EDR detects all in-memory .NET assembly execution"

Thorough falsification requires:
- Testing all .NET loading techniques (Assembly.Load, Assembly.LoadFile, etc.)
- Testing all execution contexts (PowerShell, InstallUtil, MSBuild, etc.)  
- Testing all obfuscation methods
- Testing all timing variations
- Testing on all OS versions and EDR versions

This could require hundreds of test cases

The pragmatic solution is risk-based prioritization: focus falsification efforts on hypotheses whose failure would be most consequential, and on attack techniques most likely to be employed by relevant threat actors.

The Creativity Arms Race

Popper noted that we can never enumerate all possible falsifiers. Attackers continuously invent new techniques that weren’t included in our falsification attempts:

1
2
3
4
We test Control X against: {Technique₁, Technique₂, ..., Techniqueₙ}
Attacker employs: Techniqueₙ₊₁

Our falsification attempts were incomplete

This is unavoidable. The response is to:

  1. Maximize diversity of falsification attempts through red team creativity
  2. Focus on architectural controls that resist classes of attacks
  3. Implement detection for when assumptions are violated
  4. Accept that perfect security is impossible and design for resilience

The Demarcation Problem in Practice

Determining whether a security claim is falsifiable can be subtle:

1
2
3
4
5
6
Claim: "Our AI-powered behavioral analytics detects anomalous behavior"

Is this falsifiable?
- Too vague as stated (what constitutes anomalous? what's the detection criteria?)
- Could be reformulated falsifiably: "System alerts on anomalies matching signatures S1-S47"
- Then falsifiable by: Execute known-anomalous behavior, verify alert generation

The lesson is that security claims must be made precise before they can be properly tested.

Conclusion: Toward a Popperian Security Practice

Karl Popper’s philosophy of science offers cybersecurity a rigorous epistemological foundation. The key insights are:

  1. Security cannot be verified, only falsified: Stop trying to prove systems are secure. Instead, make specific security claims and attempt to prove them wrong.
  2. Good security makes risky predictions: The value of a security control is proportional to how much it sticks its neck out with specific, testable predictions.
  3. Learn through error elimination: Security improves not by accumulating verified truths, but by systematically eliminating errors through testing and incidents.
  4. Embrace the asymmetry: Attackers need to falsify only one security claim; defenders must survive continuous falsification attempts. Design accordingly.
  5. Measure corroboration, not probability: Track how well security controls have withstood testing, not meaningless numerical scores.
  6. Practice critical rationalism: All security controls are conjectures subject to revision. Nothing is sacred; everything is testable.
  7. Openness enables falsification: Security through obscurity prevents the very testing that strengthens security. Transparency enables crowdsourced falsification.

Moving forward, security teams should:

  • Reframe security controls as falsifiable hypotheses
  • Build continuous falsification into development and operations
  • Track the history of falsification attempts and results
  • Design security architectures that fail gracefully when components are falsified
  • Replace unfalsifiable security theater with testable, specific claims
  • Embrace the adversarial mindset not as pessimism but as the scientific method applied to security

Popper argued that the growth of knowledge is one of humanity’s greatest achievements. In cybersecurity, knowledge grows not from accumulated detections or compliance checkboxes, but from the systematic, relentless attempt to prove our security assumptions wrong.

The attacker who falsifies our hypotheses is not our enemy but our teacher, albeit an unwelcome one. The question is whether we learn those lessons through actual incidents or through deliberate, controlled falsification attempts.

I know which Popper would choose.


Further Reading

Philosophy:

Security:

Applied:

This post is licensed under CC BY 4.0 by the author.