Enhancing IoT Security and
Information Systems Resilience

An Extreme Value Machine Approach for Open-Set Recognition
Ali Safari & Dr. Dan J. Kim
University of North Texas
ICIS 2025 | Nashville | Cybersecurity Track

The Growing IoT Security Challenge

27B
IoT Devices by 2025

(Industry source: 21.1B by end 2025, 39B by 2030)

Zero-Day
Attacks Cause Most Damage

No one has seen before

Heavy-Tail
Loss Distribution

Cyber security losses follow a heavy-tail distribution. Rare events cause very large damages.

The Core Problem: Traditional intrusion detection systems work under a closed-set assumption. They can only detect attacks they have seen before.

Key Question: How do we protect billions of IoT devices from attacks we have never seen before?

When a new attack appears, these systems fail. This creates serious risk for organizations depending on IoT networks.

27B: Choudhary (2024). Internet of Things: Overview. Discover IoT [Academic] | 21.1B (2025), 39B (2030): IoT Analytics (2025) [Industry]

Closed-Set vs Open-Set Recognition

Closed-Set vs Open-Set Recognition

Closed-Set (Traditional)

  • Assumes all attack types are known
  • Classifies everything into known categories
  • Cannot say "I don't know this"
  • Fails silently on new threats

Open-Set (Our Approach)

  • Accepts that new attacks will appear
  • Models boundary of known classes
  • Can reject unknown samples
  • Alerts on novel threats
Scheirer et al. (2013). Toward Open Set Recognition. IEEE TPAMI.

How EVM Works: Extreme Value Theory

Weibull distribution models the boundary

Weibull distribution models the tail of distances.
Points beyond the threshold are rejected as unknown.

The EVM algorithm:

1
Calculate distances from new sample to known class samples
2
Fit Weibull distribution to extreme distances
3
Compute inclusion probability for each class
4
Reject as unknown if probability too low for ALL classes
📡
Network Traffic
IoT Packets
EVM Core
Distance Calculation
Weibull Fitting
Probability Check
Known Attack
Classification
Unknown Attack
99.98% Detection
Rudd et al. (2018). The Extreme Value Machine. IEEE TPAMI.

Research Methodology

Primary Dataset: Kitsune

  • 100,000 samples (10k per class)
  • 9 attack types + benign traffic
  • Video Injection as "unknown"
  • 70/30 train-test split

Validation Dataset: IoT-23

  • 323,517 network flows
  • 20 malware scenarios
  • Leave-one-scenario-out protocol

Preprocessing

  • Variance threshold filtering
  • Standard scaling
  • Feature selection (top 50)
  • Unknown class held out completely

Baselines

  • Isolation Forest
  • One-Class SVM
Mirsky et al. (2018). Kitsune Dataset. | García et al. (2020). IoT-23 Dataset.

Results: Kitsune Dataset

99.98% Unknown Attack Recall
Metric EVM Isolation Forest One-Class SVM
Unknown Recall 0.9998 0.0318 0.0219
Unknown Precision 0.3227 0.1172 0.0854
Training Time ~2.1s ~0.5s ~21.7s

EVM detected 9,998 out of 10,000 unknown attack samples. Baselines detected only 2-3%.

Video Injection attack held out as unknown class.

Results: IoT-23 Dataset (Generalizability)

89.1%
Mean Unknown Recall
0.83
Mean AUROC
58.2%
Recall @ 5% FPR
Metric Minimum Maximum Mean
Unknown Recall 0.564 1.000 0.891
Unknown Precision 0.002 0.164 0.104
Known Accuracy 0.766 0.993 0.885
20 malware scenarios tested with leave-one-out protocol.

Statistical Validation: DiD Analysis

We used Difference-in-Differences analysis to compare EVM against baselines across all 20 IoT-23 scenarios.

Metric vs Isolation Forest p-value vs One-Class SVM p-value
Unknown Recall +0.436 p < 0.001 +0.352 p < 0.01
Unknown Precision +0.262 p < 0.001 +0.212 p < 0.001
Overall Accuracy +0.058 p < 0.001 +0.043 p < 0.001

Statistically Significant

EVM improvements are consistent across diverse malware families, not just in aggregate.

Paired t-test results with n=20 scenarios.

Ablation Studies: What Matters?

Feature Selection

Features Recall Accuracy
Ports only (2) 0.620 0.446
Ports + Protocol (3) 0.963 0.639
+ Duration (4) 0.963 0.600
All features (10) 0.963 0.558

Adding protocol improves recall from 62% to 96%. More features add noise.

Tail Size Parameter

Tail Size Recall FPR
5 0.824 0.167
10-20 0.964 0.510-0.555
50 0.824 0.168

Tail size controls sensitivity. Organizations can tune based on risk tolerance.

Ablation on IoT-23 dataset across 20 scenarios.

Theoretical Foundation: D&M IS Success Model

We interpret our findings through the DeLone & McLean IS Success Model.

System Quality

89-99%

Unknown Recall

vs 2-3% for baselines

Information Quality

32%

Precision

vs 8-12% for baselines

Net Benefits

Resilience

Prevent severe losses

Heavy-tail risk mitigation

The Trade-off: EVM's high recall means more alerts, but in IoT security, missing an attack is more costly than investigating false positives. False positives can be filtered; missed attacks cause severe damage.

DeLone & McLean (2003). D&M Model of IS Success: A Ten-Year Update. JMIS.

Practical Implications for Organizations

High-Security Environments

Use tail_size = 20 for maximum detection. Accept higher alert volume. Secondary verification for flagged traffic.

Resource-Constrained Settings

Use tail_size = 5 for lower FPR. Still achieves 82% unknown recall. Balance detection vs operational load.

Operating Point Guidance: At 5% false positive budget, EVM still recovers 58% of unknown threats. This provides concrete deployment configuration for practitioners.

Controllable trade-off based on organizational risk tolerance.

Limitations and Future Work

Current Limitations

  • Computational constraints required data sampling
  • Focus on offline processing
  • Comparison limited to classical baselines
  • Single train/test split (reproducible)

Future Directions

  • Streaming EVM for real-time detection
  • Benchmark against deep learning OSR methods
  • Cross-validation for stability confirmation
  • Integration with existing IDS infrastructure
Planned extensions to address current constraints.

Conclusion

Key Contribution

We demonstrate how Extreme Value Theory provides a theoretically grounded solution to open-set recognition in IoT security, achieving 89-99% unknown attack recall with statistically significant improvements over baselines.

99.98%
Kitsune Recall
89.1%
IoT-23 Mean Recall
p<0.001
Statistical Significance
Tunable
Risk Trade-off

EVM enables organizations to detect unforeseen threats and build resilient IoT security systems.

Safari & Kim (2025). Enhancing IoT Security and IS Resilience. ICIS 2025.

Thank You

Questions?

Paper QR Code

Scan for Paper

Ali Safari

alisafari@my.unt.edu

University of North Texas

Appendix

A1: Extreme Value Theory (EVT) - Foundation

What is EVT? A branch of statistics dealing with extreme deviations from the median of probability distributions. It models the behavior of maximum or minimum values.

Key Concepts

  • Models tail behavior of distributions
  • Fisher-Tippett-Gnedenko theorem
  • Generalized Extreme Value (GEV) distribution
  • Peak Over Threshold (POT) method

Why for Security?

  • Attacks are rare events (in tails)
  • Heavy-tailed loss distributions
  • Principled threshold selection
  • Statistical guarantees
Rocco (2014). Extreme Value Theory in Finance: A Survey. Journal of Economic Surveys.

EVM & Weibull Assumptions (Simple Check)

What EVM Really Assumes

  • We do not assume "all data are Weibull".
  • EVM uses distances between samples inside each class.
  • It models only the largest distances (the tail) with a Weibull curve.
  • Extreme Value Theory says: for many distributions, the tail can be approximated by an extreme value law such as Weibull.

What We Tested on IoT-23

  • Used the same processed IoT-23 feature sample as in our experiments.
  • Fitted our EVM with cosine distance and tail_size = 20.
  • For Benign and Malicious classes, took nearest-neighbour distance tails (largest distances).
  • Fitted a Weibull distribution only on these extreme distances for each class.
  • The fits are numerically stable and follow the empirical tails reasonably well.

Takeaway: EVM does not require the whole IoT traffic to be Weibull. We only model extreme distances, and on IoT-23 these tails are well behaved and can be fitted by a Weibull distribution in practice.

Based on Rudd et al. (2018), Vignotto & Engelke (2020), and our own IoT-23 tail-fit check.

A2: Weibull Distribution in EVM

The EVM uses the Weibull distribution to model extreme distances between samples.

Shape (k)

Controls the tail behavior. k < 1: heavy tail, k=1: exponential, k> 1: light tail

Scale (λ)

Controls the spread of the distribution. Larger λ means more spread.

Location (θ)

Shifts the distribution. In EVM, often set based on minimum distances.

Probability Function: P(x) = 1 - exp(-((x-θ)/λ)^k) for x ≥ θ

Weibull (1951). A Statistical Distribution Function of Wide Applicability.

A3: EVM Algorithm Details

Training Phase

1
Compute pairwise
distances
2
Select extreme
distances (tail)
3
Fit Weibull to
each class
4
Store class
models

Inference Phase

1
Calculate distance
to each class
2
Compute inclusion
probability
3
Compare to
threshold
4
Accept or
reject
Rudd et al. (2018). The Extreme Value Machine. IEEE TPAMI.

A4: Distance Metrics in EVM

Metric Formula Properties Used in Our Study
Cosine Distance 1 - cos(θ) Scale invariant, [0,2] range Yes (Primary)
Euclidean Distance ||x - y||₂ Sensitive to scale No
Manhattan Distance ||x - y||₁ Robust to outliers No
Mahalanobis Distance √((x-y)ᵀΣ⁻¹(x-y)) Accounts for covariance No

Why Cosine? Network traffic features have varying scales. Cosine distance focuses on direction rather than magnitude, making it robust for our feature space.

EVM implementation uses cosine distance as default for network traffic analysis.

A5: Kitsune Dataset - Complete Description

Dataset Overview

  • Created by Mirsky et al. (2018)
  • Captured from real IoT network
  • Contains 9 attack types + benign
  • Features extracted using AfterImage
  • 115 original features

Our Sampling

  • 10,000 samples per class
  • 100,000 total samples
  • Balanced class distribution

Network Setup

  • IP cameras, smart devices
  • IoT gateway
  • Standard home network topology
  • Traffic captured via mirroring

Feature Types

  • Packet size statistics
  • Time interval statistics
  • Channel behavior (src-dst pairs)
  • Socket behavior
Mirsky et al. (2018). Kitsune: An Ensemble of Autoencoders. arXiv:1802.09089.

A6: Kitsune Attack Types (All 9)

# Attack Type Description Category
1 Active Wiretap Man-in-the-middle eavesdropping Reconnaissance
2 ARP MitM ARP spoofing for traffic interception MitM
3 Fuzzing Random/malformed packet injection Fuzzing
4 Mirai Botnet IoT botnet infection Malware
5 OS Scan Operating system fingerprinting Reconnaissance
6 SSDP Flood UDP amplification DDoS DoS
7 SSL Renegotiation SSL/TLS resource exhaustion DoS
8 SYN DoS TCP SYN flood attack DoS
9 Video Injection IP camera feed hijacking Unknown (held out)
Video Injection selected as unknown class due to distinct traffic patterns.

A7: IoT-23 Dataset - Complete Description

323,517
Total Network Flows
23
Capture Scenarios
20
Malware Families

Dataset Composition

  • 3 benign honeypot captures
  • 20 malware infection scenarios
  • Real IoT malware samples
  • Labeled at connection level

Our Protocol

  • Leave-one-scenario-out
  • Each malware as unknown once
  • 20 experimental runs
  • Aggregate statistics reported
García et al. (2020). IoT-23 Dataset. Stratosphere Laboratory, Czech Technical University.

A8: IoT-23 Malware Scenarios (All 20)

# Malware Type
1 Mirai Botnet
2 Hajime Botnet
3 Linux.Hakai Botnet
4 Linux.Tsunami Backdoor
5 Gafgyt Botnet
6 Muhstik Botnet
7 IRCBot Botnet
8 Linux.Okiru Botnet
9 Kenjiro Botnet
10 Torii Botnet
# Malware Type
11 Trojan Trojan
12 Hide and Seek Botnet
13 Linux.Miori Botnet
14 Hakai Botnet
15 DDoS DDoS
16 Linux.Gafgyt Botnet
17 PartOfAHorizontalPortScan Scan
18 Okiru Botnet
19 C&C C&C
20 FileDownload Malware
Diverse malware families ensure robust generalizability testing.

A9: Complete Preprocessing Pipeline

1
Separate
Unknown Class
2
Variance
Threshold (0.0)
3
Train/Test Split
(70/30)
4
Standard
Scaler
5
Simple Imputer
(Mean)
6
SelectKBest
(Top 50)
7
Transform
All Sets
8
Combine
Test Set

Critical: All transformers fitted ONLY on training data to prevent data leakage. Unknown class never seen during training.

Reproducible with random_state=42 for all random operations.

A10: Feature Selection Details

Method: Mutual Information

  • SelectKBest with mutual_info_classif
  • Non-parametric, captures nonlinear relationships
  • Selected top 50 features from 115
  • Fitted only on known training classes

Why 50 Features?

  • Balance between information and noise
  • Reduces computational complexity
  • Prevents overfitting
  • Empirically validated

Top Feature Categories

Category Count
Packet Size Stats 18
Time Interval Stats 15
Channel Behavior 10
Socket Features 7
mutual_info_classif from scikit-learn used for feature ranking.

A11: Kitsune Results - Video Injection (Full)

Metric EVM Isolation Forest One-Class SVM
Overall Accuracy 0.3824 0.0842 0.0831
Macro F1-Score 0.2318 0.0276 0.0242
Unknown Recall 0.9998 0.0318 0.0219
Unknown Precision 0.3227 0.1172 0.0854
Training Time (s) ~2.1 ~0.5 ~21.7
Inference Time (s) ~14.3 ~0.2 ~12.1

Key Finding: EVM detected 9,998/10,000 unknown samples (Video Injection). Baselines detected only 318 and 219 respectively.

Test set: 27,000 known + 10,000 unknown = 37,000 samples.

A12: Kitsune Results - SSDP Flood (Replication)

Metric EVM Isolation Forest One-Class SVM
Overall Accuracy 0.2943 0.1378 0.0882
Macro F1-Score 0.1336 0.0981 0.0312
Unknown Recall 0.9996 0.2074 0.0383
Unknown Precision 0.2941 0.4607 0.1398

Replication Confirms Findings

Near-perfect unknown recall (99.96%) replicated with different unknown class (SSDP Flood), confirming EVM's robustness.

Same methodology, different unknown class for validation.

A13: IoT-23 Results - All 20 Scenarios

Scenario Unk. Recall Unk. Precision Known FPR AUROC
Minimum 0.564 0.002 0.238 0.780
25th Percentile 0.824 0.056 0.320 0.805
Median 0.923 0.098 0.401 0.832
75th Percentile 0.982 0.145 0.478 0.867
Maximum 1.000 0.164 0.577 0.920
Mean 0.891 0.104 0.400 0.831
Std. Dev. 0.132 0.048 0.098 0.042

Consistency: EVM achieves >80% unknown recall in 75% of scenarios, demonstrating robust performance across diverse malware families.

Statistics computed across 20 leave-one-out experimental runs.

A14: Confusion Matrix Analysis

Known Class Confusions

Some misclassifications occur between:

  • DoS attacks (similar traffic patterns)
  • Reconnaissance attacks (similar probing)
  • Benign vs low-intensity attacks

These confusions are within-category and less critical than missing unknown attacks.

Unknown Detection

EVM behavior on unknowns:

  • 99.98% correctly rejected
  • 0.02% misclassified to known
  • Misclassified unknowns go to similar attack types

Even when wrong, EVM flags as attack (not benign).

Critical Insight: The confusion pattern shows EVM errs on the side of caution - it rarely misses attacks entirely.

Confusion matrix analysis from Kitsune experiments.

A15: ROC Curve and AUROC

What is ROC?

  • Receiver Operating Characteristic
  • Plots TPR vs FPR at all thresholds
  • Shows trade-off between sensitivity and specificity
  • Area Under Curve (AUROC) summarizes performance

Interpretation

  • AUROC = 0.5: Random classifier
  • AUROC = 1.0: Perfect classifier
  • AUROC > 0.8: Good discrimination

Our Results

0.831
Mean AUROC on IoT-23

Range: 0.780 - 0.920 across scenarios

Indicates good discrimination between known and unknown classes.

AUROC computed for binary known/unknown classification task.

A16: Precision-Recall Trade-off

High Recall (EVM)

99.98%

Catches almost all unknowns

Cost: More false positives

Lower Precision

32.27%

Some known flagged as unknown

Benefit: Still 3x better than baselines

Why Favor Recall? In security, missing an attack (false negative) is typically far more costly than a false alarm (false positive). Zero-day attacks cause disproportionate damage.

Trade-off inherent in open-set recognition where unknown space is infinite.

A17: Operating Point Analysis

Practitioners need concrete guidance for deployment. We analyze performance at fixed false positive rates.

False Positive Rate Budget Unknown Recall Achieved Practical Interpretation
1% ~35% Very conservative, misses many
5% 58.2% Recommended operating point
10% ~72% Moderate alert load
20% ~85% High sensitivity mode

Deployment Recommendation

At 5% FPR, EVM catches 58% of unknown attacks while maintaining manageable alert volume. Use secondary verification for flagged traffic.

Analysis based on IoT-23 results across 20 scenarios.

A18: Difference-in-Differences Analysis

What is DiD? A statistical technique that compares the difference in outcomes between treatment and control groups across multiple conditions.

Our Application

  • Treatment: EVM model
  • Control: Baseline models
  • Conditions: 20 malware scenarios
  • Outcome: Unknown recall, precision

Why DiD?

  • Controls for scenario-specific effects
  • Tests consistency across conditions
  • Provides statistical significance
  • More robust than simple averaging

Result: EVM improvement of +0.436 in unknown recall vs Isolation Forest is statistically significant (p < 0.001) across all 20 scenarios.

Paired t-test used for statistical significance testing.

A19: Statistical Testing Details

Test Purpose Result Interpretation
Paired t-test (EVM vs IF) Unknown Recall Difference p = 5.4×10⁻⁴ Highly significant
Paired t-test (EVM vs OCSVM) Unknown Recall Difference p = 1.5×10⁻³ Highly significant
Effect Size (Cohen's d) Magnitude of difference d > 1.5 Large effect
95% Confidence Interval Precision of estimate [+0.35, +0.52] Narrow, reliable

Statistical Power: With n=20 scenarios and observed effect sizes, our tests have >95% power to detect true differences.

All tests conducted at α = 0.05 significance level.

A20: Full Feature Set Ablation

Feature Set # Features Unk. Recall Unk. Precision Known FPR Accuracy
Ports only 2 0.620 0.161 0.310 0.446
Ports + Protocol 3 0.963 0.223 0.252 0.639
+ Duration 4 0.963 0.178 0.342 0.600
+ Bytes (src/dst) 6 0.963 0.159 0.451 0.500
All available 10 0.963 0.162 0.396 0.558

Finding

Protocol is the key feature. Adding it improves recall from 62% to 96%. Additional features do not improve recall but increase false positives.

IoT-23 dataset, leave-one-out protocol, tail_size=20.

A21: Full Tail Size Parameter Ablation

Tail Size Unk. Recall Unk. Precision Known FPR Accuracy
5 0.824 0.241 0.167 0.668
10 0.964 0.121 0.510 0.444
20 0.964 0.105 0.555 0.420
50 0.824 0.240 0.168 0.664

Aggressive (tail_size=10-20)

96% recall, higher FPR. For high-security environments.

Conservative (tail_size=5 or 50)

82% recall, lower FPR. For resource-constrained settings.

Tail size controls sensitivity-specificity trade-off.

A22: Computational Complexity Analysis

Operation EVM Isolation Forest One-Class SVM
Training Complexity O(n² · d) O(n · log n · t) O(n² · d)
Inference Complexity O(m · n · d) O(m · t · log n) O(m · sv · d)
Training Time (70k samples) ~2.1s ~0.5s ~21.7s
Inference Time (37k samples) ~14.3s ~0.2s ~12.1s
Memory Usage Moderate Low High

n = training samples, d = features, m = test samples, t = trees, sv = support vectors

Practical: EVM training is fast (~2s), inference is reasonable for near-real-time (~0.4ms per sample). Suitable for IoT gateway deployment.

Benchmarked on standard compute resources.

A23: DeLone & McLean IS Success Model

System Quality EVM: 89-99% Recall Information Quality Precision: 32% Service Quality (Not directly affected) Use User Satisfaction Net Benefits Organizational Resilience Feedback Loop
DeLone & McLean (2003). D&M Model of IS Success: A Ten-Year Update. JMIS, 19(4), 9-30.

A24: System Quality - Detailed Mapping

Definition: System Quality refers to the technical characteristics of the system, including reliability, flexibility, response time, and security.

D&M Dimensions

  • Reliability: System performs consistently
  • Flexibility: Adapts to new situations
  • Security: Protects against threats
  • Response Time: Quick processing

EVM Contribution

  • Reliability: 89-99% detection across datasets
  • Flexibility: Detects unknown attacks
  • Security: OSR capability
  • Response: ~0.4ms per sample
30x
Improvement in Unknown Attack Detection vs Baselines
EVM directly enhances System Quality dimension of IS Success.

A25: Information Quality - Detailed Mapping

Definition: Information Quality refers to the quality of outputs: accuracy, completeness, relevance, and timeliness.

D&M Dimensions

  • Accuracy: Correctness of information
  • Completeness: All relevant info present
  • Relevance: Information is useful
  • Timeliness: Information is current

EVM Trade-offs

  • Accuracy: 32% precision (3x better)
  • Completeness: 99.98% recall (comprehensive)
  • Relevance: Catches unknown threats
  • Timeliness: Near real-time

The Challenge: High recall means more alerts. While precision is 3x better than baselines, alert volume may impact analyst workload. Secondary filtering recommended.

Trade-off between completeness and analyst workload.

A26: Net Benefits - Organizational Resilience

Definition: Net Benefits represent the overall impact on individuals, organizations, and society.

Risk Reduction

Detecting unknown attacks prevents zero-day compromises that cause disproportionate damage.

Cost Avoidance

Heavy-tail loss distribution means catching rare attacks provides outsized value.

Operational Continuity

Preventing network compromise maintains business operations.

Resilience = Absorb + Adapt + Recover

EVM enhances organizational ability to absorb unknown threats, adapt detection capabilities, and recover from security incidents.

Eling & Wirfs (2019). What are the actual costs of cyber risk events? EJOR.

A27: Heavy-Tail Loss Distribution

What is Heavy-Tail?

  • Most incidents cause small losses
  • Rare incidents cause extreme losses
  • Tail probability decays slowly
  • Mean can be dominated by extremes

Implications for Security

  • Average loss understates risk
  • Single incident can be catastrophic
  • Traditional risk models fail
  • EVT is designed for this

Cyber Attack Loss Data

Median Loss: ~$50,000

Mean Loss: ~$3.6 million

99th Percentile: >$100 million

The gap between median and mean indicates heavy tails. Zero-day attacks often fall in the extreme tail.

Eling & Wirfs (2019). European Journal of Operational Research.

A28: Comparison with Deep Learning Methods

Aspect EVM Deep Learning OSR
Theoretical Foundation EVT (strong) Learned representations
Interpretability High (statistical) Low (black box)
Training Data Needs Moderate Large
Computational Cost Low-Medium High
IoT Deployment Feasible Challenging
State-of-the-Art Performance Strong (this paper) Potentially higher

Future Work: Benchmark EVM against deep OSR methods (OpenMax, OSDN) when computational resources permit.

Geng et al. (2020). Recent Advances in Open Set Recognition. IEEE TPAMI.

A29: Why Not Autoencoder?

Original Plan: We intended to include Autoencoder as a baseline but encountered persistent Python environment conflicts during setup.

Autoencoder Approach

  • Learn compressed representation
  • Reconstruct input
  • High reconstruction error = anomaly
  • Popular for anomaly detection

Challenges for OSR

  • Threshold selection is arbitrary
  • No probabilistic rejection
  • May reconstruct novel patterns
  • Less principled than EVT

Future Work

Include Autoencoder and Variational Autoencoder comparisons in extended study.

Environment issues documented in methodology section.

A30: Implementation Details

Libraries Used

Python 3.8+
scikit-learn 1.0+
pandas 1.3+
numpy 1.21+
libEVM Latest

EVM Configuration

Distance Metric Cosine
Tail Size 20
Cover Threshold Default
Inclusion Threshold Calibrated

Reproducibility: All random operations use random_state=42. Code and preprocessed data available upon request.

libEVM: github.com/EMRResearch/libEVM

A31: libEVM Library Details

libEVM is a Python implementation of the Extreme Value Machine algorithm by Rudd et al. (2018).

Key Features

  • Multiple distance metrics
  • Configurable tail size
  • Incremental learning support
  • Probability calibration
  • scikit-learn compatible API

Usage Example

from libEVM import EVM

evm = EVM(tail_size=20,

    distance_metric='cosine')

evm.fit(X_train, y_train)

probs = evm.predict_proba(X_test)

Henrydoss et al. (2017). Incremental Open Set Intrusion Recognition Using EVM. IEEE ICMLA.

A32: Related Work - Key Papers

Paper Contribution Relation to Our Work
Scheirer et al. (2013) Defined Open Set Recognition Foundational framework
Rudd et al. (2018) Proposed EVM Our core method
Henrydoss et al. (2017) EVM for intrusion detection Prior application, different setup
Mirsky et al. (2018) Kitsune dataset & autoencoder Our primary dataset
García et al. (2020) IoT-23 dataset Our validation dataset
Geng et al. (2020) OSR survey Context and baselines
Full references in paper bibliography.

A33: Limitation - Computational Constraints

Challenge: Initial experiments with full datasets encountered memory constraints during EVM training.

Constraint Details

  • EVM computes pairwise distances
  • O(n²) memory for distance matrix
  • Full Kitsune: millions of samples
  • Required 100k sample subset

Mitigation

  • Stratified sampling (10k per class)
  • Balanced class representation
  • IoT-23 validation confirms findings
  • Results still significant

Future: Streaming EVM

Implement online/streaming version that processes data incrementally without full distance matrix.

Computational scalability is active research direction.

A34: Limitation - Offline Processing

Current State: Experiments conducted on static datasets, not real-time traffic streams.

Implications

  • No concept drift handling
  • No real-time latency testing
  • Assumes preprocessed features
  • No integration with live IDS

Path Forward

  • Streaming data pipeline
  • Incremental model updates
  • Real-time feature extraction
  • Production deployment testing

Note: Inference time of ~0.4ms per sample suggests real-time processing is feasible; needs validation in production environment.

Future work: Real-time deployment evaluation.

A35: Future Work - Streaming EVM

1
Packet
Capture
2
Real-time
Features
3
Streaming
EVM
4
Alert/
Decision

Technical Requirements

  • Online Weibull parameter updates
  • Sliding window for tail estimation
  • Approximate distance computation
  • Memory-bounded operation

Expected Benefits

  • Adapt to concept drift
  • Handle infinite data streams
  • Deploy on edge devices
  • Real-time protection
Proposed extension for production deployment.

A36: Future Work - Deep Learning Comparison

Planned comparison with state-of-the-art deep learning OSR methods:

Method Approach Expected Trade-offs
OpenMax Calibrated softmax with EVT Better with large data
OSDN Open Set Deep Network Requires more training data
Contrastive Learning Learn discriminative embeddings Higher computational cost
Variational OSR Generative modeling Complex training

Research Question: Do deep methods' potential performance gains justify increased complexity and reduced interpretability for IoT deployment?

Planned as extended journal version of this work.

A37: IoT Security Landscape

27B
Devices by 2025
$1.1T
IoT Market Size
57%
Vulnerable Devices

Security Challenges

  • Resource-constrained devices
  • Heterogeneous protocols
  • Large attack surface
  • Difficult to patch

Attack Trends

  • Mirai-style botnets
  • Supply chain attacks
  • Zero-day exploits
  • Targeted industrial attacks
Choudhary (2024). IoT Overview. Discover Internet of Things.

A38: Zero-Day Attack Examples

Attack Year Target Impact
Mirai Botnet 2016 IoT devices Major DNS outage
VPNFilter 2018 Routers 500k+ devices
Triton/TRISIS 2017 Industrial safety Physical damage risk
Log4Shell 2021 Java applications Widespread exploitation

Common Thread: Traditional IDS failed to detect these because they were unknown patterns. Open-set recognition could have provided earlier warning.

These attacks caused billions in damages before detection.

A39: Reproducibility Checklist

Item Status Details
Random Seeds Fixed random_state=42 throughout
Data Availability Public Kitsune & IoT-23 publicly available
Code On Request Available upon reasonable request
Preprocessing Steps Documented Full pipeline in paper
Hyperparameters Specified All values in methodology
Hardware Standard Consumer-grade hardware sufficient
Contact: alisafari@my.unt.edu for code access.

A40: Anticipated Questions

Question Short Answer Detail Slide
Why EVM over deep learning? Interpretable, efficient, principled A28
Why cosine distance? Scale-invariant for network features A4
Low precision concern? Still 3x better; recall prioritized A16, A17
Real-time deployment? Feasible; streaming EVM planned A34, A35
Generalizability? IoT-23 confirms across 20 scenarios A13
Statistical validity? DiD analysis, p < 0.001 A18, A19
Quick reference for navigating appendix slides.

A41: References (1/2)

  • Choudhary, A. (2024). Internet of Things: Overview, architectures, applications. Discover Internet of Things, 4, 31.
  • DeLone, W. H., & McLean, E. R. (2003). The DeLone and McLean model of IS success. JMIS, 19(4), 9-30.
  • Eling, M., & Wirfs, J. (2019). What are the actual costs of cyber risk events? EJOR, 272(3), 1109-1119.
  • García, S., et al. (2020). An empirical comparison of botnet detection methods. Stratosphere Laboratory.
  • Geng, C., et al. (2020). Recent advances in open set recognition. IEEE TPAMI, 43(10), 3614-3631.
  • Henrydoss, J., et al. (2017). Incremental open set intrusion recognition using EVM. IEEE ICMLA, 1089-1093.
  • Mirsky, Y., et al. (2018). Kitsune: An ensemble of autoencoders. arXiv:1802.09089.
Full citations in paper bibliography.

A42: References (2/2)

  • Pereira, E. S., et al. (2023). On-device tiny ML for anomaly detection based on EVT. IEEE Micro, 43(6), 58-65.
  • Qiu, H., et al. (2021). Adversarial attacks against network intrusion detection in IoT. IEEE IoT Journal, 8(13), 10327-10335.
  • Ribeiro Mendes Junior, P., et al. (2022). Open-set SVMs. IEEE Trans. SMC: Systems, 52(6), 3785-3798.
  • Rocco, M. (2014). Extreme value theory in finance. Journal of Economic Surveys, 28(1), 82-108.
  • Rudd, E. M., et al. (2018). The extreme value machine. IEEE TPAMI, 40(3), 762-768.
  • Scheirer, W. J., et al. (2013). Toward open set recognition. IEEE TPAMI, 35(7), 1757-1772.
  • Vignotto, E., & Engelke, S. (2020). EVT for anomaly detection. Extremes, 23(4), 501-520.
Full citations in paper bibliography.