Enhancing IoT Security and Information Systems Resilience
An Extreme Value Machine Approach for Open-Set Recognition
Ali Safari & Dr. Dan J. Kim
University of North Texas
ICIS 2025 | Nashville | Cybersecurity Track
The Growing IoT Security Challenge
27B
IoT Devices by 2025
(Industry source: 21.1B
by end 2025, 39B by 2030)
Zero-Day
Attacks Cause Most Damage
No one has seen before
Heavy-Tail
Loss Distribution
Cyber security losses
follow a heavy-tail distribution. Rare events cause very large damages.
The Core Problem: Traditional intrusion detection systems work under a closed-set assumption. They can only detect attacks they have seen
before.
Key Question: How do we protect billions of IoT
devices from attacks we have never seen before?
When a new attack appears, these systems fail. This
creates serious risk for organizations depending on IoT networks.
27B: Choudhary (2024). Internet of Things: Overview. Discover IoT [Academic] |
21.1B (2025), 39B (2030): IoT Analytics (2025) [Industry]
Closed-Set vs Open-Set Recognition
Closed-Set (Traditional)
Assumes all attack types are known
Classifies everything into known categories
Cannot say "I don't know this"
Fails silently on new threats
Open-Set (Our Approach)
Accepts that new attacks will appear
Models boundary of known classes
Can reject unknown samples
Alerts on novel threats
Scheirer et al. (2013). Toward Open Set Recognition. IEEE TPAMI.
How EVM Works: Extreme Value Theory
Weibull distribution models the tail of distances.
Points beyond the threshold are rejected as unknown.
The EVM algorithm:
1
Calculate
distances from new sample to known class samples
2
Fit Weibull
distribution to extreme distances
3
Compute inclusion
probability for each class
4
Reject as
unknown if probability too low for ALL classes
📡
Network Traffic
IoT Packets
→
EVM Core
Distance Calculation
Weibull Fitting
Probability Check
→
Known Attack
Classification
Unknown Attack
99.98% Detection
Rudd et al. (2018). The Extreme Value Machine. IEEE TPAMI.
Research Methodology
Primary Dataset: Kitsune
100,000 samples (10k per class)
9 attack types + benign traffic
Video Injection as "unknown"
70/30 train-test split
Validation Dataset: IoT-23
323,517 network flows
20 malware scenarios
Leave-one-scenario-out protocol
Preprocessing
Variance threshold filtering
Standard scaling
Feature selection (top 50)
Unknown class held out completely
Baselines
Isolation Forest
One-Class SVM
Mirsky et al. (2018). Kitsune Dataset. | García et al. (2020). IoT-23 Dataset.
Results: Kitsune Dataset
99.98%
Unknown Attack Recall
Metric
EVM
Isolation Forest
One-Class SVM
Unknown Recall
0.9998
0.0318
0.0219
Unknown Precision
0.3227
0.1172
0.0854
Training Time
~2.1s
~0.5s
~21.7s
EVM detected 9,998 out of 10,000 unknown attack samples. Baselines detected only
2-3%.
Video Injection attack held out as unknown class.
Results: IoT-23 Dataset (Generalizability)
89.1%
Mean Unknown Recall
0.83
Mean AUROC
58.2%
Recall @ 5% FPR
Metric
Minimum
Maximum
Mean
Unknown Recall
0.564
1.000
0.891
Unknown Precision
0.002
0.164
0.104
Known Accuracy
0.766
0.993
0.885
20 malware scenarios tested with leave-one-out protocol.
Statistical Validation: DiD Analysis
We used Difference-in-Differences analysis to compare EVM against
baselines across all 20 IoT-23 scenarios.
Metric
vs Isolation Forest
p-value
vs One-Class SVM
p-value
Unknown Recall
+0.436
p < 0.001
+0.352
p < 0.01
Unknown Precision
+0.262
p < 0.001
+0.212
p < 0.001
Overall Accuracy
+0.058
p < 0.001
+0.043
p < 0.001
Statistically Significant
EVM improvements are consistent across diverse malware families, not just in aggregate.
Paired t-test results with n=20 scenarios.
Ablation Studies: What Matters?
Feature Selection
Features
Recall
Accuracy
Ports only (2)
0.620
0.446
Ports + Protocol (3)
0.963
0.639
+ Duration (4)
0.963
0.600
All features (10)
0.963
0.558
Adding protocol improves recall from 62% to 96%.
More features add noise.
Tail Size Parameter
Tail Size
Recall
FPR
5
0.824
0.167
10-20
0.964
0.510-0.555
50
0.824
0.168
Tail size controls sensitivity. Organizations
can tune based on risk tolerance.
Ablation on IoT-23 dataset across 20 scenarios.
Theoretical Foundation: D&M IS Success Model
We interpret our findings through the DeLone & McLean IS Success Model.
System Quality
89-99%
Unknown Recall
vs 2-3% for baselines
Information Quality
32%
Precision
vs 8-12% for baselines
Net Benefits
Resilience
Prevent severe losses
Heavy-tail risk mitigation
The Trade-off: EVM's high recall means more alerts, but in IoT security, missing
an attack is more costly than investigating false positives. False positives can be filtered;
missed attacks cause severe damage.
DeLone & McLean (2003). D&M Model of IS Success: A Ten-Year Update. JMIS.
Practical Implications for Organizations
High-Security Environments
Use tail_size = 20 for
maximum detection. Accept higher alert volume. Secondary verification for flagged traffic.
Resource-Constrained Settings
Use tail_size = 5 for
lower FPR. Still achieves 82% unknown recall. Balance detection vs operational load.
Operating Point Guidance: At 5% false positive budget, EVM still recovers 58% of
unknown threats. This provides concrete deployment configuration for practitioners.
Controllable trade-off based on organizational risk tolerance.
Limitations and Future Work
Current Limitations
Computational constraints required data sampling
Focus on offline processing
Comparison limited to classical baselines
Single train/test split (reproducible)
Future Directions
Streaming EVM for real-time detection
Benchmark against deep learning OSR methods
Cross-validation for stability confirmation
Integration with existing IDS infrastructure
Planned extensions to address current constraints.
Conclusion
Key Contribution
We demonstrate how Extreme Value Theory provides a theoretically grounded
solution to open-set recognition in IoT security, achieving 89-99% unknown attack recall with
statistically significant improvements over baselines.
99.98%
Kitsune Recall
89.1%
IoT-23 Mean Recall
p<0.001
Statistical Significance
Tunable
Risk Trade-off
EVM
enables organizations to detect unforeseen threats and build resilient IoT security systems.
Safari & Kim (2025). Enhancing IoT Security and IS Resilience. ICIS 2025.
Thank You
Questions?
Scan for Paper
Ali Safari
alisafari@my.unt.edu
University of North Texas
Appendix
A1: Extreme Value Theory (EVT) - Foundation
What is EVT? A branch of statistics dealing with extreme deviations from the
median of probability distributions. It models the behavior of maximum or minimum values.
Key Concepts
Models tail behavior of distributions
Fisher-Tippett-Gnedenko theorem
Generalized Extreme Value (GEV) distribution
Peak Over Threshold (POT) method
Why for Security?
Attacks are rare events (in tails)
Heavy-tailed loss distributions
Principled threshold selection
Statistical guarantees
Rocco (2014). Extreme Value Theory in Finance: A Survey. Journal of Economic
Surveys.
EVM & Weibull Assumptions (Simple Check)
What EVM Really Assumes
We do not assume "all data are Weibull".
EVM uses distances between samples inside each class.
It models only the largest distances (the tail) with a Weibull curve.
Extreme Value Theory says: for many distributions, the tail can be approximated by an
extreme value law such as Weibull.
What We Tested on IoT-23
Used the same processed IoT-23 feature sample as in our experiments.
Fitted our EVM with cosine distance and tail_size = 20.
For Benign and Malicious classes, took nearest-neighbour distance tails (largest
distances).
Fitted a Weibull distribution only on these extreme distances for each class.
The fits are numerically stable and follow the empirical tails reasonably well.
Takeaway: EVM does not require the whole IoT traffic to be Weibull. We only
model extreme distances, and on IoT-23 these tails are well behaved and can be fitted by a
Weibull distribution in practice.
Based on Rudd et al. (2018), Vignotto & Engelke (2020), and our own IoT-23
tail-fit check.
A2: Weibull Distribution in EVM
The EVM uses the Weibull distribution to model extreme distances
between samples.
Shape (k)
Controls the tail behavior. k < 1: heavy tail, k=1: exponential, k>
1: light tail
Scale (λ)
Controls the spread of the distribution. Larger λ means more
spread.
Location (θ)
Shifts the distribution. In EVM, often set based on minimum
distances.
Probability Function: P(x) = 1 - exp(-((x-θ)/λ)^k) for x ≥ θ
Weibull (1951). A Statistical Distribution Function of Wide Applicability.
A3: EVM Algorithm Details
Training Phase
1
Compute pairwise distances
→
2
Select extreme distances (tail)
→
3
Fit Weibull to each class
→
4
Store class models
Inference Phase
1
Calculate distance to each class
→
2
Compute inclusion probability
→
3
Compare to threshold
→
4
Accept or reject
Rudd et al. (2018). The Extreme Value Machine. IEEE TPAMI.
A4: Distance Metrics in EVM
Metric
Formula
Properties
Used in Our Study
Cosine Distance
1 - cos(θ)
Scale invariant, [0,2] range
Yes (Primary)
Euclidean Distance
||x - y||₂
Sensitive to scale
No
Manhattan Distance
||x - y||₁
Robust to outliers
No
Mahalanobis Distance
√((x-y)ᵀΣ⁻¹(x-y))
Accounts for covariance
No
Why Cosine? Network traffic features have varying scales. Cosine distance
focuses on direction rather than magnitude, making it robust for our feature space.
EVM implementation uses cosine distance as default for network traffic analysis.
A5: Kitsune Dataset - Complete Description
Dataset Overview
Created by Mirsky et al. (2018)
Captured from real IoT network
Contains 9 attack types + benign
Features extracted using AfterImage
115 original features
Our Sampling
10,000 samples per class
100,000 total samples
Balanced class distribution
Network Setup
IP cameras, smart devices
IoT gateway
Standard home network topology
Traffic captured via mirroring
Feature Types
Packet size statistics
Time interval statistics
Channel behavior (src-dst pairs)
Socket behavior
Mirsky et al. (2018). Kitsune: An Ensemble of Autoencoders. arXiv:1802.09089.
A6: Kitsune Attack Types (All 9)
#
Attack Type
Description
Category
1
Active Wiretap
Man-in-the-middle eavesdropping
Reconnaissance
2
ARP MitM
ARP spoofing for traffic interception
MitM
3
Fuzzing
Random/malformed packet injection
Fuzzing
4
Mirai Botnet
IoT botnet infection
Malware
5
OS Scan
Operating system fingerprinting
Reconnaissance
6
SSDP Flood
UDP amplification DDoS
DoS
7
SSL Renegotiation
SSL/TLS resource exhaustion
DoS
8
SYN DoS
TCP SYN flood attack
DoS
9
Video Injection
IP camera feed hijacking
Unknown (held out)
Video Injection selected as unknown class due to distinct traffic patterns.
A7: IoT-23 Dataset - Complete Description
323,517
Total Network Flows
23
Capture Scenarios
20
Malware Families
Dataset Composition
3 benign honeypot captures
20 malware infection scenarios
Real IoT malware samples
Labeled at connection level
Our Protocol
Leave-one-scenario-out
Each malware as unknown once
20 experimental runs
Aggregate statistics reported
García et al. (2020). IoT-23 Dataset. Stratosphere Laboratory, Czech Technical
University.
A8: IoT-23 Malware Scenarios (All 20)
#
Malware
Type
1
Mirai
Botnet
2
Hajime
Botnet
3
Linux.Hakai
Botnet
4
Linux.Tsunami
Backdoor
5
Gafgyt
Botnet
6
Muhstik
Botnet
7
IRCBot
Botnet
8
Linux.Okiru
Botnet
9
Kenjiro
Botnet
10
Torii
Botnet
#
Malware
Type
11
Trojan
Trojan
12
Hide and Seek
Botnet
13
Linux.Miori
Botnet
14
Hakai
Botnet
15
DDoS
DDoS
16
Linux.Gafgyt
Botnet
17
PartOfAHorizontalPortScan
Scan
18
Okiru
Botnet
19
C&C
C&C
20
FileDownload
Malware
Diverse malware families ensure robust generalizability testing.
A9: Complete Preprocessing Pipeline
1
Separate Unknown Class
→
2
Variance Threshold (0.0)
→
3
Train/Test Split (70/30)
→
4
Standard Scaler
5
Simple Imputer (Mean)
→
6
SelectKBest (Top 50)
→
7
Transform All Sets
→
8
Combine Test Set
Critical: All transformers fitted ONLY on training data to prevent data leakage.
Unknown class never seen during training.
Reproducible with random_state=42 for all random operations.
A10: Feature Selection Details
Method: Mutual Information
SelectKBest with mutual_info_classif
Non-parametric, captures nonlinear relationships
Selected top 50 features from 115
Fitted only on known training classes
Why 50 Features?
Balance between information and noise
Reduces computational complexity
Prevents overfitting
Empirically validated
Top Feature Categories
Category
Count
Packet Size Stats
18
Time Interval Stats
15
Channel Behavior
10
Socket Features
7
mutual_info_classif from scikit-learn used for feature ranking.
A11: Kitsune Results - Video Injection (Full)
Metric
EVM
Isolation Forest
One-Class SVM
Overall Accuracy
0.3824
0.0842
0.0831
Macro F1-Score
0.2318
0.0276
0.0242
Unknown Recall
0.9998
0.0318
0.0219
Unknown Precision
0.3227
0.1172
0.0854
Training Time (s)
~2.1
~0.5
~21.7
Inference Time (s)
~14.3
~0.2
~12.1
Key Finding: EVM detected 9,998/10,000 unknown samples (Video Injection).
Baselines detected only 318 and 219 respectively.
Test set: 27,000 known + 10,000 unknown = 37,000 samples.
A12: Kitsune Results - SSDP Flood (Replication)
Metric
EVM
Isolation Forest
One-Class SVM
Overall Accuracy
0.2943
0.1378
0.0882
Macro F1-Score
0.1336
0.0981
0.0312
Unknown Recall
0.9996
0.2074
0.0383
Unknown Precision
0.2941
0.4607
0.1398
Replication Confirms Findings
Near-perfect unknown recall (99.96%) replicated with different unknown class (SSDP Flood),
confirming EVM's robustness.
Same methodology, different unknown class for validation.
A13: IoT-23 Results - All 20 Scenarios
Scenario
Unk. Recall
Unk. Precision
Known FPR
AUROC
Minimum
0.564
0.002
0.238
0.780
25th Percentile
0.824
0.056
0.320
0.805
Median
0.923
0.098
0.401
0.832
75th Percentile
0.982
0.145
0.478
0.867
Maximum
1.000
0.164
0.577
0.920
Mean
0.891
0.104
0.400
0.831
Std. Dev.
0.132
0.048
0.098
0.042
Consistency: EVM achieves >80% unknown recall in 75% of scenarios, demonstrating
robust performance across diverse malware families.
Statistics computed across 20 leave-one-out experimental runs.
A14: Confusion Matrix Analysis
Known Class Confusions
Some misclassifications occur between:
DoS attacks (similar traffic patterns)
Reconnaissance attacks (similar probing)
Benign vs low-intensity attacks
These confusions are within-category and less
critical than missing unknown attacks.
Unknown Detection
EVM behavior on unknowns:
99.98% correctly rejected
0.02% misclassified to known
Misclassified unknowns go to similar attack types
Even when wrong, EVM flags as attack (not
benign).
Critical Insight: The confusion pattern shows EVM errs on the side of caution -
it rarely misses attacks entirely.
Confusion matrix analysis from Kitsune experiments.
A15: ROC Curve and AUROC
What is ROC?
Receiver Operating Characteristic
Plots TPR vs FPR at all thresholds
Shows trade-off between sensitivity and specificity
Area Under Curve (AUROC) summarizes performance
Interpretation
AUROC = 0.5: Random classifier
AUROC = 1.0: Perfect classifier
AUROC > 0.8: Good discrimination
Our Results
0.831
Mean AUROC on IoT-23
Range: 0.780 - 0.920 across scenarios
Indicates good discrimination between known and unknown classes.
AUROC computed for binary known/unknown classification task.
A16: Precision-Recall Trade-off
High Recall (EVM)
99.98%
Catches almost all unknowns
Cost: More false positives
Lower Precision
32.27%
Some known flagged as unknown
Benefit: Still 3x better than
baselines
Why Favor Recall? In security, missing an attack (false negative) is typically
far more costly than a false alarm (false positive). Zero-day attacks cause disproportionate
damage.
Trade-off inherent in open-set recognition where unknown space is infinite.
A17: Operating Point Analysis
Practitioners need concrete guidance for deployment. We analyze performance at fixed false positive
rates.
False Positive Rate Budget
Unknown Recall Achieved
Practical Interpretation
1%
~35%
Very conservative, misses many
5%
58.2%
Recommended operating point
10%
~72%
Moderate alert load
20%
~85%
High sensitivity mode
Deployment Recommendation
At 5% FPR, EVM catches 58% of unknown attacks while maintaining manageable alert volume. Use
secondary verification for flagged traffic.
Analysis based on IoT-23 results across 20 scenarios.
A18: Difference-in-Differences Analysis
What is DiD? A statistical technique that compares the difference in outcomes
between treatment and control groups across multiple conditions.
Our Application
Treatment: EVM model
Control: Baseline models
Conditions: 20 malware scenarios
Outcome: Unknown recall, precision
Why DiD?
Controls for scenario-specific effects
Tests consistency across conditions
Provides statistical significance
More robust than simple averaging
Result: EVM improvement of +0.436 in unknown recall vs Isolation Forest is
statistically significant (p < 0.001) across all 20 scenarios.
Paired t-test used for statistical significance testing.
A19: Statistical Testing Details
Test
Purpose
Result
Interpretation
Paired t-test (EVM vs IF)
Unknown Recall Difference
p = 5.4×10⁻⁴
Highly significant
Paired t-test (EVM vs OCSVM)
Unknown Recall Difference
p = 1.5×10⁻³
Highly significant
Effect Size (Cohen's d)
Magnitude of difference
d > 1.5
Large effect
95% Confidence Interval
Precision of estimate
[+0.35, +0.52]
Narrow, reliable
Statistical Power: With n=20 scenarios and observed effect sizes, our tests have
>95% power to detect true differences.
All tests conducted at α = 0.05 significance level.
A20: Full Feature Set Ablation
Feature Set
# Features
Unk. Recall
Unk. Precision
Known FPR
Accuracy
Ports only
2
0.620
0.161
0.310
0.446
Ports + Protocol
3
0.963
0.223
0.252
0.639
+ Duration
4
0.963
0.178
0.342
0.600
+ Bytes (src/dst)
6
0.963
0.159
0.451
0.500
All available
10
0.963
0.162
0.396
0.558
Finding
Protocol is the key feature. Adding it improves recall from 62% to 96%. Additional features do
not improve recall but increase false positives.
n = training samples, d = features, m = test samples, t
= trees, sv = support vectors
Practical: EVM training is fast (~2s), inference is reasonable for
near-real-time (~0.4ms per sample). Suitable for IoT gateway deployment.
Benchmarked on standard compute resources.
A23: DeLone & McLean IS Success Model
DeLone & McLean (2003). D&M Model of IS Success: A Ten-Year Update. JMIS, 19(4),
9-30.
A24: System Quality - Detailed Mapping
Definition: System Quality refers to the technical characteristics of the
system, including reliability, flexibility, response time, and security.
D&M Dimensions
Reliability: System performs consistently
Flexibility: Adapts to new situations
Security: Protects against threats
Response Time: Quick processing
EVM Contribution
Reliability: 89-99% detection across datasets
Flexibility: Detects unknown attacks
Security: OSR capability
Response: ~0.4ms per sample
30x
Improvement in Unknown Attack Detection vs Baselines
EVM directly enhances System Quality dimension of IS Success.
A25: Information Quality - Detailed Mapping
Definition: Information Quality refers to the quality of outputs: accuracy,
completeness, relevance, and timeliness.
D&M Dimensions
Accuracy: Correctness of information
Completeness: All relevant info present
Relevance: Information is useful
Timeliness: Information is current
EVM Trade-offs
Accuracy: 32% precision (3x better)
Completeness: 99.98% recall (comprehensive)
Relevance: Catches unknown threats
Timeliness: Near real-time
The Challenge: High recall means more alerts. While precision is 3x better than
baselines, alert volume may impact analyst workload. Secondary filtering recommended.
Trade-off between completeness and analyst workload.
A26: Net Benefits - Organizational Resilience
Definition: Net Benefits represent the overall impact on individuals,
organizations, and society.
Risk Reduction
Detecting unknown attacks prevents zero-day compromises that cause
disproportionate damage.
Cost Avoidance
Heavy-tail loss distribution means catching rare attacks provides
outsized value.
Operational Continuity
Preventing network compromise maintains business operations.
Resilience = Absorb + Adapt + Recover
EVM enhances organizational ability to absorb unknown threats, adapt detection capabilities, and
recover from security incidents.
Eling & Wirfs (2019). What are the actual costs of cyber risk events? EJOR.
A27: Heavy-Tail Loss Distribution
What is Heavy-Tail?
Most incidents cause small losses
Rare incidents cause extreme losses
Tail probability decays slowly
Mean can be dominated by extremes
Implications for Security
Average loss understates risk
Single incident can be catastrophic
Traditional risk models fail
EVT is designed for this
Cyber Attack Loss Data
Median Loss: ~$50,000
Mean Loss: ~$3.6 million
99th Percentile: >$100
million
The gap between median and mean indicates heavy
tails. Zero-day attacks often fall in the extreme tail.
Eling & Wirfs (2019). European Journal of Operational Research.
A28: Comparison with Deep Learning Methods
Aspect
EVM
Deep Learning OSR
Theoretical Foundation
EVT (strong)
Learned representations
Interpretability
High (statistical)
Low (black box)
Training Data Needs
Moderate
Large
Computational Cost
Low-Medium
High
IoT Deployment
Feasible
Challenging
State-of-the-Art Performance
Strong (this paper)
Potentially higher
Future Work: Benchmark EVM against deep OSR methods (OpenMax, OSDN) when
computational resources permit.
Geng et al. (2020). Recent Advances in Open Set Recognition. IEEE TPAMI.
A29: Why Not Autoencoder?
Original Plan: We intended to include Autoencoder as a baseline but encountered
persistent Python environment conflicts during setup.
Autoencoder Approach
Learn compressed representation
Reconstruct input
High reconstruction error = anomaly
Popular for anomaly detection
Challenges for OSR
Threshold selection is arbitrary
No probabilistic rejection
May reconstruct novel patterns
Less principled than EVT
Future Work
Include Autoencoder and Variational Autoencoder comparisons in extended study.
Environment issues documented in methodology section.
A30: Implementation Details
Libraries Used
Python
3.8+
scikit-learn
1.0+
pandas
1.3+
numpy
1.21+
libEVM
Latest
EVM Configuration
Distance Metric
Cosine
Tail Size
20
Cover Threshold
Default
Inclusion Threshold
Calibrated
Reproducibility: All random operations use random_state=42. Code and
preprocessed data available upon request.
libEVM: github.com/EMRResearch/libEVM
A31: libEVM Library Details
libEVM is a Python implementation of the Extreme Value Machine algorithm by Rudd
et al. (2018).
Key Features
Multiple distance metrics
Configurable tail size
Incremental learning support
Probability calibration
scikit-learn compatible API
Usage Example
from libEVM import EVM
evm = EVM(tail_size=20,
distance_metric='cosine')
evm.fit(X_train, y_train)
probs = evm.predict_proba(X_test)
Henrydoss et al. (2017). Incremental Open Set Intrusion Recognition Using EVM.
IEEE ICMLA.
A32: Related Work - Key Papers
Paper
Contribution
Relation to Our Work
Scheirer et al. (2013)
Defined Open Set Recognition
Foundational framework
Rudd et al. (2018)
Proposed EVM
Our core method
Henrydoss et al. (2017)
EVM for intrusion detection
Prior application, different setup
Mirsky et al. (2018)
Kitsune dataset & autoencoder
Our primary dataset
García et al. (2020)
IoT-23 dataset
Our validation dataset
Geng et al. (2020)
OSR survey
Context and baselines
Full references in paper bibliography.
A33: Limitation - Computational Constraints
Challenge: Initial experiments with full datasets encountered memory constraints
during EVM training.
Constraint Details
EVM computes pairwise distances
O(n²) memory for distance matrix
Full Kitsune: millions of samples
Required 100k sample subset
Mitigation
Stratified sampling (10k per class)
Balanced class representation
IoT-23 validation confirms findings
Results still significant
Future: Streaming EVM
Implement online/streaming version that processes data incrementally without full distance
matrix.
Computational scalability is active research direction.
A34: Limitation - Offline Processing
Current State: Experiments conducted on static datasets, not real-time traffic
streams.
Implications
No concept drift handling
No real-time latency testing
Assumes preprocessed features
No integration with live IDS
Path Forward
Streaming data pipeline
Incremental model updates
Real-time feature extraction
Production deployment testing
Note: Inference time of ~0.4ms per sample suggests real-time processing is
feasible; needs validation in production environment.
Future work: Real-time deployment evaluation.
A35: Future Work - Streaming EVM
1
Packet Capture
→
2
Real-time Features
→
3
Streaming EVM
→
4
Alert/ Decision
Technical Requirements
Online Weibull parameter updates
Sliding window for tail estimation
Approximate distance computation
Memory-bounded operation
Expected Benefits
Adapt to concept drift
Handle infinite data streams
Deploy on edge devices
Real-time protection
Proposed extension for production deployment.
A36: Future Work - Deep Learning Comparison
Planned comparison with state-of-the-art deep learning OSR methods:
Method
Approach
Expected Trade-offs
OpenMax
Calibrated softmax with EVT
Better with large data
OSDN
Open Set Deep Network
Requires more training data
Contrastive Learning
Learn discriminative embeddings
Higher computational cost
Variational OSR
Generative modeling
Complex training
Research Question: Do deep methods' potential performance gains justify
increased complexity and reduced interpretability for IoT deployment?
Planned as extended journal version of this work.
A37: IoT Security Landscape
27B
Devices by 2025
$1.1T
IoT Market Size
57%
Vulnerable Devices
Security Challenges
Resource-constrained devices
Heterogeneous protocols
Large attack surface
Difficult to patch
Attack Trends
Mirai-style botnets
Supply chain attacks
Zero-day exploits
Targeted industrial attacks
Choudhary (2024). IoT Overview. Discover Internet of Things.
A38: Zero-Day Attack Examples
Attack
Year
Target
Impact
Mirai Botnet
2016
IoT devices
Major DNS outage
VPNFilter
2018
Routers
500k+ devices
Triton/TRISIS
2017
Industrial safety
Physical damage risk
Log4Shell
2021
Java applications
Widespread exploitation
Common Thread: Traditional IDS failed to detect these because they were unknown
patterns. Open-set recognition could have provided earlier warning.
These attacks caused billions in damages before detection.
A39: Reproducibility Checklist
Item
Status
Details
Random Seeds
Fixed
random_state=42 throughout
Data Availability
Public
Kitsune & IoT-23 publicly available
Code
On Request
Available upon reasonable request
Preprocessing Steps
Documented
Full pipeline in paper
Hyperparameters
Specified
All values in methodology
Hardware
Standard
Consumer-grade hardware sufficient
Contact: alisafari@my.unt.edu for code access.
A40: Anticipated Questions
Question
Short Answer
Detail Slide
Why EVM over deep learning?
Interpretable, efficient, principled
A28
Why cosine distance?
Scale-invariant for network features
A4
Low precision concern?
Still 3x better; recall prioritized
A16, A17
Real-time deployment?
Feasible; streaming EVM planned
A34, A35
Generalizability?
IoT-23 confirms across 20 scenarios
A13
Statistical validity?
DiD analysis, p < 0.001
A18, A19
Quick reference for navigating appendix slides.
A41: References (1/2)
Choudhary, A. (2024). Internet of Things: Overview, architectures, applications. Discover
Internet of Things, 4, 31.
DeLone, W. H., & McLean, E. R. (2003). The DeLone and McLean model of IS success. JMIS,
19(4), 9-30.
Eling, M., & Wirfs, J. (2019). What are the actual costs of cyber risk events? EJOR,
272(3), 1109-1119.
García, S., et al. (2020). An empirical comparison of botnet detection methods. Stratosphere
Laboratory.
Geng, C., et al. (2020). Recent advances in open set recognition. IEEE TPAMI, 43(10),
3614-3631.
Henrydoss, J., et al. (2017). Incremental open set intrusion recognition using EVM. IEEE
ICMLA, 1089-1093.
Mirsky, Y., et al. (2018). Kitsune: An ensemble of autoencoders. arXiv:1802.09089.
Full citations in paper bibliography.
A42: References (2/2)
Pereira, E. S., et al. (2023). On-device tiny ML for anomaly detection based on EVT. IEEE
Micro, 43(6), 58-65.
Qiu, H., et al. (2021). Adversarial attacks against network intrusion detection in IoT. IEEE
IoT Journal, 8(13), 10327-10335.