Architecture
This document describes the safeuploads validation pipeline, component responsibilities, and data flow for each file type.
Component Overview
safeuploads/
├── file_validator.py # FileValidator — orchestrator
├── config.py # FileSecurityConfig, SecurityLimits
├── protocols.py # SeekableFile, UploadFileProtocol
├── exceptions.py # Exception hierarchy, ErrorCode
├── enums.py # Threat categories, patterns
├── audit.py # SecurityAuditLogger, correlation IDs
├── utils.py # ResourceMonitor
├── validators/
│ ├── base.py # BaseValidator interface
│ ├── unicode_validator.py # Unicode normalization & checks
│ ├── extension_validator.py # Extension allow/block rules
│ ├── windows_validator.py # Windows reserved name checks
│ ├── compression_validator.py # ZIP bomb detection
│ └── xml_validator.py # XXE-safe XML parsing
└── inspectors/
├── zip_inspector.py # Deep ZIP content analysis
├── gzip_inspector.py # Gzip bomb detection
└── content_inspector.py # Malware/polyglot scanning
Roles
| Component | Responsibility |
|---|---|
FileValidator |
Orchestrates validation for each file type. Manages streaming, resource monitoring, audit events, and delegates to validators/inspectors. |
FileSecurityConfig |
Centralizes all configuration: allowed MIME types, extensions, blocked extensions, Unicode characters, and Windows reserved names. |
SecurityLimits |
Holds numeric thresholds: file sizes, compression ratios, timeouts, entry limits, resource caps. |
BaseValidator |
Abstract base class; validators inherit and implement validate(). |
UnicodeSecurityValidator |
NFC normalization, dangerous character detection, null byte rejection. |
ExtensionSecurityValidator |
Allowlist/blocklist enforcement, compound extension checks. |
WindowsSecurityValidator |
Rejects Windows reserved device names. |
CompressionSecurityValidator |
Checks compression ratio, uncompressed size, entry count, and nested archive limits. |
XmlSecurityValidator |
Parses XML with defusedxml, blocking DTDs and external entities. |
ZipContentInspector |
Iterates ZIP entries checking for traversal, executables, scripts, symlinks, and recursive structures. |
GzipContentInspector |
Streams gzip decompression, checking ratio and size progressively. |
ContentSecurityInspector |
Scans raw bytes for malware signatures, web shells, and polyglot patterns. |
ResourceMonitor |
Context manager enforcing wall-clock time and memory limits. |
SecurityAuditLogger |
Emits structured audit events under safeuploads.audit with correlation IDs. |
Validation Pipelines
Image Validation (validate_image_file)
UploadFile
│
▼
┌─────────────────────────────┐
│ 1. Audit: VALIDATION_START │
└─────────────┬───────────────┘
│
▼
┌─────────────────────────────┐
│ 2. Filename Validation │
│ ├── Unicode normalization │
│ ├── Extension allowlist │
│ ├── Extension blocklist │
│ └── Windows reserved name │
└─────────────┬───────────────┘
│
▼
┌─────────────────────────────┐
│ 3. File Size Check │
│ (chunked progressive) │
└─────────────┬───────────────┘
│
▼
┌─────────────────────────────┐
│ 4. ResourceMonitor START │
└─────────────┬───────────────┘
│
▼
┌─────────────────────────────┐
│ 5. MIME Type Detection │
│ (python-magic, first 8KB)│
└─────────────┬───────────────┘
│
▼
┌─────────────────────────────┐
│ 6. File Signature Check │
│ (magic bytes) │
└─────────────┬───────────────┘
│
▼
┌─────────────────────────────┐
│ 7. Content Analysis │
│ (if enabled) │
│ ├── Executable signatures│
│ ├── Script injection │
│ └── Polyglot detection │
└─────────────┬───────────────┘
│
▼
┌─────────────────────────────┐
│ 8. ResourceMonitor END │
└─────────────┬───────────────┘
│
▼
┌─────────────────────────────┐
│ 9. Audit: VALIDATION_SUCCESS│
└─────────────────────────────┘
ZIP Validation (validate_zip_file)
UploadFile
│
▼
┌─────────────────────────────┐
│ 1. Audit: VALIDATION_START │
└─────────────┬───────────────┘
│
▼
┌─────────────────────────────┐
│ 2. Filename Validation │
│ (same as image) │
└─────────────┬───────────────┘
│
▼
┌─────────────────────────────┐
│ 3. File Size Check │
└─────────────┬───────────────┘
│
▼
┌─────────────────────────────┐
│ 4. ResourceMonitor START │
└─────────────┬───────────────┘
│
▼
┌─────────────────────────────┐
│ 5. Stream to SpooledTempFile│
│ (memory < 10MB → disk) │
└─────────────┬───────────────┘
│
▼
┌─────────────────────────────┐
│ 6. MIME + Signature Check │
└─────────────┬───────────────┘
│
▼
┌─────────────────────────────┐
│ 7. Compression Validation │
│ ├── Ratio check │
│ ├── Uncompressed size │
│ ├── Entry count │
│ └── Nested archive check │
└─────────────┬───────────────┘
│
▼
┌─────────────────────────────┐
│ 8. ZIP Content Inspection │
│ ├── Entry name checks │
│ │ ├── Null bytes │
│ │ ├── Path traversal │
│ │ ├── Absolute paths │
│ │ └── Symlinks │
│ ├── Extension checks │
│ ├── Binary signatures │
│ ├── Script patterns │
│ └── Recursive inspection │
│ ├── Depth tracking │
│ ├── Hash tracking │
│ └── Entry count cap │
└─────────────┬───────────────┘
│
▼
┌─────────────────────────────┐
│ 9. Content Analysis │
│ (if enabled) │
└─────────────┬───────────────┘
│
▼
┌─────────────────────────────┐
│ 10. ResourceMonitor END │
└─────────────┬───────────────┘
│
▼
┌─────────────────────────────┐
│ 11. Audit: VALIDATION_SUCCESS│
└──────────────────────────────┘
Activity File Validation (validate_activity_file)
UploadFile (.gpx, .tcx, .fit)
│
▼
┌──────────────────────────────┐
│ 1. Audit: VALIDATION_START │
└──────────────┬───────────────┘
│
▼
┌──────────────────────────────┐
│ 2. Filename Validation │
└──────────────┬───────────────┘
│
▼
┌──────────────────────────────┐
│ 3. File Size Check │
│ (max_activity_file_size) │
└──────────────┬───────────────┘
│
▼
┌──────────────────────────────┐
│ 4. ResourceMonitor START │
└──────────────┬───────────────┘
│
▼
┌──────────────────────────────┐
│ 5. MIME + Signature Check │
│ ├── FIT: binary signature │
│ │ (.FIT at bytes 8-11) │
│ └── GPX/TCX: XML header │
│ (<?xml) │
└──────────────┬───────────────┘
│
▼
┌──────────────────────────────┐
│ 6. XML Security Validation │
│ (GPX/TCX only) │
│ ├── defusedxml parsing │
│ ├── DTD forbidden │
│ └── Entity expansion block│
└──────────────┬───────────────┘
│
▼
┌──────────────────────────────┐
│ 7. ResourceMonitor END │
└──────────────┬───────────────┘
│
▼
┌──────────────────────────────┐
│ 8. Audit: VALIDATION_SUCCESS │
└──────────────────────────────┘
Gzip Validation (validate_gzip_file)
UploadFile (.gz)
│
▼
┌──────────────────────────────┐
│ 1. Audit: VALIDATION_START │
└──────────────┬───────────────┘
│
▼
┌──────────────────────────────┐
│ 2. Filename Validation │
└──────────────┬───────────────┘
│
▼
┌──────────────────────────────┐
│ 3. File Size Check │
│ (max_gzip_size) │
└──────────────┬───────────────┘
│
▼
┌──────────────────────────────┐
│ 4. ResourceMonitor START │
└──────────────┬───────────────┘
│
▼
┌──────────────────────────────┐
│ 5. MIME + Signature Check │
│ (gzip magic: \x1f\x8b) │
└──────────────┬───────────────┘
│
▼
┌──────────────────────────────┐
│ 6. Gzip Content Inspection │
│ ├── Chunked decompression │
│ ├── Progressive ratio chk │
│ └── Progressive size chk │
└──────────────┬───────────────┘
│
▼
┌──────────────────────────────┐
│ 7. ResourceMonitor END │
└──────────────┬───────────────┘
│
▼
┌──────────────────────────────┐
│ 8. Audit: VALIDATION_SUCCESS │
└──────────────────────────────┘
Data Flow: Where File Content Is Read
| Stage | What is read | Buffer size |
|---|---|---|
| File size check | Chunked reads (chunk_size, 64 KB default) |
Progressive, discarded after counting |
| Stream to temp | Chunked reads to SpooledTemporaryFile |
In-memory up to max_memory_buffer_size (10 MB), then disk |
| MIME detection | First 8 KB via python-magic |
8 KB |
| Signature check | First 4-12 bytes depending on format | Minimal |
| Compression validation | ZIP central directory (metadata only) | Metadata |
| ZIP content inspection | Per-entry read for binary/text scan | Per-entry, bounded by max_individual_file_size |
| Gzip inspection | Chunked streaming decompression | chunk_size at a time |
| XML validation | Full XML content parsed by defusedxml |
Full file (bounded by max_activity_file_size) |
| Content analysis | Full file byte scan | Bounded by content_scan_max_size (50 MB) |
Exception Hierarchy
Exception
└── FileSecurityError
├── FileValidationError
│ ├── FilenameSecurityError
│ ├── UnicodeSecurityError
│ ├── ExtensionSecurityError
│ ├── WindowsReservedNameError
│ ├── FileSizeError
│ ├── MimeTypeError
│ └── FileSignatureError
├── FileProcessingError
│ ├── CompressionSecurityError
│ │ └── ZipBombError
│ ├── ZipContentError
│ └── ResourceLimitError
└── FileSecurityConfigurationError
All exceptions carry an error_code from ErrorCode for
machine-readable classification.
Audit Event Flow
FileValidator.validate_*()
│
├── set_correlation_id()
├── audit.log(VALIDATION_START)
│
├── [validation pipeline]
│ │
│ ├── ZipContentInspector ──► audit.log(THREAT_DETECTED)
│ ├── GzipContentInspector ─► audit.log(THREAT_DETECTED)
│ ├── CompressionValidator ─► audit.log(THREAT_DETECTED)
│ └── ContentSecurityInspector ► audit.log(THREAT_DETECTED)
│
├── On success: audit.log(VALIDATION_SUCCESS)
├── On failure: audit.log(VALIDATION_FAILURE)
│
└── reset_correlation_id()
All log records include the correlation ID in extra via
log_extra(), enabling full request-level tracing across
sub-components.