Docs/Scanning

Scanning

How SimSweep scans your mods folder, what it detects, and what all those numbers mean.


Scanning is the core of SimSweep. It reads every file in your Mods folder, figures out what each one is, and builds a complete picture of your CC library. Here's what it actually does under the hood.


How It Works

When you hit Scan, SimSweep goes through several phases. The progress bar is weighted to reflect how long each phase actually takes, so it won't jump from 10% to 90% and then sit there.

Phase 1: Discovery

Walks your entire Mods folder and finds every .package and .ts4script file, including files inside subfolders. Files that can't be read (permissions issues, locked files) get logged as warnings instead of aborting the scan.

Phase 2: Indexing

This is the heavy part. SimSweep opens each file and reads the binary data inside:

  • DBPF parsing - .package files use EA's DBPF container format. SimSweep reads the header, index table, and individual resource entries.
  • CAS metadata - For clothing, hair, accessories, etc., it reads the CAS Part data to extract body types, age flags, gender, outfit categories, colors, and tags.
  • Resource types - Every resource inside the package is cataloged. SimSweep recognizes 60+ resource types including CAS parts, thumbnails, object definitions (OBJD), catalog objects (COBJ), tuning files, animations, state machines, meshes, textures, pelt brushes, skin tones, and more.
  • Thumbnails - Extracts preview images from all available image resources (small, medium, large, extra, wall variants) and picks the highest resolution one. DDS textures get converted to PNG.
  • File hashes - SHA256 fingerprint of each file for duplicate detection and catalog matching. Hashing streams through files in 64KB chunks so it doesn't eat memory.
  • Script analysis - .ts4script files get scanned for module names, imports, and dependencies.
  • Build/Buy tags - COBJ (Catalog Object) binary resources are parsed to extract catalog tags for data-driven subcategorization of build/buy items.

Indexing uses a two-pass architecture for speed. The fast pass indexes file metadata (resource keys, instance IDs, conflict maps) without decompressing any resources. The enrichment pass then decompresses only the CAS Parts and Build/Buy objects that need deeper analysis. This avoids the most expensive operation (CAS Part decompression) during the main indexing loop. Resource reads within each file are sorted by disk offset for better sequential I/O, especially on HDDs.

Indexing runs in parallel across multiple threads (max 4 for the fast pass, max 3 for enrichment, max 2 in safe mode). Threads run at below-normal priority so your system stays responsive.

Phase 3: Analysis

Cross-references all the indexed data:

  • Conflict detection - Finds files that modify the same game resource. Conflicts are classified by type (Appearance, CAS, Build/Buy, Gameplay, Mixed) and severity.
  • Soft conflict detection - Finds tuning-level incompatibilities that resource-key conflicts miss (conflicting traits, buff priority, conflicting perks, lot traits, prohibited situations, relationship bit groups). Adds about 2 seconds.
  • Override vs conflict - When both mods override the same base game resource, that's a "pick one" situation, not a broken mod. These get reduced severity.
  • Broken CC detection - Finds CAS items with missing GEOM meshes and Build/Buy objects with missing MODL meshes. Uses a 3-tier lookup: same file, then all installed mods, then the base game index. CC requiring uninstalled DLC packs is excluded (that's a pack requirement, not broken CC).
  • Duplicate detection - SHA256-based exact duplicate finding.
  • Dependency tracking - Maps which script mods provide which modules and which modules each mod imports from.
  • Deletion markers - Detects mods that include COMPRESSION_DELETED entries (which remove base game resources). These are flagged as high-risk.

Phase 4: Classification

Auto-labels each file by type based on what's actually inside it, not just the filename:

  • Resource-based classification - 20+ rules that look at resource type combinations. If a package has interaction tuning, buff tuning, and trait tuning, it's classified as a mod. If it has CAS parts and mesh data, it's CC. Career mods, spell mods, pose packs, build/buy CC, and more all get their own labels.
  • Filename heuristics - Used as a fallback when resource analysis isn't conclusive. The scanner knows about 70+ known mod patterns.
  • Conflict types - Each conflict group gets classified as Appearance (purple), CAS (blue), Build/Buy (teal), Gameplay (red), or Mixed (orange).

Phase 5: Save Matching

Reads your save files and tray data to figure out which CC is actually being used:

  • CAS items only get the "used" label when their parts appear in actual save files, not just tray files
  • Objects get matched against placed items in your lots
  • Presets can't be tracked (the game doesn't save which presets a sim uses)

Phase 6: Community Sync

If community sharing is enabled, anonymous metadata gets sent to the CC Hub. This includes file hashes, categories, body types, and extracted thumbnails. Not your filenames, paths, or personal info.


Resource Types

SimSweep recognizes a huge range of Sims 4 resource types. Here are the main categories:

CAS Resources: CAS Part definitions, meshes, textures, thumbnails, skin tones, pelt brushes, pelt layers

Build/Buy Resources: Object definitions (OBJD), catalog objects (COBJ), meshes, thumbnails across multiple sizes, walls, floors, roofing, fencing, stairs

Tuning Resources: 18+ tuning types including interactions, buffs, traits, snippets, situations, careers, object states, moods, recipes, statistics, rewards, aspirations, relationship bits, spells, venues, zone modifiers, drama nodes, bucks perks

Animation Resources: State machines, animation clips, clip headers, rigs, slots

Other: Combined tuning (game-internal, automatically filtered out), script archives, string tables

The scanner skips the game's internal COMBINED_TUNING format since it's never user CC and would inflate resource counts.


Scan History

Every scan is saved. The scan history dropdown on the dashboard lets you see past scans with their dates, health scores, and file counts. Useful for tracking how your mods folder changes over time.

Full scan history requires Decrypt tier. Free users see the most recent scan only.


Scan Caching

Results are stored in a local SQLite database. This means:

  • Instant browse on startup - Last scan's data loads immediately without re-scanning
  • Incremental scans - Only new or changed files get fully processed
  • Low memory - Data is paginated from disk instead of held in RAM
  • Auto-recovery - If the database gets corrupted, SimSweep deletes it, retries once, then falls back to in-memory as a last resort

Scan indices (in-memory hash maps) get freed 5 minutes after the scan finishes. The Browse page pulls from SQLite so this doesn't affect anything.


Auto-Scan and File Watching

Requires Decrypt tier

With Decrypt or Override, SimSweep can watch your Mods folder for changes and trigger scans automatically:

  • File watcher - Monitors your Mods folder for new, changed, or deleted files. When something changes, SimSweep can prompt you or auto-scan.
  • Auto-scan on file change - When enabled, dropping a new .package into your mods folder triggers a scan without you having to click anything.

Configure these in Settings > Scanning.


Performance

A few things SimSweep does to keep scans fast without hurting your system:

  • Two-pass indexing — fast metadata pass skips all decompression, enrichment pass only hits files that need it
  • Disk-ordered reads — resource reads within each file sorted by offset for sequential I/O
  • Parallel indexing on multiple threads at below-normal CPU priority
  • 64KB streaming for file hashing instead of loading entire files into memory
  • Batch SQLite inserts (20 rows per batch) instead of one-at-a-time
  • PRAGMA tuning on the scan database (cache_size, temp_store, mmap_size)
  • Dedicated indexes for common queries (broken files, file lookups, save usage)
  • Index cleanup after 5 minutes to free ~100-150 MB of RAM