Abstract

This document presents a systematic analysis of the structural and functional parallels between the Linux kernel’s GPU memory management subsystem (TTM, GEM, GPUVM) and the classical CPU memory management (MM) subsystem. We argue that the GPU memory stack is not merely inspired by CPU MM—it is a re-derivation of the same fundamental principles under different hardware constraints. The analogy is not incidental; it is architecturally inevitable, deeply embedded in the kernel’s DRM subsystem through shared terminology, identical algorithmic patterns, and converging design trajectories.


1. Thesis

The Linux GPU memory management subsystem (TTM/GEM/GPUVM) constitutes a domain-specific re-implementation of classical CPU virtual memory concepts—virtual address spaces, demand paging, LRU-based eviction, and swap—adapted to the constraints of discrete accelerator memory hierarchies.

This is evidenced by:

  1. Direct lexical borrowing (swap_storage, TTM_TT_FLAG_SWAPPED, ttm_tt_swapin())
  2. Isomorphic data structure design (drm_gpuvmmm_struct, drm_gpuvavm_area_struct)
  3. Identical algorithmic strategies (LRU eviction scanning, shrinker integration, fault-driven population)
  4. Converging evolution (GPU fault handling moving toward CPU-style demand paging via SVM/HMM)

2. Argumentation

2.1 Address Space Management

CPU MM GPU MM (DRM) Structural Role
mm_struct struct drm_gpuvm Per-context virtual address space container
vm_area_struct struct drm_gpuva Contiguous VA region mapped to a backing object
VMA red-black tree GPUVM interval rb-tree Spatial indexing for fast lookup
mmap() / munmap() drm_gpuvm_sm_map() / drm_gpuvm_sm_unmap() User-facing VA space mutation
VMA split/merge on partial unmap drm_gpuva_op_remap (split) Maintaining VA space consistency

The drm_gpuvm documentation states:

“The DRM GPU VA Manager keeps track of a GPU’s virtual address space by using maple_tree structures… There should be one manager instance per GPU virtual address space.”

This is functionally identical to mm_struct managing a process’s virtual address space. The kernel_alloc_node in drm_gpuvm directly mirrors the kernel’s reserved address range in process VA space.

2.2 Backing Storage and Placement

CPU MM GPU MM (TTM) Structural Role
Physical RAM (zones) VRAM (TTM_PL_VRAM) Fast, limited primary storage
Swap device System memory (TTM_PL_SYSTEM) Slower, larger overflow storage
struct page struct ttm_resource Unit of physical allocation tracking
Page frame allocation (buddy) gpu_buddy_alloc_blocks() Power-of-two physical allocator
NUMA node affinity ttm_place.mem_type Placement preference hierarchy

The TTM placement system (struct ttm_placement with an ordered array of struct ttm_place) is analogous to NUMA memory policies—expressing a preference hierarchy for where memory should physically reside.

2.3 The Swap Analogy

This is the most explicit parallel, with TTM directly adopting CPU MM terminology:

struct ttm_tt {
    struct page **pages;
    #define TTM_TT_FLAG_SWAPPED  BIT(0)   // ← CPU swap terminology
    struct file *swap_storage;             // ← shmem backing, like swap
    ...
};

The eviction-to-system-memory path in TTM is structurally identical to page swap-out:

CPU Swap-Out GPU Eviction (TTM)
Select victim via LRU scan Select BO via ttm_resource_manager.lru[]
Write page contents to swap device Move BO contents to system memory / shmem (swap_storage)
Replace PTE with swap entry Update ttm_resource placement; set TTM_TT_FLAG_SWAPPED
Free physical page frame Free VRAM allocation

The swap-in / fault-in path:

CPU Swap-In GPU Re-validation
Page fault triggers do_swap_page() Command submission triggers drm_gpuvm_validate()
Read from swap, allocate page frame Call ttm_bo_validate() → allocate VRAM, copy back
Install PTE pointing to new frame Update GPU page table entry
Clear swap entry Clear TTM_TT_FLAG_SWAPPED; call ttm_tt_swapin()

The GEM VRAM documentation makes this explicit:

“If there’s no more space left in VRAM, inactive GEM objects can be moved to system memory.”

This sentence is the GPU equivalent of: “If there’s no more physical RAM, inactive pages can be moved to swap.”

2.4 LRU-Based Eviction and Reclaim

Both subsystems use LRU (Least Recently Used) as the primary eviction policy:

CPU MM:

  • Active/inactive LRU lists per memory zone
  • kswapd daemon performs background reclaim
  • Shrinker callbacks for slab caches
  • lru_gen (multi-generational LRU) for improved aging

GPU MM (TTM/GEM):

  • ttm_resource_manager.lru[TTM_MAX_BO_PRIORITY] — priority-aware LRU per memory type
  • drm_gem_lru with drm_gem_lru_scan() — shrinker integration for GEM objects
  • ttm_pool_type.shrinker_list — TTM page pools registered as kernel shrinkers
  • drm_mm_scan_init() / drm_mm_scan_add_block() — LRU scan for contiguous eviction

The DRM MM documentation describes the eviction scan pattern:

“Eviction candidates are added using drm_mm_scan_add_block() until a suitable hole is found or there are no further evictable objects.”

This is algorithmically identical to the CPU MM’s shrink_inactive_list() scanning candidates until enough memory is freed.

2.5 Demand Paging and Fault Handling

CPU MM GPU MM Structural Role
handle_mm_fault() GEM fault handler (vm_operations_struct.fault) On-demand page/BO population
Lazy allocation (allocate on first touch) ttm_tt_populate() on first use Defer physical allocation
FAULT_FLAG_WRITE → CoW Pin on write / migrate on access Access-type-specific handling

The GEM documentation states:

“Drivers are responsible for the actual physical pages allocation by calling shmem_read_mapping_page_gfp() for each page. Note that they can decide to allocate pages when initializing the GEM object, or to delay allocation until the memory is needed (for instance when a page fault occurs).”

This is textbook demand paging, transplanted into the GPU domain.

2.6 Madvise and Purgeability

CPU MM GPU MM Structural Role
madvise(MADV_DONTNEED) DRM_GEM_OBJECT_PURGEABLE / shmem.madv Hint: memory can be reclaimed
madvise(MADV_WILLNEED) Prefetch operations (DRM_GPUVA_OP_PREFETCH) Hint: memory will be needed soon
Page marked as clean → free without write-back ttm_backup_flags.purge — free without backing up Optimization: skip write-back

2.7 Hibernation as Full Swap-Out

TTM even handles system hibernation by treating it as a complete swap-out:

int ttm_device_prepare_hibernation(struct ttm_device *bdev);
// "move GTT BOs to shmem for hibernation"

This is the GPU equivalent of the CPU MM writing all active pages to swap during suspend-to-disk.

2.8 Convergence: GPU SVM and HMM

The analogy is not static—it is converging. Modern GPU drivers (AMD KFD SVM, Intel Xe SVM) now implement true shared virtual memory where:

  • GPU and CPU share the same virtual address space
  • GPU page faults are handled like CPU page faults
  • HMM (hmm_range_fault()) bridges the two worlds
  • Device-private pages (MEMORY_DEVICE_PRIVATE) appear as swap entries in CPU page tables

This convergence validates the thesis: the GPU memory subsystem was always solving the same problem as CPU MM, and the two are now literally merging through HMM.


3. Architectural Mapping (Complete)

┌─────────────────────────────────────────────────────────────────┐
│                    STRUCTURAL ISOMORPHISM                       │
├──────────────────────────┬──────────────────────────────────────┤
│       CPU MM             │          GPU MM (DRM/TTM)            │
├──────────────────────────┼──────────────────────────────────────┤
│ mm_struct                │ drm_gpuvm                            │
│ vm_area_struct           │ drm_gpuva                            │
│ struct page / folio      │ ttm_resource / drm_gem_object        │
│ Physical RAM             │ VRAM (TTM_PL_VRAM)                   │
│ Swap space               │ System memory (TTM_PL_SYSTEM)        │
│ Page tables (PGD→PTE)    │ GPU page tables (driver-specific)    │
│ Buddy allocator          │ gpu_buddy / drm_mm range allocator   │
│ LRU lists + kswapd       │ ttm_resource_manager.lru[] + shrinker│
│ do_swap_page()           │ ttm_tt_swapin()                      │
│ swap entry in PTE        │ TTM_TT_FLAG_SWAPPED                  │
│ madvise(MADV_DONTNEED)   │ DRM_GEM_OBJECT_PURGEABLE             │
│ handle_mm_fault()        │ ttm_tt_populate() / GEM fault handler│
│ migrate_pages()          │ ttm_bo_validate() (move between mems)│
│ mmu_notifier             │ drm_gpuvm_bo_evict() callbacks       │
│ /proc/pid/maps           │ debugfs GPU VA dump                  │
│ OOM killer               │ Eviction failure → -ENOMEM           │
└──────────────────────────┴──────────────────────────────────────┘

4. Why the Analogy Is Architecturally Inevitable

The convergence is not coincidental. Both subsystems solve the same abstract problem:

Given a processor with a virtual address space larger than its fast local memory, multiplex limited physical storage among competing consumers using indirection (page tables), lazy allocation (demand paging), and capacity management (eviction/swap).

The only differences are:

  1. Granularity: CPU MM operates at page granularity (4KB–2MB); GPU MM often operates at buffer-object granularity (KB–GB).
  2. Coherence model: CPU has hardware cache coherence; GPU requires explicit flush/invalidate or domain transitions.
  3. Fault latency tolerance: CPU page faults stall a single thread; GPU “faults” (eviction+revalidation) are batched at submission time (though modern GPUs now support true page faults).
  4. Multiplexing unit: CPU multiplexes among processes; GPU multiplexes among buffer objects (though GPUVM now provides per-process GPU VA spaces).

5. Documented Sources

5.1 Primary Kernel Documentation

  • DRM Memory Management — The canonical reference. TTM, GEM, GPUVM, DRM MM, Buddy Allocator all documented here. Contains the swap_storage, LRU, shrinker, and eviction APIs.

  • DRM GPUVM — Documents the GPU virtual address space manager with eviction tracking, split/merge, and validation.

5.2 In-Tree Source Code (self-documenting)

File Relevant Analogy
drivers/gpu/drm/ttm/ttm_tt.c ttm_tt_swapin(), ttm_tt_swapout(), swap_storage
drivers/gpu/drm/ttm/ttm_bo.c LRU eviction, ttm_bo_validate()
drivers/gpu/drm/ttm/ttm_pool.c Page pool with shrinker (mirrors slab shrinker)
drivers/gpu/drm/drm_gpuvm.c GPU VA space management, eviction lists
drivers/gpu/drm/drm_gem.c drm_gem_lru_scan() — shrinker for GEM objects
include/drm/ttm/ttm_tt.h TTM_TT_FLAG_SWAPPED, struct ttm_tt

5.3 Conference Presentations and Articles

  • XDC (X.Org Developer’s Conference) — Christian König’s presentations on TTM rework explicitly discuss the memory hierarchy and eviction model.
  • LWN.net — Articles on DRM memory management, GPUVM (Danilo Krummrich’s series, 2023), and VM_BIND discuss GPU VA management in terms familiar to CPU MM developers.
  • “GEM - the Graphics Execution Manager” (LWN, 2008) — The foundational article establishing GEM’s design philosophy around shmem-backed objects.

5.4 The HMM Bridge

The HMM (Heterogeneous Memory Management) subsystem is the ultimate evidence of this analogy, as it literally bridges the two:

  • Device-private pages appear as swap-like entries in CPU page tables
  • hmm_range_fault() mirrors handle_mm_fault() for device memory
  • migrate_vma_*() extends migrate_pages() to device memory

6. Conclusion

The GPU memory management subsystem in Linux is not merely analogous to CPU memory management—it is a convergent re-derivation of the same solutions to the same fundamental problem of virtual memory management under physical scarcity. The evidence is:

  1. Lexical: TTM uses CPU MM terminology (swap, swapin, populate, evict, LRU, shrinker).
  2. Structural: drm_gpuvm/drm_gpuva mirror mm_struct/vm_area_struct in role and implementation.
  3. Algorithmic: LRU-based eviction scanning, demand population, and shrinker integration follow identical patterns.
  4. Evolutionary: The two subsystems are actively converging through HMM/SVM, with GPU drivers now participating directly in CPU MM’s page fault and migration infrastructure.

For kernel developers, understanding CPU MM provides a direct conceptual framework for understanding GPU memory management—and vice versa. The DRM subsystem’s memory management is best understood not as a novel design, but as classical virtual memory theory applied to a different class of processor.


References

  1. Linux Kernel Documentation: DRM Memory Management
  2. Linux Kernel Source: include/drm/ttm/ttm_tt.h
  3. Linux Kernel Source: drivers/gpu/drm/ttm/ttm_tt.c
  4. Linux Kernel Source: drivers/gpu/drm/drm_gpuvm.c
  5. Krummrich, D. “DRM GPUVM” — Kernel patches and documentation, 2023
  6. König, C. “TTM Rework” — XDC presentations, 2019–2023
  7. Corbet, J. “GEM - the Graphics Execution Manager” — LWN.net, 2008
  8. Linux Kernel Documentation: Heterogeneous Memory Management (HMM)
Logo

AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念,把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起,为开发者提供从开发、训练到部署的一站式体验。

更多推荐