A BitLocker false front

false front, noun; a facade extending beyond and especially above the true dimensions of a building to give it a more imposing appearance.

Overview

During a review of a stale production environment, I found a gap between reported BitLocker compliance and actual device state. BitLocker configuration policies were present, and devices were reporting compliance, but several inspected devices showed no trace of key escrow in either Entra or AD. Without escrowed recovery keys, and with the risk of permanent data loss from something as routine as a firmware update, this warranted immediate investigation.

Environmental Context

The endpoint environment was a standard Microsoft hybrid environment, with Windows devices managed through a combination of MDM policies in Intune and GPOs via AD.

Records showed that the MDM policies had not been touched for several years, and the GPOs even longer than that. It gave the impression of an environment that had begun a modernisation effort and then frozen once it got to a minimum viable functional state. On prem did its job, the cloud was functional, and the end users were not complaining because their workflows were undisturbed. Nothing was "obviously" broken, so no one thought there was anything that warranted fixing. The environment sat in this state for years.

On the surface, BitLocker appeared to exist in production, but its actual state was another story. The goal was to assess the deployment state, make explicit how encryption was functioning, and create a safe path toward proper enforcement without disrupting users.

Investigation

Since the existing policies indicated compliance, this suggested the environment had already implemented encryption and only needed cleanup or correction.

BitLocker compliance policies at the time of this investigation did not have the ability to check for escrowed keys. With that context, my first decision was to check how many devices were lacking escrowed keys. It was all of them. My next decision was to back up the keys manually on a handful of business-critical endpoints. Since this was a non-disruptive operation, I decided to invoke the necessary commands remotely via a domain controller.

First was to check the output of Get-BitLockerVolume on the remote machines.

I did not expect to find an entirely unencrypted drive at first, but after the second and third ones I found, I had to rethink my mental model.

That led me to where I probably should have begun: an analysis of the BitLocker policy itself.

I think I did not look there at first because, as someone reasonably new to the industry, I had an implicit trust or reverence for those that built the systems. I was looking for the bug in the system, but it turned out that the system was the bug.

The critical finding was that the existing targeting model applied only to fixed data drives, and not to OS drives. No one in the environment ran a fixed data drive.

Root Cause

The BitLocker policy that was originally in place was attached to a user group, and the actual configuration settings did not apply to the only kind of drives with which the endpoints were equipped. There were no keys, and there was no encryption, because according to how the policy was set up there was nothing to encrypt in the first place. In practice, the policy had nothing meaningful to act on.

The result was a false front. All the right things appeared to be in place. From the surface, the devices had secure data at rest. BitLocker-related configuration existed, but the actual encryption event had never been triggered for the operating system volume.

The root cause was not a broken BitLocker deployment, but an ineffective one. Legacy policy targeted the wrong drive class and therefore never established meaningful OS-drive encryption. This created the appearance of security coverage without the actual lifecycle events that would prove deployment, such as recovery-key generation and escrow.

Findings

Compliance was not signal here, it was noise. It showed only that policies existed, and that they had been acknowledged by the endpoints.
The absence of escrowed recovery keys was not a side issue. It was one of the indicators that the state being reported by the environment was false.
BitLocker settings are not all equal. Some only matter at the point encryption begins, which means a policy can look populated for years without ever having produced the event it was supposed to govern.
A user-scoped policy aimed at a device-based configuration creates unnecessary noise.

Approach

Once the actual state of the environment was clear, the work stopped being cleanup and became a matter of designing and implementing an actual deployment in a production environment, with minimal end-user disruption.

The new approach was built around a few principles:

target devices rather than users
make escrow a prerequisite rather than an afterthought
use TPM-backed enablement when hardware supported it
apply in lab first, then a curated pilot group, observe closely, then widen only if the evidence justified it

Risk Management

The main risk was not theoretical security loss. That had already happened years earlier when the appearance of encryption was accepted as if it were the thing itself. The live risk now was operational: if I moved too aggressively, I could create disruption for users or lose the opportunity to stage the correction properly.

The design had to be conservative. Recovery material needed to be in the right place first. The targeting model needed to be corrected. The initial rollout needed to happen on a small enough group that failure would be visible and containable.

In practical terms, the acceptable failure mode was that a device might not encrypt and would need further classification. The unacceptable failure mode was a broad rollout that created downtime or put endpoints in an unrecoverable state.

Validation

To validate, I had to look for real signal: OS-drive encryption beginning on the endpoint, and recovery keys appearing where they could actually be recovered.

The checks were simple enough, but this time they were grounded in evidence across Entra, Intune, and the endpoints themselves:

did recovery keys appear in Entra for the device
did the pilot devices begin OS-drive encryption
did TPM or identity issues surface during the process
did the devices remain usable without turning the rollout into a support event

Because standard reporting mechanisms had been misread at the original time of implementation, the point was to confirm that the underlying lifecycle event had finally taken place.

Outcome

The false front was identified, the targeting model was corrected, and the rollout was implemented in phases.

All Windows 11 computers in the tenant were brought to a functional BitLocker state with recovery keys escrowed. Older Windows 10 computers without TPM 2.0 were not forced through the same path; they were left out of the encryption rollout and scheduled for upgrade and decommissioning.

The real change was that encryption was no longer being assumed from the presence of policy objects and green check marks. The state had to be verified from the endpoint and from recovery-key escrow.

Lessons Learned

A control can exist in the environment for years without ever having done the thing people assume it did.
Compliance indicators can hide true state if they are not tied to measurement.
The absence of an artefact can be more useful in diagnosis than the presence of a reassuring setting.
There is a difference between inheriting a system and trusting it. One is unavoidable. The other has to be earned.
Sometimes the work is not fixing the bug in the system. Sometimes it is recognising that the system itself is what needs to be rethought.