Skip to content

Sensitive data discovery

Hunt credential stores, secret material, and high-risk file types tenant-wide across SharePoint, OneDrive, and (via keyword search) Exchange Online. Our tool of choice is Microsoft Purview eDiscovery: it indexes mailboxes and sites org-wide and lets us query them with KeyQL and built-in sensitive-information types.

Purview eDiscovery

Prereqs: account in eDiscovery Manager or eDiscovery Administrator (or a custom role with search/export rights). Credential-related sensitive information types (SITs) (e.g. All credentials, General password, Client secret / API key) require advanced classification and typically E5 / premium eDiscovery. Confirm licensing before relying on SensitiveType for secrets.

Scope note: SensitiveType:"..." matches classified content on SharePoint / OneDrive (indexed documents). It does not search mailbox/Teams chat bodies for SIT matches the same way; combine with keyword queries for mailboxes. Stay within ROE for search, export, and retention.

  1. Purview portal, eDiscovery, open the system Content Search.
  2. New search, add locations (all users, specific users, sites, or org-wide SharePoint/OneDrive/Exchange as needed).
  3. In the query box, use KeyQL (keyword query language) and/or the condition builder (conditions).
  4. Run the search, then review Sample (hit counts, locations, size).

Downloading arbitrary files

Direct download works for common file types (e.g. .docx, .csv, email): select the item and use Download.

When Download is missing (e.g. .kdbx, other non-preview types), pull the file via the preview API:

  1. Open the item in preview anyways, then DevTools, Network.
  2. Find the GetPreviewInfo request, copy DocumentId from the JSON response.
  3. Download any file that does offer the download functionality, Copy as cURL (request to /api/DocumentPreview/DownloadDocument).
  4. In that cURL, replace the documentId query value with the ID from step 2, run with -o <filename>.
bash
# The download request should look something like this
curl 'https://purview.microsoft.com/api/DocumentPreview/DownloadDocument?documentId=<DocumentId>' \
  -X POST \
  -o file.kdbx

Query patterns

Credential / secret files (extension-based)

Target vaults, keys, and common leak formats on sites (use keyword filetype: for prefix/wildcard behavior; in the GUI File type condition, list extensions explicitly. doc* does not match docx):

text
filetype:kdbx OR filetype:pem OR filetype:pfx OR filetype:p12 OR filetype:key OR filetype:ppk OR filetype:ovpn OR filetype:rdp
text
(filename:password* OR filename:*secret* OR filename:*credential*) AND (filetype:txt OR filetype:csv OR filetype:xlsx OR filetype:json OR filetype:xml OR filetype:env)

Keyword hunts (content in body/metadata)

text
password OR passwd OR pwd OR secret OR "api key" OR apikey OR api_key OR connectionstring OR "client secret" OR privatekey OR "BEGIN RSA PRIVATE KEY"

Externally shared sensitive data

text
ViewableByExternalUsers:true AND SensitiveType:"All Credentials"
text
ViewableByExternalUsers:true AND filetype:kdbx

References