Detect threats using Microsoft Graph activity logs - Part 2

Fabian Bader included in Azure AD Entra KQL Sentinel Entra ID Microsoft Graph Security

2023-11-11 2221 words 11 minutes

Contents

In part one I focused mostly on detecting offensive security tools like AzureHound, GraphRunner, and PurpleKnight. In part two I will go into more depth how you can use the now available information for hunting and how to correlate it with other datasets to gain deeper insights.

Correlate Graph activities with other log sources

While the MicrosoftGraphActivityLogs alone is a trove of information, correlating it with other logs makes it an even more interesting data source. Here are a few example how to get additional information.

Resolve User Id to UPN

With User and Entity Behavior Analytics (UEBA) enabled in Sentinel the IdentityInfo table gives a great overview on all user identities in Entra ID. It behaves more like a hybrid between watchlist and regular table and you should always query the last 14 days and aggregate to the newest available entry to make sure to have all information available.

Since the Graph logs only contain a user Id but no user principal name, this is something you might need to better identify the user responsible for the Graph call.

MicrosoftGraphActivityLogs
| where TimeGenerated > ago(1d)
| where isnotempty( UserId )
| join kind=inner (IdentityInfo
    | where TimeGenerated > ago(14d)
    | summarize arg_max(TimeGenerated, *) by AccountObjectId
    | project UserId=AccountObjectId, AccountUPN)
    on UserId
| project-away UserId1
| limit 100
| sort by TimeGenerated

This returns only Graph calls done by a regular user and if the user could be resolved by the IdentityInfo table. Change the join kind to leftouter to also include graph calls from user ids that cannot be resolved.

Another cool trick is the ability map the sign-in information using the field SignInActivityId which translates to UniqueTokenIdentifier. That way you can easily map a particular sign-in event to the events in Microsoft Graph.

Since the object id of the active entity can be in either the field UserId or in ServicePrincipalId depending on the object type you must consider this when querying the data.

I created a two new fields ObjectId and ObjectType for this reason.

Now you should join SigninLogs, AADNonInteractiveUserSignInLogs, AADServicePrincipalSignInLogs, and AADManagedIdentitySignInLogs to have the best coverage of all available sign-ins. (ADFS logs not covered because please don’t use it anymore)

MicrosoftGraphActivityLogs
| where TimeGenerated > ago(8d)
| extend ObjectId = iff(isempty(UserId), ServicePrincipalId, UserId)
| extend ObjectType = iff(isempty(UserId), "ServicePrincipalId", "UserId")
| join kind=inner (union isfuzzy=true
        SigninLogs,
        AADNonInteractiveUserSignInLogs,
        AADServicePrincipalSignInLogs,
        AADManagedIdentitySignInLogs
    | where TimeGenerated > ago(90d)
    | summarize arg_max(TimeGenerated, *) by UniqueTokenIdentifier
    )
    on $left.SignInActivityId == $right.UniqueTokenIdentifier
| project-reorder TimeGenerated, ObjectType, UserPrincipalName, ObjectId, SignInActivityId, RequestUri, RequestMethod

With this you get a good understanding when the entity signed in to Entra ID and what they did using the Microsoft Graph API.

/en/detect-threats-microsoft-graph-logs-part-2/images/MapSignInEventsToGraphCalls.png — IdentiyInfo is a real help when you want to map Graph calls to user data.

From a threat detection perspective let’s change the direction of this query for a second and ignore all the queries where you find no sign-in information in the logs.

MicrosoftGraphActivityLogs
| where TimeGenerated > ago(8d)
| extend ObjectId = iff(isempty(UserId), ServicePrincipalId, UserId)
| extend ObjectType = iff(isempty(UserId), "ServicePrincipalId", "UserId")
| summarize by ObjectType, ObjectId, SignInActivityId
| join kind=leftanti (union isfuzzy=true
        SigninLogs,
        AADNonInteractiveUserSignInLogs,
        AADServicePrincipalSignInLogs,
        AADManagedIdentitySignInLogs
    | where TimeGenerated > ago(90d)
    | summarize arg_max(TimeGenerated, *) by UniqueTokenIdentifier
    )
    on $left.SignInActivityId == $right.UniqueTokenIdentifier
| summarize by ObjectType, ObjectId

In my environment this resulted in about 55 unique service principal Ids I could find any sign-in data for. Either my lab is hopelessly compromised or there is some data missing from the logs.

Let’s map all these object ids to service principal ids that exists in my tenant either as Enterprise Application, multi tenant app or even managed identities.

Since there is no native IdentityInfo table for such objects my colleague Thomas Naunheim and I created such a enrichment table based on our Sentinel Enrichment Framework. More on than in due time.

MicrosoftGraphActivityLogs
| where TimeGenerated > ago(8d)
| extend ObjectId = iff(isempty(UserId), ServicePrincipalId, UserId)
| extend ObjectType = iff(isempty(UserId), "ServicePrincipalId", "UserId")
| summarize by ObjectType, ObjectId, SignInActivityId
| join kind=leftanti (union isfuzzy=true
        SigninLogs,
        AADNonInteractiveUserSignInLogs,
        AADServicePrincipalSignInLogs,
        AADManagedIdentitySignInLogs
    | where TimeGenerated > ago(90d)
    | summarize arg_max(TimeGenerated, *) by UniqueTokenIdentifier
    )
    on $left.SignInActivityId == $right.UniqueTokenIdentifier
| summarize by ObjectType, ObjectId
| join kind=leftouter (_GetWatchlist('WorkloadIdentityInfo')) on $left.ObjectId == $right.SearchKey
| project ObjectType, ObjectId, AppDisplayName, AppId, IsFirstPartyApp

This changes the perception of the data quite a bit. All of the service principals are resolved and as indicated by the IsFirstPartyApp field belong to Microsoft. But it’s still curious that they don’t show up in any sign-in log I have access to. With names like Yggdrasil there definitely are some creative minds at work.

/en/detect-threats-microsoft-graph-logs-part-2/images/IsFirstPartyApp.png — First party Microsoft apps that don't sign in?

Missing object ids

One thing I also found very curious are Graph events without any user or service principal Id.

let ClientAuthMethods = dynamic ({"0": "public client", "1": "client secret", "2": "Certificate"});
MicrosoftGraphActivityLogs
| where TimeGenerated > ago(8d)
| where isempty(UserId) and isempty(ServicePrincipalId)
| extend ClientAuthMethodName = tostring(ClientAuthMethods[tostring(ClientAuthMethod)])
| summarize Count=count() by RequestUri, UserAgent, ClientAuthMethodName
| project-reorder Count, ClientAuthMethodName, UserAgent, RequestUri

/en/detect-threats-microsoft-graph-logs-part-2/images/MissingObjectIds.png — Are these a bug or problems in mapping object id to Graph event?

Some of them I was able to match to a sign-in events of a user using GDAP, but others I didn’t find correlating logs. This case is still unsolved.

let ClientAuthMethods = dynamic ({"0": "public client", "1": "client secret", "2": "Certificate"});
MicrosoftGraphActivityLogs
| where TimeGenerated > ago(90d)
| where isempty(UserId) and isempty(ServicePrincipalId)
| extend ClientAuthMethodName = tostring(ClientAuthMethods[tostring(ClientAuthMethod)])
| join kind=inner (union isfuzzy=true
        SigninLogs,
        AADNonInteractiveUserSignInLogs,
        AADServicePrincipalSignInLogs,
        AADManagedIdentitySignInLogs
    | where TimeGenerated > ago(90d)
    | summarize arg_max(TimeGenerated, *) by UniqueTokenIdentifier
    )
    on $left.SignInActivityId == $right.UniqueTokenIdentifier
| summarize by RequestUri, UserAgent, ClientAuthMethodName, ClientAuthMethod, Identity

/en/detect-threats-microsoft-graph-logs-part-2/images/MissingObjectIdsCorrespondingSignIn.png — SignIn information to the rescue.

Correlate the data with itself

The batch endpoint

When using Microsoft Graph you might have encountered it already, and if you take a look at the browser developer tools when using the Entra portal you definitely have seen it:

https://graph.microsoft.com/beta/$batch

This endpoint accepts multiple Graph requests using the POST method and returns all results in one, handy response. But how does such an request will show up int the MicrosoftGraphActivityLogs?

MicrosoftGraphActivityLogs
| where TimeGenerated > ago(1d)
| where RequestMethod == "POST" and RequestUri == "https://graph.microsoft.com/beta/$batch"
| limit 10
| join kind=inner (MicrosoftGraphActivityLogs
    | where RequestMethod != "POST" and RequestUri != "https://graph.microsoft.com/beta/$batch"
    | project-rename BatchRequestUri = RequestUri
    )
    on OperationId
| project-reorder TimeGenerated, OperationId, RequestUri, BatchRequestUri

Using this kusto query you can get 10 of those graph calls and map the actual requests based on the OperationId. So even if the batch endpoint is used, all related graph calls are logged and can be used in the investigation.

Hunting

All new data sources should help you build either detections or hunting queries to find the needle in the haystack. Here are a few ideas of mine you can use in you environment.

Unusual user agent

let HistoricalActivity = MicrosoftGraphActivityLogs
    | where TimeGenerated between (ago(30d) .. startofday(now()))
    | where isnotempty(UserAgent)
    | extend ObjectId = iff(isempty(UserId), ServicePrincipalId, UserId)
    | extend ObjectType = iff(isempty(UserId), "ServicePrincipalId", "UserId")
    | summarize by ObjectId, UserAgent, IPAddress;
MicrosoftGraphActivityLogs
| where TimeGenerated between (startofday(now()) .. now())
| extend ObjectId = iff(isempty(UserId), ServicePrincipalId, UserId)
| where isnotempty(UserAgent)
// Remove known user agents
| join kind=leftanti (HistoricalActivity
    | summarize by ObjectId, UserAgent
    )
    on UserAgent, ObjectId
// Remove known IP addresses to limit false positives
//| join kind=leftanti (HistoricalActivity | summarize by IPAddress) on IPAddress

Building a list of know User agents per entity and comparing those to current data maybe helps to identify if something is off. The false positive rate can be medium to high depending on how much your environment changes. Removing known “good” IP addresses can help mitigate this quite a bit.

New sensitive role used

Using the awesome Microsoft Graph classification information provided by Thomas Naunheim, it’s super easy to get all Graph requests that use a API role assigned to the tier level ControlPlane for the first time.

let SensitiveMsGraphPermissions = externaldata(AppId: guid, AppRoleId: guid, AppRoleDisplayName: string, Category: string, EAMTierLevelName: string, EAMTierLevelTagValue: string)["https://raw.githubusercontent.com/Cloud-Architekt/AzurePrivilegedIAM/main/Classification/Classification_AppRoles.json"] with (format='multijson')
    | where EAMTierLevelName == "ControlPlane"
    | distinct AppRoleDisplayName;
let HistoricalActivity = MicrosoftGraphActivityLogs
    | where TimeGenerated between (ago(30d) .. startofday(now()))
    | where Roles has_any (SensitiveMsGraphPermissions)
    | extend ObjectId = iff(isempty(UserId), ServicePrincipalId, UserId)
    | extend ObjectType = iff(isempty(UserId), "ServicePrincipalId", "UserId")
    | summarize by ObjectId;
MicrosoftGraphActivityLogs
| where TimeGenerated between (startofday(now()) .. now())
| extend ObjectId = iff(isempty(UserId), ServicePrincipalId, UserId)
| where Roles has_any (SensitiveMsGraphPermissions)
// Remove known object ids
| where ObjectId !in (HistoricalActivity)

Of course you can also adjust this query to return all the service principals that would be classified as Control Plane assets and use set_intersect to identify which role permissions are the critical ones.

let SensitiveMsGraphPermissions = externaldata(AppId: guid, AppRoleId: guid, AppRoleDisplayName: string, Category: string, EAMTierLevelName: string, EAMTierLevelTagValue: string)["https://raw.githubusercontent.com/Cloud-Architekt/AzurePrivilegedIAM/main/Classification/Classification_AppRoles.json"] with (format='multijson')
    | where EAMTierLevelName == "ControlPlane"
    | distinct AppRoleDisplayName;
let ScalarRoles = toscalar(SensitiveMsGraphPermissions
    | summarize AppRoleDisplayName=make_set(AppRoleDisplayName, 1000));
MicrosoftGraphActivityLogs
| where TimeGenerated > ago(30d)
| where Roles has_any (SensitiveMsGraphPermissions)
| extend Roles = split(Roles, ' ')
| extend ControlPlaneRoles=set_intersect(todynamic(Roles), ScalarRoles)
| extend ObjectId = iff(isempty(UserId), ServicePrincipalId, UserId)
| extend ObjectType = iff(isempty(UserId), "ServicePrincipalId", "UserId")
| summarize by ObjectId, ObjectType, tostring(ControlPlaneRoles)
| join kind=leftouter (_GetWatchlist('WorkloadIdentityInfo')
    | project-away ['_DTItemId'], LastUpdatedTimeUTC, SearchKey
    | project-rename ObjectId=ServicePrincipalObjectId
    | extend ObjectId = tostring(ObjectId))
    on ObjectId
| join kind=leftouter (IdentityInfo
    | where TimeGenerated > ago(14d)
    | summarize arg_max(TimeGenerated, *) by AccountObjectId
    | project-rename ObjectId=AccountObjectId)
    on ObjectId
| project ObjectType, AppDisplayName, AccountUPN, ControlPlaneRoles

/en/detect-threats-microsoft-graph-logs-part-2/images/AppUsedSensitiveRoles.png — set_intersect is a powerful tool to compare data and only show the interesting bits, like control plane roles used.

Audit data

The last idea is paired with a funny coincidence. I was using the Entra ID audit logs and correlated them with the graph data. This got me thinking: Is there a source for this information already?

A database where you can see which Graph call will result in which audit event and the other way around?

And it seemed that I wasn’t the only one thinking about this at the time. On X/Twitter Andy Robbins (@_wald0) had exactly this question.

/en/detect-threats-microsoft-graph-logs-part-2/images/_wald0.png — Great minds think alike ;)

And now I can answer this question with a definitive: “Some of it”.

In the EntraIDAuditLogToMicrosoftGraph repository, you will find a nice list, either as CSV or as JSON, which contains this data.

https://github.com/f-bader/EntraIDAuditLogToMicrosoftGraph

The data is based on the Graph logs and the following query is the source of it. If you want to contribute feel free to run this query in your environment, export the results and create a pull request with your file added to the source folder.

Hopefully with a collective effort we will get a good coverage of the data.

AuditLogs
| where TimeGenerated > ago(90d)
| join kind=inner (
    MicrosoftGraphActivityLogs
    // Ignore GET requests
    | where RequestMethod != 'GET'
    )
    on $left.CorrelationId == $right.ClientRequestId
// Remove PII information and normalize the RequestURI
| extend NormalizedRequestUri = replace_regex(RequestUri, @'[0-9a-fA-F]{8}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{12}', @'<UUID>')
| extend NormalizedRequestUri = replace_regex(NormalizedRequestUri, @'[a-zA-Z0-9_-]{41,65}', @'<ID>')
| extend NormalizedRequestUri = replace_regex(NormalizedRequestUri, @'\d+$', @'<ID>')
| extend NormalizedRequestUri = replace_regex(NormalizedRequestUri, @'\/+', @'/')
| extend NormalizedRequestUri = replace_regex(NormalizedRequestUri, @'https:\/', @'https://')
| extend NormalizedRequestUri = replace_regex(NormalizedRequestUri, @'%23EXT%23', @'')
| extend NormalizedRequestUri = replace_regex(NormalizedRequestUri, @'\/[a-zA-Z0-9+_.\-]+@[a-zA-Z0-9.]+\/', @'/<UPN>/')
| extend NormalizedRequestUri = replace_regex(NormalizedRequestUri, @'^\/<UUID>', @'')
| extend NormalizedRequestUri = replace_regex(NormalizedRequestUri, @'\?.*$', @'')
// Remove POST requests to the batch endpoint
| where not ( NormalizedRequestUri matches regex @"https:\/\/graph.microsoft.com\/(v1\.0|beta)/\$batch" )
| summarize by OperationName, NormalizedRequestUri, RequestMethod, OperationVersion
| project-rename
    MicrosoftGraphRequestUri = NormalizedRequestUri,
    EntraIDOperationName = OperationName,
    EntraIDOperationVersion = OperationVersion
| sort by EntraIDOperationName asc

Missing data

One big caveat of this query is, it can only map what’s there or has the correct correlation id. And in many cases this seems not to be the case. In my environment I cannot map all the audit data to a graph call. While ClientRequestId is the best anchor I found in the data it’s not perfect. RequestId and OperationId are only a subset of the results from ClientRequestId so I don’t use them anymore.

My best guess on the missing data is, that the client did not use the Microsoft Graph to do the change, but used other administrative APIs.

AuditLogs
| where TimeGenerated > ago(14d)
| join kind=leftanti (MicrosoftGraphActivityLogs) on $left.CorrelationId == $right.ClientRequestId
| count

/en/detect-threats-microsoft-graph-logs-part-2/images/MissingLogData.png — 73 audit events in a test tenant, in production this number will most likely be higher.

Community resources

Since I wrote the initial draft for this post the amazing security community has come up with more and more use cases for this log type. Here are a few of those

Conclusion

MicrosoftGraphActivityLogs is an excellent source of data that can be used to analyze the usage in any environment. There are some caveats to be aware of, mostly that other APIs are not part of this log and therefore there might some gaps that you as an defender should be aware of.

Overall I would recommend everybody to invest the time to identify additional use cases, build upon the provided ones and share the results with the community.