Access Risks of Non-Human Identities

New strategies are urgently needed to combat threats exacerbated by the Gen AI revolution

Atul Tulshibagwale, CTO, SGNL
July 17, 2024
Follow us on

The Gen AI revolution and the increase in the number of automated processes has created an even greater emphasis on the access risks of non-human identities (aka machine identities). Organizations are struggling to manage their inventory of NHIs and detecting their malicious use. As the uses of AI mature from there being a proliferation of chatbots to truly empowered agents, the problems associated with NHIs are set to explode exponentially. Attackers are looking forward to these new opportunities to create havoc for you, and you must be prepared to stop them.

Aren’t these just identities?

To answer this question, let’s first look at how we secure access from human (user identities).

Technology to secure user access has evolved over time, but the latest technology emphasizes the following:

  1. Ensure strong, phishing resistant authentication at the time of login.
  2. Ensure a tight binding to the user’s point of use such as: Device, location, and time of day.
  3. Ensure zero standing privilege: Dynamically grant and remove access to data that is justified by the current task the user is expected to perform.
  4. Evaluate access continuously based on any changes to authentication, device, usage and environmental properties.

(Talk to SGNL if you would like to know how to achieve all this)

But many of these considerations are dramatically different when it comes to non-human identities:

  • Strong authentication for non-human identities boils down to storing secrets as noted below.
  • Binding environmental factors to access from non-human identities is trickier and sometimes not effective.
  • Non-human identities, by design, may need access to vast swaths of customer data.

As a result, the strategies to secure such identities, limit their access, and detect their abuse, all need to be very different for human and non-human identities.

What are non-human identities?

Ultimately, all non-human identities are asserted using secrets. For example, when one program calls another program, the callee knows who the caller is by recognizing a secret that is in possession of the caller and mapping that secret to something that was issued to a specific non-human entity.

There is a variety in the terminology about what is a “machine account” or a “service account” versus an “identity”, versus a specific secret “key” that binds to that identity. In this article, I use the term account and identity interchangeably, so while a machine may have multiple identities associated with it, for the purposes of this article, let’s assume there is only one identity associated with any specific non-human entity such as a machine or workload. There may be multiple secrets that assert the same identity, but that doesn’t change the fact that non-human identities are always asserted using secrets.

These secrets may be credentials (e.g., passwords or private keys) or they may be tokens derived from a trust exchange that occurred sometime in the past. Examples of such a trust exchange include setting up a “client secret” while registering a new app in a service, or using OAuth to authenticate the client using the “client-credentials flow” and obtaining an access token and a refresh token, which are then used for subsequent accesses.

This is not unique to non-human identities; even user identities are ultimately asserted using secrets, typically stored as a cookie in the browser or an access token in a mobile app. However, the non-human identity secrets are typically longer lasting than user identity secrets. Even if you are using ephemeral access tokens derived from public-private key-based authentication, those private keys are long-lived secrets.

Unlike user identities though, non-human identity secrets may be automatically unlocked. In a user identity, the user is often required to unlock it using a secret that is not stored (e.g., a password), a biometric factor, or both. For human identities, we generally refer to this as multi-factor authentication. As you can imagine, the fact that machine identity secrets can be automatically unlocked has drastic consequences on how these secrets can be protected and how the identities they assert can be trusted.

The fact that a callee’s identification of the caller entirely depends upon the caller’s possession of a secret is the root cause of many of the non-human identity attacks. By simply copying the secret, the attacker appears no different from the legitimate possessor of the secret. Current standard protocols lack binding that limits such access to, say, the same device that the secret was issued to.

Non-human identity types

Non-human identities can be broken into three broad categories:

  • Inter-organizational: These are issued to supplier, partner, or customer organizations in order to access your services via published APIs. These are used to assert the identity of the calling organization. Let’s call these “API Principals”. An “organization” here does not necessarily mean a business entity such as a company; it could just be an administrative or functional unit within a company.
  • Internal, infrastructure bound: These are often used in establishing trust between communicating components such as workloads such as microservices. They are used from specific instances of workloads that map to infrastructure resources such as VMs, containers, or servers. These are used to assert the identity of the calling service or workload. Let’s call these “Service IDs
  • Internal, application bound: These are secrets inside applications that may reside anywhere, even on end-user devices, (e.g. refresh tokens or access tokens in OAuth) which are used to assert the identity of the calling application. Let’s call these “App IDs

Securing non-human identities

As I mentioned earlier, since secrets that assert non-human identities are automatically unlocked, it is critical to put strong controls around how they are issued, stored and unlocked, and how they may be trusted when asserted.

Control the issuance

The best way to keep a secret is to not have one. Being very careful in issuing secrets that identify non-human identities—even if they are internal Service IDs or App IDs—is very important. It is easy to lose track of these identities; but given the risks, you do not want to lose track of them.

Corral the secrets

If you do not have such processes to limit issuance in place and there are a lot of secrets “out there”, institute a program to phase out old secrets and control the issuance of new ones.

Unknown secrets that have powerful access capabilities are potentially the biggest threat vectors in non-human identities, so the first step is to recognize the severity of the problem. The solution requires an organization-wide commitment and instituting a phased process to achieve compliance. The steps in this process include:

  • Discover such secrets and maintain an inventory
  • Control the issuance of new secrets
  • Phase out any secrets that were issued prior to the process being instituted

Identifying levels of compliance and including target achievement levels for specific teams depending on their exposure can help an organization achieve these goals.

Protect at Runtime

One best practice of managing secrets that I have seen is that a CI/CD pipeline inserts the secrets into a configuration right before deployment into the target environment (development, staging, production, etc.). That way the secrets are never exposed to any individual developer, yet available to the runtime code as required. The secrets are generated and stored in a secure database, which cannot be accessed by anything other than the workflow that deploys the code.

There are other best practices that can be used to protect secrets at runtime. The critical point is to ensure the secrets are not visible to humans including administrators and developers. This ensures that even if those identities are compromised (or abused), the secrets are safe.

Short-lived secrets

A common strategy used to improve security of API Principals is to use them to produce short-lived secrets that are actually used in accessing APIs. Using the runtime protection strategy outlined above can prevent illegal copying. Rotating secrets even if they are Service IDs or AppIDs also makes stolen or unauthorized copies of those secrets less useful. In systems like SPIRE, which implements SPIFFE, Service IDs can be rotated automatically to achieve the same effect.

If you are currently using long-lived access tokens or credentials, consider upgrading to using short-lived secrets. If you use common infrastructure components such as API gateways, they will have features that will help you adopt short-lived secrets.

Manage access

There are multiple dimensions of how one can limit access and the potential for abuse from non-human identities.

  • Contextual access restrictions: Ensuring there is a good binding between the source of the non-human identity and the secret itself, such as specific IP addresses, or specific times of the day / week when such access is allowed. Detecting anomalous use can also help you flag illegal use as an indicator of compromise, and take remedial action.

  • Reduce trust: If you are blindly trusting non-human identities to enable a caller to perform any action, you end up exposing yourself to devastating attacks because the caller can pretty much access any data. This trust can be reduced in a few ways:

    • Limit usage scope: limiting permitted OAuth scopes effectively for each such non-human identity, and newer technologies like transaction tokens to limit the access of Service IDs.
    • Limit app scope: Limiting the type of applications that allow such non-human identities is also a good way to prevent attacks. For example, there’s no reason why a SQL Server identity should be allowed RDP access.
    • Limit the breadth of use: Using a new standard called Transaction Tokens, you can ensure that only a few services obtain the token that can give broad access. Not exposing the original token presented by a caller drastically reduces the attack surface for the NHI. Transaction Tokens are short-lived signed tokens that have far more restrictive scope and tight binding to a specific call context so that they cannot be reused or abused for anything other than their intended purpose.

    While user identities can operate on a “zero standing privilege” principle (which SGNL offers), non-human identities are issued with specific privileges in mind. The important thing in managing access from such entities is to ensure the scope of the actions they are able to take is restricted to match the intent with which the identity was issued.

  • Propagate detection: If one application or service in your infrastructure detects anomalous behavior indicating abuse of a secret, take immediate action to notify other parts of your environment to kill use of that secret. Using protocols such as CAEP can help propagate such detection.

Solutions

Organizations need solutions that track inventory and manage the lifecycle for non-human identities. On the usage side, they need runtime protections, which manage their security. Once in use, they also need solutions to manage access effectively.

The industry as a whole needs standards that communicate risks, such as SSF and CAEP. Standards such as Transaction Tokens, SPIFFE and SPIRE are required to be able to reduce reliance on long-lived secrets. Other standards like DPoP provide contextual restrictions (e.g., proof of possession of a private key) when issuing new secrets (e.g., access tokens).

On a personal note, I decided to write this blog after thinking about the problem and the evolution of access management. I work at SGNL as the CTO, where we have been focused on helping organizations move toward zero-standing access. We built SGNL with user accounts and non-human identities in mind.

By their intended use cases, non-human identities tend to have even more standing access to critical systems and data. As we talked to customers, we realized that the mechanisms for contextual and conditional access for non-human identities are in many ways the same as doing this for human identities, but the specific policies could be very different.

If you are looking to solve non-human / machine identity access management, please reach out. I would be happy to have a conversation.

Best practices and the latest security trends delivered to your inbox