Prerequisites

API Management instance with an Azure OpenAI model deployment as an API
Deployment for the following APIs:
- Chat Completion API
- Embeddings API
Configured API Management instance to use managed identity authentication to the Azure OpenAI service
An Azure Managed Redis instance with the RediSearch module enabled

Authenticate with managed identity

Let’s quickly explain how to use Managed Identity to authenticate to Azure OpenAI. Make sure that the managed identity is enabled on your API Management instance.

Assign the Cognitive Services OpenAI User role to the managed identity. Add the following inbound policy section to authenticate requests to the API by using the managed identity.

<authentication-managed-identity resource="https://cognitiveservices.azure.com" output-token-variable-name="managed-id-access-token" ignore-error="false" /> 
<set-header name="Authorization" exists-action="override"> 
    <value>@("Bearer " + (string)context.Variables["managed-id-access-token"])</value> 
</set-header>

Import an Azure OpenAI API

If you have imported an Azure OpenAI instance as an API into APIM you should see something like this:

If not, you can either import an Azure OpenAI API directly from a deployment in Microsoft Foundry or download and edit the OpenAPI specification.

Azure Managed Redis Setup

We will setup an external cache using the Azure Managed Redis instance.

Azure API Management uses a Redis connection string to connect to the cache. Because we are using Azure Managed Redis, we need to enable access key authentication for the cache and use the key as password in the connection string. Currently, we can't use Microsoft Entra authentication to connect Azure API Management to Azure Managed Redis.

The connection string will look like:

<cache-name>:10000,password=<cache-access-key>,ssl=True,abortConnect=False

Semantic Cache Configuration

Start by creating a backend for the embeddings API.

Next, let’s configure the semantic caching policies. In the Inbound section, add the azure-openai-semantic-cache-lookup policy. In the embeddings-backend-id attribute, specify the Embeddings API backend you created. In my case, the value is Embedding.

<azure-openai-semantic-cache-lookup
    score-threshold="0.15"
    embeddings-backend-id="Embedding"
    embeddings-backend-auth="system-assigned"
    ignore-system-messages="true"
    max-message-count="10">
    <vary-by>@(context.Subscription.Id)</vary-by>
</azure-openai-semantic-cache-lookup>

In the Outbound processing section for the API, add the azure-openai-semantic-cache-store policy.

<azure-openai-semantic-cache-store duration="60" />

To confirm the semantic cache is working, we can trace a test Completion or Chat Completion operation by using the test console in the portal.

Example from my trace operation:

Semantic Cache for LLM responses

Prerequisites

Authenticate with managed identity

Import an Azure OpenAI API

Azure Managed Redis Setup

Semantic Cache Configuration

Comments

The Azure Behind the Madness

Automated secret expiration alerts with Azure Logic Apps

More from this blog

The Secret Behind Better RAG Results: Hybrid Search in Azure AI Search

Book Review: Design Multi-Agent AI Systems Using MCP and A2A

Agentic Architectural Patterns for Building Multi-Agent Systems - Book Review

Using Logic Apps with Foundry Agents

Microsoft Foundry Agents

Command Palette

Prerequisites

Authenticate with managed identity

Import an Azure OpenAI API

Azure Managed Redis Setup

Semantic Cache Configuration

Comments

The Azure Behind the Madness

Automated secret expiration alerts with Azure Logic Apps

More from this blog