The Secret Behind Better RAG Results: Hybrid Search in Azure AI Search
Inside Azure AI Search: Hybrid Retrieval with Vector and Semantic Ranking

Expert insights on Azure AI architecture and implementation. Real-world solutions for building intelligent enterprise systems.
What is hybrid search?
Hybrid search combines full-text and vector queries into a single request, running against an index that holds both plain-text content and generated embeddings.
Vector search excels at surfacing information that is conceptually related to your query, even when no exact keywords match in the inverted index. Full-text search, on the other hand, offers precision, and can be paired with optional semantic ranking to further improve the quality of results.
Prerequisites
A Cosmos database with sample documents to index
An Azure AI Search service
A search index containing searchable vector and nonvector fields.
Search service must have the semantic ranker enabled
Overview of the services
The Cosmos database has five documents with the following structure:
{
"id": "doc-005",
"title": "Azure OpenAI Integration Patterns",
"category": "AI Architecture",
"tags": [
"OpenAI",
"Azure",
"Integration"
],
"content": "Azure OpenAI can be integrated with backend systems using APIs, event-driven architectures, or orchestration frameworks like Semantic Kernel. It enables intelligent applications with natural language capabilities.",
"summary": "Patterns for integrating Azure OpenAI into applications.",
"author": "Michael Green",
"createdDate": "2025-05-01T16:20:00Z",
"region": "EU",
"accessLevel": "public"
}
The Azure AI Search contains an index with field configuration:
Notice the text_vecor field used for the vector query.
Finally, the semantic configuration defined:
Rise of the working example
Let’s walk through how to build this in C#. Here’s the code:
using Azure;
using Azure.Identity;
using Azure.Search.Documents;
using Azure.Search.Documents.Models;
using Azure.AI.OpenAI;
string searchEndpoint = Environment.GetEnvironmentVariable("AZURE_SEARCH_ENDPOINT")!;
string indexName = "rag-hybrid";
string openAiEndpoint = Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!;
string deploymentName = "text-embedding-ada-itt";
var azureOpenAIClient = new AzureOpenAIClient(
new Uri(openAiEndpoint),
new DefaultAzureCredential());
// Create embedding client
var embeddingClient = azureOpenAIClient.GetEmbeddingClient(deploymentName);
string query = "how to integrate RAG";
// Generate embedding
var embeddingResponse = await embeddingClient.GenerateEmbeddingAsync(query);
// Convert to float[]
var queryVector = embeddingResponse.Value.ToFloats();
string searchKey = Environment.GetEnvironmentVariable("AZURE_SEARCH_KEY")!;
SearchClient client = new(
new Uri(searchEndpoint),
indexName,
new AzureKeyCredential(searchKey));
SearchResults<SearchDocument> results = await client.SearchAsync<SearchDocument>(
query,
new SearchOptions
{
VectorSearch = new()
{
Queries =
{
new VectorizedQuery(queryVector)
{
Fields = { "text_vector" },
Exhaustive = true,
KNearestNeighborsCount = 5
},
}
},
QueryType = SearchQueryType.Semantic,
SemanticSearch = new SemanticSearchOptions
{
SemanticConfigurationName = "rag-hybrid-semantic-configuration"
},
Select = { "summary", "chunk", "category", "author" },
Size = 5
});
await foreach (SearchResult<SearchDocument> result in results.GetResultsAsync())
{
Console.WriteLine($"{result.Document["category"]} -> {result.Document["summary"]} \n" +
$"{result.Document["chunk"]}");
Console.WriteLine();
}
The core of the example is the SearchAsync call, which sends a query that does four things simultaneously.
First, the query string is passed as the first argument to SearchAsync, which triggers traditional keyword search.
Second, the VectorizedQuery uses the float vector generated earlier to search the text_vector field in the index. It also finds the 5 nearest neighbors.
Third, the results from keyword search and vector search are merged, producing a single ranked list.
Fourth, the QueryType is set to Semantic, which activates Azure's semantic re-ranker. This re-ranks the fused results using a language model that understands context, not just keywords. The semantic configuration defines how the re-ranker interprets fields like title and content, guiding how relevance is computed.
Let's observe the results returned by running two different queries. The first query is string query = "how to integrate RAG"; while the second is string query = "written by Michael Green";





