I have deployed a FastAPI application on Azure App Service. The application processes large tender/RFP documents using Azure Document Intelligence and then sends the extracted content to an LLM (GPT-4o) for response generation. Current workflow: User uploads a tender document. Azure Document Intelligence extracts the text. The document is split into chunks (~10,000 tokens each). Chunk-level LLM processing is performed. An aggregation prompt combines the chunk outputs. The final response ca...

How can I avoid 504 Gateway Timeout when an LLM response takes more than 240 seconds in Azure App Service?

asked 8 hours ago by @qa-prs2dupqpocneoy2juw5 0 rep · 20 views

I have deployed a FastAPI application on Azure App Service. The application processes large tender/RFP documents using Azure Document Intelligence and then sends the extracted content to an LLM (GPT-4o) for response generation.

Current workflow:

User uploads a tender document.
Azure Document Intelligence extracts the text.
The document is split into chunks (~10,000 tokens each).
Chunk-level LLM processing is performed.
An aggregation prompt combines the chunk outputs.
The final response can be around 16,000 output tokens.

The issue is that the complete processing sometimes takes more than 240 seconds, and the client receives a 504 Gateway Timeout from Azure App Service.

Constraints:

Azure App Service deployment
GPT-4o model
Large prompts and large outputs
Current implementation is mostly sequential
The request is processed synchronously and the API waits for the final LLM response before returning

Questions:

What are the recommended architectural patterns for handling long-running LLM workloads in Azure App Service?
Is moving the LLM processing to a background job (Azure Functions, WebJobs, Service Bus, etc.) the preferred solution?
Would streaming responses prevent the gateway timeout, or does the backend request still need to complete within the App Service timeout limit?
What are the best practices for reducing end-to-end latency when processing large documents with GPT-4o?
Has anyone implemented asynchronous job-based processing for similar RFP/tender document generation workflows?

Any guidance on avoiding gateway timeouts and designing a scalable architecture for long-running LLM requests would be appreciated.

How can I avoid 504 Gateway Timeout when an LLM response takes more than 240 seconds in Azure App Service?

Comments on this question (0)

0 answers

Your answer