The single most impactful optimization is reusing your client instance across requests. This enables HTTP connection pooling, which skips the TCP and TLS handshake on subsequent calls.
Reusing your client instance can reduce network overhead by up to 50% on consecutive requests.
When you create a new client for every request, each call must establish a fresh TCP connection and negotiate TLS — adding several tens of milliseconds depending on your location and network conditions. By reusing the client, the underlying connection stays open and subsequent requests skip this setup entirely.
from sambanova import SambaNova# Create the client onceclient = SambaNova( base_url="https://api.infercom.ai/v1", api_key="your-infercom-api-key")# Reuse it for all requestsresponse_1 = client.chat.completions.create( model="Meta-Llama-3.3-70B-Instruct", messages=[{"role": "user", "content": "Hello"}])response_2 = client.chat.completions.create( model="Meta-Llama-3.3-70B-Instruct", messages=[{"role": "user", "content": "Follow-up question"}])
Avoid creating a new client inside loops or request handlers. This forces a new TCP+TLS handshake on every call.
Copy
# Avoid this patternfor message in messages: client = SambaNova(base_url="https://api.infercom.ai/v1", api_key="...") response = client.chat.completions.create(...) # New connection each time
Both the SambaNova SDK and OpenAI SDK use httpx under the hood, which automatically manages a connection pool when you reuse the client. The default pool maintains up to 20 keep-alive connections.