LLmHub API Rate Limit

Docs

Official API documentation for LLmHub API (api.llmhub.dev)

LLMHub API does NOT constrain user's rate limit. We will try our best to serve every request.

However, please note that when our servers are under high traffic pressure, your requests may take some time to receive a response from the server. During this period, your HTTP request will remain connected, and you may continuously receive content in the following formats:

Non-streaming requests: Continuously return empty lines
Streaming requests: Continuously return SSE keep-alive comments (: keep-alive)

These contents do not affect the parsing of the JSON body by the OpenAI SDK. If you are parsing the HTTP responses yourself, please ensure to handle these empty lines or comments appropriately.

If the request is still not completed after 30 minutes, the server will close the connection.

Intelligent Routing and Performance

When using model="automatic", LLMHub's routing system works to optimize not just for quality but also for availability. During high traffic periods, our system may route requests to models with lower queue times while still maintaining quality requirements.

Best Practices

For time-sensitive applications, we recommend:

Implementing appropriate timeout handling in your client code
Using streaming responses when possible to start receiving content faster
Setting up retry logic with exponential backoff for failed requests

Please contact support@llmhub.dev if you experience persistent latency issues or need guidance on optimizing your implementation.

Temperature Preferences Models, Pricing and Auto-Routing Models