OpenAI Proxy Server
A local, fast, and lightweight OpenAI-compatible server to call 100+ LLM APIs.
Usage
pip install litellm
$ litellm --model ollama/codellama
#INFO: Ollama running on http://0.0.0.0:8000
Test
In a new shell, run:
$ litellm --test
Replace openai base
import openai
openai.api_base = "http://0.0.0.0:8000"
print(openai.ChatCompletion.create(model="test", messages=[{"role":"user", "content":"Hey!"}]))
Other supported models:
- VLLM
- OpenAI Compatible Server
- Huggingface
- Anthropic
- TogetherAI
- Replicate
- Petals
- Palm
- Azure OpenAI
- AI21
- Cohere
$ litellm --model vllm/facebook/opt-125m
$ litellm --model openai/<model_name> --api_base <your-api-base>
$ export HUGGINGFACE_API_KEY=my-api-key #[OPTIONAL]
$ litellm --model claude-instant-1
$ export ANTHROPIC_API_KEY=my-api-key
$ litellm --model claude-instant-1
$ export TOGETHERAI_API_KEY=my-api-key
$ litellm --model together_ai/lmsys/vicuna-13b-v1.5-16k
$ export REPLICATE_API_KEY=my-api-key
$ litellm \
--model replicate/meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3
$ litellm --model petals/meta-llama/Llama-2-70b-chat-hf
$ export PALM_API_KEY=my-palm-key
$ litellm --model palm/chat-bison
$ export AZURE_API_KEY=my-api-key
$ export AZURE_API_BASE=my-api-base
$ litellm --model azure/my-deployment-name
$ export AI21_API_KEY=my-api-key
$ litellm --model j2-light
$ export COHERE_API_KEY=my-api-key
$ litellm --model command-nightly
[Tutorial]: Use with Continue-Dev/Aider/AutoGen/Langroid/etc.
Here's how to use the proxy to test codellama/mistral/etc. models for different github repos
pip install litellm
$ ollama pull codellama # OUR Local CodeLlama
$ litellm --model ollama/codellama --temperature 0.3 --max_tokens 2048
Implementation for different repos
- ContinueDev
- Aider
- AutoGen
- Langroid
- GPT-Pilot
- guidance
Continue-Dev brings ChatGPT to VSCode. See how to install it here.
In the config.py set this as your default model.
default=OpenAI(
api_key="IGNORED",
model="fake-model-name",
context_length=2048, # customize if needed for your model
api_base="http://localhost:8000" # your proxy server url
),
Credits @vividfog for this tutorial.
$ pip install aider
$ aider --openai-api-base http://0.0.0.0:8000 --openai-api-key fake-key
pip install pyautogen
from autogen import AssistantAgent, UserProxyAgent, oai
config_list=[
{
"model": "my-fake-model",
"api_base": "http://localhost:8000", #litellm compatible endpoint
"api_type": "open_ai",
"api_key": "NULL", # just a placeholder
}
]
response = oai.Completion.create(config_list=config_list, prompt="Hi")
print(response) # works fine
llm_config={
"config_list": config_list,
}
assistant = AssistantAgent("assistant", llm_config=llm_config)
user_proxy = UserProxyAgent("user_proxy")
user_proxy.initiate_chat(assistant, message="Plot a chart of META and TESLA stock price change YTD.", config_list=config_list)
Credits @victordibia for this tutorial.
pip install langroid
from langroid.language_models.openai_gpt import OpenAIGPTConfig, OpenAIGPT
# configure the LLM
my_llm_config = OpenAIGPTConfig(
#format: "local/[URL where LiteLLM proxy is listening]
chat_model="local/localhost:8000",
chat_context_length=2048, # adjust based on model
)
# create llm, one-off interaction
llm = OpenAIGPT(my_llm_config)
response = mdl.chat("What is the capital of China?", max_tokens=50)
# Create an Agent with this LLM, wrap it in a Task, and
# run it as an interactive chat app:
from langroid.agent.base import ChatAgent, ChatAgentConfig
from langroid.agent.task import Task
agent_config = ChatAgentConfig(llm=my_llm_config, name="my-llm-agent")
agent = ChatAgent(agent_config)
task = Task(agent, name="my-llm-task")
task.run()
Credits @pchalasani and Langroid for this tutorial.
In your .env set the openai endpoint to your local server.
OPENAI_ENDPOINT=http://0.0.0.0:8000
OPENAI_API_KEY=my-fake-key
NOTE: Guidance sends additional params like stop_sequences
which can cause some models to fail if they don't support it.
Fix: Start your proxy using the --drop_params
flag
litellm --model ollama/codellama --temperature 0.3 --max_tokens 2048 --drop_params
import guidance
# set api_base to your proxy
# set api_key to anything
gpt4 = guidance.llms.OpenAI("gpt-4", api_base="http://0.0.0.0:8000", api_key="anything")
experts = guidance('''
{{#system~}}
You are a helpful and terse assistant.
{{~/system}}
{{#user~}}
I want a response to the following question:
{{query}}
Name 3 world-class experts (past or present) who would be great at answering this?
Don't answer the question yet.
{{~/user}}
{{#assistant~}}
{{gen 'expert_names' temperature=0 max_tokens=300}}
{{~/assistant}}
''', llm=gpt4)
result = experts(query='How can I be more productive?')
print(result)
Contribute Using this server with a project? Contribute your tutorial here!
Advanced
Multiple LLMs
$ litellm
#INFO: litellm proxy running on http://0.0.0.0:8000
Send a request to your proxy
import openai
openai.api_key = "any-string-here"
openai.api_base = "http://0.0.0.0:8080" # your proxy url
# call gpt-3.5-turbo
response = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hey"}])
print(response)
# call ollama/llama2
response = openai.ChatCompletion.create(model="ollama/llama2", messages=[{"role": "user", "content": "Hey"}])
print(response)
Logs
$ litellm --logs
This will return the most recent log (the call that went to the LLM API + the received response).
All logs are saved to a file called api_logs.json
in the current directory.
Deploy Proxy
- Self-Hosted
- Ollama/OpenAI Docker
- LiteLLM-Hosted
Step 1: Clone the repo
git clone https://github.com/BerriAI/litellm.git
Step 2: Modify secrets_template.toml
Add your api keys / configure default model:
[keys]
OPENAI_API_KEY="sk-..."
[general]
default_model = "gpt-3.5-turbo"
Step 3: Deploy Proxy
docker build -t litellm . && docker run -p 8000:8000 litellm
It works for models like Mistral, Llama2, CodeLlama, etc. (any model supported by Ollama)
usage
docker run --name ollama litellm/ollama
More details 👉 https://hub.docker.com/r/litellm/ollama
Deploy the proxy to https://api.litellm.ai
$ export ANTHROPIC_API_KEY=sk-ant-api03-1..
$ litellm --model claude-instant-1 --deploy
#INFO: Uvicorn running on https://api.litellm.ai/44508ad4
This will host a ChatCompletions API at: https://api.litellm.ai/44508ad4
Configure Proxy
If you need to:
- save API keys
- set litellm params (e.g. drop unmapped params, set fallback models, etc.)
- set model-specific params (max tokens, temperature, api base, prompt template)
You can do set these just for that session (via cli), or persist these across restarts (via config file).
Save API Keys
$ litellm --api_key OPENAI_API_KEY=sk-...
LiteLLM will save this to a locally stored config file, and persist this across sessions.
LiteLLM Proxy supports all litellm supported api keys. To add keys for a specific provider, check this list:
- Huggingface
- Anthropic
- PerplexityAI
- TogetherAI
- Replicate
- Bedrock
- Palm
- Azure OpenAI
- AI21
- Cohere
$ litellm --add_key HUGGINGFACE_API_KEY=my-api-key #[OPTIONAL]
$ litellm --add_key ANTHROPIC_API_KEY=my-api-key
$ litellm --add_key PERPLEXITYAI_API_KEY=my-api-key
$ litellm --add_key TOGETHERAI_API_KEY=my-api-key
$ litellm --add_key REPLICATE_API_KEY=my-api-key
$ litellm --add_key AWS_ACCESS_KEY_ID=my-key-id
$ litellm --add_key AWS_SECRET_ACCESS_KEY=my-secret-access-key
$ litellm --add_key PALM_API_KEY=my-palm-key
$ litellm --add_key AZURE_API_KEY=my-api-key
$ litellm --add_key AZURE_API_BASE=my-api-base
$ litellm --add_key AI21_API_KEY=my-api-key
$ litellm --add_key COHERE_API_KEY=my-api-key
E.g.: Set api base, max tokens and temperature.
For that session:
litellm --model ollama/llama2 \
--api_base http://localhost:11434 \
--max_tokens 250 \
--temperature 0.5
# OpenAI-compatible server running on http://0.0.0.0:8000
Across restarts:
Create a file called litellm_config.toml
and paste this in there:
[model."ollama/llama2"] # run via `litellm --model ollama/llama2`
max_tokens = 250 # set max tokens for the model
temperature = 0.5 # set temperature for the model
api_base = "http://localhost:11434" # set a custom api base for the model
Save it to the proxy with:
$ litellm --config -f ./litellm_config.toml
LiteLLM will save a copy of this file in it's package, so it can persist these settings across restarts.
Complete Config File 🔥 [Tutorial] modify a model prompt on the proxy
Track Costs
By default litellm proxy writes cost logs to litellm/proxy/costs.json
How can the proxy be better? Let us know here
{
"Oct-12-2023": {
"claude-2": {
"cost": 0.02365918,
"num_requests": 1
}
}
}
You can view costs on the cli using
litellm --cost
Support/ talk with founders
- Schedule Demo 👋
- Community Discord 💭
- Our numbers 📞 +1 (770) 8783-106 / +1 (412) 618-6238
- Our emails ✉️ ishaan@berri.ai / krrish@berri.ai