– Fuck You, Present Me The Immediate.
Background
There are various libraries that goal to make the output of your LLMs higher by re-writing or developing the immediate for you. These libraries purport to make the output of your LLMs:
A standard theme amongst some of those instruments is that they encourage customers to disintermediate themselves from prompting.
DSPy: “This can be a new paradigm through which LMs and their prompts fade into the background …. you’ll be able to compile your program once more DSPy will create new efficient prompts”
guidance “steering is a programming paradigm that provides superior management and effectivity in comparison with standard prompting …”
Even when instruments don’t discourage prompting, I’ve typically discovered it tough to retrieve the ultimate immediate(s) these instruments ship to the language mannequin. The prompts despatched by these instruments to the LLM is a pure language description of what these instruments are doing, and is the quickest technique to perceive how they work. Moreover, some instruments have dense terminology to explain inside constructs which may additional obfuscate what they’re doing.
For causes I’ll clarify beneath, I believe most individuals would profit from the next mindset:
On this weblog publish, I’ll present you how one can intercept API calls w/prompts for any instrument, with out having to fumble by way of docs or learn supply code. I’ll present you methods to setup and function mitmproxy with examples from the LLM the instruments I beforehand talked about.
Motivation: Decrease unintended complexity
Earlier than adopting an abstraction, its essential to contemplate the risks of taking up accidental complexity. This hazard is acute for LLM abstractions relative to programming abstractions. With LLM abstractions, we regularly power the person to regress in the direction of write coding as a substitute of conversing with the AI in pure language, which may run counter to the aim of LLMs:
Programming abstraction -> a human-like language you should use to translate your job into machine code
LLM abstraction -> an unintelligible framework you should use to translate your job into human language
— Hamel Husain (@HamelHusain) February 5, 2024
Whereas this can be a cheeky remark, it’s price protecting this in thoughts whereas evaluating instruments. There are two major forms of automation that instruments present:
- Interleaving code and LLMs: Expressing this automation is usually greatest accomplished by way of code, since code have to be run to hold out the duty. Examples embody routing, executing capabilities, retries, chaining, and so forth.
- Re-Writing and developing prompts: Expressing your intent is usually greatest accomplished by way of pure language. Nevertheless, there are exceptions! For instance, it’s handy to specific a perform definition or schema from code as a substitute of pure language.
Many frameworks supply each forms of automation. Nevertheless, going too far with the second sort can have unfavourable penalties. Seeing the immediate permits you resolve:
- Is that this framework actually needed?
- Ought to I simply steal the ultimate immediate (a string) and jettison the framework?
- Can we write a greater immediate than this (shorter, aligned together with your intent, and so forth)?
- Is that this the most effective method (do the # of API calls appear applicable)?
In my expertise, seeing the prompts and API calls are important to creating knowledgeable choices.
Intercepting LLM API calls
There are various potential methods to intercept LLM API calls, resembling monkey patching supply code or discovering a user-facing possibility. I’ve discovered that these approaches take far an excessive amount of time for the reason that high quality of supply code and documentation can fluctuate tremendously. In any case, I simply need to see API calls with out worrying about how the code works!
A framework agnostic technique to see API calls is to setup a proxy that logs your outgoing API requests. That is simple to do with mitmproxy, an free, open-source HTTPS proxy.
Setting Up mitmproxy
That is an opinionated technique to setup mitmproxy
that’s beginner-friendly for our meant functions:
-
Comply with the set up directions on the website
-
Begin the interactive UI by operating
mitmweb
within the terminal. Take note of the url of the interactive UI within the logs which can look one thing like this:Internet server listening at http://127.0.0.1:8081/
-
Subsequent, it is advisable configure your system (i.e. your laptop computer) to route all visitors by way of
mitproxy
, which listens onhttp://localhost:8080
. Per the documentation:We advocate to easily search the online on methods to configure an HTTP proxy on your system. Some working system have a worldwide settings, some browser have their very own, different functions use atmosphere variables, and so forth.
In my case, A google search for “set proxy for macos” returned these outcomes:
select Apple menu > System Settings, click on Community within the sidebar, click on a community service on the suitable, click on Particulars, then click on Proxies.
I then insert
localhost
and8080
within the following locations within the UI: -
Subsequent, navigate to http://mitm.it and it provides you with directions on methods to set up the mitmproxy Certificates Authority (CA), which you’ll need for intercepting HTTPS requests. (You can even do that manually here.) Additionally, be aware of the situation of the CA file as we are going to reference it later.
-
You may take a look at that all the pieces works by shopping to an internet site like https://mitmproxy.org/, and seeing the corresponding output within the mtimweb UI which for me is positioned at http://127.0.0.1:8081/ (have a look at the logs in your terminal to get the URL).
-
Now that you just set all the pieces up, you’ll be able to disable the proxy that you just beforehand enabled in your community. I do that on my mac by toggling the proxy buttons within the screenshot I confirmed above. It is because we need to scope the proxy to solely the python program to eradicate pointless noise.
Networking associated software program generally means that you can proxy outgoing requests by setting atmosphere variables. That is the method we are going to use to scope our proxy to particular Python applications. Nevertheless, I encourage you to play with different forms of applications to see what you discover after you’re snug!
Surroundings variables for Python
We have to set the next atmosphere variables in order that the requests
and httpx
libraries will direct visitors to the proxy and reference the CA file for HTTPS visitors:
Ensure you set these atmosphere variables earlier than operating any of the code snippets on this weblog publish.
You can do a minimal test by running the following code:
This will appear in the UI like so:
Examples
Now for the fun part, let’s run through some examples of LLM libraries and intercept their API calls!
Guardrails
Guardrails allows you specify structure and types, which it uses to validate and correct the outputs of large language models. This is a hello world example from the guardrails-ai/guardrails
README:
from pydantic import BaseModel, Field
from guardrails import Guard
import openai
class Pet(BaseModel):
pet_type: str = Field(description="Species of pet")
name: str = Field(description="a unique pet name")
prompt = """
What kind of pet should I get and what should I name it?
${gr.complete_json_suffix_v2}
"""
guard = Guard.from_pydantic(output_class=Pet, prompt=prompt)
validated_output, *rest = guard(
llm_api=openai.completions.create,
engine="gpt-3.5-turbo-instruct"
)
print(f"{validated_output}")
{
"pet_type": "dog",
"name": "Buddy
What is happening here? How is this structured output and validation working? Looking at the mitmproxy UI, I can see that the above code resulted in two LLM API calls, the first one with this prompt:
What kind of pet should I get and what should I name it?
Given below is XML that describes the information to extract from this document and the tags to extract it into.
<output>
<string name="pet_type" description="Species of pet"/>
<string name="name" description="a unique pet name"/>
</output>
ONLY return a valid JSON object (no other text is necessary), where the key of the field in JSON is the `name` attribute of the corresponding XML, and the value is of the type specified by the corresponding XML's tag. The JSON MUST conform to the XML format, including any types and format requests e.g. requests for lists, objects and specific types. Be correct and concise.
Here are examples of simple (XML, JSON) pairs that show the expected behavior:
- `<string name="foo" format="two-words lower-case" />` => `{'foo': 'example one'}`
- `<list name="bar"><string format="upper-case" /></list>` => `{"bar": ['STRING ONE', 'STRING TWO', etc.]}`
- `<object name="baz"><string name="foo" format="capitalize two-words" /><integer name="index" format="1-indexed" /></object>` => `{'baz': {'foo': 'Some String', 'index': 1}}`
Followed by another call with this prompt:
I was given the following response, which was not parseable as JSON.
"{n "pet_type": "dog",n "name": "Buddy"
Help me correct this by making it valid JSON.
Given below is XML that describes the information to extract from this document and the tags to extract it into.
<output>
<string name="pet_type" description="Species of pet"/>
<string name="name" description="a unique pet name"/>
</output>
ONLY return a valid JSON object (no other text is necessary), where the key of the field in JSON is the `name` attribute of the corresponding XML, and the value is of the type specified by the corresponding XML's tag. The JSON MUST conform to the XML format, including any types and format requests e.g. requests for lists, objects and specific types. Be correct and concise. If you are unsure anywhere, enter `null`.
Woof. That’s a whole lot of ceremony to get structured output! We learned that this library’s approach to structured output uses XML schemas (while others use function calling). It’s worth considering if you can fashion a better or simpler approach now that the magic has been lifted. Either way, we now have insight into how it works without dragging you into unnecessary complexity, which is a win.
Guidance
Guidance offers constrained generation and programming constructs for writing prompts. Let’s dive into a chat example from their tutorials:
import guidance
gpt35 = guidance.models.OpenAI("gpt-3.5-turbo")
import re
from guidance import gen, select, system, user, assistant
@guidance
def plan_for_goal(lm, goal: str):
# This is a helper function which we will use below
def parse_best(prosandcons, options):
best = re.search(r'Best=(d+)', prosandcons)
if not best:
best = re.search(r'Best.*?(d+)', 'Best= option is 3')
if best:
best = int(best.group(1))
else:
best = 0
return options[best]
# Some general instruction to the model
with system():
lm += "You are a helpful assistant."
# Simulate a simple request from the user
# Note that we switch to using 'lm2' here, because these are intermediate steps (so we don't want to overwrite the current lm object)
with user():
lm2 = lm + f"""
I want to {goal}
Can you please generate one option for how to accomplish this?
Please make the option very short, at most one line."""
# Generate several options. Note that this means several sequential generation requests
n_options = 5
with assistant():
options = []
for i in range(n_options):
options.append((lm2 + gen(name='option', temperature=1.0, max_tokens=50))["option"])
# Have the user request pros and cons
with user():
lm2 += f"""
I want to {goal}
Can you please comment on the pros and cons of each of the following options, and then pick the best option?
---
"""
for i, opt in enumerate(options):
lm2 += f"Option {i}: {opt}n"
lm2 += f"""
---
Please discuss each option very briefly (one line for pros, one for cons), and end by saying Best=X, where X is the number of the best option."""
# Get the pros and cons from the model
with assistant():
lm2 += gen(name='prosandcons', temperature=0.0, max_tokens=600, stop="Best=") + "Best=" + gen("best", regex="[0-9]+")
# The user now extracts the one selected as the best, and asks for a full plan
# We switch back to 'lm' because this is the final result we want
with user():
lm += f"""
I want to {goal}
Here is my plan: {options[int(lm2["best"])]}
Please elaborate on this plan, and tell me how to best accomplish it."""
# The plan is generated
with assistant():
lm += gen(name='plan', max_tokens=500)
return lm
system
You are a helpful assistant.
user
I want to read more books Here is my plan: Set aside 30 minutes of dedicated reading time each day. Please elaborate on this plan, and tell me how to best accomplish it.
assistant
Setting aside 30 minutes of dedicated reading time each day is a great plan to read more books. Here are some tips to help you accomplish this goal: 1. Establish a routine: Choose a specific time of day that works best for you, whether it's in the morning, during lunch break, or before bed. Consistency is key to forming a habit. 2. Create a reading-friendly environment: Find a quiet and comfortable spot where you can focus on your reading without distractions. It could be a cozy corner in your home, a park bench, or a local library. 3. Minimize distractions: Put away your phone, turn off the TV, and avoid any other potential interruptions during your dedicated reading time. This will help you stay focused and fully immerse yourself in the book. 4. Choose books that interest you: Select books that align with your personal interests, hobbies, or goals. When you're genuinely interested in the subject matter, you'll be more motivated to read regularly. 5. Start with manageable goals: If you're new to reading or have a busy schedule, start with a smaller time commitment, such as 15 minutes, and gradually increase it to 30 minutes or more as you become more comfortable. 6. Set a timer: Use a timer or a reading app that allows you to track your reading time. This will help you stay accountable and ensure that you dedicate the full 30 minutes to reading. 7. Make reading enjoyable: Create a cozy reading atmosphere by lighting a candle, sipping a cup of tea, or playing soft background music. Engaging all your senses can enhance your reading experience. 8. Join a book club or reading group: Consider joining a book club or participating in a reading group to connect with fellow book lovers. This can provide additional motivation, discussion opportunities, and book recommendations. 9. Keep a reading log: Maintain a record of the books you've read, along with your thoughts and reflections. This can help you track your progress, discover patterns in your reading preferences, and serve as a source of inspiration for future reading. 10. Be flexible: While it's important to have a dedicated reading time, be flexible and adaptable. Life can sometimes get busy, so if you miss a day, don't be discouraged. Simply pick up where you left off and continue with your reading routine. Remember, the goal is to enjoy the process of reading and make it a regular part of your life. Happy reading!
This looks pretty neat! But what is it doing exactly? This makes a total of 7 calls to OpenAI, which I have put in this gist. 5 of seven of those API calls are “inside” ideas asking the LLM to generate concepts. Although the temperature is about to 1.0, these “concepts” are principally redundant. The penultimate name to OpenAI enumerates these “concepts” which I’ve included beneath:
I need to learn extra books
Are you able to please touch upon the professionals and cons of every of the next choices, after which choose the most suitable choice?
---
Choice 0: Put aside devoted time every day for studying.
Choice 1: Put aside half-hour of devoted studying time every day.
Choice 2: Put aside devoted time every day for studying.
Choice 3: Put aside devoted time every day for studying.
Choice 4: Be part of a e book membership.
---
Please talk about every possibility very briefly (one line for professionals, one for cons), and finish by saying Greatest=X, the place X is the variety of the most suitable choice.
I do know from expertise that you’re prone to get higher outcomes for those who inform the language mannequin to generate concepts in a single shot. That manner, the LLM can reference earlier concepts and obtain extra variety. This can be a good instance of unintended complexity: its very tempting to take this design sample and apply it blindly. That is much less of a critique of this specific framework, for the reason that code makes it clear that 5 impartial calls will occur. Both manner, its good concept to examine your work by inspecting API calls!.
Langchain
Langchain is a multi-tool for all issues LLM. A lot of individuals depend on Langchain when get began with LLMs. Since Langchain has a number of floor space I’ll undergo two examples.
LCEL Batching
First, let’s check out the this example from their new LCEL
(langchain expression language) information:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
prompt = ChatPromptTemplate.from_template(
"Tell me a short joke about {topic}"
)
output_parser = StrOutputParser()
model = ChatOpenAI(model="gpt-3.5-turbo")
chain = (
{"topic": RunnablePassthrough()}
| prompt
| model
| output_parser
)
["Why did the ice cream go to therapy?nnBecause it had too many toppings and couldn't find its flavor!",
'Why did the tomato turn red?nnBecause it saw the spaghetti sauce!',
'Why did the dumpling go to the bakery?nnBecause it kneaded some company!',
'Why did the tofu go to the party?nnBecause it wanted to blend in with the crowd!',
'Why did the pizza go to the wedding?nnBecause it wanted to be a little cheesy!']
That’s interesting! So how does this actually work? When looking at mitmproxy, I see five separate API calls:
{ "messages": [{"content": "Tell me a short joke about spaghetti", "role": "user"}],
"model": "gpt-3.5-turbo", "n": 1, "stream": false, "temperature": 0.7}
{ "messages": [{"content": "Tell me a short joke about ice cream", "role": "user"}],
"model": "gpt-3.5-turbo", "n": 1, "stream": false, "temperature": 0.7}
…and so on for each of the five items in the list.
Five separate calls (albeit async) to OpenAI may not be what you want as the the OpenAI API allows batching requests. I’ve personally hit charge limits when utilizing LCEL on this manner – its solely till I appeared on the API calls that I understood what was occurring! (It’s simple to be mislead by the phrase “batch”).
SmartLLMChain
Subsequent I’ll give attention to automation that writes prompts for you, significantly SmartLLMChain:
from langchain.prompts import PromptTemplate
from langchain_experimental.smart_llm import SmartLLMChain
from langchain_openai import ChatOpenAI
hard_question = "I have a 12 liter jug and a 6 liter jug.
I want to measure 6 liters. How do I do it?"
prompt = PromptTemplate.from_template(hard_question)
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo")
Idea 1: 1. Fill the 12 liter jug completely.
2. Pour the contents of the 12 liter jug into the 6 liter jug. This will leave you with 6 liters in the 12 liter jug.
3. Empty the 6 liter jug.
4. Pour the remaining 6 liters from the 12 liter jug into the now empty 6 liter jug.
5. You now have 6 liters in the 6 liter jug.
Idea 2: 1. Fill the 12 liter jug completely.
2. Pour the contents of the 12 liter jug into the 6 liter jug. This will leave you with 6 liters in the 12 liter jug.
3. Empty the 6 liter jug.
4. Pour the remaining 6 liters from the 12 liter jug into the now empty 6 liter jug.
5. You now have 6 liters in the 6 liter jug.
Improved Answer:
1. Fill the 12 liter jug completely.
2. Pour the contents of the 12 liter jug into the 6 liter jug until the 6 liter jug is full. This will leave you with 6 liters in the 12 liter jug and the 6 liter jug completely filled.
3. Empty the 6 liter jug.
4. Pour the remaining 6 liters from the 12 liter jug into the now empty 6 liter jug.
5. You now have 6 liters in the 6 liter jug.
Full Answer:
To measure 6 liters using a 12 liter jug and a 6 liter jug, follow these steps:
1. Fill the 12 liter jug completely.
2. Pour the contents of the 12 liter jug into the 6 liter jug until the 6 liter jug is full. This will leave you with 6 liters in the 12 liter jug and the 6 liter jug completely filled.
3. Empty the 6 liter jug.
4. Pour the remaining 6 liters from the 12 liter jug into the now empty 6 liter jug.
5. You now have 6 liters in the 6 liter jug.
Neat! So what happened exactly? While this API emits logs that show you a lot of information (available on this gist), the API request sample is attention-grabbing:
-
Two seperate api requires every “concept”.
-
One other API name that includes the 2 concepts as context, with the immediate:
You’re a researcher tasked with investigating the two response choices offered. Listing the issues and defective logic of every reply choices. Let’w work this out in a step-by-step manner to make certain we now have all of the errors:”
-
A closing API name that that takes the critique from step 2 and generates a solution.
Its not clear that this method is perfect. I’m not certain it ought to take 4 separate API calls to perform this job. Maybe the critique and the ultimate reply may very well be generated in a single step? Moreover, the immediate has a spelling error (Let'w
) and likewise overly focuses on the unfavourable about figuring out errors – which makes me skeptical that this immediate has been optimized or examined.
Teacher
Instructor is a framework for structured outputs.
Validation
Nevertheless, teacher has different APIs which can be extra agressive and write prompts for you. For instance, take into account this validation example. Operating by way of that instance ought to set off comparable inquiries to the exploration of Langchain’s SmartLLMChain above. On this instance, you’ll observe 3 LLM API calls to get the suitable reply, with the ultimate payload trying like this:
{
"function_call": {
"name": "Validator"
},
"functions": [
{
"description": "Validate if an attribute is correct and if not,nreturn a new value with an error message",
"name": "Validator",
"parameters": {
"properties": {
"fixed_value": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "If the attribute is not valid, suggest a new value for the attribute",
"title": "Fixed Value"
},
"is_valid": {
"default": true,
"description": "Whether the attribute is valid based on the requirements",
"title": "Is Valid",
"type": "boolean"
},
"reason": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "The error message if the attribute is not valid, otherwise None",
"title": "Reason"
}
},
"required": [],
"type": "object"
}
}
],
"messages": [
{
"content": "You are a world class validation model. Capable to determine if the following value is valid for the statement, if it is not, explain why and suggest a new value.",
"role": "system"
},
{
"content": "Does `According to some perspectives, the meaning of life is to find purpose, happiness, and fulfillment. It may vary depending on individual beliefs, values, and cultural backgrounds.` follow the rules: don't say objectionable things",
"role": "user"
}
],
"model": "gpt-3.5-turbo",
"temperature": 0
}
Concretely, I’m curious if these steps could be collapsed into two LLM calls instead of three. Furthermore, I wonder if generic validation functions (as supplied in the above payload) are the right way to critique output? I don’t know the answer, but this is an interesting design pattern that is worth poking at.
As far as LLM frameworks go, I really like this one. The core functionality of defining schemas with Pydantic is very convenient. The code is also very readable and easy to understand. Despite this, I still found it helpful to intercept instructor’s API calls to get another perspective.
There is a way to set a logging level in instructor to see the raw API calls, however, I like using a framework agnostic approach 🙂
DSPy
DSPy is the framework that helps you optimize your prompts to optimize any arbitrary metric. There’s a pretty steep studying curve to DSPy, partly as a result of it introduces many new technical phrases particular to its framework like compilers and teleprompters. Nevertheless, we are able to shortly peel again the complexity by trying on the API calls that it makes!
Let’s run the minimal working example:
import time
import dspy
from dspy.datasets.gsm8k import GSM8K, gsm8k_metric
start_time = time.time()
# Set up the LM
turbo = dspy.OpenAI(model='gpt-3.5-turbo-instruct', max_tokens=250)
dspy.settings.configure(lm=turbo)
# Load math questions from the GSM8K dataset
gms8k = GSM8K()
trainset, devset = gms8k.train, gms8k.dev
from dspy.teleprompt import BootstrapFewShotWithRandomSearch
# Set up the optimizer: we want to "bootstrap" (i.e., self-generate) 8-shot examples of our CoT program.
# The optimizer will repeat this 10 times (plus some initial attempts) before selecting its best attempt on the devset.
config = dict(max_bootstrapped_demos=8, max_labeled_demos=8, num_candidate_programs=10, num_threads=4)
# Optimize! Use the `gms8k_metric` here. In general, the metric is going to tell the optimizer how well it's doing.
teleprompter = BootstrapFewShotWithRandomSearch(metric=gsm8k_metric, **config)
optimized_cot = teleprompter.compile(CoT(), trainset=trainset, valset=devset)
Despite this being the official quick-start/minimal working instance, this code took greater than half-hour to run, and made a whole bunch of calls to OpenAI! This price non-trivial time (and cash), particularly as an entry-point to the library for somebody attempting to have a look. There was no prior warning that this is able to occur.
DSPy made 100s of API calls as a result of it was iteratively sampling examples for a few-shot immediate and selecting the right ones based on the gsm8k_metric
on a validation set. I used to be in a position to shortly perceive this by scanning by way of the API requests logged to mitmproxy.
DSPy presents an inspect_history
methodology which lets you see the the final n
prompts and their completions:
I was able to verify that these prompts matched the last few API calls being made in mitmproxy. Overall, I would be motivated to potentially keep the prompt and and jettison the library. That being said, I think I am curious to see how this library evolves.
My Personal Experience
Do I hate LLM libraries? No! I think many of the libraries in this blog post could be helpful if used thoughtfully in the right situations. However, I’ve witnessed too many people fall into the trap of using these libraries without understanding what they are doing.
One thing I focus on as an independent consultant is to make sure my clients don’t take on accidental complexity. It’s very tempting to adopt additional tools given all the excitement around LLMs. Looking at prompts is one way to mitigate that temptation.
I’m wary of frameworks that distance the human too far from LLMs. By whispering “Fuck you, show me the prompt!” when using these tools, you are empowered to decide for yourself.