1.3: Programmatic Model Access

Bring up an ssh session on your Linux VM. Clone the course repository, create a Python virtual environment, activate it, and then install the packages for utilizing langchain with the different provider models. LangChain is a Python framework that enables developers to easily build applications across a range of language models.

git clone https://github.com/wu4f/cs410g-src
cd cs410g-src/01*
virtualenv -p python3 env
source env/bin/activate
pip install -r requirements.txt

Consider the program below that instantiates a model from a number of provider's, then calls the uniform invoke() method with a prompt to generate a completion. The invoke() call blocks until the entire response is generated before returning the result.

from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro")
response = llm.invoke("Write me a haiku about Portland State University")
print(response)

from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")
response = llm.invoke("Write me a haiku about Portland State University")
print(response)

from langchain_anthropic import ChatAnthropic
llm = ChatAnthropic(model="claude-3-opus-20240229")
response = llm.invoke("Write me a haiku about Portland State University")
print(response)

Ensure that all of the environment variables specifying API keys for each model provider has been set. Then, run the program and view the results each model returns.

python3 01_models.py

We can programmatically access our Ollama models from Python as well. If it is not still running, bring up your Ollama server for your project. Then, within the Compute Engine interface make a note of its external IP address (e.g. 35.230.93.16 below)

Consider the snippet below. It instantiates an Ollama model based on the name of a model that has been installed on the Ollama server (e.g. llama3) as well as the location of the API endpoint of the Ollama server as specified by the IP address in the base_url parameter.

from langchain_community.llms import Ollama
llm = Ollama( model="llama3", base_url="http://35.230.93.16:11434" )

With the model instantiated, instead of using a model's invoke() method as before, we can instead utilize its stream() method instead. This method returns results to the program as it is generated. As longer responses are produced, the model will pass back partial results in a generator that is iteratively output in the for loop.

for chunks in llm.stream("Write me a short story about a rabbit"):
    print(chunks,end="")

For this program, we wrap the execution of the model with a simple interactive shell that accepts input in a loop and sends it to the language model. The shell exits when a blank line is given.

while True:
    line = input("llm>> ")
    if line:
        for chunks in llm.stream(line):
            print(chunks,end="")
        print("")
    else:
        break

On your Linux VM, run the program below that implements a simple interactive shell for querying your Ollama server, supplying the IP address of your Ollama server and the model on the server you want to query as arguments. Test out the functionality of the models you've installed.

python3 02_ollama.py <OllamaIPAddress> <OllamaModel>

One of the benefits of LangChain is that it provides a unified interface for integrating models into an application. This allows one to plug-and-play different models and compare their behavior and performance for your application. One useful module in LangChain for doing so is the Model Laboratory. With this, one can instantiate a range of models and automatically run a given prompt across all of them, allowing one to analyze the results of each model. The snippet below shows an example using models pulled from HuggingFace, OpenAI, and Google.

from langchain.model_laboratory import ModelLaboratory
from langchain_community.llms import HuggingFaceHub, HuggingFaceEndpoint, Ollama
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic

llms = [
#    To instantiate an Ollama level
#    Ollama( model="llama2", base_url="http://w.x.y.z:11434"),
    HuggingFaceEndpoint(repo_id="mistralai/Mistral-7B-Instruct-v0.2"),
    HuggingFaceHub(repo_id="google/gemma-2b"),
    ChatOpenAI(model='gpt-4o'),
    ChatGoogleGenerativeAI(model="gemini-1.5-pro"),
    ChatAnthropic(model="claude-3-sonnet-20240229"),
]
model_lab = ModelLaboratory.from_llms(llms)

model_lab.compare("Who is the president of the United States?")

Examine the program provided in the repository that iteratively queries a set of LLMs with your prompts. Edit the file to only include the ones you want to compare, then run the program and examine how the models compare across a variety of prompts.

python3 03_multi.py