Want to go back?

Streamlining Structured Outputs with Instructor

Published on
3 mins read
––– views
thumbnail-image

You know how guys shares memes like there's a meme connection thingy? Well, I want to be your go-to guy for sharing awesome libraries that’ll supercharge your workflows! 😄 Get it? Let’s kick things off with Instructor.

So, our library lord? 😎😎 What's the deal with Instructor?

Instructor is a Python library designed to facilitate interaction with large language models (LLMs) by offering a user-friendly API for managing structured outputs.

It’s built on Pydantic, which is why you would use Instructor:

  1. Powered by Type Hints: Instructor leverages Pydantic’s use of type hints, enabling schema validation and prompting through type annotations. This means less to learn, less code to write, and better integration with your IDE.

  2. Customizable: Pydantic is highly customizable. You can define your own validators, custom error messages, and much more.

  3. Ecosystem: Pydantic is the most widely used data validation library for Python, with over 100 million downloads a month. It powers popular libraries like FastAPI and Typer.

Key Features

  • Response Models: Define structured outputs using Pydantic models.
  • Retry Management: Configure retry attempts for requests seamlessly.
  • Validation: Ensure LLM responses meet your specifications.
  • Streaming Support: Handle lists and partial responses effortlessly.
  • Multiple Language Support: Available in Python, TypeScript, Ruby, Go, and Elixir.

Getting Started

Requires Python 3.9+

To install Instructor, simply run:

pip install -U instructor
  • I find the library to be extremely helpful, clear, and simple, as it allows you to:
  1. Define a schema with class StructuredData(BaseModel):
  2. Define validators and methods for your schema.
  3. Encapsulate all your LLM logic within a function using def extract(a) -> StructuredData:
  4. Define typed computations against your data with def compute(data: StructuredData): or call methods on your schema using data.compute()

Example Usage

Here’s a quick example illustrating how to extract structured data from LLM responses:

import instructor
from pydantic import BaseModel
from openai import OpenAI

# Define your desired output structure
class ProductReview(BaseModel):
    product_name: str
    rating: int
    comment: str

# Patch the OpenAI client
client = instructor.from_openai(OpenAI())

# Extract structured data from natural language
review_data = client.chat.completions.create(
    model="gpt-3.5-turbo",
    response_model=ProductReview,
    messages=[
        {"role": "user", "content": "I recently bought the Acme Coffee Maker. It's amazing, I'd give it a 5 out of 5! The coffee is always hot and delicious, but the design could be better."},
    ],
)

print(review_data.product_name)  # Output: Acme Coffee Maker
print(review_data.rating)        # Output: 5
print(review_data.comment)       # Output: "The coffee is always hot and delicious, but the design could be better."

Conclusion

I truly appreciate the existence of such libraries; it is one of a kind that I have encountered, as it streamlines the process of working with LLM outputs, enhancing both efficiency and accuracy in data extraction. I encourage you to give it a try!

Thank you for reading! Stay tuned for more insights from your knowledgeable library lord! 😉

References