Image Captioning

Image captioning generates descriptive text for images, useful for accessibility and content indexing.

Hands-on Example: Generating Captions for Images

from transformers import pipeline
from PIL import Image
import requests
from io import BytesIO
import matplotlib.pyplot as plt

# Initialize the image-to-text pipeline
image_captioner = pipeline("image-to-text")

# Load images from URLs
image_urls = [
    "https://upload.wikimedia.org/wikipedia/commons/thumb/f/f3/San_Francisco_skyline_at_night_from_Pier_7.jpg/800px-San_Francisco_skyline_at_night_from_Pier_7.jpg",
    "https://upload.wikimedia.org/wikipedia/commons/thumb/d/d5/Giraffe_at_Kruger_National_Park%2C_South_Africa_%28square_crop%29.jpg/800px-Giraffe_at_Kruger_National_Park%2C_South_Africa_%28square_crop%29.jpg"
]

# Generate captions for each image
for i, url in enumerate(image_urls):
    # Load image
    response = requests.get(url)
    image = Image.open(BytesIO(response.content))
    
    # Display image
    plt.figure(figsize=(8, 8))
    plt.imshow(image)
    plt.axis('off')
    plt.title(f"Image {i+1}")
    plt.show()
    
    # Generate caption
    captions = image_captioner(image)
    
    print(f"Generated captions for Image {i+1}:")
    for caption in captions:
        print(f"• {caption['generated_text']}")
    print("-" * 50)

The image captioning pipeline generates descriptive text for images, demonstrating how vision and language models can be combined.

Try It Yourself:

Generate captions for personal photos or artwork to see how the model interprets different visual styles.
Try different models like nlpconnect/vit-gpt2-image-captioning for comparison.
Test the captioning on abstract or ambiguous images to see how the model handles them.

results matching ""

No results matching ""