Digestible AI
Posts
Claude Just Took the Wheel

Claude Just Took the Wheel

+ A Tutorial on Activating Computer Use 🖱️

Reyhan Merekar
October 24, 2024 • Estimated Reading Time: 10 minutes

Huge developments from Anthropic spell out a crazy future for AI…

In this edition we’ll be covering…

Anthropic Claude’s new Computer Use and Model refresh
Ideogram’s new Canvas
An introduction to vector embeddings
And much more…

Let’s get into it!

Move Over, Humans — Claude Can Click That for You Now

Image from: Anthropic

Okay hold up, Claude can do WHAT?

Anthropic has introduced a new capability in public beta called Computer Use, allowing the Claude 3.5 Sonnet model to interact with computers like humans—by navigating screens, clicking buttons, and typing text.

For example, it can book flights and even order pizza.

While still experimental and prone to errors, this feature is now available via the API for developers to test and provide feedback, with improvements expected over time.

Behind the scenes…

Anthropic’s previous research on tool use and multimodality laid the foundation for Claude’s new computer use skills, which require the ability to interpret images of a computer screen and reason about when to perform specific actions.
Every AI advancement brings new safety challenges. Since computer use applies existing skills rather than enhancing them, Anthropic’s focus is on present risks. Claude 3.5 Sonnet’s computer use skill remains at AI Safety Level 2, requiring no additional safety measures (for now).

And lets not forget…

Although computer use stole the headlines this week, flying under the radar are new model refreshes from Anthropic. The upgraded Sonnet 3.5 enhances coding and tool use, outperforming other models like o1-preview on key benchmarks, while the new Haiku 3.5 delivers the same high-end capabilities as earlier models but with improved speed and lower costs.

So What?

Computer use flips the script on AI development—rather than building custom tools for models, Claude has now proven it can adapt to everyday computer environments, using existing software just like a person would. This is a huge step in building autonomous AI systems, and right now is probably the worst it will be…

Innovation Showcase

Taking Computer Use on a Test Drive

It got scared when asked to verify that it’s a human 😂

You don’t think I’d highlight Computer Use without showing you how to use it yourself… right?

Anthropic has an entire demo dedicated to this. Their GitHub repository helps you get started with computer use on Claude, with reference implementations of:

Build files to create a Docker container with all necessary dependencies
A computer use agent loop using the Anthropic API, Bedrock, or Vertex to access the updated Claude 3.5 Sonnet model
Anthropic-defined computer use tools
A streamlit app for interacting with the agent loop

Here’s how you can take computer use for a test drive yourself locally with the API 👇️

Make sure you have an Anthropic API key and Docker installed
Open up a Terminal or Command Line and set your key as an environment variable

export ANTHROPIC_API_KEY=%your_api_key%

Run the Docker container

docker run \
    -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
    -v $HOME/.anthropic:/home/computeruse/.anthropic \
    -p 5900:5900 \
    -p 8501:8501 \
    -p 6080:6080 \
    -p 8080:8080 \
    -it ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest

Once the container is running, navigate to http://localhost:8080 on your browser to access the combined interface that includes both the agent chat and virtual desktop view!

Quick note: Those using the computer-use version of Claude in the public beta should take the relevant precautions to minimize risks. This is why I’m recommending you run this through the container and VM with minimal privileges.

Image Startup Ideogram Drops Canvas

Today, we’re introducing Ideogram Canvas, an infinite creative board for organizing, generating, editing, and combining images.
Bring your face or brand visuals to Ideogram Canvas and use industry-leading Magic Fill and Extend to blend them with creative, AI-generated content.
— Ideogram (@ideogram_ai)
4:05 PM • Oct 22, 2024

AI startup Ideogram is shaking things up with the launch of Canvas, a new tool designed to manipulate and combine AI-generated images in ways that go beyond traditional editing.

This feature allows users to interact with multiple images on a shared canvas, making it easier to blend elements, experiment with new ideas, and generate creative works. Some of the new features include:

Magic Fill: the inpainting tool, allows you to edit specific regions of your images to replace objects, add text, fix imperfections, change backgrounds, and more.
Extend: the outpainting tool, allows you to expand images beyond their original borders while maintaining a consistent style.

Quick demo of Magic Fill

The design world is already familiar with AI tools, but Ideogram’s latest release hits different—offering exactly the kind of intuitive tool that has the potential to reshape creative workflows.

Check it out below 👇️

AI Visualized

Introduction to Vector Embeddings

Example from a Hugging Face application - You see here that similar colors are placed close to each other in space as they have similar embedding representations!

Vector Embeddings are a foundational concept in AI, and I want to explain what they are and why they're so important.

This video from Colin Talks Tech is a great introduction to the concept of vector embeddings.

Okay so what the heck are embeddings?

Vector embeddings are a way to represent objects, such as text and images, as a list of numbers. These numbers can be thought of as coordinates in a high-dimensional space. The closer two objects are in this space, the more similar they are.

How are they created?

There are a few different ways to create vector embeddings. One way is to use a machine learning model, such as a neural network. Another way is to use a technique called feature engineering.

Vector embeddings are typically stored in a vector database. A vector database is a specialized database that is designed to store and search for vector embeddings.

How are they used?

You can see in the illustration above that similar colors are placed close to each other in space.

When objects are positioned based on their key properties, measuring semantic similarity becomes as simple as calculating distances between points. This is far more effective than traditional methods like matching color names.

Why should I care?

The most foundational AI applications are entirely dependent on some sort of embeddings framework. Some common use-cases for vector embeddings include:

Recommendation Systems
Search
Chatbots
Retrieval Augmented Generation

Quick Bites

Stay updated with our favorite highlights, dive in for a full flavor of the coverage!

Image from Microsoft

Microsoft announced that Agents are coming to Copilot Studio in public preview next month.

In August, Elon Musk’s xAI promised to make Grok, the company’s flagship generative AI model powering a number of features on X, available via an API. Now, that API has arrived.

Marc Andreessen and Ben Horowitz explore the fascinating intersection of AI and crypto, highlighting the rise of Truth Terminal (@truth_terminal), an autonomous chatbot that made $1M.

Cohere releases their new state-of-the-art multimodal AI search model, Embed 3.

The Neural Network

Accurate…

Keeping GPT-4o honest with meme analysis

Blown up image of meme

Also we’re curious…

Do you trust fully autonomous AI systems like Claude's Computer Use?

Until We Type Again…

How did we do?

This helps us create better newsletters!

If you have any suggestions or specific feedback, simply reply to this email or fill out this form. Additionally, if you found this insightful, don't hesitate to engage with us on our socials and forward this over to your friends!

You can find our past newsletter editions here!

This newsletter was brought to you by Digestible AI. Was this forwarded to you? Subscribe here to get your AI fix delivered right to your inbox. 👋