Digestible AI
Posts
Zuck's INSANE Vision for AI

Zuck's INSANE Vision for AI

+ How to use ChatGPT's New Advanced Voice Mode 🗣️

Reyhan Merekar
September 26, 2024 • Estimated Reading Time: 9 minutes

Meta gave us an event to remember yesterday and there are MAJOR developments out of OpenAI…

In this edition we’ll be covering…

Meta Connect 2024
ChatGPT’s Advanced Voice Mode
Google’s new Gemini models
And much more…

Let’s get into it!

Meta Goes Bonkers at its Flagship Event, Connect 2024

Image from: WIRED

Meta Connect 2024 did not disappoint, and it left us AI nerds SALIVATING at what’s to come next from Meta. In the race to actually democratize AI, I’d say Meta jumped a few steps ahead of the competition yesterday, and specifically I want to highlight the new set of models, Llama 3.2.

Meta’s latest flagship family of LLMs comes in two sizes, large (11B and 90B parameters) and small (1B and 3B parameters).

The two largest models of the Llama 3.2 collection support image reasoning use cases, such as document-level understanding including charts and graphs, captioning of images, and visual grounding tasks such as directionally pinpointing objects in images based on natural language descriptions. The Llama 3.2 vision models are competitive with top models like Claude 3 Haiku and GPT4o-mini in image recognition and visual tasks.

The 3B version surpasses Gemma 2 2.6B and Phi 3.5-mini in instruction-following and summarization, while the 1B model performs comparably with Gemma. The smaller models are optimized for mobile and edge computing.

Image from: Meta

More releases include:

The new Orion AR Glasses (powered by Llama 3.2), voice chat with Meta AI (with celebrities), and having your own AI Avatar that answers an audience in real time!

Get Your Hands Dirty!

How to Use OpenAI’s New Advanced Voice Mode

Image from: MIT Technology Review

OpenAI’s highly anticipated “Advanced Voice Mode” has finally been rolled out, per Sam Altman. This was demoed back in the Spring, and it was created to have seamless interactions with ChatGPT as if you were speaking to a real person.

It’s currently only accessible via mobile, but here’s how you can use it starting now!

Log into the ChatGPT app on your mobile device (premium or team subscription required).
Open up a new chat and make sure you are toggled to either GPT-4o or GPT-4o Mini.
Tap the “wave” icon on the bottom right next to the microphone.
Choose a voice persona to chat with, then ask away!

🔥 Pro tip: Some users may not see this right away in the mobile interface, you can try to delete and reinstall the app, that did the trick for me!

Google Gemini is the Gift that Keeps on Giving for Developers

Image from: Google

This week, Google announced two new “production ready” models, Gemini-1.5-Pro-002 and Gemini-1.5-Flash-002.

The new Gemini 1.5 models build on earlier releases with improvements in math, long context, and vision tasks.
The new models are available for free via Google AI Studio and the Gemini API, and also through Vertex AI for larger organizations and Google Cloud users.
Significant gains include ~7% improvement in MMLU-Pro and ~20% in math benchmarks, along with enhanced performance in visual understanding and Python code generation.

So What?

When we talk about developing models, it’s often tough to find the line between capable and cost-effective LLMs. Google is looking to solve that problem by addressing both at the same time.

Starting October 1st, 2024, Google is reducing prices for the existing Gemini 1.5 Pro models by up to 64% for input and cached tokens, and 52% for output tokens on prompts under 128K tokens.

Tool Spotlight

How to Actually Start Chatting with Open Source Models

Open source LLMs continue to gain popularity, and Ollama is a great place to get started with them. The platform offers the latest open source LLMs (including Llama 3.2) out of the box to chat with. To get started with Ollama and Google’s Gemma 2…

Download Ollama, open a Terminal (or equivalent) on your machine, type in the following commands:

Grab the Gemma 2 model from the Ollama registry:

ollama pull gemma2:2b

List your downloaded language models:

ollama list

Run the model of your choice (beware of your compute limitations, I am running gemma2:2b on a 16GB MacBook Pro):

ollama run gemma2:2b

Ask it to make you a carrot cake 🥕:

>>> Give me a recipe for a carrot cake

Getting my next great dessert recipe…

Quick Bites

Stay updated with our favorite highlights, dive in for a full flavor of the coverage!

I shared the following note with the OpenAI team today.
— Mira Murati (@miramurati)
7:34 PM • Sep 25, 2024

OpenAI CTO Mira Murati announced her resignation from the prolific startup yesterday.

Middle Eastern funds are plowing billions of dollars into the hottest AI start-ups

Jony Ive has confirmed that he’s working with OpenAI CEO Sam Altman on an AI hardware project.

The new Notion AI let’s you search, generate, analyze, and chat all in one tool.

Sundar Pichai recently shared his thoughts on why AI is a fundamental platform shift and how it’s accelerating scientific discoveries to help people in remarkable ways.

The Neural Network

Now that we can have live conversations with ChatGPT…

What will you use Advanced Voice Mode for?

Until We Type Again…

How did we do?

This helps us create better newsletters!

If you have any suggestions or specific feedback, simply reply to this email or fill out this form. Additionally, if you found this insightful, don't hesitate to engage with us on our socials and forward this over to your friends!

You can find our past newsletter editions here!

This newsletter was brought to you by Digestible AI. Was this forwarded to you? Subscribe here to get your AI fix delivered right to your inbox. 👋