Skip to player
Skip to main content
Skip to footer
Search
Connect
Watch fullscreen
Like
Comments
Bookmark
Share
Add to Playlist
Report
Brainstorm AI 2024: OpenAI Demo
Fortune
Follow
12/10/2024
Presenter: Romain Huet, Head, Developer Experience, OpenAI
Category
🤖
Tech
Transcript
Display full video transcript
00:00
Thank you. Good afternoon, everyone. Super thrilled to be here today.
00:04
It's an interesting time in AI. The pace of change is really accelerating,
00:09
and there are more AI applications going from prototype to production every single day.
00:14
I'm Romain, and I lead developer experience at OpenAI.
00:17
And you know, a lot of people know us for ChatGPT.
00:19
But in fact, our first product was the OpenAI API,
00:22
the ability for developers and for companies to take our AI models
00:27
and bring them into their own products and services.
00:30
And today, we're very fortunate to work with more than 3 million developers on the platform,
00:34
and 92% of Fortune 500 actually use our product.
00:37
So it's an amazing time to be all of us together and building in AI.
00:42
You know, one of the favorite things in my role day to day
00:46
is to work with amazing founders and companies who are pushing the envelope of these models every day
00:51
and also showing them the art of the possible with what we put out there in the world.
00:56
So in that spirit, instead of having any presentation of slides today,
01:00
let's jump to my laptop and take a look at some live demos.
01:05
So first off, I'd be remiss not to mention Sora.
01:08
We just launched Sora this morning, our video generation model,
01:13
and it's now available on Sora.com.
01:15
So what you're looking at here is the homepage with features video
01:19
already created in the last few hours by the community.
01:22
It's pretty amazing.
01:23
In fact, we can fire off maybe a first video generation here.
01:27
I'll say a golden retriever dressed as Santa on Christmas Day.
01:33
You know, sounds pretty cool.
01:36
There is an enormous amount of demand at the moment for Sora.
01:39
So it sounds like, you know, the service is already at capacity.
01:42
That's okay.
01:43
You can check it out later, you know.
01:46
And these are some of the beautiful videos that are already creating.
01:50
So now, in fact, to show you something more concrete but with a wink to
01:54
Sora, I'm going to bring here something about multimodality.
01:59
Multimodality is what we refer to as the ability for AI models to not
02:04
just work with text but also work with speech, with audio, with vision,
02:10
and also with video, as you've seen with Sora.
02:13
So I wanted to bring you here with a demo of how we can combine all of
02:17
these modalities together to create something all unique.
02:21
So this is the only part of the demo that is not live.
02:24
This is describing a prompt, as you can read on the screen in great
02:27
detail, about kind of the history of the German automotive industry.
02:33
And so if I click here, this is the video that came out of Sora.
02:45
So what's pretty amazing here is that it's only about us stitching
02:48
together some scenes out of Sora and adding a little bit of a
02:51
soundtrack on top, but all of the images are untouched.
02:55
This is straight out of the model.
02:57
It's pretty magical.
02:59
So now let's say we want to use such a video as a basis for a
03:03
presentation, and what if we want to tell a story that these images
03:07
and this video can tell us?
03:09
So what I'll do next is I'll click this button, extract frames, and
03:13
I'll slice a few frames out of this video that Sora generated.
03:18
And now what I'll do next, I'll bring another modality of GPT-4,
03:22
the ability to understand and extract insights from images or
03:26
pictures, and that's what we call vision.
03:28
And so what I'll do next here, I'll click this button, and I'll
03:31
take all of these six frames I sliced out of the Sora video, and
03:35
I'll send them to GPT-4 with vision to come up with a story.
03:39
So don't blink.
03:40
It goes fast.
03:41
There we go.
03:42
Every single time this is unique.
03:43
Again, this is all happening in real time as we speak.
03:46
So now out of these few frames, we were able to come up with that
03:50
story.
03:51
So this could be, for instance, the narration for a documentary,
03:55
or this could be something that you can adapt to your own use
03:57
cases and industries.
04:00
But next, what if we were to bring this to life?
04:02
What if we were to be able to bring a voice onto that narration?
04:06
Now I'm sure some of you are already thinking ahead here.
04:09
So what I'll do next is I'll click here to create a voiceover,
04:12
and we now have amazing voices in our API, like preset voices,
04:17
and they are much more expressive.
04:19
We'll see that in a minute with the next demo.
04:21
But here, I also wanted to show you a preview of our voice
04:25
engine model.
04:26
The voice engine model is the ability to take a short sample
04:29
of a voice and turn it into a voice you can use in your apps.
04:33
So this is only available to trusted partners for now, and
04:37
we're actively testing this with them.
04:39
We obviously take safety very seriously.
04:41
We wanted to show you a quick preview of what that could look
04:44
like.
04:45
So I'll go ahead, and here I'm going to record a few seconds of
04:48
my voice.
04:49
I'm extremely excited to be on stage at the Fortune Brainstorm
04:53
AI Conference today, and all I need is a few more seconds,
04:56
and I think we should be good to go for me to actually narrate
04:59
this beautiful video that Sora created.
05:02
All right, so now that this is...
05:04
So I'll go ahead, record it right there.
05:06
I can just send the script that we created before.
05:09
Now send this little sample to our voice engine, and we'll see
05:12
how all of these can be combined together to create something
05:16
a little unique here.
05:18
So in a second, we should see that exact same video.
05:21
In the heart of the German automotive industry...
05:24
But now, narrate it.
05:25
...Workers meticulously assemble the machines that...
05:27
So one more modality that you can play with, right?
05:30
...Every bolt secures a legacy of engineering precision and
05:34
innovation.
05:35
Next, I could also do translation.
05:37
So as you can hear from my French accent, I can maybe also
05:41
narrate it in French.
05:42
So I'll go ahead and click French here.
05:44
...Parallel commitment to quality and efficiency,
05:46
setting a couture pour rejoindre le monde.
05:48
And now this is me narrating this video in French.
05:51
...Témoigne d'un engagement sans pareil envers la qualité...
05:54
And last but not least, another modality that you can now tap
05:57
into from our multi-modal AI model is the ability to bring
06:01
the speech to text as well.
06:03
So here, for instance, most of you don't speak French, but I
06:06
can actually turn on the captions, and all of a sudden,
06:09
we have this other modality of speech to text available.
06:13
So this is just one example of how you can combine now these
06:17
multi-modal models like GPT4 to create something very unique
06:21
that you can adapt to your own businesses and products and
06:25
services.
06:26
But next, I want to show you something else about real-time
06:30
and our real-time API.
06:32
First of all here, how many of you in the room have tried
06:34
ChatGPT with the advanced voice mode before?
06:37
Wow, quite a few of you.
06:38
That's awesome.
06:39
Well, you know, one question that we've heard so many times
06:42
over the past few months has been, how can I bring these
06:44
exact same experiences into my own products, into my own
06:48
services, whether it's to reinvent customer service at
06:51
scale or if it's about, like, a new way to interface with
06:55
products?
06:56
Well, here, what I'm going to show you is a little sneak peek
06:59
behind the scenes.
07:00
So this is the OpenAI playground.
07:02
So this is where developers and builders start their journey
07:06
with our models.
07:07
And so here, I can start a session at any time and actually
07:12
engage to start to have a feel for how the real-time API and
07:16
the speech-to-speech capabilities of GPT4 come to life.
07:19
So let's try that.
07:22
Hi, there.
07:24
Hello, there.
07:25
How can I assist you today?
07:26
Yeah, I'm on stage at the Fortune brainstorm AI conference.
07:29
Could you welcome the attendees for me?
07:32
Welcome, everyone, to the Fortune brainstorm AI conference.
07:35
Okay.
07:36
So I like how you started, but I think I want you to bring more
07:38
energy, more enthusiasm, and don't forget people watching on
07:41
the live stream.
07:43
Welcome, everyone, to the Fortune brainstorm AI conference
07:47
at the fabulous St. Regis Hotel in San Francisco.
07:51
To our incredible attendees and viewers online, prepare for an
07:56
inspiring and groundbreaking event as we dive into the future
08:00
of AI.
08:01
Cool.
08:02
Now, could you whisper with a very low voice a secret about San
08:05
Francisco that maybe only the true locals could know, like
08:08
very, very low, like...
08:11
Did you know that underneath San Francisco's bustling streets,
08:16
there's a hidden network of tunnels?
08:19
I did not know that.
08:20
These tunnels were used during Prohibition to transport liquor
08:25
discreetly.
08:26
It's a little-known secret.
08:28
So as you can see, the ability to interrupt the model at any
08:31
time, the ability to jump back and, like, generate intonations,
08:35
all of that is at your disposal to create amazing ways to
08:38
interface between your brand, your product, and your own
08:41
customers.
08:42
Maybe one last example here.
08:44
I have a test for you here.
08:45
So here is the deal.
08:47
I'm going to speak to you in three different languages, and
08:50
I'd like you to speak back to me with those same three languages
08:53
mixed up.
08:54
Does that sound good?
08:55
Absolutely.
08:56
I'm ready for the challenge.
08:58
Let's hear it.
08:59
All right.
09:00
Ciao.
09:01
Parla Italiano.
09:02
Je suis français.
09:03
Très heureux d'être ici on stage, and I can switch back to
09:05
English at any time.
09:06
Your turn.
09:07
Ciao.
09:08
Je parle français aussi.
09:10
And I'm switching to English now.
09:12
It's a pleasure to mix these languages on stage with you.
09:16
So this is the kind of power now you have with these
09:19
multimodal abilities for speech-to-speech.
09:22
Here, one example, for instance, is that if you want to
09:24
imagine what that could look like briefly, well, this is what
09:27
you can do.
09:29
You can now have voice as a modality to interface with no
09:32
keyboard.
09:33
What if I were to say, hey, can you show me on the mat where
09:36
we are?
09:38
Sure, Romain.
09:40
Showing your current location in San Francisco now.
09:42
Awesome.
09:43
I'm planning a trip to Paris.
09:44
Can you show that to me?
09:47
Here's Paris, a beautiful city with rich history.
09:50
Amazing.
09:51
I like to see amazing Christmas decorations when I'm there.
09:53
Can you show us a few examples to me?
09:58
Here are some spots in Paris with amazing Christmas
10:00
decorations.
10:02
Galerie Lafayette, the Champs-Élysées.
10:04
So this is one way you can imagine how we'll be able to
10:07
reinvent computer-to-human interfaces in the future with
10:11
multimodality.
10:12
And, of course, this is not the only paradigm shift that we're
10:15
excited about because we've also launched the O1 series of
10:18
models that can now tackle very complex tasks with reasoning.
10:22
And we strongly believe that reasoning plus multimodality
10:25
combined will unleash AI agents next year.
10:29
And so we're extremely excited about these two paradigm shifts.
10:32
Thank you so much for your time.
10:34
And I can't wait to see what you build.
Recommended
1:24
|
Up next
OpenAI exec tests out new Sora AI video generator
Fortune
12/12/2024
0:52
OpenAI's $157 Billion Milestone: The Future of AI Just Got Bigger!
WooGlobe
10/4/2024
10:11
Brainstorm AI Singapore 2024: How to write effective AI prompts
Fortune
7/31/2024
5:37
Brainstorm AI London 2024: The Importance Of Ai Chip Development
Fortune
4/15/2024
0:41
OpenAI pulls video after “io” trademark dispute; deal with Ive and Altman remains intact
Benzinga
yesterday
0:48
Insurers to Cut Prior Authorization Delays, Aim for Real-Time Processing by 2027
Benzinga
yesterday
0:42
JetBlue to cease all Miami flights amid losses, ending service by September 3
Benzinga
yesterday
0:31
How to Make Crushed Cucumber Salad With Tomatillo Dressing
Food and Wine
yesterday
0:35
How to Make Chef Roscoe's Arroz con Pollo with Mushrooms
Food and Wine
yesterday
0:32
How to Make Tuna Steaks with Lemon Caper Sauce
Food and Wine
yesterday
1:05
United Flight Declares Emergency 8 Hours Into Flight, Sends Out International Distress Code
People
yesterday
0:43
Rescue Dogs Reunited After Being Sent to Different Foster Homes
People
yesterday
1:18
Rescue Teams Racing to Save 26-Year-Old Hiker Trapped for 3 Days Inside Walls of Active Volcano
People
yesterday
1:20
Behind the scenes of Fortune Magazine's Southeast Asia 500 cover
Fortune
3 days ago
1:05
OpenAI CEO says Meta is offering $100 million bonuses to poach his team
Fortune
3 days ago
0:58
What tasks are you offloading to Al at work?
Fortune
3 days ago
1:22
Meet Southeast Asia’s 500 biggest business power players
Fortune
3 days ago
0:30
How often do you use ChatGPT at work?
Fortune
5 days ago
0:37
John Green: Gen Z should read more books to "shrink the empathy gap"
Fortune
6 days ago
1:10
OpenAI CEO says he thought AMD's chip specs were "totally crazy"
Fortune
6 days ago
0:31
Does your boss know you use AI at work?
Fortune
6/16/2025
1:43
Why Walgreens fell from its peak—can going private save it?
Fortune
6/16/2025
1:10
C-suite execs reveal their first jobs
Fortune
6/14/2025
0:52
Will AI take your job?
Fortune
6/13/2025
0:46
How LA Clippers exec beat out 30 candidates for the job
Fortune
6/13/2025