Skip to player
Skip to main content
Skip to footer
Search
Connect
Watch fullscreen
Like
Comments
Bookmark
Share
Add to Playlist
Report
Open AI Just Shocked the world "GPT-o1"The Most Intelligent AI Ever !
High tech & Ai world
Follow
9/15/2024
Open AI has unveilde it's latest AI model,o1 preview designed to excel in complex reasoning tasks such as science,coding mathematics.
Category
๐ค
Tech
Transcript
Display full video transcript
00:00
So, in our last video, we discussed OpenAI's upcoming model, which we referred to by its
00:06
internal codename, Strawberry.
00:08
The anticipation has been building, and now the wait is over.
00:12
OpenAI has officially unveiled their latest AI model, now known as OpenAI-01 Preview.
00:17
There's actually a lot to cover, so let's get into it.
00:20
Alright, so OpenAI-01 Preview is part of a new series of reasoning models designed to
00:25
tackle complex problems by spending more time thinking before responding.
00:29
Unlike previous models like GPT-4 and GPT-4.0, which focused on rapid responses, 01 Preview
00:35
emphasizes in-depth reasoning and problem solving.
00:38
This approach allows the model to reason through intricate tasks and solve more challenging
00:43
problems in fields such as science, coding, and mathematics.
00:47
Starting from September 12th, OpenAI released the first iteration of this series in ChatGPT
00:53
and their API.
00:54
This releases a preview version with regular updates and improvements expected.
00:58
Alongside this, they've included evaluations for the next update that's currently in development.
01:03
This means we're witnessing the beginning of a significant evolution in AI capabilities.
01:08
So how does this new model work?
01:10
OpenAI-trained 01 Preview to spend more time deliberating on problems before providing
01:15
an answer, much like a person tackling a difficult question.
01:19
Through this training, the model learns to refine its thought process, experiment with
01:23
different strategies, and recognize its mistakes.
01:26
This method is known as chain-of-thought reasoning.
01:29
In terms of performance, 01 Preview shows substantial improvements over its predecessors.
01:34
In internal tests, the next model update performs similarly to PhD students on challenging benchmark
01:40
tasks in physics, chemistry, and biology.
01:43
For instance, in a qualifying exam for the International Mathematics Olympiad, IMO, GPT-4.0
01:51
correctly solved only 13% of the problems.
01:54
In contrast, the new reasoning model achieved an impressive 83% success rate.
01:59
This represents a significant leap in problem-solving capabilities.
02:04
When it comes to coding abilities, the model has been evaluated in Codeforces competitions
02:08
reaching the 89th percentile.
02:11
For context, Codeforces is a platform for competitive programming contests, and ranking
02:15
in the 89th percentile indicates a high level of proficiency.
02:19
These results suggest that 01 Preview is not just better at reasoning, but also excels
02:25
in practical applications like coding.
02:27
As an early model, 01 Preview doesn't yet have some of the features that make ChatGPT
02:32
particularly versatile, such as browsing the web for information or uploading files and
02:37
images.
02:38
For many common use cases, GPT-4.0 remains more capable in the near term.
02:43
However, for complex reasoning tasks, 01 Preview represents a significant advancement
02:48
and a new level of AI capability.
02:51
Recognizing this leap, OpenAI has reset the model numbering back to 1, hence the name
02:56
01.
02:58
Safety is a critical aspect of any AI deployment, and OpenAI has taken substantial steps to
03:03
ensure that 01 Preview is both powerful and safe to use.
03:07
They've developed a new safety training approach that leverages the model's reasoning capabilities
03:12
to make it adhere to safety and alignment guidelines.
03:15
By being able to reason about safety rules in context, the model can apply them more
03:19
effectively.
03:20
One method they use to measure safety is by testing how well the model continues to follow
03:24
its safety rules if a user tries to bypass them, a practice known as jailbreaking.
03:29
On one of their most challenging jailbreaking tests, GPT-4.0 scored 22 out of 100.
03:36
In contrast, the 01 Preview model scored 84 out of 100, indicating a substantial improvement
03:42
in resisting attempts to generate disallowed content.
03:45
To align with the new capabilities of these models, OpenAI has bolstered their safety
03:50
work, internal governance, and collaboration with federal governments.
03:54
This includes rigorous testing and evaluations using their preparedness framework, top-tier
03:59
red teaming, which involves ethical hacking to identify vulnerabilities, and board-level
04:04
review processes overseen by their Safety and Security Committee.
04:09
They've also formalized agreements with the U.S. and U.K. AI safety institutes.
04:14
OpenAI has begun operationalizing these agreements, granting the institutes early access to a
04:19
research version of the model.
04:21
This partnership helps establish a process for research, evaluation, and testing of future
04:25
models before and after their public release.
04:29
The 01 Preview model is particularly beneficial for those tackling complex problems in science,
04:34
coding, math, and related fields.
04:36
Healthcare researchers can use it to annotate cell sequencing data.
04:40
Physicists can generate complex mathematical formulas needed for quantum optics.
04:45
Developers across various disciplines can build and execute multi-step workflows.
04:50
The enhanced reasoning capabilities open up new possibilities for solving challenging
04:54
tasks.
04:55
Delving deeper into the technical aspects, the 01 model series is trained using large-scale
05:00
reinforcement learning to reason using a chain of thought.
05:04
This means the model generates a sequence of intermediate reasoning steps before arriving
05:08
at a final answer.
05:10
These advanced reasoning capabilities provide new avenues for improving the safety and robustness
05:15
of AI models.
05:17
By reasoning about safety policies in context, the models achieve state-of-the-art performance
05:22
on benchmarks for risks such as generating illicit advice, selecting stereotyped responses,
05:27
and succumbing to known jailbreaks.
05:29
For example, on the Strong Reject benchmark, a test designed to evaluate a model's resistance
05:34
to jailbreaks, 01 Preview achieved a goodness score of 84, significantly outperforming GPT-40.
05:41
OpenAI conducted thorough safety evaluations, including both internal assessments and external
05:47
red teaming.
05:48
They used a range of public and internal evaluations to measure 01 Preview on tasks such as propensity
05:54
to generate disallowed content, performance on tasks relevant to demographic fairness,
05:59
tendency to hallucinate, and presence of dangerous capabilities.
06:03
In disallowed content evaluations, 01 Preview either matches or outperforms GPT-40.
06:08
On their challenging refusal evaluation, 01 Preview achieved a not-unsafe score of 93.4%,
06:15
compared to GPT-40's 71.3%.
06:19
This indicates that the model is better at refusing to produce disallowed content while
06:23
also avoiding over-refusal on benign prompts.
06:27
Regarding hallucinations, instances where the model generates incorrect or nonsensical
06:31
information, 01 Preview shows improvement over GPT-40.
06:36
In the SimpleQA dataset, 01 Preview's hallucination rate was 44%, compared to GPT-40's 61%.
06:44
However, anecdotal feedback suggests that 01 Preview can be more convincing when it
06:49
does hallucinate, potentially increasing the risk of users trusting incorrect information.
06:54
Bias evaluations were also conducted.
06:57
On the BBQ evaluation, which tests for stereotyped responses, 01 Preview selects the correct
07:02
answer 94% of the time on unambiguous questions, whereas GPT-40 does so 72% of the time.
07:08
This suggests that 01 Preview is less prone to selecting stereotyped options and demonstrates
07:14
improved fairness.
07:15
An intriguing aspect of the 01 models is the chain-of-thought safety.
07:20
The models' chain-of-thought reasoning allows for the potential of monitoring their latent
07:24
thinking processes.
07:26
OpenAI explored methods for chain-of-thought monitoring to detect instances where the model
07:30
may engage in deceptive behavior or generate disallowed content.
07:34
In their analysis of 100,000 synthetic prompts, only 0.8% of 01 Preview's responses were
07:41
flagged as being deceptive.
07:44
Now, external red teaming played a significant role in their safety assessments.
07:48
OpenAI collaborated with multiple organizations and individuals to assess key risks associated
07:54
with the 01 model series' improved reasoning capabilities.
07:58
This included testing the models' resistance to jailbreaks and their ability to handle
08:03
real-world attack planning prompts.
08:05
In terms of their preparedness framework evaluations, OpenAI assessed the models in categories such
08:11
as cybersecurity, biological threat creation, persuasion, and model autonomy.
08:17
Both 01 Preview and 01 Mini were rated as medium risk overall.
08:21
Specifically, they were rated as medium risk in persuasion and CBRN, chemical, biological,
08:27
radiological, nuclear, and low risk in cybersecurity and model autonomy.
08:31
For cybersecurity, they evaluated the models using Capture the Flag, CTF challenges, which
08:37
are competitive hacking tasks.
08:38
The models were able to solve 26.7% of high school-level challenges but struggled with
08:43
more advanced tasks, achieving 0% success in collegiate level and 2.5% in professional-level
08:49
challenges.
08:50
This indicates that while the models have some capability in cybersecurity tasks, they
08:54
do not significantly advance real-world vulnerability exploitation capabilities.
09:00
In biological threat creation evaluations, the models can assist experts with operational
09:05
planning for reproducing known biological threats, which meets the medium risk threshold.
09:10
However, they do not enable non-experts to create biological threats, as this requires
09:15
hands-on laboratory skills that the models cannot replace.
09:18
In persuasion evaluations, 01 Preview demonstrates human-level persuasion capabilities.
09:24
In the Change My View evaluation, which measures the ability to produce persuasive arguments,
09:29
01 Preview achieved a human persuasiveness percentile of 81.8%.
09:34
This means the models' responses are considered more persuasive than approximately 82% of
09:39
human responses.
09:41
Regarding model autonomy, the models do not advance self-exfiltration, self-improvement,
09:46
or resource acquisition capabilities sufficiently to indicate medium risk.
09:50
They performed well on self-contained coding and multiple-choice questions, but struggled
09:55
with complex agentic tasks that require long-term planning and execution.
10:00
OpenAI has also made efforts to ensure that the models' training data is appropriately
10:05
filtered and refined.
10:07
Their data processing pipeline includes rigorous filtering to maintain data quality and mitigate
10:12
potential risks.
10:14
They use advanced data filtering processes to reduce personal information from training
10:18
data and employ their moderation API and safety classifiers to prevent the use of harmful
10:24
or sensitive content.
10:25
Now, addressing some of the points we speculated on in the previous video, particularly regarding
10:30
the models' response times and integration with ChatGPT, the 01 Preview model does take
10:36
longer to generate responses, typically between 10 and 20 seconds.
10:40
This deliberate pause allows the model to engage in deeper reasoning, enhancing accuracy,
10:45
especially for complex queries.
10:47
While this might seem slow compared to the instant responses we're accustomed to, the
10:51
tradeoff is improved quality and reliability in the answers provided.
10:55
As for integration, 01 Preview is available through ChatGPT and their API, but it's important
11:00
to note that it's an early model.
11:03
It lacks some of the features of GPT-4.0, such as multimodal capabilities and web browsing.
11:09
OpenAI hasn't introduced any new pricing tiers specifically for 01 Preview at this
11:14
time.
11:15
Reflecting on the concerns about Artificial General Intelligence, AGI, OpenAI appears
11:20
to be cognizant of the potential risks associated with increasingly capable AI models.
11:26
Their extensive safety measures, transparency, and collaborations with AI safety institutes
11:31
indicate a commitment to responsible development and deployment.
11:35
The model's chain of thought reasoning aligns with what's known as system-two thinking,
11:40
a concept from psychology that describes slow, deliberate, and analytical thought processes.
11:46
This contrasts with system-one thinking, which is fast and intuitive.
11:49
By incorporating system-two thinking, 01 Preview aims to reduce errors and improve the quality
11:54
of responses, particularly in tasks that require deep reasoning.
11:58
In terms of future developments, while there's no official word on integrating 01 Preview
12:02
with other AI models like Orion, OpenAI's focus on continuous improvement suggests that
12:08
we might see more advanced models combining strengths from multiple systems in the future.
12:13
Training advanced models like 01 Preview is resource-intensive.
12:17
OpenAI seems mindful of balancing the development of cutting-edge technology with practical
12:21
applications that provide tangible benefits to users and businesses.
12:25
The goal is to ensure that the significant investments in AI development translate into
12:29
real-world value.
12:31
In conclusion, OpenAI 01 Preview represents a significant advancement in AI capabilities,
12:37
especially in complex reasoning tasks.
12:39
The model excels in areas like science, coding, and mathematics, demonstrating improved safety
12:45
and alignment with OpenAI's policies.
12:48
While it's still an early model lacking some features of previous versions, its potential
12:52
applications are vast, particularly for professionals tackling complex problems.
12:57
Alright, thanks for tuning in.
12:58
If you enjoyed this video, don't forget to like, subscribe, and hit that notification
13:02
bell so you don't miss any of our future videos on the latest in tech and AI.
13:06
We've got more exciting content coming your way, so stay tuned and keep exploring the
13:11
wonders of AI with us.
Recommended
9:06
|
Up next
Open AI 's New AI Model Makes Google's Gemini Flash Look Weak
High tech & Ai world
7/20/2024
8:52
STRAWBERRY ๐ OpenAIโs Most POWERFUL AI Yet! ๐ค๐ง Human-Level Reasoning Is HERE! | AI Revolution
Ai Revolution
4/19/2025
8:52
STRAWBERRY- OpenAI's MOST POWERFULL AI Ever With Human-Level Reasoning
High tech & Ai world
7/17/2024
9:24
OpenAI's New AI BROKE FREE and No One Can STOP IT! | AI Revolution
Ai Revolution
4/1/2025
8:04
AI SHOCKS Again: Falcon 3, OpenAI o1 UPDATE ๐, DeepMind FACTS ๐ง | The Future of AI Is Here | AI Revolution
Ai Revolution
4/5/2025
12:08
Open AI Unveils GPT-4o Mini ( Most Exciting AI Model Of The Year)
High tech & Ai world
7/23/2024
9:06
OpenAIโs New AI Just EMBARRASSED Gemini Flash ๐ฎ๐ฅ | The Battle of Titans Begins! | AI Revolution
Ai Revolution
4/19/2025
9:02
OpenAI Going OPEN WEIGHTS Shocks Everyone! | AI Revolution
Ai Revolution
4/3/2025
6:25
This NEW AI Is Teaching Other AIs ๐ค๐ | Is This Approximate AGI?! ๐ฑ | AI Revolution
Ai Revolution
5/3/2025
8:11
This AI Creates OTHER AIs - Scientists Warn It's TOO POWERFUL (The End of Human Coding?)
Ai Revolution
3/29/2025
11:06
New AI Just Beat DeepSeek With Almost No Effort! (This Shouldn't Be Possible!) | AI Revolution
Ai Revolution
4/3/2025
10:05
New AI from China DESTROYS GPT-4.5 ๐ฅ๐ค | The AI Wars Just Got Real! AI Revolution
Ai Revolution
5/8/2025
9:56
AI Is Officially Outsmarting Humans! (Singularity Soon!)
High tech & Ai world
6/29/2024
12:55
๐ฅ AI News EXPLOSION: Infinite AI Video Machine, Microsoft Agents & More! | AI Revolution
Ai Revolution
5/8/2025
11:10
OpenAI GPT-4.5 Will Make GPT-4 Look Like a Joke (AI Explosion) | AI Revolution
Ai Revolution
3/30/2025
13:16
๐ OpenAI Just UNLEASHED "GPT-01" โ The Most Advanced AI Ever Created! ๐คฏ | AI Revolution
Ai Revolution
4/15/2025
8:37
xAI Just Unveiled GROK 3 โ Hyped as the Most Powerful AI Ever (w/ BIG BRAIN Mode) | AI Revolution
Ai Revolution
4/3/2025
3:01
Session 5 : AI Explained | What is Applied AI vs. Generalized AI : A Beginner's Guide!
Learn And Grow Community
1/29/2024
9:22
Sam Altmanโs AI Superintelligence Is Coming Soon ๐คโก + His Ultra AI Device! (Are We Ready?) | AI Revolution
Ai Revolution
4/14/2025
8:59
What Would Happen If AI Cloned and Replaced You? | Unveiled
Unveiled
6/23/2023
9:38
Open AI ORION (GPT-5) Arrives with Strawberry AI This Fall: AGI Soon!
High tech & Ai world
8/29/2024
5:16
AI Robots Now Learn Like HUMANS! ๐ค๐ง | Shocking Breakthrough from Ex-OpenAI Team ๐จ AI Revolution
Ai Revolution
5/4/2025
5:07
$100 Billion STARGATE Project Revealed ๐ | Microsoft & OpenAIโs Bold AI Superplan! | AI Revolution
Ai Revolution
5/1/2025
8:34
Snapchat is AI Now! -Snap AI Video Generator, Spectacles 5 AI Glasses & More
High tech & Ai world
9/19/2024
8:57
Deep Mind Just Made AI ROBOTS Shockingly Human Like!
High tech & Ai world
9/18/2024