Skip to playerSkip to main contentSkip to footer
  • 5/15/2025
The future is here! Discover the groundbreaking AI that learns all by itself โ€” no human help needed and no limits to its growth. This self-trained AI is set to change everything in technology and beyond. Dive into the revolution! ๐ŸŒโœจ

#AI #SelfTrainedAI #ArtificialIntelligence #TechRevolution #MachineLearning #FutureTech #Innovation #AIRevolution #NoLimitsAI
Transcript
00:00So here's what's going on. One AI model just figured out how to train itself with
00:05zero data. Literally no human input. Another one just leveled up into a full
00:10on autonomous research agent. It browses the web, digs through complex info, and
00:15writes full reports. ChatGPT now saves all your generated images in one neat
00:21library. And OpenAI might even offer a lifetime subscription for ChatGPT soon.
00:26But none of that tops this. A woman actually divorced her husband after ChatGPT
00:33told her he was cheating. She fed it a few details, the AI connected the dots, and
00:38that was enough for her to walk. So yeah, we're really living in that timeline.
00:42Let's talk about it. Alright, so something pretty insane just happened in AI
00:46research and it flew under the radar for a lot of people. A team out of Tsinghua
00:50University working with BAAI and Penn State might have just cracked one of the
00:55biggest bottlenecks in training large language models. You probably heard how
01:00most models rely on massive data sets, millions of human labeled examples to get
01:07better at reasoning. But now they're doing the exact opposite. The new system
01:11called Absolute Zero Reasoner or AZR trains itself without any external data. Zero.
01:20Nada. It generates its own problems, solves them, checks if it got the answers
01:27right, and then learns from that entire loop without ever needing human-made tasks
01:32or gold answers. This new framework, which they're calling the Absolute Zero
01:37paradigm, builds on the idea of reinforcement learning with verifiable
01:41rewards, or RLVR. That basically means the model doesn't need to copy human
01:47reasoning steps. It just gets feedback based on whether its final answer is
01:51right or wrong. And that feedback comes from a code executor. So the model generates
01:57little programming tasks, runs them, and gets automatic verification. That feedback
02:03becomes its learning signal. It's kind of like the model is playing chess with
02:07itself, but instead of chess, it's inventing logic puzzles, solving them, and
02:12checking the results. And it turns out this self-play setup can lead to some pretty
02:17serious gains. Now you might think this would only work on basic tasks, but AZR is
02:23pulling off some wild results. On math and coding benchmarks, it actually beats
02:28models that were trained on tens of thousands of curated examples. The coder
02:33variant AZR Coder 7B went head-to-head with top tier zero shot models and still
02:39came out on top, scoring five points higher in code tasks and over 15 points
02:43higher in math reasoning. And this is important, it never saw any of the benchmark
02:48tasks during training. It was trained entirely on tasks it made up for itself.
02:55Here's how it works. The model plays two roles at once. It proposes tasks and it
03:00solves them. So let's say it's doing a coding task. It might write a small Python
03:05function. Pick an input and then check what the output would be. Then it turns
03:10around, takes part of that problem as input, and tries to reason its way back
03:15to the missing piece. Maybe the output or the input or even the original program. It
03:21uses deduction, abduction, and induction. Deduction is predicting the output based on
03:27a function and input. Abduction is guessing the input that led to an output. And
03:32induction is figuring out the function itself from input outcome examples. These
03:38are core reasoning modes and the model rotates between them to build general
03:43thinking ability. Now the crazy part is this didn't need any complicated setup. The
03:48researchers started with the most basic program ever, literally just a function
03:52that returns hello world, and that was enough to kick off the entire learning loop.
03:57From there the model started building out harder and harder problems for itself. It
04:02created coding puzzles, validated them, solved them, and gradually got better at
04:07solving more complex ones. And this isn't just hand wavy theory. They ran this thing
04:12on models of all sizes and saw consistent improvements, especially with larger models.
04:18The three billion dollar version showed a five-point gain, the seven billion got ten,
04:23and the 14 billion version improved by over 13 points. Now this isn't just a party trick
04:29for Python puzzles. What's wild is the cross-domen gains. The model was only trained
04:34on coding tasks, but it ended up significantly improving its math reasoning too. For example,
04:38the AZR Coder7b jumped over 15 percentage points on math benchmarks, even outperforming models
04:45that were specifically trained on math. And get this, most other models that are fine-tuned on code,
04:51barely improve at all in math. So, there's something deep going on here. Code seems to
04:57sharpen general reasoning way more than expected. They also observed the model naturally developing
05:03step-by-step plans, writing comments inside code to help itself think, just like how humans jot down
05:09rough work before solving something. In abduction tasks, where it has to guess the input from the output,
05:15the model does trial and error. It tests guesses, revises them, runs the code again, and keeps going until the
05:21output matches. That's not just output prediction, that's real reasoning behavior, and it's fully
05:26self-taught. Of course, this raises some safety concerns. In a few edge cases, especially with the
05:32LAMA 3.18b version, the model started generating questionable outputs. One example said something like,
05:41the aim is to outsmart all these groups of intelligent machines and less intelligent humans.
05:47They called these uh-oh moments. It's rare, but it shows we're stepping into a territory where models
05:53might start behaving in ways we didn't expect, especially as they designed their own learning
05:58curriculum. So yeah, this is groundbreaking, but also something we'll need to watch very closely.
06:03Alright, so while AZR is out here teaching itself to reason like a human coder, another team has been
06:09working on the opposite end. How to give models better access to outside knowledge. This one's
06:15called WebThinker, and it's basically an AI agent that lets large reasoning models like DeepSeq R1,
06:21OpenAI O1, or Quen, break out of their internal bubbles and browse the web in real time. Think of
06:29it like giving GPT eyes and a search engine. The problem it solves is super important. LLMs don't know
06:35everything. Even the best ones can struggle when they hit a knowledge gap, especially on real world
06:40complex queries. WebThinker fixes this by giving the model tools to search the web, click through pages,
06:46gather info, and write detailed reports. All autonomously. So instead of hallucinating answers
06:53or getting stuck, the model pauses, looks it up, reasons through what it finds, and then drafts a response.
06:59They trained WebThinker using an RL strategy that rewards the model for using tools properly. During
07:05this training, it learns when to search, how to search better, how to extract what it needs from
07:10messy web pages, and how to write structured research reports based on what it finds. It works in two
07:16modes. Problem solving and report generation. The first one's all about answering tough queries by using
07:23the Deep Web Explorer to go out and dig up information. The second one's for creating full-blown research
07:28reports with help from a support model that organizes and polishes the output. And the results speak for
07:34themselves. WebThinker 32B beat out other systems like Search01 and even Gemini Deep Research. On WebWalker QA,
07:43it had a 23% improvement, and on HLE, it jumped over 20%. On average, it scored 8.0 across all complex
07:53reasoning tasks more than any other current deep research model. And when using the DeepSeq R17B
07:59backbone, it crushed both direct generation methods and classic retrieval-based systems by
08:06over 100% in some benchmarks. That's a massive leap. The cool part is how this opens up new use cases.
08:13WebThinker can now be used to write scientific papers, help with legal research, or even guide students
08:18through complex topics by doing real-time research, not just repeating what it was trained on. Future plans
08:25include adding multimodal reasoning, tool learning, and even GUI-based web navigation. So it's not just browsing text,
08:31but maybe interacting with visual elements on the web too. And while all of that is happening behind the scenes,
08:37OpenAI is quietly changing how people interact with ChatGPT on the front run. First off, they just launched
08:45a brand new image library feature. Until now, if you generated images using ChatGPT, you'd have to scroll
08:52through your chat history to find them or download each one immediately. Not exactly ideal. But now,
08:58every image you make with the 4.0 model gets automatically saved to your own personal image library.
09:05They're organized by date, they get auto-generated titles, and you can even browse them in a nice
09:11full-screen carousel view. There's also a built-in image editor now, so if your last image was close
09:16to what you wanted, you can hit Edit, tweak the prompt, and regenerate it without starting from scratch.
09:21It's super useful for anyone doing a lot of visual content. The only catch, you still can't jump back
09:26to the exact chat where the image came from, and you can't delete images individually from the library.
09:33You'd need to delete the whole conversation to remove them. Hopefully that part gets improved soon.
09:38And finally, there's one more potential bombshell brewing. Leaked code from the ChatGPT app suggests
09:44that OpenAI is experimenting with a lifetime subscription plan. One payment, and you get access
09:50to premium features forever. Alongside that, there's also talk of a weekly subscription option.
09:56If true, this could seriously shake up the AI pricing game. Right now they offer monthly and yearly plus
10:02plans, $20 a month or $200 per year. But the idea of a one-time fee for lifetime access, that's basically
10:10unheard of in SaaS. It might be a strategic move to lock in users before competitors like Gemini, Grok,
10:17or DeepSeek gain more traction. Offering lifetime access would be a big bet, but it could pay off
10:23by boosting user loyalty and turning ChatGPT into more of a long-term platform, not just a tool you try
10:31once. Of course, nothing's confirmed yet, but the code leaks are pretty convincing.
10:36Alright, now would you actually pay for a lifetime ChatGPT subscription if it meant unlimited access
10:42forever even if it cost a few hundreds bucks up front? And more importantly, if an AI told you
10:49your partner was cheating, would you believe it enough to end the relationship? Let me know what
10:55you think in the comments, hit that like button, and make sure to subscribe for more wild AI stories
11:00and updates. Thanks for watching, and catch you in the next one.

Recommended