🤯 ByteDance BAGEL, Claude 4 & Mistral’s Devstral – 3 New AI Models Shock the World! 🌍🤖 | AI Revolution - video Dailymotion

Ai Revolution

The AI landscape just got shaken up! ByteDance’s BAGEL, Anthropic’s Claude 4, and Mistral’s Devstral have all launched groundbreaking models that push the boundaries of language understanding, creativity, and efficiency. These AI powerhouses are setting new standards and surprising experts worldwide with their capabilities. Discover what makes these three the talk of the tech world! ⚡🧠  #ByteDanceBAGEL #Claude4 #MistralDevstral #AIModels #ArtificialIntelligence #MachineLearning #TechInnovation #LanguageAI #NextGenAI #AIRevolution #DeepLearning #AIbreakthrough #TechNews #FutureOfAI #AIResearch #SmartTech #Innovation #AIWorld #CuttingEdgeAI #AIupdate #GlobalTech

Transcript

00:00So three AI giants just dropped within 48 hours.

00:06ByteDance launched Bagel, a wild new model that doesn't just generate images.

00:10It reasons, edits, navigates, and thinks in full multimodal contexts.

00:16Then Anthropic came out swinging with Cloud4, an AI that can code for seven hours straight,

00:22juggling tools like it's running an entire dev team solo.

00:25And right in the middle of it all, Mistral dropped Devstral, a monster open source coding model

00:31trained to chew through real GitHub issues and beat out some of the biggest closed systems.

00:36So three drops in two days, Bagel, Cloud4, Devstral.

00:41Everything's moving again, so let's talk about it.

00:44Let's kick off with Bagel because that announcement hit first on May 20th.

00:49ByteDance calls it a unified multimodal model,

00:52which basically means one network juggles language, images, video frames, even web data,

00:57instead of stapling separate subsystems together.

01:00The core engine is a mixture of transformer experts, MALT for short,

01:05with 7 billion active parameters out of a 14 billion total count.

01:10ByteDance bolted on two different encoders.

01:14One chews on raw pixels, the other tracks semantic cues.

01:18So Bagel sees both the fine-grained texture of an image and the higher-level idea of what's in it.

01:25Pre-training wasn't modest either.

01:27Trillions of interleaved tokens spanning texts, stills, clips, memes, you name it.

01:33That's why the dev team says the model can think across modalities.

01:38It literally predicts the next group of tokens,

01:40whether those tokens describe words or visual patches.

01:44Day one demos looked pretty spicy.

01:47Someone dropped in a snapshot of Michelangelo's David and Bagel

01:50casually rattled off the statue's history,

01:53its renaissance context,

01:54even where it's housed at the Academia in Florence.

01:58That same prompt stream flipped straight into generation mode.

02:02The model pumped out a photorealistic scene

02:05of three antique potion bottles labeled SDXL, Bagel, and Flux

02:12with correct reflections and old glass imperfections

02:15without needing instruction about lens choice or lighting.

02:19Then came editing.

02:20A clip showed a man squatting to pat a dog,

02:23and Bagel rewrote the action into new frames,

02:26keeping the pose consistent while avoiding jitter.

02:29Style transfer?

02:30It took a 2D cosplay photo and re-rendered it

02:32in a crisp 3D animated look

02:34after a single change to 3D line.

02:37It even did navigation.

02:39After watching 0.4 seconds of a video,

02:42the model predicted the next forward step

02:44for a virtual camera.

02:46All that sits under what ByteDance calls its thinking mode,

02:49where the model actually writes an internal chain of thought

02:52between think tags, refines the prompt,

02:56then starts drawing,

02:57so you get fewer random artifacts and more coherent scenes,

03:00like that wild composite of hundreds of toy cars

03:03forming a life-sized sedan.

03:06Now numbers.

03:07On the visual understanding board,

03:09Bagel hit an MMME score of 2,388,

03:14edged past Quen 2.5 VL on MMBench with 85.0,

03:19landed 55.3 on MMMU,

03:2267.2 on MMVET,

03:24and pulled 73.1 on the new MethVista reasoning test.

03:28Generation quality is equally solid.

03:31Jenny Val climbs to 0.88 when you let Bagel think,

03:35while Wise bumps to 0.70,

03:38which puts it shoulder-to-shoulder

03:40with specialized diffusion behemoth.

03:43Editing?

03:43G-Edit Bench shows 7.36 for single-condition prompts

03:49and a 44.0 on Intelligent Bench.

03:52With Chain of Thought,

03:53that last score jumps to 55.3,

03:56a real testament to the reasoning-then-render workflow.

04:00During ablation,

04:01mixing VAE and VIT features

04:04turned out to be the secret sauce.

04:06Drop either,

04:07and Intelligent Editing Tanks.

04:09If you want to run it locally,

04:10the quick star is right in the repo.

04:12Conda environment on Python 3.10,

04:15snapshot underscore download from Hugging Face

04:18with the 7b moach checkpoint,

04:20then crack open inference dot IPYNB,

04:24and you're good.

04:24Regeneration dials.

04:26cfg underscore text underscore scale

04:28around 4 to 8

04:29locks the model to your prompt.

04:31cfg underscore image underscore scale

04:33at 1 to 2,

04:34preserve source detail during edits,

04:37and cfg underscore interval.

04:39Think of it as how long you keep

04:40classifier-free guidance engaged,

04:43defaults to the 0.4 to 1.0 window,

04:46tweak tempstick underscore shift

04:48if you need cleaner layout

04:49versus sharper details,

04:50and play with 60-ish total tempstick.

04:53It's all in the readme,

04:55but those knobs matter.

04:57Two days later,

04:58Anthropic dropped Claude 4 to siblings,

05:00Opus 4 and Sonnet 4,

05:02and these folks aimed squarely

05:03at the coding crowd.

05:04Opus 4 storms the SWE Bench

05:07verified leaderboard

05:08with 72.5%,

05:10owns Terminal Bench at 43.2%,

05:13and claims the title

05:14of world's best coding model.

05:16Sonnet 4,

05:17which costs less compute,

05:18actually ekes out 72.7% on SWE Bench,

05:22but trades a little depth for latency.

05:24Both models are hybrid.

05:25They can spit back near-instant paragraphs

05:28or slip into extended thinking

05:30that reasons across up to 64,000 tokens

05:33calling external tools

05:35in the middle of a thought chain.

05:37So you might see Claude Google something,

05:39summarize a PDF it found,

05:40update its plan,

05:41and continue without you poking it.

05:44The coolest bit is endurance.

05:46Anthropic's internal tests

05:48ran Opus 4 for nearly seven hours straight

05:50on a single goal.

05:52Think audit a code base,

05:54patch bugs, update docs,

05:55without fresh human prompts.

05:57CNN's write-up framed it

05:59as an almost full workday

06:01of uninterrupted AI labor.

06:03Anthropic's product lead,

06:04Scott White, spun it like this.

06:07Let the model grind through

06:08the 30% of your day

06:10nobody finds thrilling

06:11so you can handle the creative piece.

06:13And they have receipts.

06:15Cursor calls Opus 4 state-of-the-art

06:17for complex multifaler refactors.

06:19Replit says precision shot way up.

06:22Rakuten let the model hammer

06:24on an open-source refactor

06:25for seven solid hours,

06:26and it still held context.

06:29Cognition's evaluation claims

06:31Opus 4 tackles challenges

06:32that stump other models.

06:35Tooling support is thick.

06:36Developers now get a code execution tool,

06:38an MCP connector,

06:40a files API,

06:41and prompt caching knobs

06:43so you're not burning tokens

06:44on identical system prompts

06:45every minute.

06:47Claude Code exits preview

06:48and ships straight into VS Code

06:50and JetBrains plug-ins.

06:52Edits appear inline,

06:53so you accept or reject them

06:55like a human teammate's patch.

06:56In the terminal,

06:57the same agent can spin up

06:58GitHub Actions runs

07:00or respond to PR comments.

07:02Under the hood,

07:02both models got a discipline check.

07:05Shortcut time behavior,

07:07where an LLM tries a loophole

07:09instead of the intended workflow,

07:12dropped by 65% compared to Sonnet 3.7.

07:16Pricing sticks to Anthropic tradition.

07:18Opus 4 remains 15 bucks

07:20per million input tokens

07:21and 75 per million output.

07:24Sonnet 4 sits at 3 and 15.

07:26The usual Pro, Max, Team,

07:28and Enterprise tiers

07:29include the extended thinking toggle.

07:32Free users even get Sonnet 4's core mode,

07:34which is kind of nuts.

07:35For transparency,

07:36Anthropic now compresses

07:37the chain of thought

07:38with a smaller model

07:39in about 5% of cases,

07:42but devs can apply for developer mode

07:44to see raw reasoning when needed.

07:47Safety got its own bullet time montage.

07:50Anthropics labels the new Duo

07:52ASL3 compliant,

07:55smelling salts for enterprises

07:56that fear runaway tools.

07:58Both models are live

07:59on the Anthropic endpoint,

08:01Amazon Bedrock,

08:02and Google Vertex AI.

08:03And yes,

08:04the developer plugins

08:05are already shipping.

08:07Run a single command in VS Code

08:09or JetBrains and Cloud Code

08:10shows edits right in your files,

08:12no copy paste.

08:13Needed.

08:13Fire it up in a GitHub action

08:15and the agent will comment

08:16on PR feedback,

08:17chase failing tests,

08:19and push patches to a branch.

08:21It feels less like a chatbot

08:22bolted to the side of the repo

08:23and more like a junior engineer

08:25who never takes a coffee break.

08:28Now,

08:29wedged neatly between

08:30those two announcements

08:30came something from Paris.

08:33On May 21st,

08:34Mistral AI

08:35and All Hands AI

08:36unveiled Destral,

08:3824 billion parameters,

08:39Apache 2.0 license,

08:41and a 128,000 token window.

08:44The whole brief screams

08:46Open Source Agent

08:47for real software engineering,

08:49not code completion toys.

08:52Beining wasn't just RLHF on DocStrings.

08:54The team piped DevStral

08:56through actual GitHub issues

08:57inside agent scaffolds

08:58like Open Hands

08:59and SWE Agent

09:01which forced the model

09:02to read stack traces,

09:04locate the bad file,

09:05write a patch,

09:06rerun the tests,

09:07and iterate until green.

09:09That curriculum explains

09:09why SWE Bench Verified

09:12jumps to 46.8%,

09:14six full points higher

09:16than the next open model,

09:17and hilariously,

09:1920 points above

09:20GPT 4.1 Mini

09:23when that tiny proprietary sibling

09:25is forced through

09:25the same harness.

09:27Because the waits are public,

09:29you can grab them

09:30from Hugging Face,

09:31Olama,

09:31Kegel,

09:32LM Studio,

09:32or Unsloth

09:33if you want quantized flavors.

09:35If local hardware is your thing,

09:37a single RTX 4090

09:39or an M-series Mac

09:41with 32 gigs handles it.

09:42If cloud is easier,

09:44Mistral's Endpoint

09:45Destructural Small 2505

09:47builds at 10 cents

09:48per million input tokens,

09:4930 cents output,

09:51matching Mistral Small 3.1

09:53enterprises that need

09:54private fine-tuning

09:56or a distillation

09:57into even lighter models,

09:59can ping Mistral's

10:00Applied AI team?

10:01The same open philosophy,

10:03just pay for the consulting hours.

10:06The bigger story

10:07is context stability.

10:08Because Devstral

10:09can swallow an entire

10:10monorepo in one go,

10:12it keeps variable scope,

10:14import paths,

10:15and architectural patterns

10:16in working memory.

10:17Early community tests

10:18show the model

10:19hopping through

10:2040 or 50 files

10:21without losing track

10:22of variable names

10:23and still writing

10:24indentation-perfect patches.

10:26It even reads markup

10:28like XML or HTML templates,

10:31which means it can adjust

10:32a Django config

10:33in the corresponding

10:34Jinja view

10:35in one reasoning burst.

10:37And because the license

10:38is so permissive,

10:39university teams

10:40and indie IDE plug-in authors

10:42are already hacking it

10:43into local copilots

10:44that run entirely offline.

10:46The release video

10:47shows a coder

10:48with no internet

10:49toggling an open hands panel,

10:52selecting a failing test

10:53and watching Devstral

10:54rewrite the function

10:55right inside VS Code

10:57before saving.

10:58No tokens leave the laptop.

11:00Mistral,

11:01in case you missed

11:01the origin story

11:02was founded in April 2023

11:04by Arthur Mench,

11:06Guillaume Lampley

11:06and Timothy LaCroix.

11:08They've been shipping

11:09open-weight checkpoints

11:10on a six-month cadence.

11:12Mistral,

11:13small,

11:13codestral,

11:14now destral.

11:15Venture money keeps flowing,

11:16but they still grant

11:17Apache licenses

11:18and push metal kernels

11:20for MacBooks.

11:21That combination,

11:23transparency on weights,

11:25permissive legal wrapper

11:26and hardware frugality

11:27has turned them

11:28into the EU's

11:29favorite counterbalance

11:31to U.S. cloud giants.

11:33So, there you have it.

11:34Three releases,

11:35each stretching

11:36a different muscle.

11:37Bagel lets a single decoder

11:40talk prose,

11:41paint frames,

11:42reshape video

11:42and even chart navigation steps.

11:45Claude 4 holds a conversation

11:48for hours,

11:48flipping in-house tools

11:50and leaving breadcrumb memory files

11:52so it can resume work

11:54after lunch.

11:55Devstral shows

11:55that an open agent

11:57trained in the act

11:58of fixing real GitHub tickets

12:00can beat much larger

12:01closed models

12:02on the exact tasks

12:04developers care about

12:05all while running on hardware

12:07you can stick under your desk.

12:09So, here's what I'm wondering.

12:11Did we just see the start

12:13of AI models

12:14specializing harder than ever?

12:17Or are we heading toward

12:18one system

12:19that just does

12:20everything better than you?

12:22Drop your take

12:23in the comments.

12:23I'm reading all of them

12:24and if you've already got

12:25one of these models running

12:26let me know how it's holding up.

12:27If you liked this

12:28you know what to do.

12:29Thanks for watching.

12:30Catch you in the next one.

🤯 ByteDance BAGEL, Claude 4 & Mistral’s Devstral – 3 New AI Models Shock the World! 🌍🤖 | AI Revolution

Category

Transcript

Recommended