🚨 New AI HYENA SMASHES Old Models, Breaks Speed & Memory Records!

Name: 🚨 New AI HYENA SMASHES Old Models, Breaks Speed & Memory Records! | AI Revolution
Uploaded: 2025-05-08T11:30:31+00:00
Duration: 11 min 1 s
Channel: Ai Revolution

Ai Revolution

5/8/2025

💥 BREAKING AI NEWS! Meet the AI HYENA — a revolutionary new AI that DESTROYS old models and smashes speed and memory records! 🤖⚡

In this AI Revolution episode, we cover:
🔥 AI HYENA’s groundbreaking performance: faster and more efficient than anything before
💡 How it’s setting new benchmarks in speed, memory, and AI capabilities
🔍 What this means for the future of AI technology in industries like gaming, research, and more
🚀 Why this is a game-changer and how it challenges existing AI models like GPT-4
💬 Potential real-world applications and the risks it brings

Stay ahead of the curve with the latest in AI tech. Don’t miss out on this revolutionary breakthrough!

🔔 SUBSCRIBE for more of the hottest AI updates every week.

#AIHYENA
#AIRevolution
#AIUpdate
#ArtificialIntelligence
#AIModels
#SpeedRecords
#MemoryRecords
#NextGenAI
#AI2025
#MachineLearning
#AIRevolution2025
#FutureOfAI
#TechNews
#AIvsGPT4
#AIPerformance
#AIInnovation
#AIWorld
#AItech
#EmergingTech
#AIInsider

Category

🤖

Tech

Transcript

Display full video transcript

00:00Something big just dropped, and it's not another transformer upgrade, it's something completely different.

00:07Liquid AI, a Boston startup spun out of MIT, just revealed Hyena Edge on April 25th, right before ICLR 2025 kicks off in Singapore.

00:18And yeah, Hyena, like the animal that laughs at lions, except this thing is laughing at latency graphs on your phone.

00:26It's built to run powerful AI right on your device, faster and lighter than anything we're used to.

00:32And it might just be the first real sign that the transformer era is starting to crack.

00:37So let's rewind just a bit.

00:39For years, we've been in this romance with the transformer architecture because of that paralyzable attention mechanism Vaswani and friends introduced back in 2017.

00:48It's gotten us some wild breakthroughs, sure, but there's a big catch, squeezing these chunky transformer models onto a smartphone.

00:55Without frying your battery or devouring your RAM has been painful.

01:00Most edge-optimized models, like SMOL LM2, like PHY, even Meta's Llama 3.21B, still lug around standard attention blocks and rely on kernels that are great on a data center grade GPU, but not so hot on the Snapdragon inside your pocket.

01:16Liquid AI is basically saying why not ditch most of that attention baggage and do something leaner.

01:21Enter Hyena Edge, a convolution-based multi-hybrid model.

01:26Convolutions on a language model?

01:28Yep.

01:29Convas aren't new.

01:30They rule in vision.

01:32But here they're part of this broader family of operators called Hyena, which Michael Polley's group kicked off a couple years ago.

01:38The Edge variant takes it further by replacing roughly two-thirds of the grouped query attention operations inside a top-shelf Transformer++ backbone with these gated convolutions from the Hyena sub-family.

01:52That swap alone cuts a heap of memory overhead and avoids the quadratic time blow-up that attention brings.

01:58Now, Liquid AI didn't just eyeball some abstract benchmarks on a desktop GPU and declare victory.

02:05They ran the whole thing on an actual Samsung Galaxy S24 Ultra, yes, the phone you might have in your pocket right now, and compared it to a parameter-matched GQA Transformer++ model.

02:18Pre-fill latency?

02:19Hyena Edge was faster across the board and at longer contexts.

02:23We're talking up to 30% quicker.

02:25Decode latency?

02:27Same story.

02:28Once you hit sequences over 256 tokens, that convolution magic really kicks in.

02:34And memory usage was lower at every sequence length they measured, which is huge when your app's data has to fit between Spotify, TikTok, and the photo album of your cat.

02:46What about accuracy, though?

02:47Because let's be honest, nobody cares if a model is lightning fast if it can't finish your sentence.

02:51Liquid AI trained both models on the exact same 100 billion tokens and then unleashed them on a battery of standard language model benchmarks.

03:00On Wikitext, Hyena Edge's perplexity dropped to 16.2 compared to 17.3 from the Transformer baseline.

03:09Lambata went from 10.8 down to 9.4.

03:12On PyCap, the accuracy nudged up from 71.1 to 72.3.

03:20Pella Swag saw a jump from 49.3 to 52.8.

03:25Wino Grande climbed from 51.4 to 54.8.

03:29Arc Easy crept up from 63.2 to 64.4.

03:34And Arc Challenge pushed from 53.34 to 55.2.

03:39One funny footnote, both models tied on a PyQA variant at 31.7, so the Hyena didn't win absolutely everything, but it never fell behind either.

03:50Net result, speed and memory savings come with equal or better predictive oomph, which is the holy grail for on-division AI.

03:58Okay, but how did they actually arrive at that architecture?

04:01This is where it gets super nerdy, but also kind of sci-fi cool.

04:04Back in December 2024, Liquid AI unveiled something called STAR, the Synthesis of Tailored Architecture's Framework.

04:13Picture an evolutionary algorithm wearing a lab coat.

04:17You feed it a bunch of primitive operators, some constraints about latency and memory, sprinkle in linear systems theory, and then let it evolve architectures generation after generation.

04:27For Hyena Edge, they kicked off with a population of 16 candidate models.

04:33Over 24 Generation STAR juggled 18 different convolution options.

04:38Hyena Full, Hyena X, Hyena Y, with filter lengths ranging from 3 to 128, plus several flavors of grouped query attention and Swy Glue feedforward layers.

04:50Every candidate got its latent memory and latency profiled on the actual S24 Ultra, not a random desktop card.

04:59And they even trained each mini model for 5 billion tokens to keep score on perplexity in real time.

05:05As the evolutionary cycles rolled on, the Hyena Y operator kept muscling its way to the front of the pack.

05:11It turns out this variant strikes that sweet balance, plenty of expressive power without the inner convolution overhead you see in Hyena Full, and a lighter gating setup than Hyena X.

05:23STAR could literally visualize how many self-attention, Hyena, and Swy Glue blocks were inside each generation.

05:29If you watch the walkthrough video Liquid AI posted, you'll see those histograms shifting over time.

05:34Self-attention bars shrinking, Hyena Y bars swelling, latency curves dipping, kind of like watching natural selection, but with code blocks instead of genes.

05:45By the final generation, STAR spat out the design that became Hyena Edge, 32 layers deep, width 2048, attention head size 64, and two-thirds of what used to be GQA, replaced by hyena-gated convolutions.

05:59No hand-tuning, no trust-me-bro, just hardcore automated search.

06:03And they stress-tested the outcome directly on the phone again to be sure those earlier approximations held true at full scale, which they did, otherwise we wouldn't be talking about this.

06:14One angle I love is how they benchmark responsiveness on short prompts, because that's where hybrids usually fall flat.

06:21Edge apps like voice assistants often fire off queries under 20 tokens, so shaving milliseconds there is everything.

06:28Liquid AI reports their pre-fill latency advantage is visible right from the shortest sequences.

06:35That's basically the model's first impression for the user, and it only widens as you feed in longer contacts.

06:41Even if you're just sending a single sentence to your on-device language model, Hyena Edge can answer faster than its transformer twin.

06:49Now a quick side tour. Group QA attention KI was already an optimization to make transformers more manageable by letting multiple queries share key value heads.

06:59It's lighter than full attention, but still attention at heart.

07:03Hyena Edge swaps most of those heads out entirely, and that's what slices the quadratic term down.

07:08Convolution operations scale linearly with sequence length, so for 512 or 1024 tokens, you're looking at serious compute savings.

07:18The kicker is that Liquid AI engineered their gated convolution, so they still capture long-range dependencies, something older Conva models struggled with.

07:27That's how they kept or improved the perplexity numbers without fallback to heavy attention.

07:32All that said, Liquid AI isn't keeping this in a vault.

07:35They've already stated, loudly, that they plan to open-source Hyena Edge and a series of Liquid Foundation models in the coming months.

07:43If you're like me, that sentence feels like Christmas morning.

07:46It means developers will get a turnkey model that can run natively on stuff like the S24 Ultra, maybe the iPhone 16 Pro, maybe even a Raspberry Pi if you dare, without requiring a cloud subscription or a 5-watt charger.

07:59And because it's open, we'll see a thousand forks, someone will quantize it to 4-bits, someone else will fine-tune for coding assistance, another team will port it to watchOS.

08:09Mark my words.

08:10More broadly, this is part of a bigger trend.

08:12We're stepping into a post-transformer world, or at least a poly-architecture ecosystem.

08:18Transformers are still unbeatable for heavy GPU jobs, though.

08:22But when it comes to edge devices, where every bit of energy matters, hybrids like convolutions, recurrent models, and even state-space models are finally getting their moment.

08:32And with smart tools like Star doing the architecture search, we're moving faster than manual tweaking ever allowed.

08:37The best part? Everything's tested directly on real devices like smartphones, not just on GPUs in a lab, so what works in theory also works in your hand.

08:48Zooming out a little, phones now have powerful NPUs, laptops are shipping with crazy AI accelerators, and there's pressure to keep AI local for privacy.

08:57Models that can crush benchmarks like Lombada at a perplexity of 9.4, use less RAM and respond 30% faster could be exactly what tips the scale.

09:06Running everything on devices just feels better.

09:09No lag, no cloud dependency, no leaking personal data when you're offline.

09:14Credit where it's due.

09:15The team behind this, listed under Liquid Science, includes Armin Thomas, Stefano Massaroli, Michael Pulley, and the rest of Liquid Edge.

09:23They built on the Hyena X and Hyena Y work, plus tweaks like Swiglu, and of course, the original Transformer ideas.

09:30It's less about one-off brilliance and more about letting algorithms evolve designs smarter than we could by hand.

09:39One quick note, early versions of Hyena Edge were tested at a width of 512 before scaling up to 2048 for the final model, keeping attention head size at 64.

09:49During Starz evolution runs, they estimated whole model latency and memory by adding up pre-measured operator stats, which let them move fast without wasting full training cycles.

09:59They even visualized it.

10:01Watching those models slide toward the bottom left corner, where low latency meets low perplexity, was like watching a stock ticker, only cooler.

10:10So, where are we headed?

10:12If the last few years belonged to Transformers, the next could be ruled by automated architecture search, hybrid models, and real practical edge AI.

10:21Hyena Edge proves you can rip out most of the attention, still match or beat quality, and get way faster on real-world devices.

10:29And since Liquid AI plans to open-source it, they're inviting everyone to take it even further.

10:34Is this the future, powerful AI running straight from your pocket with no cloud in sight?

10:41Or are we just dreaming too big too soon?

10:44Thanks for watching and I'll catch you in the next one.