The Obtain: sycophantic LLMs, and the AI Hype Index

0
22
The Obtain: sycophantic LLMs, and the AI Hype Index

[ad_1]

That is right now’s version of The Obtain, our weekday e-newsletter that gives a every day dose of what’s occurring on the earth of know-how.

This benchmark used Reddit’s AITA to check how a lot AI fashions suck as much as us

Again in April, OpenAI introduced it was rolling again an replace to its GPT-4o mannequin that made ChatGPT’s responses to person queries too sycophantic.An AI mannequin that acts in a very agreeable and flattering manner is extra than simply annoying. It may reinforce customers’ incorrect beliefs, mislead individuals, and unfold misinformation that may be harmful—a selected danger when growing numbers of younger individuals are utilizing ChatGPT as a life advisor. And since sycophancy is troublesome to detect, it may possibly go unnoticed till a mannequin or replace has already been deployed.A brand new benchmark referred to as Elephant that measures the sycophantic tendencies of main AI fashions may assist corporations keep away from these points sooner or later. However simply realizing when fashions are sycophantic isn’t sufficient; you want to have the ability to do one thing about it. And that’s trickier. Learn the complete story.

—Rhiannon Williams

The AI Hype Index

Separating AI actuality from hyped-up fiction isn’t at all times simple. That’s why we’ve created the AI Hype Index—a easy, at-a-glance abstract of the whole lot that you must know in regards to the state of the business. Check out this month’s version of the index right here.

The must-reads

I’ve combed the web to seek out you right now’s most enjoyable/necessary/scary/fascinating tales about know-how.

1 Anduril is partnering with Meta to construct a sophisticated weapons systemEagleEye’s VR headsets will improve troopers’ listening to and imaginative and prescient. (WSJ $)+ Palmer Luckey needs to show “warfighters into technomancers.” (TechCrunch)+ Luckey and Mark Zuckerberg have buried the hatchet, then. (Insider $)+ Palmer Luckey on the Pentagon’s way forward for blended actuality. (MIT Know-how Overview)2 A brand new Texas regulation requires app shops to confirm customers’ agesIt’s following in Utah’s footsteps, which handed an identical invoice in March. (NYT $)+ Apple has pushed again on the regulation. (CNN)3 What occurs to DOGE now?It has misplaced its chief and a high lieutenant throughout the area of per week. (WSJ $)+ Musk’s departure raises questions over how a lot energy it would wield with out him. (The Guardian)+ DOGE’s tech takeover threatens the protection and stability of our essential information. (MIT Know-how Overview)

4 NASA’s ambitions of a 2027 moon touchdown are trying much less likelyIt wants SpaceX’s Starship, which retains blowing up. (WP $)+ Is there a viable various? (New Scientist $)

5 College students are utilizing AI to generate nude pictures of every otherIt’s a grave and rising downside that nobody has an answer for. (404 Media)

6 Google AI Overviews doesn’t know what yr it isA yr after its introduction, the characteristic continues to be making apparent errors. (Wired $)+ Google’s new AI-powered search isn’t match to deal with even primary queries. (NYT $)+ The corporate is pushing AI into the whole lot. Will it repay? (Vox)+ Why Google’s AI Overviews will get issues unsuitable. (MIT Know-how Overview)

7 Hugging Face has created two humanoid robots The machines are open supply, which means anybody can construct software program for them. (TechCrunch)

8 A well-liked vibe coding app has a significant safety flawDespite being notified about it months in the past. (Semafor)+ Any AI coding program catering to amateurs faces the identical problem. (The Info $)+ What’s vibe coding, precisely? (MIT Know-how Overview)

9 AI-generated movies have gotten far more realisticBut not in terms of depicting gymnastics. (Ars Technica)

10 This digital tattoo measures your stress levelsConsider it a temper ring to your face. (IEEE Spectrum)

Quote of the day

“I feel lastly we’re seeing Apple being dragged into the kid security area kicking and screaming.”

—Sarah Gardner, CEO of kid security collective Warmth Initiative, tells the Washington Submit why Texas’ new app retailer regulation may sign a turning level for Apple.

Yet one more factor

Home-flipping algorithms are coming to your neighborhoodWhen Michael Maxson discovered his dream house in Nevada, it was not owned by an individual however by a tech firm, Zillow. When he went to try the property, nevertheless, he found it broken by an enormous water leak. Regardless of providing to deal with the expensive repairs himself, Maxson found that the home had already been offered to a different household, on the similar value he had provided.Throughout this time, Zillow misplaced greater than $420 million in three months of erratic home shopping for and unprofitable gross sales, main analysts to query whether or not the whole tech-driven mannequin is basically viable. For the remainder of us, an even bigger query stays: Does the arrival of Silicon Valley tech level to a greater future for housing or an business disruption to concern? Learn the complete story.

—Matthew Ponsford

We will nonetheless have good issues

A spot for consolation, enjoyable and distraction to brighten up your day. (Received any concepts? Drop me a line or skeet ’em at me.)

+ A 100-mile real-time ultramarathon online game that lasts wherever as much as 27 hours is about as enjoyable because it sounds.+ Right here’s how edible glitter may assist save the standard water vole from extinction.+ Cleansing large statues just isn’t for the faint-hearted ($)+ When is a flute trainer not a flautist? When he’s a whistleblower.

[ad_2]