- The Daily AI Show Newsletter
- Posts
- The Daily AI Show: Issue #38
The Daily AI Show: Issue #38
The Quantum Leap . . .Of Faith?

Welcome to #38
In this issue:
ChatGPT 4o Evolves: Better Search, Broken GPTs, and a New Memory Update
Top Scores + X data: Grok 3 Challenges the Leaders
AI Memory is Improving, But Do We Really Want Total Recall?
Plus, we discuss the quantum question of faith, Figure has new robots and they know where the food goes, why AI general reasoning might actually be a good thing, a reason to launch “lemonade”, riding the AI hype train, and all the news we found interesting this week.
It’s Sunday morning!
AI is advancing faster than your weekend plans are disappearing, so let’s get into it!
The DAS Crew - Andy, Beth, Brian, Eran, Jyunmi, and Karl
Why It Matters
Our Deeper Look Into This Week’s Topics
Better Search, Broken GPTs, and a New Memory Update: ChatGPT 4o Evolves
OpenAI quietly rolled out an update to ChatGPT 4o, catching users off guard with unannounced changes. Sam Altman confirmed the update in a casual social media post, stating that the model is “pretty good” and will continue improving. While some users immediately noticed improvements in creativity, conversational flow, and web search quality, others found that the update broke their Custom GPTs, disrupting workflows that relied on multi-step processes.
The update also introduced a memory refresh and a knowledge cutoff date adjustment to June 2024, raising questions about OpenAI’s approach to iterative releases. Instead of major version launches, OpenAI seems to be shifting toward stealth updates that provide continuous refinements, though without warning, these updates can create unintended disruptions for users who rely on stable behavior in model performance for their applications.
WHY IT MATTERS
Unannounced Updates Can Break Workflows: Users running Custom GPTs for business applications found their tools malfunctioning overnight, exposing unexpected risks in OpenAI’s stealth update strategy.
Better Conversational Flow: Many users reported a noticeable improvement in how ChatGPT 4o responds, with more natural phrasing and enhanced creativity.
Search Functionality is Stronger: Altman claimed ChatGPT now provides the best AI-powered search, sparking a debate with Perplexity’s CEO over whose model truly delivers better results.
Memory and Context Advances May Improve Consistency: With an updated knowledge cutoff date and potential enhancements to how ChatGPT recalls past conversations, users may experience more reliable long-term interactions.
AI’s Iterative Release Model is Taking Shape: OpenAI appears to be moving toward a continuous deployment strategy, meaning users should expect frequent, silent updates rather than milestone-based releases.
Top Scores + X data: Grok 3 Challenges the Leaders
Elon Musk has called Grok 3 the smartest AI on Earth, but early reactions suggest it may not be the breakthrough he claims. The model was released with little warning, landing for Grok+ subscribers just days after the announcement. While its reasoning benchmarks are impressive, some users are questioning how meaningful these numbers really are in real-world applications.
Grok 3’s updates focus heavily on math, coding, and scientific reasoning, mirroring trends from Anthropic, OpenAI, and Google. Initially released for X Premium+ subscribers only, on Friday this week Grok 3 was made available to all users on X, publicly through the x.ai website, and via mobile apps on iOS and Android. The new Grok 3 includes a DeepSearch mode—to compete with the Deep Research features on OpenAI, Google and Perplexity—and a “Thinking” mode. Grok’s unique access to X (formerly Twitter) posts suggest it might be uniquely well-positioned for social media and news-based queries.
WHY IT MATTERS
Benchmark Scores Look Great, But Real-World Use is Unclear: Grok 3 posted top-tier reasoning scores, but its practical benefits over competitors remain to be tested.
Faster Development Cycles Could Be a Strength: XAI moved from Grok 2 to Grok 3 in just eight months, setting expectations for rapid iteration compared to competitors. The X “Colossus” data center, with 200,000 GPUs coordinated, was able to reduce the training time to get to Frontier-model performance.
Integration with X Data Could Be a Competitive Edge: Grok 3 has access to X’s massive dataset, giving it a potential advantage in real-time news analysis and trend tracking.
The Next Model is Already in Training: Musk confirmed that Grok 4 is already in development, suggesting XAI is pushing hard to keep pace with OpenAI, Google, and Anthropic.
AI Memory is Changing, But Do We Really Want Total Recall?
AI memory is evolving, but do we actually want our language models to remember everything we have said? Current models struggle with short context windows, hallucinations, and static knowledge, and aren’t designed to deliver hundreds of millions of users meaningful long-term memory of their exchanges. New approaches from models like Google’s Titan, MemoryBank, and MemoryLLM aim to create more persistent, structured AI memory, but this raises important questions. Google’s Gemini Advanced now can remember your past conversations. (see News Item below)
Persistent memory could allow AI to remember preferences, workflows, and past interactions, reducing repetitive prompting and improving efficiency. However, unrestricted memory could also lead to unintended biases, privacy concerns, and increased computational costs. The ideal AI memory may need to balance short-term context, long-term storage, and adaptive forgetting, similar to how human memory functions.
WHY IT MATTERS
Memory Impacts Workflow Automation: AI that remembers past tasks and actions could reduce friction in repetitive workflows, especially for business applications.
Forgetting is Just as Important as Remembering: Human memory decays naturally, and AI systems may need selective forgetting to avoid accumulating outdated or irrelevant information.
Context Windows Are Still a Bottleneck: Even with larger context limits, current AI models require manual summaries or workarounds to retain ongoing context-awareness in long multi-turn conversations.
Ethical and Security Risks Exist: Unregulated AI memory could introduce privacy issues if models retain sensitive user data without proper safeguards.
Next-Gen Models are Experimenting with Hybrid Approaches: Systems like Google’s Titan and MemoryLLM aim to mix long-term retention, short-term recall, and surprise-based learning, mimicking human cognitive functions.
Did you know?
MIT researchers have developed a new AI approach that allows large language models to reason across different types of data in a more generalized way. Unlike traditional AI models that are trained for specific tasks, this method enables AI to analyze text, images, and structured data in a unified way, making it far more versatile.
The research team found that when large language models are fine-tuned with a mixture of different data formats, they can solve problems that would typically require multiple specialized AI systems. A single AI model could analyze financial reports, interpret medical scans, and summarize legal documents without needing to be retrained for each task.
This breakthrough could improve AI-powered decision-making across industries. Businesses could automate tasks more efficiently, researchers could process complex data more effectively, and medical diagnostics could benefit from AI that can connect insights across multiple sources. The study shows how AI is evolving from task-specific tools into general reasoning systems capable of handling a wide range of challenges.

Heard Around The Slack Cooler
The conversations we are having outside the live show
Karl shares the hype:
We are expecting 4.5 soonish (or as Beth said, "between next week and Summer”). But we are also expecting a new model from Anthropic and possible new updates from Google and Meta.
It is shaping up to be an interesting first half of 2025.
Whatever comes, we can probably thank DeepSeek for pushing the timelines forward a bit.

Chatbot Builder 2.0 = Lemonade
Andy shared this TikTok from Vanessa about Launch Lemonade. In the video, she builds a quick chatbot in about 25 minutes. What stood out to us though was the addition of knowledge documents and making the chatbot part of a larger workflow automation.
Check out the video by clicking on the image below.
Consensus@64 vs Pass@64
Karl shared this thread from X about model testing.
When AI models are tested, they often use methods like consensus@64 or pass@64 to measure accuracy. These approaches involve running the same query through the model 64 times and then evaluating the results. Consensus@64 means that the model’s output is marked correct only if the most common answer (the consensus) is correct. This helps ensure reliability by filtering out random mistakes, making the evaluation more stable.
On the other hand, pass@64 is more lenient. It considers the query correct if any of the 64 outputs contain a correct answer. This is useful for problems where multiple correct answers exist or where a model might occasionally guess right. The reason for choosing 64 samples is that results tend to stabilize beyond this point, meaning running it more times wouldn't improve accuracy significantly. Additionally, this method is easy to implement and works even for weaker models.

This Week’s Conundrum
A difficult problem or question that doesn't have a clear or easy solution.
The Quantum Leap of Faith
Quantum computers will soon solve problems that humans cannot verify. They may uncover new physical laws, optimize complex systems, or make discoveries that defy traditional proof. But what happens when these machines produce answers that we cannot explain, replicate, or even comprehend?
For centuries, science has relied on human understanding, skepticism, and proof. Quantum computing challenges that foundation by producing knowledge we must accept without fully grasping how or why it is true. This shifts science away from something humans can verify toward something we must trust. If we accept answers simply because an advanced system says they are correct, then knowledge itself may start to resemble belief.
The conundrum: If quantum computers reveal truths beyond human comprehension, does accepting their answers require a kind of faith in machines? And if science begins to operate on trust rather than understanding, have we advanced knowledge or abandoned it?
News That Caught Our Eye
Humane Officially Shuts Down as HP Buys Assets for $116 Million
Humane, the company behind the AI Pin, has officially shut down, selling its remaining assets to HP for $116 million. The deal includes over 300 patents and key technical staff, but Humane’s AI cloud services will shut down within days, rendering existing devices useless. The AI Pin’s failure was largely attributed to poor user experience, high return rates, and negative reviews, despite price cuts and attempts to rebrand the device as an AI operating system rather than a wearable.
Deeper Insight:
Humane’s downfall highlights a key lesson in AI hardware adoption, tech that promises to replace smartphones must actually be better than smartphones. The AI Pin never delivered on its vision, and its failure could slow down AI-first wearables development in the short term. HP’s acquisition suggests that some of Humane’s tech, possibly its AI OS and patents, may live on in future HP products like laptops, printers, or smart assistants.
Mira Murati, Former OpenAI CTO, Launches Thinking Machines Lab
Mira Murati has announced her new AI startup, Thinking Machines Lab, which aims to build customizable AI systems that adapt to users' unique needs. The company’s website is minimal, featuring a retro typewriter-style aesthetic, leading to debate over whether this is a branding choice or just a placeholder.
Deeper Insight:
Murati’s new venture is part of a broader trend of OpenAI alumni launching competing AI companies, including Ilya Sutskever’s (formerly co-founder and chief scientist at OpenAI) startup Safe Super Intelligence (SSI). Thinking Machines Lab may focus on modular AI tools for enterprises, mirroring Perplexity’s approach to research AI, but with a broader scope. Whether it becomes an OpenAI rival or a niche AI provider remains to be seen.
MIT-Harvard Breakthrough in Quantum Computing Eliminates Need for Extreme Cooling
A team of researchers from MIT and Harvard have spun out a new startup that unveiled a new neutral atom quantum computing method, which eliminates the need for extreme cooling. Unlike traditional quantum computers that require near-absolute-zero temperatures, this new system traps rubidium atoms with laser light to store quantum information. The approach allows for scalable quantum computing without bulky refrigeration units, making it one of the biggest advances in the field.
Deeper Insight:
If this technique proves scalable, it could dramatically accelerate quantum computing adoption, reducing energy costs and making quantum systems more practical for industries beyond academia. While IBM and Google focus on supercooled superconducting qubits, this method might allow for smaller, more efficient quantum computers that fit in data centers rather than research labs.
The New York Times Uses AI While Suing OpenAI
Despite its lawsuit against OpenAI for copyright infringement, The New York Times has rolled out its own AI tool, Echo, to assist journalists with editing, summarization, and social media copywriting. The tool is designed to help reporters, but it is unclear how much AI-edited content will be allowed in published articles.
Deeper Insight:
This move highlights the tension between media companies and AI. While The New York Times fights against AI scraping its content, it also recognizes AI’s value in news production. This raises ethical questions: If AI tools are beneficial to journalism, will future lawsuits focus more on compensation for training data rather than outright AI bans?
Google’s Gemini Advanced Can Now Remember Past Conversations
Google has announced that Gemini Advanced can now reference previous conversations, allowing for longer-term memory in AI interactions. This feature enables users to pick up conversations where they left off, reducing the need for repetitive context-setting.
Deeper Insight:
This upgrade signals a shift toward AI that functions more like a personal assistant rather than a chatbot. Unlike OpenAI’s memory feature, which is still in limited rollout, Gemini Advanced seems to be more context-aware across multiple conversations. However, concerns over data privacy and selective memory retrieval remain, as AI memory needs to balance convenience with user control and transparency.
Meta Enters the Humanoid Robot Race with Figure and Unitree Partnerships
Meta has confirmed its entry into humanoid robotics, partnering with Figure AI and Unitree Robotics to develop AI-driven household robots. While Meta does not plan to build its own robots, it will focus on integrating AI assistants into third-party hardware.
Deeper Insight:
This marks the latest step in Big Tech’s push toward embodied AI. With OpenAI and Tesla also investing in robotics, Meta’s focus on AI integration suggests it could become the software backbone for future personal robots. However, consumer adoption of humanoid robots is still years away, and privacy concerns over AI-powered home assistants could slow progress.
Google Introduces AI Co-Scientist to Accelerate Research
Google has launched AI Co-Scientist, an AI-powered tool that helps researchers generate and refine scientific hypotheses. The system, built on Gemini 2.0, can analyze research papers, generate experiments, and predict outcomes based on existing data. Early tests show that AI Co-Scientist accelerates lab research significantly, with scientists at Imperial College London using it to complete experiments in days that would have taken years.
Deeper Insight:
AI’s role in scientific discovery is evolving beyond data analysis into actual hypothesis generation. This could reshape drug discovery, material science, and theoretical physics by reducing trial-and-error research cycles. However, widespread adoption may require new ethical guidelines to ensure AI-generated hypotheses are rigorously validated before being acted upon.
Saronic Raises $600 Million for AI-Powered Military Vessels
Saronic, a startup specializing in AI-driven autonomous surface vessels, has raised $600 million at a $4 billion valuation. These vessels, capable of surveillance, reconnaissance, and combat, are designed for independent operation in high-risk environments.
Deeper Insight:
Autonomous military AI is moving beyond drones and cyber warfare into naval operations. With growing concerns over AI-controlled weapons, this raises urgent questions about autonomous warfare, military accountability, and global AI arms control.
OpenAI’s New Poison Pill Blocks Elon Musk Takeover
OpenAI has implemented a corporate defense mechanism that prevents major investors, including Elon Musk, from taking control of the company’s nonprofit board. This move, often referred to as a poison pill, ensures that the nonprofit arm retains ultimate control over OpenAI’s for-profit ventures.
Deeper Insight:
This is a direct response to Musk’s failed $97 billion bid to take over OpenAI. With the company now valued at $260 billion, OpenAI is reinforcing its independence amid growing investor influence. Whether this move protects OpenAI’s mission or simply secures Altman’s leadership remains to be seen.
Did You Miss A Show Last Week?
Enjoy the replays on YouTube or take us with you in podcast form on Apple Podcasts or Spotify.