The Last Six Months in LLMs: From Coding Agents to Laptop Models
#LLMs

The Last Six Months in LLMs: From Coding Agents to Laptop Models

Trends Reporter
6 min read

A comprehensive overview of the major developments in Large Language Models from November 2025 to May 2026, highlighting the inflection point in coding capabilities, emergence of personal AI assistants, and surprising advancements in open-weight models.

The past six months in Large Language Models have represented a period of unprecedented acceleration, with November 2025 emerging as a critical inflection point that fundamentally reshaped the AI landscape. This period has witnessed not just incremental improvements but qualitative shifts in how developers interact with and utilize AI systems.

The November Inflection Point

November 2025 marked a pivotal moment when the supposedly "best" model changed hands five times among the three major providers. At the start of the month, Claude Sonnet 4.5, released on September 29th, held the crown. It was soon overtaken by GPT-5.1, then Gemini 3, followed by GPT-5.1 Codex Max, and finally Anthropic reclaimed the top position with Claude Opus 4.5. While Gemini 3 reportedly drew the best pelican riding a bicycle (a whimsical benchmark the author uses), most practitioners would agree that Opus 4.5 maintained leadership for the following couple of months.

The real significance of November, however, wasn't just the model leadership changes but the dramatic improvement in coding agents. OpenAI and Anthropic had invested heavily in Reinforcement Learning from Verifiable Rewards throughout 2025 to enhance code quality when paired with their Codex and Claude Code agent harnesses. By November, these efforts bore fruit, with coding agents transitioning from "often-work" to "mostly-work." This crossing of a quality barrier meant developers could use these systems as daily drivers for real work without constantly fixing fundamental errors.

The Rise of Coding Agents

The improvement in coding capabilities wasn't merely theoretical. Practitioners reported tangible productivity gains as AI assistants became more reliable for complex programming tasks. The shift was particularly noticeable in debugging, code refactoring, and implementing novel algorithms—areas where earlier models frequently produced syntactically correct but semantically flawed solutions.

"We've reached a point where the AI can understand not just what code does, but why it's structured a certain way," noted one developer on social media. "The explanations for suggested improvements actually make sense now."

However, some skeptics cautioned against over-reliance on these systems. "Just because the code looks correct doesn't mean it is," warned a senior engineer at a major tech company. "We're still seeing subtle race conditions and edge cases that the models miss, especially in distributed systems."

The Emergence of Personal AI Assistants

While coding agents were improving in November, another development was quietly beginning: the first commit to a project that would eventually become OpenClaw. What started as an obscure repository by developer Pete underwent several name changes before exploding in popularity in February under its final name.

OpenClaw represents a new category of "personal AI assistants"—collectively referred to as "Claws"—that have gained remarkable traction in just a few months. The adoption has been so rapid that Mac Minis began selling out in Silicon Valley, with some jokingly calling them "the new digital pets" and "perfect aquariums for your Claw."

The author offers an intriguing metaphor comparing these assistants to Doc Ock's AI-powered claws from Spider-Man 2: "They're perfectly safe provided nothing damages the inhibitor chip... after which they turn evil and take over." This playful warning highlights legitimate concerns about the control and safety of increasingly autonomous AI systems.

The community remains divided on the utility of these personal assistants. Proponents argue they represent the natural evolution of AI from task-specific tools to integrated personal companions. Detractors counter that they're essentially repackaged versions of existing technologies with minimal innovation, creating unnecessary complexity for solving problems that could be addressed more simply.

Advancements in Open-Weight Models

Perhaps the most surprising development has been the rapid improvement in open-weight models that can run on consumer hardware. In April, Google released the Gemma 4 series, which the author describes as "the most capable open weight models I've seen from a US company." More remarkably, Chinese AI lab GLM introduced GLM-5.1, a massive 1.5TB open-weight model that delivers impressive performance despite requiring substantial hardware resources.

Even more noteworthy is the emergence of models like Qwen3.6-35B-A3B, a 20.9GB open-weight model that reportedly outperforms Claude Opus 4.7 on the author's pelican benchmark when running on a laptop. This represents a significant leap in capability for models that don't require enterprise infrastructure.

"The gap between frontier models and what can run locally is narrowing faster than anyone expected," observed one AI researcher. "We're reaching a point where personal devices can host models capable of complex reasoning that would have required massive clusters just a year ago."

However, others caution that these benchmarks can be misleading. "The pelican test has exceeded its usefulness as a meaningful benchmark," suggests a model evaluation specialist. "It's become more of a meme than a serious assessment of model capabilities."

Community Sentiment and Adoption Signals

The adoption patterns reveal interesting shifts in how developers are integrating these technologies. The holiday period from December to January saw a surge of experimentation, with many developers embarking on ambitious projects to test the limits of new models and coding agents. The author admits to their own "short-lived bout of a form of LLM psychosis" during this time, creating projects like micro-javascript—a JavaScript implementation in Python that, while technically impressive, served no practical purpose.

This pattern of enthusiastic experimentation followed by more pragmatic adoption mirrors previous technology cycles. The initial hype gives way to more realistic assessments of utility and limitations.

Counter-Perspectives and Challenges

Despite the rapid progress, significant challenges remain. The author's own experience with abandoned holiday projects serves as a reminder that technical capability doesn't always translate to practical value. Many AI-generated solutions solve problems that don't actually need solving or introduce unnecessary complexity.

Privacy and security concerns also persist with personal AI assistants. The "Claw" metaphor's darker undertone reflects legitimate worries about autonomous systems that could potentially act against their users' interests if safeguards fail.

Additionally, the resource requirements for even open-weight models remain prohibitive for many. While GLM-5.1 and similar models represent impressive achievements, their 1.5TB size places them beyond the reach of most individual developers and small organizations.

The Path Forward

The past six months have demonstrated that AI development is accelerating along multiple fronts simultaneously: model capabilities, coding assistance, personal AI integration, and efficient deployment. The convergence of these trends suggests we're approaching a new phase where AI becomes more seamlessly integrated into daily workflows rather than remaining specialized tools.

As the author's pelican benchmark becomes increasingly inadequate for distinguishing model capabilities, we may need to develop more nuanced evaluation methods that better reflect real-world utility. The future of AI likely lies not just in raw power but in specialized, efficient systems that can operate effectively on diverse hardware platforms.

The rapid pace of change shows no signs of slowing. With each new development, the AI landscape continues to evolve, challenging developers to adapt their approaches and expectations. The next six months promise to bring further innovations, as well as more refined understanding of how these powerful systems can best serve human needs.

Featured image

Featured image: The evolving landscape of AI capabilities over the past six months.

Comments

Loading comments...