- Geekflare Newsletter
- Posts
- This Was a Big Week for AI
This Was a Big Week for AI
[inside] New benchmarks, smarter reasoning, and major moves from OpenAI, Google, and Anthropic.


š§ Monday AI & Tech Brief
Your theme-based roundup of what really moved AI today
Welcome to this Mondayās AI & Tech News edition. Last week quietly turned into a benchmark war, with major labs racing to prove who really leads in reasoning, coding, and research-grade AI. Hereās what matters.
š Model Wars: Benchmarks Are Heating Up
Anthropicās Claude Opus 4.5 is currently leading the charts. The model scored 80.9% on SWE-bench Verified, becoming the first AI system to cross the 80% mark. More interestingly, Anthropic claims it outperformed human engineers in controlled coding testsāsomething few labs openly state.
OpenAI responded fast. On December 9, it released GPT-5.2, available in Instant, Thinking, and Pro variants. Internally, this was reportedly a ācode redā move after Googleās Gemini 3 announcement. OpenAI claims GPT-5.2 tops several benchmarks, including 55.6% on SWE-Bench Pro, positioning it as a strong competitor in real-world software tasks.
Google, meanwhile, upgraded its Gemini Deep Research agent, posting notable gains like 46.4% on HLE and 66.1% on DeepSearchQA. The bigger update? Google has now opened this research agent to third-party developers via AI Studio, signalling a push beyond first-party products.
š¬ AI in Science: From Papers to Practice
A new platform called SciSciGPT has debuted, focusing on human-AI collaboration in scientific research. Instead of replacing scientists, it uses models like Claude to assist with data analysis, hypothesis exploration, and research navigation. This feels like a shift away from āAI replaces researchersā toward āAI works alongside them.ā
Adding to this, Nature published studies showing how AI can model brain and behavioural dynamics. One tool, AmadeusGPT, allows researchers to interactively analyse animal behaviourāsomething that would traditionally take weeks of manual work.
š§ Reasoning & Intelligence Milestones
Google has rolled out Gemini 3 Deep Think mode for subscribers. This version is designed for long-form reasoning, and early results show strong performance in mathematics, science, and logic-heavy tasks. Itās clearly aimed at users who want fewer fast answers and more thought-out ones.
In another interesting development, researchers found that GPT-4.5 passed the Turing Test better than humans 73% of the time in short conversations. While this doesnāt prove AGI, it does suggest that conversational intelligence is now crossing human-like thresholds in specific settings.
āļø Open Source & Strategy Shifts
The original authors of the Transformer paper have launched Rnj-1, a new open-source AI model trained entirely from scratch. The goal is explicit: strengthen US-led AI development amid rising global competition, particularly from China.
Meanwhile, OpenAI has reportedly paused expansion plans around ads, shopping agents, and its Pulse assistant. The company is instead refocusing on core ChatGPT improvements, likely due to increasing pressure from Google, Anthropic, and open-source alternatives.
The Prompt Library
Stop reinventing the wheel every time you need to write something.

Geekflare Connect comes with a curated prompt library with thousands of ready-to-use prompts for common business tasks like marketing, sales, customer support, HR, finance, and more. No need to start from scratch.
How it works:
Browse the library and find a prompt that matches your task
Load it into your chat
Customize the variables (change names, dates, topics, etc.)
Run it on your preferred AI model
Save your own prompts and share them with your team for consistency
š Trending in Tech:
Cheers,
Keval, Editor
Reply