Reflect 25 Ai
AI in 2025 from a twitter user perspective
INCOMPLETE,
Imagine being there when the first seed was planted, or when the first branch was pulled from a brushfire on the savanna. We can look back now and see how those events changed us. But exponential growth can’t be grasped when it’s right in front of you.
-Pantheon, S1 E5
(TODO: Too much “I” )I have spent an unhealthy amount of time on twitter to follow the progress of this AI revolution. But I am happy that I could realize the reaction of humanity(only the tech part of twitter) as the revolution was happening. Memory is a product of repetition or strong emotion. If I were to recall, the events in AI news in 2025 just from my memory, then they would be the once that personally stuck with me. Here are some that come to my mind:
Shock Moments - Release of Deepseek R1 and o3:
- Deepseek R1: o1 was still the leading reasoning model and it was the first of its kind. When a Chinese company released an open source equivalent model built with extreme efficiency and open source, the world went into a shock because a lot of AI community was with the mindset the china is 6 months behind US(Gwern’s comment) and they mostly replicate and don’t innovate. This was like a wakeup call for US tech and can be considered as beginning of the AI race that followed and will continue. Dario even wrote a blog post asking to increase chip sale restrictions to China. At first, I felt that it was a move to reduce competition, but over the months I understood Dario more and more.
- o3: Personally, o3 is the model which gave a glimpse of how capable models can be. its one heck of persistent model on find the answer to your question. Once out of laziness, I took a photo of a book and asked it to find the cited paper. It searched for the author’s papers and it couldn’t find. It searched the sentence of the book from the photo, found the book I was reading, checked the references section and found the right paper. I think the community still misses the tables that o3 loved to create! This tweet well summarizes it. With o3, it almost seemed to me like hallucination is a solved problem. Since release of o3, I have been only using thinking models, no matter the query.

Vibe Coding
Karpathy brilliantly named this. But in the initial days, it was considered like a fun thing to test. At the end of the year, a lot of people are embracing it but with more caution and discipline. The tools and models have grown well to make the vibe coding infrastructure solid. Soon, “vibe” word might be removed and the future will look up to people who wrote code with their hands like the way we used to look at programmers who used to manage memory allocation and compilers. But I like the Karpathy’s idea that it will be like arithmetic. You will be taught to do it. But you will almost always use a calculator when faced with a problem.
New kind of software
Neural OS release by Lambda: This sadly didn’t blow up on twitter, but I feel this is very promising idea. Current softwares we interact with daily like like say coffee machines or customer service websites, all are based on a programming language. The features of the software are decided by the programmer. If the programmer doesn’t handle failure cases well, the user suffers. But imagine a future where all the software is basically an interaction via natural language. Its not that AI writes code for the coffee machine. But AI is the software that handles user’s requests. Here you don’t have to think about edge cases because the AI will answer in natural language. One might feel that AI is an overkill for a simple problem like this. But imagine, 100 years before, one would also thinking the task of making coffee is overkill for machines. When AI becomes as common as electricity or machines, its becomes inevitable that they will be everyone as its convinient.
GPT 5
I still remember the hype before the release day of GPT-5. This tweet perfectly summaries the hype:

I watched the live. But after the release, a lot of people were angry about the chart crimes and no new features. But over days, it became almost top model of everyone until Opus 4.1 got released.
Media Generation - Nano Banana Pro
Nano Banana Pro seemed a huge leap in text to image generation. I haven’t used this model seriously because I never got the need to. But info graphic posters generated by the model are surprisingly good. In fact, AI news letter by smol.ai even put the heading as text on image problem solved. I am hoping to use this model in the future with a good usecase. Its amazing that other modalities like image and video generation are close to becoming practically useful.
Hitting the wall, Julian’s blog, METR Graph, GPD-eval**:
Twitter users have been used to models raining frequently throughout the year. But once, there was a pause and twitter began the cry of we have hit a wall. I think it was around this point Julian Schrittwieser’s released a blog post with a very catchy and appropirate title - “Failing to understand the exponential again”. The blog has a nice point that people often fail to understand exponentials(Neil degress Tyson also has a nice explaination about it)
We also have linear brains, which prevents us from seeing exponential change. And the best example of this is the algae on a pond. You see like one square foot of it, and you learn—someone tells you—the algae is doubling every day. And you have this huge pond. You go away for a month and come back, pond is half covered with algae. You say, “Oh my gosh, I was away for a month and this happened.” When will it be completely covered? So, what’s the answer?
COVID was a proof that we all saw. Julian cited METR graph showing that the rate of improvement is indeed doubling every 6 months. And GDP-eval where we already have a model - Opus 4.1 that is on par with human performance. METR graph has been the poster that perfectly captures the AI progress. I expect it to be part of future books when someone documents all this.

A lot of common public is not aware of the AI progress. For them, AI is mostly chatgpt that can respond text and talk. In a similar fashion, a lot of twitter users, who don’t directly work in big open labs also not aware of the progress that is possible. The upcoming releases of Gemini 3, Opus 4.5, GPT 5.2 proved that point.
AI safety
- 0 to 1
- Dario accused
- But Dario thought about it in 2016
- Recent papers by Anthropic - reward hacking
Notable mentions
- AI 2027 was another hype of this year. I really appreciate the great effort in modeling the future. But when I read “Chinese steals american LLM weights” part, I stopped taking it seriously.
- Chinese labs have done prettty amazing this year. Qwen, GLM, Kimi have gained popularity.
- AI digest is a beautiful project where LLMs work together with the tools humans use - gmail, drive, docs. And a lot of the times, they face the same problems as we do - permission issues, finding the button to click (although its not a target problem for us)
What I am looking forward to
Open AI has an amibitious plan of releasing a fully autonomous AI research intern by september 2026. We are already seeing glimpses of AI helping in accelerating scientific research. It is difficult to trust when a company has a lot of monetary stake. But there is a section where Timothy Gowers ( a fields medal winner ) mentions
Taking this together, my current assessment of LLMs is that they are just beginning to be useful as research collaborators: one can bounce ideas off them in the way that one can with a human collaborator and one gets a response extremely quickly. As with human collaborators, even the less good ideas of an LLM can sometimes stimulate me to make progress. (I think of this as the “That clearly doesn’t work … but wait a minute!” phenomenon.) So we have reached the stage where LLMs can speed up the process of thinking about a problem, especially if that problem is a little outside one’s primary domain of expertise, but we have not yet reached the stage where an LLM is likely to have the main idea for solving a difficult problem.
I am looking forward for a moment where a knowledgable individual can collaborate with LLM to produce a paper, where LLM comes up with key ideas, implements it, makes plots, also writes it. Once the infrastructure to achieve this has been built, we need to start collecting good questions to work on!
AI safety