AI for Science: Transforming Research and Industry

Introduction

As AI reshapes the foundational logic of research and industry, AI for Science has become more than just a theoretical concept. On April 28, the Future Light Cone collaborated with Beijing Zhongguancun Academy’s AI Business School to launch the “AI for Science Innovators: Academia × Industry Pioneers Dialogue Series Salon.” The first event featured three frontline guests. Zheng Shuxin, an associate professor and co-dean of the AI Business School, provided solid data and insights to address three pressing questions: Will the “ChatGPT moment” for AI4S arrive? What are the barriers for entrepreneurs? How should young people invest in this field?

The Essence of Large Models: Intelligence from Compression

Where does the general intelligence of large models come from? Ilya, former chief scientist at OpenAI, succinctly stated: “Intelligence arises from compression.” The intelligence of a model comes from its ability to compress vast amounts of human language data into a relatively small parameter space. In this process, the model is forced to distill common structures and inherent representations from the data, leading to the emergence of intelligence.

For instance, the first version of GPT-3 with 175 billion parameters aimed to encapsulate nearly all text ever written by humans. If it relied solely on memory, it would essentially be a hard drive, which does not produce intelligence. However, when we require it to “remember” this data with fewer parameters and a smaller size, it is compelled to extract common structures and inherent representations—intelligence emerges in this compression process.

A more rigorous theoretical underpinning is known as Kolmogorov complexity, which measures the complexity of data by the length of the shortest program that can describe it. For example, a sequence of zeros can be compressed into a single line of Python due to its simple internal structure. The paradigm of large language models predicting the next word is a good approximation of Kolmogorov programs.

However, this also sets a ceiling: human knowledge. You cannot surpass humans by learning from them. AI for Science is taking a completely different path.

Two Core Paths of AI4S

AI4S does not engage with human language; it directly studies physical laws, biological processes, and molecular conformations. It compresses data from nature itself, rather than how humans describe nature.

A prime example is AlphaFold, a Nobel-level achievement. What does it do? It finds correlations within natural data. When the Protein Data Bank (PDB) accumulates hundreds of thousands of protein structure data, the model can identify the mapping from sequence to three-dimensional structure, effectively “solving” the protein structure problem.

Here lies a core analytical framework: the two legs of AI4S:

Scientist: Reading literature, proposing hypotheses, and designing experiments, which fundamentally involves language intelligence, knowledge integration, and logical reasoning. Its strengths lie in reasoning and knowledge, while its weakness is a lack of “direct understanding” of the physical world. Representatives include research models from Frontier Labs like OpenAI, Anthropic, and DeepMind.
Simulator: Using AI to data-drive the modeling of physical world laws. Its strengths lie in modeling the world itself, which cannot be achieved merely by stacking parameters, while its weakness is the absence of explicit knowledge chains and reasoning abilities. Representatives include AlphaFold and various meteorological models.

The endpoint of large models is AGI, while the vast potential of AI4S is to break through the boundaries of human cognition—the universe remains unknown, and only the Simulator path theoretically allows AI to explore what humanity has yet to discover.

However, today, the Simulator alone cannot solve all problems—it lacks logic and reasoning. Relying solely on either path is insufficient. The true endgame of AI4S is the convergence of both paths: the ability to reason and propose hypotheses like top scientists while directly understanding the physical world itself.

This is why I repeatedly emphasize that AI for Science requires more than just larger models. Even if you scale GPT up by 100 times, it still won’t automatically understand how a protein folds or how a cloud evolves.

No Unified “ChatGPT Moment” for AI4S

My core judgment is that AI4S will experience continuous breakthroughs, but it will not manifest as a unified moment of widespread excitement; its progress resembles a highly uneven map.

In any given field, the more it satisfies the three elements of “clear problem structure + sufficient data + short validation loop,” the faster AI4S will advance there.

Protein Folding: The most interesting aspect of protein folding is that both the Scientist and Simulator paths have produced significant results. AlphaFold answers “what does a protein look like,” while DiG and BioEmu answer “how does a protein move”—one captures still images, while the other films a movie. Only by producing the movie can the functional mechanisms of proteins be truly explained.
AI Drug Development: This field has crossed a critical threshold. There are over 200 AI drug clinical pipelines, with Phase I success rates of 80%-90%, double that of traditional methods; the first AI drug has shown efficacy in Phase II clinical trials, with a crucial data readout window expected in 2026-2027.
AI Meteorology: Chinese players are leading globally. Huawei’s Pangu, Fudan’s Fuxi, and Fengwu models are making continuous breakthroughs, with Fengwu achieving accurate forecasts of 11.25 days, marking the first global breakthrough of the 10-day accuracy barrier.
Materials Science: This field is in a different state: it has just evolved from “screening known compounds” to “designing unprecedented molecules from scratch.” The most critical signal in 2025-2026 will be that frontline modelers begin to genuinely trust the tools at their disposal. Although this field is still in its early stages, the combination of elemental spaces is nearly infinite; once a breakthrough occurs, the value will be immense.

The Barriers for Entrepreneurs in the Age of Large Model Giants

An undeniable fact is that the six major AI giants—OpenAI, Anthropic, Google DeepMind, Microsoft, NVIDIA, and Meta—are all entering the AI4S arena.

Even OpenAI is developing a dedicated life sciences model, GPT-Rosalind, while Anthropic is fully committing to Claude for Life Sciences, indicating that the narrative of “one universal model solving everything” is quietly being abandoned by these companies.

With the giants entering the field, what are the barriers for entrepreneurs? My answer is clear: the threshold lies not in prompts and workflows, but in scientific capabilities, data loops, and depth of industry embedding.

You must first clarify which game you are playing:

Product-oriented: Competing on rapid iteration and user stickiness, with validation cycles ranging from days to weeks, represented by Manus and Cursor.
Resource-oriented: Competing on depth of industry embedding and customer resources, with validation cycles from quarters to years, represented by traditional SaaS and industry solutions.
Science Story-oriented: Competing on scientific capability and data flywheel, represented by Isomorphic Labs, with validation cycles from years to decades.

AI4S companies can actually be divided into two categories: scientific companies (science story-oriented) and scientific service companies (resource-oriented). Both paths are viable, but the greatest risk is misidentifying oneself as a “scientific company” while ultimately becoming a “scientific service company.”

If you are confident in your technology and truly believe you can uncover gold, you should naturally tell a scientific story; if you still have some gaps, focus on delivery and customer resources, and earnestly deepen your industry engagement.

Now is the Golden Window for AI4S

Why do I say that now is the window period? Because money is already in motion. The annual financing for a single AI4S enterprise can reach $550 million, and a substantial amount of global VC funds flowing into AI is now directed toward AI4S. The U.S. Department of Energy has invested $320 million to launch the Genesis program, and China is following suit.

Why is funding concentrating on AI4S? Due to a confluence of technological breakthroughs, the inefficiency of traditional R&D, the nascent state of data infrastructure, and national strategic support, a fourfold resonance has emerged.

Even if there are bubbles that may burst in the process, this is entirely different from the industry frenzy of five or six years ago—this time, technology has genuinely reached a critical point.

The two most important long-term trends to watch for in the future are:

Self-Driving Labs: Achieving a complete loop of “hypothesis → experiment → data → model update → new hypothesis”; the more experiments conducted, the better the models become, and the smarter the experiments, forming a true flywheel, with key players including Lila Sciences, Recursion, and Atinary.
National-Level AI4S Infrastructure: AI4S is transitioning from “academic research” to “industrial infrastructure,” which is a core layout for national competitiveness.

Five Hard-Hitting Suggestions for Young Professionals

Choosing a field is more important than choosing a technology. The true moat is domain knowledge, not model architecture; select a scientific problem you are willing to immerse yourself in for five years.

Learn to communicate with experiments. Those with purely computational backgrounds often lack understanding of experiments. Spending three months in a lab is more valuable than reading ten papers.

Data capability is the core leverage. The performance ceiling of a model ultimately depends on the information limits of the training data. Those who can establish a data flywheel are far more valuable than those who can merely tune models; acquiring, cleaning, and annotating scientific data is hard currency.

Clarify which game you are playing. The science story game requires long-term patience, the resource game requires industry embedding, and the product game competes on rapid iteration; do not mix the approaches.

Now is the window period. The convergence of technology, capital, and national strategy is significant, but the window will not remain open forever.

Three Core Conclusions

Returning to the three initial questions, the answers are now very clear:

Breakthroughs in AI4S are already occurring, but there will not be a unified “ChatGPT moment”; progress varies greatly across fields, and the speed of breakthroughs depends on problem structure, data foundation, and validation loops.
The core barrier for entrepreneurs lies in “scientific capability + data loops,” never in model size.
Choosing the right direction fundamentally means selecting a scientific problem you are willing to delve into for five years.

In closing, I leave you with this thought: the window belongs to those willing to do the heavy lifting and dare to bet in uncertainty.