AI expands across scientific research
Artificial Intelligence is becoming more deeply embedded across scientific domains including biology, chemistry, physics and astronomy. Natural sciences reached approximately 80,150 Artificial Intelligence publications in 2025, up 26% from 2024. Artificial Intelligence now accounts for 5.8%-8.8% of scientific research output depending on the field, up from below 1% in 2010.
Performance is advancing in several specialized areas, but reliability remains uneven. On ChemBench, the best models surpass human expert averages across 2,700+ chemistry questions while struggling with basic tasks. On ReplicationBench, frontier models score below 20% on paper-scale replication in astrophysics. On UnivEarth, LLM agents answer earth observation questions with 33% accuracy, and their code fails 58% of the time. On end-to-end scientific research tasks, the best Artificial Intelligence agents score roughly half of what PhD experts achieve. On PaperArena, the best agent reaches 38.8% accuracy versus a PhD expert baseline of 83.5%. On BixBench, frontier models achieve roughly 17% accuracy on real-world bioinformatics analysis.
Scientific infrastructure is also shifting toward larger Artificial Intelligence-native systems and datasets. Astronomy released its first foundation model, first visualization benchmark, and a 100TB training dataset in 2025, signaling a field-wide shift toward Artificial Intelligence infrastructure. AION-1, trained on over 200 million celestial objects from 5 major surveys, is the first astronomy foundation model. AstroVisBench introduced the first benchmark for LLM scientific computing and visualization in the field.
Weather and climate research saw a major operational step forward. An Artificial Intelligence system ran a full weather forecasting pipeline end-to-end for the first time in 2025. Aardvark Weather replaced the traditional numerical prediction pipeline with a single ML system, and multiple Artificial Intelligence weather models reached operational deployment. FourCastNet 3 generates a 60-day global forecast in under 4 minutes, running 8 to 60 times faster than prior approaches.
Research automation also moved forward, though confirmed scientific impact remains limited. The first fully Artificial Intelligence-generated paper was accepted at a peer-reviewed workshop in 2025, but the list of experimentally confirmed Artificial Intelligence discoveries remains short. Sakana’s Artificial Intelligence Scientist-v2 produced a paper accepted at an ICLR workshop without human-coded templates. Google’s Artificial Intelligence Co-Scientist was validated in three biomedical areas. Most Artificial Intelligence models for science still come from academic and government institutions, while industry leads foundation model development in weather and climate.