AI Pangaea: Unifying Intelligence Islands for Adapting Myriad Tasks

Pangaea Banner
a, AI models build intelligence from data relying on modality-specific data encodings, leading to Intelligence Islands. b, Pangaea unifies Intelligence Islands through unified data encoding, constructing an AI supercontinent.

The pursuit of artificial general intelligence (AGI) continuously demands generalization in one model across myriad tasks, even those not seen before. However, current AI models are isolated from each other for being limited to specific tasks, now first defined as Intelligence Islands. To unify Intelligence Islands into one, we propose Pangaea, the first AI supercontinent akin to the geological Pangaea. Pangaea encodes any data into a unified format and accumulates universal knowledge through pre-training on 296 datasets across diverse modalities. Eventually, it demonstrates remarkable generalization across 45 general tasks and 15 scientific tasks encompassing a wide range of scientific subjects. By investigating Pangaea deeper, the scaling effect of modality is revealed, quantifying the universal knowledge accumulation across modalities as the cumulative distribution function of a geometric distribution. On the whole, Pangaea shows strong potential to handle myriad tasks, indicating a new direction toward AGI.

๐Ÿ“„ Read on arXiv


โœจ Highlights


๐Ÿ—‚๏ธ Unified Data Encoding

Data is viewed as the discretized modeling of the real world and is represented by triplets, which serve as the fundamental unit of data.

๐Ÿ”„ Cross-Modal Learning

a, Pangaea architecture; b, Pre-training dataset. c, Sample and feature distributions of table datasets for pre-training. d, Pre-training convergence curve of Pangaea.

๐Ÿ”— Knowledge Transfer

Performances of Pangaea exceed those of both Pangaea_w/o and competitive models across 45 downstream tasks, demonstrating the pre-training knowledge transfer.

๐Ÿงช Scientific Assessment

a, Overview of the performance comparison between Pangaea and competitive models on all 15 scientific tasks. b-e, Pangaea is applied in Health and Biological sciences. b, Prostate cancer grading. c, Cyclic peptide membrane permeability prediction. d, Drug molecule toxicity prediction. e, Blood-brain barrier penetration prediction.
a-f, Pangaea is applied in Earth and environmental, and Physical sciences. a, Worldwide temperature forecasting. b, High-energy particle identification. c, Marine mammal vocalization classification. d, Molecule electronic property prediction. e, Reservoir property estimation. f, Material band gap prediction.
a-e, Pangaea is applied in Business and commerce, Humanities, Astronomy, Mathematical, and Social sciences. a, Stock movement prediction. b, Drug consumer type classification. c, Active galactic nuclei classification. d, Mathematical subject classification. e, Massive multitask language understanding.

๐Ÿ“ˆ Scaling Effect

a, Scaling effect of pre-training modalities; b, Scaling effect of unseen modalities.

๐Ÿ’ž Modality Affinity

Affinity phenomenon of modality, comparing the fine-tuning performances of Pangaea across 31 pre-training modality combinations to the ones of Pangaea_w/o.

๐Ÿ“š Citation

If you find Pangaea useful for your research, please cite our work:


@misc{chang2025aipangaeaunifyingintelligence,
      title={AI Pangaea: Unifying Intelligence Islands for Adapting Myriad Tasks}, 
      author={Jianlong Chang and Haixin Wang and Zhiyuan Dang and Li Huang and Zhiyu Wang and Ruoqi Cao and Shihao Piao and Dongzhe Li and Dianyu Gao and Dongsheng Wang and Yin Li and Jinan Sun and Lu Fang and Zhouchen Lin},
      year={2025},
      eprint={2509.17460},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2509.17460}, 
}