プレプリント / バージョン1

Neural Scaling: From Empirical Laws to Geometric Saturation

A Harmonic State Equation for Optimal Entropic Dimensionality

##article.authors##

DOI:

https://doi.org/10.51094/jxiv.2289

キーワード:

Neural Scaling Laws、 Optimal Entropic Dimensionality、 Effective Dimension、 Large Language Models、 Information Geometry、 Data Efficiency、 Power Laws、 Fisher Information、 Model Saturation、 Deep Learning Theory

抄録

Empirical scaling laws make performance extrapolation routine, yet they leave a practical blind spot: when should scaling bend, and why should that bend be systematic rather than ad hoc? We address this by recasting scaling as a geometric saturation problem. In Optimal Entropic Dimensionality (OED), effective dimensionality is treated as a state variable set by an equilibrium between geometric cost of expansion and informational cost of compression (we use equilibrium/state language as an analogy, not as a claim of literal thermodynamics). A key step is to separate the payload into a free (irreducible/novel) component and a structural (redundant/correlated) component, so that redundancy reduces the usable pressure available to create effective capacity. To connect the equilibrium picture to observed curves, we introduce a Harmonic Saturation ansatz: realized capacity is the harmonic combination of an architecture-governed spectral potential and a payload/geometry-governed OED cap. This yields a closed-form macroscopic state equation that recovers dilute-regime power laws while predicting a systematic bend toward a geometry-dependent floor. In dimensionless collapse coordinates, the theory collapses to a linear master relation, providing a falsifiable diagnostic: agreement supports the chosen state variables and proxies, while structured departures point to missing effects such as non-equilibrium optimization, effective-token limits, or proxy mismatch. The takeaway is a practical reframing of late-stage progress: near saturation, gains come less from brute-force expansion and more from improving spectral efficiency, data novelty, and effective geometry.

利益相反に関する開示

The author declares that there are no conflicts of interest.

ダウンロード *前日までの集計結果を表示します

ダウンロード実績データは、公開の翌日以降に作成されます。

引用文献

J. Hestness, S. Narang, N. Ardalani, G. Diamos, H. Jun, H. Kianinejad, M. M. A. Patwary, Y. Yang, Y. Zhou, Deep learning scaling is predictable, empirically, arXiv preprint arXiv:1712.00409 (2017). doi: 10.48550/arXiv.1712.00409. URL https://arxiv.org/abs/1712.00409

J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, D. Amodei, Scaling laws for neural language models, arXiv preprint arXiv:2001.08361 (2020). URL https://arxiv.org/abs/2001.08361

J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, et al., Training compute-optimal large language models, arXiv preprint arXiv:2203.15556 (2022). URL https://arxiv.org/abs/2203.15556

J. S. Rosenfeld, A. Rosenfeld, Y. Belinkov, N. Shavit, A constructive prediction of the generalization error across scales, arXiv preprint arXiv:1909.12673ICLR 2020 (2019). doi:10.48550/arXiv.1909.12673. URL https://arxiv.org/abs/1909.12673

Y. Bahri, J. Kadmon, S. Ganguli, et al., Explaining neural scaling laws, arXiv preprint arXiv:2102.06701 (2021). URL https://arxiv.org/abs/2102.06701

T. Imaizumi, Optimal entropic dimensionality: A continuous variational principle for geometric equilibrium, JxivPreprint (2025). doi:10.51094/jxiv.2161.

C. E. Shannon, A mathematical theory of communication, Bell System Technical Journal 27 (3–4) (1948) 379–423.

S.-i. Amari, H. Nagaoka, Methods of Information Geometry, American Mathematical Society, 2000.

O. Roy, M. Vetterli, The effective rank: A measure of effective dimension, Tech. rep., EPFL, technical report (2007). URL https://infoscience.epfl.ch/record/111177

ダウンロード

公開済


投稿日時: 2025-12-19 03:00:58 UTC

公開日時: 2026-01-08 10:49:37 UTC
研究分野
情報科学