Neural Scaling: From Empirical Laws to Geometric Saturation
A Harmonic State Equation for Optimal Entropic Dimensionality
DOI:
https://doi.org/10.51094/jxiv.2289キーワード:
Neural Scaling Laws、 Optimal Entropic Dimensionality、 Effective Dimension、 Large Language Models、 Information Geometry、 Data Efficiency、 Power Laws、 Fisher Information、 Model Saturation、 Deep Learning Theory抄録
Empirical scaling laws make performance extrapolation routine, yet they leave a practical blind spot: when should scaling bend, and why should that bend be systematic rather than ad hoc? We address this by recasting scaling as a geometric saturation problem. In Optimal Entropic Dimensionality (OED), effective dimensionality is treated as a state variable set by an equilibrium between geometric cost of expansion and informational cost of compression (we use equilibrium/state language as an analogy, not as a claim of literal thermodynamics). A key step is to separate the payload into a free (irreducible/novel) component and a structural (redundant/correlated) component, so that redundancy reduces the usable pressure available to create effective capacity. To connect the equilibrium picture to observed curves, we introduce a Harmonic Saturation ansatz: realized capacity is the harmonic combination of an architecture-governed spectral potential and a payload/geometry-governed OED cap. This yields a closed-form macroscopic state equation that recovers dilute-regime power laws while predicting a systematic bend toward a geometry-dependent floor. In dimensionless collapse coordinates, the theory collapses to a linear master relation, providing a falsifiable diagnostic: agreement supports the chosen state variables and proxies, while structured departures point to missing effects such as non-equilibrium optimization, effective-token limits, or proxy mismatch. The takeaway is a practical reframing of late-stage progress: near saturation, gains come less from brute-force expansion and more from improving spectral efficiency, data novelty, and effective geometry.
利益相反に関する開示
The author declares that there are no conflicts of interest.ダウンロード *前日までの集計結果を表示します
引用文献
J. Hestness, S. Narang, N. Ardalani, G. Diamos, H. Jun, H. Kianinejad, M. M. A. Patwary, Y. Yang, Y. Zhou, Deep learning scaling is predictable, empirically, arXiv preprint arXiv:1712.00409 (2017). doi: 10.48550/arXiv.1712.00409. URL https://arxiv.org/abs/1712.00409
J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, D. Amodei, Scaling laws for neural language models, arXiv preprint arXiv:2001.08361 (2020). URL https://arxiv.org/abs/2001.08361
J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, et al., Training compute-optimal large language models, arXiv preprint arXiv:2203.15556 (2022). URL https://arxiv.org/abs/2203.15556
J. S. Rosenfeld, A. Rosenfeld, Y. Belinkov, N. Shavit, A constructive prediction of the generalization error across scales, arXiv preprint arXiv:1909.12673ICLR 2020 (2019). doi:10.48550/arXiv.1909.12673. URL https://arxiv.org/abs/1909.12673
Y. Bahri, J. Kadmon, S. Ganguli, et al., Explaining neural scaling laws, arXiv preprint arXiv:2102.06701 (2021). URL https://arxiv.org/abs/2102.06701
T. Imaizumi, Optimal entropic dimensionality: A continuous variational principle for geometric equilibrium, JxivPreprint (2025). doi:10.51094/jxiv.2161.
C. E. Shannon, A mathematical theory of communication, Bell System Technical Journal 27 (3–4) (1948) 379–423.
S.-i. Amari, H. Nagaoka, Methods of Information Geometry, American Mathematical Society, 2000.
O. Roy, M. Vetterli, The effective rank: A measure of effective dimension, Tech. rep., EPFL, technical report (2007). URL https://infoscience.epfl.ch/record/111177
ダウンロード
公開済
投稿日時: 2025-12-19 03:00:58 UTC
公開日時: 2026-01-08 10:49:37 UTC
ライセンス
Copyright(c)2026
Takeo Imaizumi
この作品は、Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Licenseの下でライセンスされています。
