アプリケーションの機能ブロックの自動オフロードの検討と評価

山登, 庸次

doi:10.51094/jxiv.2817

##article.authors##

山登, 庸次 NTT株式会社　ネットワークサービスシステム研究所

DOI:

https://doi.org/10.51094/jxiv.2817

キーワード:

環境適応ソフトウェア、 GPGPU、自動オフロード、性能、機能ブロック

抄録

GPU（Graphical Processing Unit）やFPGA（Field Programmable Gate Array）はCPU（Central Processing Unit）に比べて優位性があるため、これらを用いたシステムが増加している。しかし、このようなシステムでは、ハードウェア記述言語（HDL）やCUDA（Compute Unified Device Architecture）などのハードウェア固有の技術仕様を理解する必要があり、ハードルが高い。このような背景から、我々はこれまでに、配置するハードウェアに合わせて既存コードを自動的に変換・構成し、高性能に動作させることを可能にする環境適応型ソフトウェアを提案してきた。このコンセプトの要素として、CPU向けアプリケーションソースコードのループ文をGPUやFPGAに自動オフロードする手法も提案している。本稿では、アプリケーション内の個々のループ文ではなく、より大きな単位である機能ブロックをオフロードし、GPUやFPGAに自動オフロードすることで高速化を図る手法を提案する。我々は提案手法を実装し、現在のGPUやFPGAにオフロードしているアプリケーションを用いて評価した。

山登庸次，"アプリケーションの機能ブロックの自動オフロードの検討と評価," Automatika, Taylor & Francis, 2024年1月

利益相反に関する開示

著者は，この原稿と競合する利害関係がないことを宣言する．

ダウンロード *前日までの集計結果を表示します

ダウンロード実績データは、公開の翌日以降に作成されます。

引用文献

G. Shahidi, "Slow-Down in Power Scaling and the End of Moore's Law?" In 2019 IEEE International Symposium on VLSI Technology, Systems and Application (VLSI-TSA), pp.1, 2019.

A. Putnam, A. M. Caulfield, E. S. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray, M. Haselman, S. Hauck, S. Heil, A. Hormati, J.-Y. Kim, S. Lanka, J. Larus, E. Peterson, S. Pope, A. Smith, J. Thong, P. Y. Xiao and D. Burger, "A reconfigurable fabric for accelerating large-scale datacenter services," Proceedings of the 41th Annual International Symposium on Computer Architecture (ISCA'14), pp.13-24, June 2014.

Y. Yamato, "Use Case Study of HDD-SSD Hybrid Storage, Distributed Storage and HDD Storage on OpenStack," 19th International Database Engineering & Applications Symposium (IDEAS '15), pp.228-229, July 2015.

Y. Yamato, Y. Nishizawa and S. Nagao, "Fast Restoration Method of Virtual Resources on OpenStack," IEEE Consumer Communications and Networking Conference (CCNC2015), pp.607-608, Jan. 2015.

Y. Yamato, et al, "Development of Resource Management Server for Production IaaS Services Based on OpenStack," Journal of Information Processing, Vol.23, No.1, pp.58-66, Jan. 2015.

Y. Yamato, "Proposal of Optimum Application Deployment Technology for Heterogeneous IaaS Cloud," 2016 6th International Workshop on Computer Science and Engineering (WCSE 2016), pp.34-37, June 2016.

J. Sanders and E. Kandrot, "CUDA by example : an introduction to general-purpose GPU programming," Addison-Wesley, 2011.

J. E. Stone, D. Gohara and G. Shi, "OpenCL: A parallel programming standard for heterogeneous computing systems," Computing in science & engineering, Vol.12, No.3, pp.66-73, 2010.

M. Hermann, T. Pentek and B. Otto, "Design Principles for Industrie 4.0 Scenarios," Rechnische Universitat Dortmund. 2015.

Y. Yamato, Y. Fukumoto and H. Kumazaki, "Proposal of Shoplifting Prevention Service Using Image Analysis and ERP Check," IEEJ Transactions on Electrical and Electronic Engineering, Vol.12, Issue.S1, pp.141-145, June 2017.

Y. Yamato, M. Takemoto and N. Shimamoto, "Method of Service Template Generation on a Service Coordination Framework," 2nd International Symposium on Ubiquitous Computing Systems (UCS 2004), Nov. 2004.

H. Noguchi, T. Demizu, N. Hoshikawa, M. Kataoka and Y. Yamato, "Autonomous Device Identification Architecture for Internet of Things," 2018 IEEE 4th World Forum on Internet of Things (WF-IoT 2018), pp.407-411, Feb. 2018.

H. Noguchi, M. Kataoka and Y. Yamato, "Device Identification Based on Communication Analysis for the Internet of Things," IEEE Access, DOI: 10.1109/ACCESS.2019.2910848, Apr. 2019.

H. Noguchi, T. Demizu, M. Kataoka and Y. Yamato, "Distributed Search Architecture for Object Tracking in the Internet of Things," IEEE Access, DOI: 10.1109/ACCESS.2018.2875734, Oct. 2018.

Y. Yamato, Y. Fukumoto and H. Kumazaki, "Proposal of Real Time Predictive Maintenance Platform with 3D Printer for Business Vehicles," International Journal of Information and Electronics Engineering, Vol.6, No.5, pp.289-293, Sep. 2016.

Y. Yamato, "Study of Parallel Processing Area Extraction and Data Transfer Number Reduction for Automatic GPU Offloading of IoT Applications," Journal of Intelligent Information Systems, Springer, DOI: 10.1007/s10844-019-00575-8, Aug. 2019.

Y. Yamato, "Study and Evaluation of Improved Automatic GPU Offloading Method," International Journal of Parallel, Emergent and Distributed Systems, Taylor and Francis, DOI: 10.1080/17445760.2021.1941010, June 2021.

Y. Yamato, "Study and Evaluation of Automatic GPU Offloading Method from Various Language Applications," International Journal of Parallel, Emergent and Distributed Systems, Taylor and Francis, DOI: 10.1080/17445760.2021.1971666, Sep. 2021.

Y. Yamato, "Automatic Offloading Method of Loop Statements of Software to FPGA," International Journal of Parallel, Emergent and Distributed Systems, Taylor and Francis, DOI: 10.1080/17445760.2021.1916020, Apr. 2021.

S. Wienke, P. Springer, C. Terboven and D. an Mey, "OpenACC-first experiences with real-world applications," Euro-Par Parallel Processing, 2012.

M. Wolfe, "Implementing the PGI accelerator model," ACM the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pp.43-50, Mar. 2010.

K. Ishizaki, "Transparent GPU exploitation for Java," The Fourth International Symposium on Computing and Networking (CANDAR 2016), Nov. 2016.

E. Su, X. Tian, M. Girkar, G. Haab, S. Shah and P. Petersen, "Compiler support of the workqueuing execution model for Intel SMP architectures," In Fourth European Workshop on OpenMP, Sep. 2002.

J. H. Holland, "Genetic algorithms," Scientific american, Vol.267, No.1, pp.66-73, 1992.

F. Wuhib, R. Stadler, and H. Lindgren, "Dynamic resource allocation with management objectives - Implementation for an OpenStack cloud," In Proceedings of Network and service management, 2012 8th international conference and 2012 workshop on systems virtualiztion management, pp.309-315, Oct. 2012.

A. Shakarami, A. Shahidinejad and M. Ghobaei-Arani, "An autonomous computation offloading strategy in Mobile Edge Computing: A deep learning-based hybrid approach," Journal of Network and Computer Applications, Vol.178, 102974, 2021.

A. Shakarami, M. Ghobaei-Arani and A. Shahidinejad, "A survey on the computation offloading approaches in mobile edge computing: A machine learning-based perspective," Computer Networks, Vol.182, 107496, 2020.

A. Shakarami, M. Ghobaei-Arani, M. Masdari and M. Hosseinzadeh, "A survey on the computation offloading approaches in mobile edge/cloud computing environment: a stochastic-based perspective," Journal of Grid Computing, Vol.18, pp.639-671, 2020.

A. Shakarami, A. Shahidinejad and M. Ghobaei-Arani, "A review on the computation offloading approaches in mobile edge computing: A game‐theoretic perspective," Software: Practice and Experience, Vol.50, pp.1719-1759, 2020.

A. Shahidinejad, M. Ghobaei-Arani and M. Masdari, "Resource provisioning using workload clustering in cloud computing environment: a hybrid approach," Cluster Computing, Vol.24, pp.319-342, 2021.

M. S. Aslanpour, S. E. Dashti, M. Ghobaei-Arani and A. A. Rahmanian, "Resource provisioning for cloud applications: a 3-D, provident and flexible approach," The Journal of Supercomputing, Vol.74, pp.6470-6501, 2018.

F. Jazayeri, A. Shahidinejad and M. Ghobaei-Arani, "Autonomous computation offloading and auto-scaling the in the mobile fog computing: a deep reinforcement learning-based approach," Journal of Ambient Intelligence and Humanized Computing, Vol.12, pp.8265-8284, 2021.

S. Anitha and T. Padma, "A Neuro-Fuzzy Hybrid Framework for Augmenting Resources of Mobile Device," International Journal of Information Technology & Decision Making, Vol.20, pp.1519-1555, 2021.

J. Chen, B. Joo, W. Watson III and R. Edwards, "Automatic offloading C++ expression templates to CUDA enabled GPUs," 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, pp.2359-2368, May 2012.

C. Bertolli, S. F. Antao, G. T. Bercea, A. C. Jacob, A. E. Eichenberger, T. Chen, Z. Sura, H. Sung, G. Rokos, D. Appelhans and K. O'Brien, "Integrating GPU support for OpenMP offloading directives into Clang," ACM Second Workshop on the LLVM Compiler Infrastructure in HPC (LLVM'15), Nov. 2015.

S. Lee, S. J. Min and R. Eigenmann, "OpenMP to GPGPU: a compiler framework for automatic translation and optimization," 14th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP'09), 2009.

Cheng Liu, Ho-Cheung Ng and Hayden Kwok-Hay So, "Automatic nested loop acceleration on fpgas using soft CGRA overlay," Second International Workshop on FPGAs for Software Programmers (FSP 2015), 2015.

C. Alias, A. Darte and A. Plesco, "Optimizing remote accesses for offloaded kernels: Application to high-level synthesis for FPGA," 2013 Design, Automation and Test in Europe (DATE), pp.575-580, Mar. 2013.

L. Sommer, J. Korinth and A. Koch, "OpenMP device offloading to FPGA accelerators," 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP 2017), pp.201-205, July 2017.

A. Putnam, D. Bennett, E. Dellinger, J. Mason, P. Sundararajan and S. Eggers, "CHiMPS: A C-level compilation flow for hybrid CPU-FPGA architectures," IEEE 2008 International Conference on Field Programmable Logic and Applications, pp.173-178, Sep. 2008.

A. Shitara, T. Nakahama, M. Yamada, T. Kamata, Y. Nishikawa, M. Yoshimi and H. Amano, "Vegeta: An implementation and evaluation of development-support middleware on multiple opencl platform," IEEE Second International Conference on Networking and Computing (ICNC 2011), pp.141-147, 2011.

K. Shirahata, H. Sato and S. Matsuoka, "Hybrid Map Task Scheduling for GPU-Based Heterogeneous Clusters,"IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom), pp.733-740, Dec. 2010.

R. Kaleem, R. Barik, T. Shpeisman, C. Hu, B. T. Lewis and K. Pingali, "Adaptive heterogeneous scheduling for integrated GPUs.," 2014 IEEE 23rd International Conference on Parallel Architecture and Compilation Techniques (PACT), pp.151-162, Aug. 2014.

M. A. Davila Guzman, R. Nozal, R. G. Tejero, M. Villarroya-Gaudo, D. S. Garcia and J. L. Bosque, "Cooperative CPU, GPU, and FPGA heterogeneous execution with EngineCL," The Journal of Supercomputing, Vol.75, No.3, pp.1732-1746, 2019.

M. Owaida, G. Alonso, L. Fogliarini, A. Hock-Koon and Pierre-Etinne Melet, "Lowering the latency of data processing pipelines through FPGA based hardware acceleration," Proceedings of the VLDB Endowment, Vol.13, No.1, pp.71-85, 2019.

P. Stefanic, M. Cigale, A. C. Jones, L. Knight, I. Taylor, C. Istrate, G. Suciu, A. Ulisses, V. Stankovski, S. Taherizadeh, G. F. Salado, S. Koulouzis, P. Martin and Z. Zhao, "SWITCH workbench: A novel approach for the development and deployment of time-critical microservice-based cloud-native applications," Future Generation Computer Systems, Vol.99, pp.197-212, 2019.

Deckard web site, http://github.com/skyhover/Deckard

cuFFT web site, https://docs.nvidia.com/cuda/cufft/index.html

cuSOLVER web site, https://docs.nvidia.com/cuda/cusolver/index.html

cuRAND web site, https://docs.nvidia.com/cuda/curand/index.html

Time domain finite impulse response filter web site, https://www.intel.com/content/www/us/en/programmable/support/support-resources/design-examples/design-software/opencl/td-fir.html

Numerical Recipes in C, https://www.cec.uchile.cl/cinetica/pcordero/MC_libros/NumericalRecipesinC.pdf

Himeno benchmark website, http://accc.riken.jp/en/supercom/

Cupy website, https://cupy.dev/

MRI-Q website, http://impact.crhc.illinois.edu/parboil/

s-tui website, https://github.com/amanusk/s-tui