loading page

Efficient Biomedical Text Summarization with Quantized LLAMA 2: Enhancing Memory Usage and Inference on Low Powered Devices
  • +2
  • Sanjeev Kumar,
  • Vikas Ranjan,
  • Arjab Chakrabarti,
  • Tridib Kumar Das,
  • Anushka Singh
Sanjeev Kumar
University of Illinois Urbana-Champaign

Corresponding Author:[email protected]

Author Profile
Vikas Ranjan
Birla Institute of Technology & Science Pilani
Author Profile
Arjab Chakrabarti
Kalinga Institute of Industrial Technology Deemed to be University
Author Profile
Tridib Kumar Das
Kalinga Institute of Industrial Technology Deemed to be University
Author Profile
Anushka Singh
Kalinga Institute of Industrial Technology Deemed to be University
Author Profile

Abstract

The deployment of large language models (LLMs) on edge devices and non-server environments presents significant challenges, primarily due to constraints in memory usage, computational power, and inference time. This paper investigates the feasibility of running LLMs across such devices by focusing on optimizing memory usage, employing quantization techniques, and reducing inference time. Specifically, we utilize LLaMA 2 for biomedical text summarization and implement Low-Rank Adaptation (LoRA) quantization to compress the model size to compress the model size and fine-tune it using limited resources. Our study systematically evaluates memory consumption during both training and inference phases, demonstrating substantial reductions through efficient LoRA quantization. Our results indicate that with careful optimization, it is feasible to deploy sophisticated LLMs like LLaMA 2 on low powered devices, thereby broadening the scope of their application in resource-constrained environments.
18 Jul 2024Submitted to Expert Systems
18 Jul 2024Submission Checks Completed
18 Jul 2024Assigned to Editor
23 Jul 2024Reviewer(s) Assigned
20 Aug 2024Review(s) Completed, Editorial Evaluation Pending
26 Aug 2024Editorial Decision: Revise Major
21 Sep 20241st Revision Received
24 Sep 2024Submission Checks Completed
24 Sep 2024Assigned to Editor
24 Sep 2024Reviewer(s) Assigned
01 Oct 2024Review(s) Completed, Editorial Evaluation Pending
06 Oct 2024Editorial Decision: Accept