MapReduce and K-Means Clustering Method for Long Text Summarization on Large Language Model

TENCON 2025 - IEEE Region 10 Conference

2025

Publication Overview

Published at TENCON 2025 - 2025 IEEE Region 10 Conference (October 27-30, 2025). This research addresses text summarization challenges for lengthy documents using Large Language Models by combining MapReduce and K-Means clustering algorithms to overcome token limitations.

Abstract

This paper addresses text summarization challenges for lengthy documents using LLMs by combining MapReduce and K-means algorithms. Long documents are segmented into chunks, converted to embeddings, and clustered. The approach processes documents exceeding token limits. Using Qwen2.5-7B with LoRA-based fine-tuning on banking sector documents, the method achieved ROUGE-1, ROUGE-2, and ROUGE-L scores of 0.416, 0.118, and 0.219 respectively, compared to direct truncation scores of 0.320, 0.090, and 0.168.

Authors

Moh. Rosy Haqqy Aminy, Diana Purwitasari, Dwi Sunaryono, Ilham Gurat Adillion, Dini Adni Navastara, Bilqis Amaliah, Hilmil Pradana, Yoga Yustiawan

Publication Details

Conference: TENCON 2025 - 2025 IEEE Region 10 Conference
Date: October 27-30, 2025
Pages: 1909-1913
DOI: 10.1109/TENCON66050.2025.11375088
Publisher: IEEE
ISBN: 979-8-3315-3772-2
ISSN: 2159-3450

Keywords

Clustering, Large Language Model, MapReduce, ROUGE, Summarization

Key Results

ROUGE-1: 0.416 (vs 0.320 baseline)
ROUGE-2: 0.118 (vs 0.090 baseline)
ROUGE-L: 0.219 (vs 0.168 baseline)
Effective processing of documents exceeding LLM token limits
LoRA-based fine-tuning on Qwen2.5-7B

Affiliation

Department of Informatics, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia

View on IEEE Xplore →