TENCON 2025 - IEEE Region 10 Conference
2025Published at TENCON 2025 - 2025 IEEE Region 10 Conference (October 27-30, 2025). This research addresses text summarization challenges for lengthy documents using Large Language Models by combining MapReduce and K-Means clustering algorithms to overcome token limitations.
This paper addresses text summarization challenges for lengthy documents using LLMs by combining MapReduce and K-means algorithms. Long documents are segmented into chunks, converted to embeddings, and clustered. The approach processes documents exceeding token limits. Using Qwen2.5-7B with LoRA-based fine-tuning on banking sector documents, the method achieved ROUGE-1, ROUGE-2, and ROUGE-L scores of 0.416, 0.118, and 0.219 respectively, compared to direct truncation scores of 0.320, 0.090, and 0.168.
Moh. Rosy Haqqy Aminy, Diana Purwitasari, Dwi Sunaryono, Ilham Gurat Adillion, Dini Adni Navastara, Bilqis Amaliah, Hilmil Pradana, Yoga Yustiawan
Clustering, Large Language Model, MapReduce, ROUGE, Summarization
Department of Informatics, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia