Modeling and automation of the process for detecting duplicate objects in memory snapshots

Mitikov, Nikolay Y.; Мітіков, Микола Юрійович; Guk, Natalia A.; Гук, Наталія Анатоліївна

dc.contributor.author	Mitikov, Nikolay Y.
dc.contributor.author	Мітіков, Микола Юрійович
dc.contributor.author	Guk, Natalia A.
dc.contributor.author	Гук, Наталія Анатоліївна
dc.date.accessioned	2024-06-04T21:08:24Z
dc.date.available	2024-06-04T21:08:24Z
dc.date.issued	2024-05-27
dc.identifier.issn	2663-0176
dc.identifier.issn	2663-7731
dc.identifier.uri	http://dspace.opu.ua/jspui/handle/123456789/14505
dc.description.abstract	The paper is devoted to the problem of detecting increased memory usage by software applications. The modern software development cycle is focused on functionality and often overlooks aspects of optimal resource utilization. Limited physical scalability sets an upper limit on the system's capacity to handle requests. The presence of immutable objects with identical information indicates increased memory consumption. Avoiding duplicates of objects in memory allows for more rational use of existing resources and increases the volumes of processed information. Existing scientific publications focus on investigating memory leaks, limiting attention to excessive memory use due to the lack of a unified model for finding excessive memory use. It should be noted that existing programming patterns include the “object pool” pattern, but leave the decision on its implementation to engineers without providing mathematical grounding. This paper presents the development of a mathematical model for the process of detecting duplicates of immutable String type objects in a memory snapshot. Industrial systems that require hundreds of gigabytes of random-access memory to operate and contain millions of objects in memory have been analyzed. At such data scales, there is a need to optimize specifically the process of finding duplicates. The research method is the analysis of memory snapshots of high-load systems using software code developed on.NET technology and the ClrMD library. A memory snapshot reflects the state of the process under investigation at a particular moment in time, containing all objects, threads, and operations being performed. The ClrMD library allows programmatic exploration of objects, their types, obtaining field values, and constructing graphs of relationships between objects. The series of experiments was conducted on Windows-backed machines, although similar results can be obtained on Linux thanks to cross-platform object memory layout pattern. The results of the study proposed an optimization that allows speeding up the process of finding duplicates several times. The scientific contribution of the research lies in the creation of a mathematically substantiated approach that significantly reduces memory resource use and optimizes computational processes. The practical utility of the model is confirmed by the optimization results achieved thanks to the obtained recommendations, reducing hosting costs (which provides greater economic efficiency in the deployment and use of software systems in industrial conditions), and increasing the volumes of processed data.	en
dc.description.abstract	Мета цієї роботи полягає у виявленні збільшеного використання пам'яті програмними застосунками. Сучасний цикл розробки програмного забезпечення зосереджений на функціональності і часто ігнорує аспекти оптимального використання ресурсів. Обмежене фізичне масштабування задає верхній ліміт на пропускну здатність системи оброблювати запити. Наявність незмінних об’єктів з однаковою інформацію є ознакою збільшеної витрати пам’яті. Уникнення дублікатів об’єктів в пам’яті дозволяє більш раціонально використовувати існуючий ресурс і збільшити обсяги оброблюваної інформації. Існуючі наукові публікації фокусуються на дослідженні проблем витоків пам’яті, та обмежують увагою саме надмірне використання пам’яті через відсутність уніфікованої моделі пошуку надмірного використання пам’яті. Варто зазначити, що існуючі шаблони програмування містять шаблон «пул об’єктів», але залишають висновок про доцільність його впровадження інженерам, не надаючи математичного підґрунтя. Представлено розробку математичної моделі для процесу виявлення дублікатів об'єктів з властивістю незмінності типу String в знімку пам’яті. Проаналізовано промислові системи, які вимагають сотні гігабайт оперативної пам’яті для роботи та містять мільйони об’єктів в оперативній пам’яті. За таких масштабів даних, існує необхідність оптимізувати саме процес пошуку дублікатів. Методом дослідження є аналіз знімків пам’яті високонавантажених систем за допомогою програмного коду, розробленого на технології .NET та бібліотеці ClrMD. Знімок пам’яті відображає стан досліджуваного процесу у момент часу, містить усі об’єкти, потоки та виконувані операції. Бібліотека ClrMD дозволяє програмно досліджувати об’єкти, їх типи, отримувати значення полів, будувати графи зв’язків між об’єктами. Серію експериментів було проведено на віртуальних машинах під керуванням операційної системи Windows, але схожі результати можуть бути отримані для операційної системи Linux через крос-платформений стандарт позиціювання даних в пам’яті. За результатами дослідження було запропоновано оптимізацію яка дозволяє пришвидшити процес пошуку дублікатів у декілька разів. Науковий внесок дослідження полягає в створенні математично обґрунтованого підходу, який сприяє значному зменшенню використання ресурсів пам'яті та оптимізації обчислювальних процесів. Практична користь моделі підтверджується результатами оптимізації досягнутих завдяки отриманим рекомендаціям, зниженням витрат на хостинг (що забезпечує більшу економічну ефективність у розгортанні та використанні програмних систем у промислових умовах), а також збільшення обсягів оброблених даних.	en
dc.language.iso	en	en
dc.publisher	Odessа Polytechnic National University	en
dc.subject	Optimization	en
dc.subject	algorithm	en
dc.subject	performance	en
dc.subject	memory snapshot	en
dc.subject	duplication	en
dc.subject	string	en
dc.subject	оптимізація	en
dc.subject	алгоритм	en
dc.subject	продуктивність	en
dc.subject	знімок пам’яті	en
dc.subject	дублювання	en
dc.subject	строка	en
dc.title	Modeling and automation of the process for detecting duplicate objects in memory snapshots	en
dc.title.alternative	Моделювання та автоматизація процесу пошуку дублікатів об'єктів у знімках пам'яті	en
dc.type	Article	en
opu.citation.journal	Herald of Advanced Information Technology	en
opu.citation.volume	2	en
opu.citation.firstpage	147	en
opu.citation.lastpage	157	en
opu.citation.issue	7	en