23. 12. 2020 Michele Santuari Development

RPM/ISO Repository: Disk Space Optimization

Our NetEye Unified Monitoring Solution is distributed and maintained via ISO images and RPM Packages. In the past, we used the mrepo tool to manage our RPMs/ISO repositories, and during the current year we migrated to Pulp, as my colleague Andrea has already described.

As an R&D team, we continuously release new features in development so that they can be installed on “pre-release” machines to gather internal feedback, and on our R&D NetEye machine, which is used to monitor our internal infrastructure. Moreover, we have some dedicated development RPMs that contain all the unit, integration and front-end tests.

To store all the aforementioned RPMs for all NetEye versions, you can imagine that disk space is particularly critical. For this reason, we decided to use a Pulp repository which is able to optimize disk storage by eliminating duplicate copies of the same RPMs (e.g., we might have the same RPM in both the NetEye 4.15 and 4.16 repositories).

Although Pulp can greatly reduce the disk space needed for copies of an RPM, we can do even better by using a “special” volume which creates a transparent compress/de-duplicate layer for the data called VDO.

“VDO is slower than plain LVM”, and also uses from 3 GB to 4 GB as metadata to support its compression and de-duplication system, but it provides a significant way to reduce the disk space used.

The VDO on our productive Pulp repository management system is able to save almost 40% of the space compared to before. Given this rate and 200 GB of storage, you might be able to save ~78 GB (including the 4 GB of “wasted” space for the VDO metadata).

I created a test VDO to see what it could do:

  • I started with a physical volume of 10 GB.
  • 4 GB are used for the metadata, which is fixed and does not depend on the total size of the physical volume. Clearly you’ll need to use a larger volume to see a real gain using VDO, so that the “wasted” space of the metadata will be negligible compared to the disk space saved 🙂
  • 6 GB are thus available for data.
Device               Size      Used Available Use% Space saving%
/dev/mapper/vdo     10.0G      4.0G      6.0G  40%           N/A

Now if we add two RPMs of two different versions of Elasticsearch, we can see that only ~300 MB are used instead of ~600 MB (excluding the 4 GB of metadata):

Device               Size      Used Available Use% Space saving%
/dev/mapper/vdo     10.0G      4.3G      5.7G  42%           50%

Files stored on VDO:
Size    File
302M    elasticsearch-7.10.1-aarch64.rpm
302M    elasticsearch-7.9.3-x86_64.rpm
total 603M

We can also check the behavior of the VDO with two NetEye ISOs, where we can see that the disk space saved reaches 35% (again, excluding the 4 GB of metadata).

Device               Size      Used Available Use% Space saving%
/dev/mapper/vdo     10.0G      6.1G      3.9G  60%           35%

Files stored on VDO:
Size   File
1.6G   neteye4.15-centos7.stable.iso
1.6G   neteye4.16-centos7.stable.iso
total 3.2G
Michele Santuari

Michele Santuari

Software Architect at Wuerth Phoenix
Hi, my name is Michele Santuari and I am a Telecommunication engineer felt in love with OpenFlow, the first attempt of centralized network management, provisioning, and monitoring. I embraced the Software Defined Networking approach to discover a passion for programming languages. Now, I am into Agile methodologies and crazy development process management.

Author

Michele Santuari

Hi, my name is Michele Santuari and I am a Telecommunication engineer felt in love with OpenFlow, the first attempt of centralized network management, provisioning, and monitoring. I embraced the Software Defined Networking approach to discover a passion for programming languages. Now, I am into Agile methodologies and crazy development process management.

Leave a Reply

Your email address will not be published.

Archive