Skip to content

Speed up csv compression on SMP machines#1085

Open
sbernhard wants to merge 1 commit into
theforeman:masterfrom
ATIX-AG:use_pbzip
Open

Speed up csv compression on SMP machines#1085
sbernhard wants to merge 1 commit into
theforeman:masterfrom
ATIX-AG:use_pbzip

Conversation

@sbernhard
Copy link
Copy Markdown
Contributor

The output of pbzip2 is fully compatible with bzip2. pbzip2 is optimized on SMP machines as it splits the input in multiple chunks and uses threads to run on multiple cpu cores to speed up the compression.

This change requires to add pbzip2 as a requirements in the RPM spec for foreman-maintain.

The output of pbzip2 is fully compatible with bzip2.
pbzip2 is optimized on SMP machines as it splits the input in multiple
chunks and uses threads to run on multiple cpu cores to speed up the
compression.

This change requires to add pbzip2 as a requirements in the RPM spec
for foreman-maintain.
@sbernhard
Copy link
Copy Markdown
Contributor Author

@sbernhard
Copy link
Copy Markdown
Contributor Author

Thoughs on this @evgeni ?

@sbernhard
Copy link
Copy Markdown
Contributor Author

@stejskalleos can you maybe have a look at this?

@stejskalleos stejskalleos self-assigned this May 25, 2026
@stejskalleos
Copy link
Copy Markdown
Contributor

/packit build

f.close
end
execute("bzip2 #{filepath} -c -9 > #{filepath}.bz2")
execute("pbzip2 #{filepath} -c -9 > #{filepath}.bz2")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the pbzip2 is not available? I could not find information about where it is installed by default, so maybe some checks in place would be a good idea:

compressor = system("which pbzip2 > /dev/null 2>&1") ? "pbzip2" : "bzip2"
execute("#{compressor} #{filepath} -c -9 > #{filepath}.bz2")

Second question:
By default, pbzip2 will use all available cores. This can make the Foreman instance unresponsive and can cause some access issues. Should we lower the process's priority or (somehow) limit the number of cores used for execution? So we don't exhaust the server's resources?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants