Skip to content

Conversation

@leborchuk
Copy link
Owner

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


@leborchuk leborchuk force-pushed the AddDeployScripts branch 2 times, most recently from 535d5da to f637ea2 Compare August 18, 2025 08:50
@leborchuk leborchuk force-pushed the AddDeployScripts branch 2 times, most recently from 331d799 to 208f386 Compare August 29, 2025 10:23
gongxun0928 and others added 5 commits September 3, 2025 11:29
apache#1337)

* feat: use ColumnEncoding_Kind_DIRECT_DELTA as default in offset stream

Optimize performance of variable-length column offsets by switching from
Zstd to delta encoding. This approach better compresses incremental integer
sequences, cutting disk space by more than half while maintaining performance.

The following is a comparison of file sizes for different encoding methods on TPC-DS 20G:

Name                   PAX(ZSTD)    AOCS_SIZE    PAX(Delta)    PAX SIZE / AOCS * 100%
call_center               12 kB       231 kB      10185 bytes        4.31%
catalog_page             499 kB       653 kB       393 kB           60.18%
catalog_returns          240 MB       171 MB       178 MB          104.09%
catalog_sales           3033 MB      1837 MB      1977 MB          107.63%
customer                  16 MB        12 MB        12 MB          100.00%
customer_address        7008 kB      3161 kB      3115 kB           98.54%
customer_demographics     28 MB      8164 kB      9292 kB          113.82%
date_dim                3193 kB      1406 kB      1249 kB           88.85%
household_demographics    42 kB       248 kB        28 kB           11.29%
income_band            1239 bytes     225 kB      1239 bytes         0.54%
inventory                 36 MB        71 MB        36 MB           50.70%
item                    3084 kB      2479 kB      2227 kB           89.84%
promotion                 27 kB       239 kB        18 kB            7.53%
reason                 2730 bytes     226 kB      2280 bytes         0.99%
ship_mode              3894 bytes     227 kB      3315 bytes         1.43%
store                     23 kB       239 kB        18 kB            7.53%
store_returns            400 MB       265 MB       277 MB          104.53%
store_sales             4173 MB      2384 MB      2554 MB          107.12%
time_dim                1702 kB       819 kB       627 kB           76.56%
warehouse              5394 bytes     227 kB      4698 bytes         2.02%
web_page                  21 kB       236 kB        14 kB            5.93%
web_returns              116 MB        83 MB        85 MB          102.41%
web_sales               1513 MB       908 MB       982 MB          108.15%
* PAX: Support LZ4 compression for table columns

PAX only support zlib and zstd compression for column values.
This commit add lz4 support for pax table columns.

* map compress level to acceleration for lz4

* strict acceleration to range [0, 3]

* add macro control
Remove the USE_ORCA ifdef around OptimizerOptions. The struct is
required regardless of ORCA support, and the conditional caused
compilation failures when configured with --disable-orca.
If QueryFinishPending is set when query is running into dumptuples, the
tuplecontext is reset but memtuples are not cumsumed. When query is running
into dumptuples again, tuplesort_sort_memtuples will access these memtuples,
and the memory allocated in tuplecontext is already freed, this will cause
invalid memory access.

To avoid this situation, do nothing in dumptuples if QueryFinishPending is
set.
huansong and others added 19 commits September 8, 2025 15:20
We used to not have a very clear naming guideline for the existing
'pg_%' system views and the MPP versions of them. As an example,
we renamed PG's pg_stat_all_tables and pg_stat_all_indexes to have
an '_internal' appendix, and used their original names to collect
aggregated results from all segments (commit e6f9303).

However, with the previous commit, we now let all existing PG system
views to have their original names, while add corresponding 'gp_%'
views for the non-aggregated results from all segments, and
'gp_%_summary' views for aggregated results from all segments.

Therefore, we now revert pg_stat_all_tables and pg_stat_all_indexes
back to their original definitions, which just collect stats from
a single segment. Then, we add them to sytem_views_gp.in to produce
gp_stat_all_tables and gp_stat_all_indexes which collect non-aggregated
results from all segments. Finally, we rename the aggregate version of
those views to be gp_stat_all_tables_summary and gp_stat_all_indexes_summary.

Because views pg_stat_user_tables and pg_stat_user_indexes use the above
sumary views, we have to add _summary views for these two views as well.
We will add _summary for other system views later.

Modify regress test accordingly.
Added the following views:

gp_stat_progress_vacuum_summary
gp_stat_progress_analyze_summary
gp_stat_progress_cluster_summary
gp_stat_progress_create_index_summary

Also replaced pg_stat_progress_* views with gp_stat_progress_* views for
existing tests.
These summary views offer basic aggregation of the gp_stat_* views across Greenplum coordinator and
segments.

Aggregation logic applied as follows:
* Time related (last_%): use max()
* Transaction related, not innately summable (number of commits/rollbacks) : use max()
* Table specific: sum()/numsegments for replicated tables, sum() for
  distributed tables
* Innately summable stats, if no particular table is involved: use sum()
* pid: use coordinator's pid (not used here, but this is the convention in other gp_%_summary views)
This commit replaces the use of the -d parameter with the -e parameter
when checking for the presence of a Git repository. This allows for more
comprehensive checks, including cases where the working directory may be
part of a Git repository but not the entire repository.
Fix issue: apache#1240

Replicated locus could EXCEPT Partitioned locus when there is
writable CTE on replicated tables.
We could make them SingleQE or Entry to do the set operation.

with result as (update r_1240 set a = a +1 where a <
5 returning *) select * from result except select * from p1_1240;
                          QUERY PLAN
---------------------------------------------------------------
 HashSetOp Except
   ->  Append
         ->  Explicit Gather Motion 3:1  (slice1; segments: 3)
               ->  Subquery Scan on "*SELECT* 1"
                     ->  Update on r_1240
                           ->  Seq Scan on r_1240
                                 Filter: (a < 5)
         ->  Gather Motion 3:1  (slice2; segments: 3)
               ->  Subquery Scan on "*SELECT* 2"
                     ->  Seq Scan on p1_1240
 Optimizer: Postgres-based planner
(11 rows)

Authored-by: Zhang Mingli avamingli@gmail.com
let resource group io limit testing can be reproduced.

If we retain the objects created in the testing, we must clear those
objects before we re-run the testing on local, it's not convenient for
developers.
* add function to clear io.max

This pr has several improvements for io limit:

1. Add a function to clear io.max. This function should be used when
   alter io_limit.

2. Check tablespace in io_limit when drop tablespaces. If the
   tablespace which will be dropped presents in some io_limit resource
   groups, the drop tablespace statement will be aborted.

3. When InitResGroup and AlterResourceGroup, if parseio raises an error,
   the error will be demote to WARNING. So the cluster can launch when
   some tablespace has been removed.
Fix resource group io limit flaky case.

The flaky case caused by running mkdir on multi segments at the same
host.

Just catch FileExistsError and ignore it is ok, the mkdir function just
need the dir exists.
When io_limit encountered syntax error, previous log is just
"Error: Syntax error".

Now, the io_limit has comprehensive log for syntax error:

```
demo=# create resource group rg1 WITH (cpu_max_percent=10, 
                                       io_limit='pg_defaultrbps=100, 
                                       wbps=550,riops=1000,wiops=1000');
ERROR:  io limit: syntax error, unexpected '=', expecting ':'
HINT:   pg_defaultrbps=100, wbps=550,riops=1000,wiops=1000
                      ^
```

```
demo=# create resource group rg1 WITH (cpu_max_percent=10,
                                       io_limit='pg_default:
                                       rbps=100wbps=550,riops=1000,
                                       wiops=1000');
ERROR:  io limit: syntax error, unexpected IO_KEY, expecting end of file or ';'
HINT:   pg_default:rbps=100wbps=550,riops=1000,wiops=1000
                           ^
```
io limit: fix double free.

In 'alterResgroupCallback', the io_limit pointer of 'caps' and 'oldCaps'
maybe point to the same location, so there is a double free potentially.

In 'alterResgroupCallback', the 'oldCaps' will be filled in
'GetResGroupCapabilities', and the assign it to 'caps' via:
caps = oldCaps

To resolve this problem, the code should free the oldCaps.io_limit, and
set it to NIL, when the io_limit has not been altered.

So, if the io_limit has not been altered, caps.io_limit =
oldCaps.io_limit = NIL. If io_limit has been altered, caps.io_limit !=
oldCaps.io_limit.
Add one more hierarchy for resource group when use cgroup v2.

Current leaf node in the gpdb cgroup hierarchy is:
/sys/fs/cgroup/gpdb/<oid>, it's ok for gpdb workflow. But for some
extensions which want to use gpdb cgroup hierarchy, it's not convenient.

Extensions like plcontainer want create sub-cgroup under
/sys/fs/cgroup/<oid> as new leaf node, it's not possible in current
hierarchy, because of no internal processes constraint of cgroup v2.

This commit use a new hierarchy to adopt extensions which want to use
gpdb cgroup hierarchy, and the modification is tiny: move processes from
/sys/fs/cgroup/<oid>/cgroup.procs to
/sys/fs/cgroup/gpdb/<oid>/queries/cgroup.procs, and keep limitations in
/sys/fs/cgroup/<oid>.

With this modification, extensions which want to use gpdb cgroup
hierarchy can create sub cgroup under /sys/fs/cgroup/gpdb/<oid>.
For example, plcontainer will create a cgroup
/sys/fs/cgroup/gpdb/<oid>/docker-12345 and put processes into it.
delete cgroup leaf dir only when use group-v2.

There is no leaf directory in gpdb cgroup when use cgroup v1, so the
rmdir(leaf_path) will always return non-zero values, then the rmdir(path)
will be ignored.
When drop some resource groups, when corresponding cgroup dir cannot be
removed because the rmdire(path) is not executed, this behavior will
cause the failure of CI.

This commit add some logic to check resource group version in deleteDir,
when use group-v1, rmdir(leaf_path) will be ignored.
Add guc: gp_resource_group_cgroup_parent (only for cgroup v2).

Current gpdb doesn't support change root cgroup path of resource group.
For some situations, it's better if gpdb can change the root cgroup path
of resource group.

For example, on the OS with systemd, user maybe want to create a
delegated cgroup to gpdb via systemd, but the delegated cgroup must end
with .service which typically is /sys/fs/cgroup/gpdb.service. And in
other OS without systemd, user maybe want to use /sys/fs/cgroup/gpdb or
other locations directly. So add the gp_resource_group_cgroup_parent can
make the resource group more flexible.
Fix no response when alter io_limit of resource group to '-1'.

There is no action when ALTER RESOURCE GROUP xxx SET IO_LIMIT '-1'
before.

Now the action is that clear the content of io.max and update relation
pg_resgroupcapability.
…eter

This commit fixes issues introduced in
"Add guc: gp_resource_group_cgroup_parent (#16738)" where the
gp_resource_group_cgroup_parent GUC parameter was added but the
gpcheckresgroupv2impl script still used hardcoded "gpdb" paths.

Changes:
- Implement get_cgroup_parent() method to dynamically retrieve the
  gp_resource_group_cgroup_parent value from database
- Replace all hardcoded "gpdb" paths with dynamic cgroup parent value
- Improve error handling in cgroup.c with more descriptive error messages
- Fix test configuration order: set gp_resource_group_cgroup_parent before
  enabling gp_resource_manager=group-v2 to avoid validation failures

This ensures the cgroup validation script works correctly with custom
cgroup parent directories configured via the GUC parameter, making the
resource group feature more flexible for different deployment scenarios.
when seq scan begins, check whether the scanflags of table am is set to
determine whether the runtime filter is pushed down.

When the runtime filter is pushed down to pax am, pax am converts the min/max
scankey in the runtime filter into PFTNode and performs min/max filtering.
Leonid Borchuk added 2 commits September 11, 2025 07:38
…in repo

Changes here includes original commits
```
git log --pretty=format:"%H%x09%an%x09%ad%x09%s"

5c1a2ada9a93ab5f930aebd0018a7369fdf61930        Dianjin Wang    Wed Jun 25 18:28:43 2025 +0800  Update gcc/g++ settings for PAX on RockyLinux 8
133a81303555dedc07c36ec16aa686367c47c774        Leonid Borchuk  Wed Jul 16 15:58:52 2025 +0000  Rename greenplum_path to cloudberry-env
eef2516b90bb7b9de2af95cc2a9df5b125533794        Dianjin Wang    Sat Jun 7 08:51:09 2025 +0800   Update `dorny/paths-filter` version tag to commit
e06dd830250ce89184b13a18aa6663ffcb56db4b        Ed Espino       Sun Jun 1 12:27:17 2025 -0700   Initial commit: Apache Cloudberry (Incubating) release script
384202893e571ce06a2224a116019c2ca9a3dce5        Dianjin Wang    Wed Apr 30 16:09:51 2025 +0800  Add the PAX support in the configure
5081920c07096e9e2cf217ad1b7489ae8963a86a        Ed Espino       Tue Apr 1 03:31:45 2025 -0700   Add protobuf-devel to Dockerfiles for Rocky 8 and 9 builds (apache#14)
7a6549cedb84edad516b89ebbd73529b773aaf03        Jianghua Yang   Thu Feb 27 22:31:07 2025 +0800  Add debuginfo package.
5965faabbf15965778ec09bca02960dbc1900a6a        Ed Espino       Thu Feb 13 01:28:47 2025 -0800  Add new Cloudberry dependency (apache#11)
d50af03d7c04341ce86047aa098bfd8e6a914804        Ed Espino       Sun Dec 15 23:27:47 2024 -0800  Add script to analyze core dumps with gdb (#10)
54bbb3d10ad642f98cf97418978319d6d030070c        Ed Espino       Sun Dec 15 21:39:20 2024 -0800  Adding packages (gdb and file) used in core file analysis. (apache#9)
9638c9e3c983e4d9ae7327517c3f88b0f8335614        Ed Espino       Mon Dec 9 18:43:57 2024 -0800   Container - Multi arch support for Rocky 8 & 9 (apache#8)
5249d69825ec34c2363aa3de3391ddc790786ffe        Ed Espino       Wed Nov 27 02:09:46 2024 -0800  Enhance test result parsing for ignored tests (apache#7)
f6fb4296c392b3cfaef0da45da282ca667af1025        Ed Espino       Tue Nov 19 18:02:14 2024 -0800  fix: remove -e from set options in parse-test-results.sh (apache#6)
2c8302c2c2beb2682185cc9c54503342b0bb0351        Ed Espino       Thu Nov 7 22:11:17 2024 -0800   build: add Apache Cloudberry build automation and test framework
6b8f8938196d55902eecf3506a9142811f97d633        Ed Espino       Thu Nov 14 02:42:02 2024 -0800  Update to Apache Cloudberry (incubating) rpm name and add disclaimer
1a5852903579b11ec0bd12d3083cf4299250eb96        Ed Espino       Thu Nov 14 10:44:10 2024 -0800  Update repository names for pushing to official Docker Hub repository
7f82c004c4a106832a2483e509ba152082561838        Ed Espino       Thu Nov 14 00:13:43 2024 -0800  Add initial Dockerfiles, configs, and GitHub workflows for Cloudberry
72e5f06a7cdf6e807b255717ffe5681a122da474        Dianjin Wang    Tue Nov 5 14:21:11 2024 +0800   Add asf-yaml and basic community files
5acab8e2c785c8036c92562017d06f666b42cfd5        Ed Espino       Wed Sep 4 04:16:35 2024 -0700   Fix release names and paramerize pgvector version.
4ec026dea578f397512c3d8686082317338d6d2c        Ed Espino       Tue Sep 3 01:50:34 2024 -0700   Using Cloudberry pgvector 0.5.1
ed928320f640731efd33e6157a315a4d3100efb5        Ed Espino       Mon Sep 2 23:50:39 2024 -0700   Change ownership of symlink.
e8b22b0bffdb44a65528cbc862cfe8d6425302af        Ed Espino       Sun Sep 1 23:52:57 2024 -0700   Update elf scripts
7f295146bc3412894ecbb0027c10754e736fb7ed        Ed Espino       Fri Aug 30 16:23:38 2024 -0700  Change default installation directory.
1a571a8bdeac6eb1ccf4e7d81d7d89da7ae6b0e8        Ed Espino       Fri Aug 30 16:10:44 2024 -0700  Change default installation directory.
8ed243cff42501843f86a8b70925a6b1e34c6681        Ed Espino       Fri Aug 30 03:52:16 2024 -0700  Updates
ab43f527c56b197f5c223672e92fb1d384f40858        Ed Espino       Fri Aug 30 03:36:35 2024 -0700  Add hll and pgvector extensions
a133b6ea1fe5b0cb8c7440218873023be990ddfa        Ed Espino       Wed Aug 28 22:57:20 2024 -0700  Remove Changelog
371e50dafd31282da78ac661aa7f9890f083b3b9        Ed Espino       Wed Aug 28 22:56:26 2024 -0700  Add Group to spec file.
54be4f19ecc15ee477040e169cd7e66246c3034e        Ed Espino       Wed Aug 28 12:43:09 2024 -0700  Fix changing ownership to gpadmin.
357936d3efd9de69b913b736172270781ac0b6f1        Ed Espino       Wed Aug 28 12:02:15 2024 -0700  Adjustment for shipping GO apps (e.g. gpbackup).
4f2cde6f1c97913152498a6cbb54fcc0d9785958        Ed Espino       Mon Aug 26 23:29:03 2024 -0700  Update spec files
44b1425441c482d260f087d0702aef66f379e840        Ed Espino       Mon Aug 26 22:57:05 2024 -0700  Update docs
57faae9ab7600efc3a4bfe0dd7a2b4eb2d82229f        Ed Espino       Mon Aug 26 22:51:41 2024 -0700  minor enhancements
44123c9a7a133b4625ddaa2a06fb0be0db77f6a2        Ed Espino       Thu Aug 22 02:15:00 2024 -0700  Script update.
60a5884b33ef848aa6861bb9537f28b84ce6597a        Ed Espino       Wed Aug 21 22:28:07 2024 -0700  Clean up repo rpm making it noarch and making the repo entry dynamic
e40c74a91ee4dcac727622fdfbef84e41e61969e        Ed Espino       Wed Aug 21 13:09:22 2024 -0700  Add repo tool
16b3dfe3ead0834bd83165b5559f3388f4e55e6e        Ed Espino       Wed Aug 21 00:45:53 2024 -0700  Fix description in repo RPM.
feb096c5d8887dd6b29d382be9b37844169df1ca        Ed Espino       Wed Aug 21 00:42:28 2024 -0700  Fix relocation RPM feature by createing own prefix variable.
fd5017365759572248fd07c39af67f83f3987295        Ed Espino       Wed Aug 21 00:10:05 2024 -0700  In spec file, set version and release variables via script (build-rpm.sh).
b9668e1b86cc631db50ae603cc63cc0ffa35a400        Ed Espino       Tue Aug 20 11:44:05 2024 -0700  EL SPEC file consolidation.
904f2981e8d1217b57bc5490d31e4463399d4551        Ed Espino       Sun Aug 18 00:45:32 2024 -0700  Rename spec file and add additional runtime dependencies.
89f65a6a0902bbe0fd8294c5ad0767419e53e608        Ed Espino       Sat Aug 17 01:04:02 2024 -0700  Add RPM GPG KEY
91ea01f7f0b7ce9e5553d0861ec640b3392bacd2        Ed Espino       Fri Aug 16 22:56:54 2024 -0700  Update Spec file.
c362bdab21ef9e33e885ee839ad9cf8b009122bd        Ed Espino       Fri Aug 16 19:00:41 2024 -0700  Create repo RPM
9b02885090fb9c09dd03202616303afe9c7f93f5        Ed Espino       Fri Aug 16 11:14:36 2024 -0700  feat: Add ELF dependency analyzer script
f3a569e9620c1d643046e92d1387f104a3dbf8cd        Ed Espino       Fri Aug 16 10:30:55 2024 -0700  Initial EL9 spec file
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.