Skip to content

Conversation

@GavinHeff
Copy link
Contributor

Adds playbook to gather more advanced facts about a multinode cloud

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears to have picked up some stray changes from another PR. You might need another rebase

@Alex-Welsh
Copy link
Member

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new playbook to gather more detailed facts about the cloud environment, supported by a new custom Ansible filter. While the overall direction is good, the new filter plugin in filters.py contains several issues, including a critical bug that silences errors, leftover debugging comments, and inconsistencies. The get-cloud-facts.yml playbook also has some potential problems, such as a misleading fact name and unsafe variable access that could lead to playbook failures. I have provided specific comments and code suggestions to address these issues and improve the robustness and clarity of the new code.

@GavinHeff GavinHeff force-pushed the gatherCloudFactsv2 branch 2 times, most recently from 9949f32 to 8b1311e Compare September 25, 2025 14:53
indiv_host_vars = hostvars[host][var]
for key in indiv_host_vars.keys():
if key == "skipped":
return {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think return here is just going to instantly exit the function completely and return an empty dictionary. I think you want continue. May have to refactor the loop a bit to get that to work properly though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, what's the intention with this bit of the loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a way to handle hosts not having Docker installed (or any other service it might be checking). If the docker ps command is skipped then Ansible saves "skipped" to containers_list. This part of the loop catches that and returns an empty dict as the value for that host

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay have you tested it? I'm concerned that return is going to exit the entire function and just ignore any future iterations of the loop

Copy link
Member

@Alex-Welsh Alex-Welsh Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a quick example of what I mean:

def testfunc():
    for i in range(10):
        if not i == 3:
            print(i)
        else:
            return i

    return "complete"

output = testfunc()

print("Final output")
print(output)
alex@alex-framework:/tmp$ python3 testo.py 
0
1
2
Final output
3

The loop never gets past 3 because it returns too early

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whereas continue will just skip the element:

alex@alex-framework:/tmp$ python3 testo.py 
0
1
2
4
5
6
7
8
9
Final output
complete

Copy link
Member

@Alex-Welsh Alex-Welsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, just a few minor tweaks

ansible.builtin.set_fact:
seed_node_is_vm: "{{ ansible_facts.virtualization_role == 'guest' }}"

- name: Get facts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be extra clear

Suggested change
- name: Get facts
- name: Set facts for all hosts

- name: Get facts
hosts: all
gather_facts: true
tasks:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a pass on spacing between these tasks. Sometimes there's no gap, sometimes there's two lines.
Should be one empty line between each task (no gap between the first task and the tasks: line)

Comment on lines +30 to +40
- name: Set internet connectivity fact
ansible.builtin.set_fact:
internet_connectivity: "{{ not successful_ping.failed }}"
- name: Get OS distribution
ansible.builtin.set_fact:
distribution: "{{ ansible_facts.distribution }}"


- name: Get OS distribution release
ansible.builtin.set_fact:
distribution_release: "{{ ansible_facts.distribution_release }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went looking for ways to deduplicate this and it turns out you can specify multiple facts in one task, see docs

So you can do:

Suggested change
- name: Set internet connectivity fact
ansible.builtin.set_fact:
internet_connectivity: "{{ not successful_ping.failed }}"
- name: Get OS distribution
ansible.builtin.set_fact:
distribution: "{{ ansible_facts.distribution }}"
- name: Get OS distribution release
ansible.builtin.set_fact:
distribution_release: "{{ ansible_facts.distribution_release }}"
- name: Set facts
ansible.builtin.set_fact:
internet_connectivity: "{{ not successful_ping.failed }}"
distribution: "{{ ansible_facts.distribution }}"
distribution_release: "{{ ansible_facts.distribution_release }}"
hugepages_enabled: "{{ 'hugepages' in ansible_facts.cmdline }}"

(note I've also added hugepages_enabled to my example)

ansible.builtin.set_fact:
hugepages_enabled: "{{ 'hugepages' in ansible_facts.cmdline }}"

- name: Gather Cloud Facts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this play is now a bit similar to the previous "gather facts" plays. I'd rename it to be a bit more clear

Suggested change
- name: Gather Cloud Facts
- name: Write cloud facts file

pulp_tls_enabled: "{{ pulp_enable_tls }}"
kolla_image_tags: "{{ kolla_image_tags }}"
gpu_passthrough_enabled: "{{ gpu_group_map | length > 0 | bool }}"
vGPU_enabled: "{{ groups['vgpu'] | length > 0 | bool }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just by convention, lets keep all vars lowercase

Suggested change
vGPU_enabled: "{{ groups['vgpu'] | length > 0 | bool }}"
vgpu_enabled: "{{ groups['vgpu'] | length > 0 | bool }}"

@@ -0,0 +1,124 @@
---
- name: Get facts for control host
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Set facts rather than Get facts

Suggested change
- name: Get facts for control host
- name: Set facts for control host

Applies to subsequent plays as well

ansible.builtin.command: "uname -r"
register: kernel_version

- name: Check if docker is installed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: capitalisation

Suggested change
- name: Check if docker is installed
- name: Check if Docker is installed

@Alex-Welsh
Copy link
Member

You'll also need to rebase this on stackhpc/2025.1. In a recent change, all the ansible playbooks were moved to separate sub-directories, so there's a conflict where the playbook moved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants