-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
Description
I've made two attempts this afternoon to create a new CitC on AWS using the one-click installer, but for some reason the installation "hangs".
The management node is being created, and I can SSH into that, but the finish command keep producing this (with or without a limits.yaml file):
[citc@mgmt ~]$ finish
Error: The management node has not finished its setup
Please allow it to finish before continuing.
For information about why they have not finished, check the file /root/ansible-pull.log
The last part in /root/ansible-pull.log is this:
TASK [slurm : open all ports] **************************************************
Friday 19 February 2021 14:19:11 +0000 (0:00:00.045) 0:06:17.021 *******
That was over 1 hour ago, no progress since then...
/var/log/slurm exists, but it entirely empty.
Running processes:
Details
root 1515 0.0 1.0 372592 40816 ? Ss 14:12 0:00 /usr/libexec/platform-python /usr/bin/cloud-init modules --mode=final
root 1997 0.0 0.0 217052 732 ? S 14:12 0:00 \_ tee -a /var/log/cloud-init-output.log
root 2037 0.0 0.0 235744 3412 ? S 14:12 0:00 \_ /bin/bash /var/lib/cloud/instance/scripts/part-001
root 4767 0.0 0.9 406240 34832 ? S 14:12 0:00 \_ /usr/bin/python3 -u /usr/bin/ansible-pull --url=https://github.com/clusterinthecloud/ansible.git --checkout=6 --inventory=/root/hosts management.yml
root 9929 7.3 1.6 590508 61548 ? Sl 14:12 5:24 \_ /usr/bin/python3.6 /usr/bin/ansible-playbook -c local /root/.ansible/pull/ip-10-0-16-0.eu-west-1.compute.internal/management.yml -t all -l localhost,mgmt,ip-10-0-16-0,ip-10-0-16-0.eu-west-1.com
root 27615 0.0 1.4 583004 54488 ? S 14:19 0:00 \_ /usr/bin/python3.6 /usr/bin/ansible-playbook -c local /root/.ansible/pull/ip-10-0-16-0.eu-west-1.compute.internal/management.yml -t all -l localhost,mgmt,ip-10-0-16-0,ip-10-0-16-0.eu-west-1
root 27616 0.0 0.0 235744 3372 ? S 14:19 0:00 \_ /bin/sh -c /usr/libexec/platform-python && sleep 0
root 27617 0.0 0.8 415588 30484 ? S 14:19 0:00 \_ /usr/libexec/platform-python
dirsrv 17078 0.1 2.1 662068 81740 ? Ssl 14:14 0:06 /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-mgmt -i /run/dirsrv/slapd-mgmt.pid
citc 17138 0.0 0.2 93904 9968 ? Ss 14:15 0:00 /usr/lib/systemd/systemd --user
citc 17142 0.0 0.1 257440 5068 ? S 14:15 0:00 \_ (sd-pam)
mysql 21671 0.0 2.4 1776020 93568 ? Ssl 14:15 0:01 /usr/libexec/mysqld --basedir=/usr
munge 22577 0.0 0.1 125220 4048 ? Sl 14:17 0:00 /usr/sbin/munged
root 24674 0.0 1.0 509096 41380 ? Ssl 14:17 0:00 /usr/libexec/platform-python -s /usr/sbin/firewalld --nofork --nopid
root 27703 0.0 0.0 232532 2036 ? Ss 15:01 0:00 /usr/sbin/anacron -s
Any suggestions on how to figure out what went wrong?