Skip to content

Katalyst-colocation-orm can be installed on enhanced-k8s cluster but katalyst-colocation cannot be installed #617

@ozline

Description

@ozline

What happened?

I followed Colocate your application using Katalyst to install Katalyst.

It mentioned that if you use Kubewharf enhanced kubernetes, install katalyst-colocation

And if you use vanilla kubernetes, install katalyst-colocation-orm

My node follows Install Kubewharf enhanced-k8s to install enhanced k8s, but only katalyst-colocation-orm can be installed instead of katalyst-colocation

If I install katalyst-colocation, it will report the following error in katalyst-colocation-agent

I0610 13:10:27.641756       1 state_checkpoint.go:121] "[cpu_plugin] State checkpoint: restored state from checkpoint"
I0610 13:10:27.641777       1 util.go:68] [katalyst-core/pkg/agent/qrm-plugins/cpu/util.GetCoresReservedForSystem] get reservedQuantityInt: 0 from ReservedCPUCores configuration
I0610 13:10:27.641787       1 util.go:77] [katalyst-core/pkg/agent/qrm-plugins/cpu/util.GetCoresReservedForSystem] take reservedCPUs:  by reservedCPUsNum: 0
I0610 13:10:27.641832       1 policy.go:950] [katalyst-core/pkg/agent/qrm-plugins/cpu/dynamicpolicy.(*DynamicPolicy).cleanPools] there is no pool to delete
I0610 13:10:27.641842       1 policy.go:964] [katalyst-core/pkg/agent/qrm-plugins/cpu/dynamicpolicy.(*DynamicPolicy).initReservePool] initReservePool reserve:
I0610 13:10:27.641859       1 state_mem.go:109] "[cpu_plugin] updated cpu plugin pod entries" podUID="reserve" containerName="" allocationInfo="{\"pod_uid\":\"reserve\",\"owner_pool_name\":\"reserve\",\"allocation_result\":\"\",\"original_allocation_result\":\"\",\"topology_aware_assignments\":{},\"original_topology_aware_assignments\":{},\"init_timestamp\":\"\",\"labels\":null,\"annotations\":null,\"qosLevel\":\"\"}"
I0610 13:10:27.644274       1 policy.go:1039] [katalyst-core/pkg/agent/qrm-plugins/cpu/dynamicpolicy.(*DynamicPolicy).initReclaimPool] exist initial reclaim: 0-9
I0610 13:10:27.644300       1 agent.go:102] needToRun "qrm_cpu_plugin"
I0610 13:10:27.644308       1 agent.go:91] initializing "qrm_io_plugin"
I0610 13:10:27.644320       1 agent.go:102] needToRun "qrm_io_plugin"
I0610 13:10:27.644325       1 agent.go:91] initializing "qrm_network_plugin"
W0610 13:10:27.644335       1 util.go:122] [katalyst-core/pkg/agent/qrm-plugins/network/staticpolicy.filterNICsByAvailability] nic: eno1 doesn't have IP address
I0610 13:10:27.644344       1 util.go:302] [katalyst-core/pkg/agent/qrm-plugins/network/staticpolicy.getReservedBandwidth] reservedBanwidth: 0, nicCount: 1, policy: first,
I0610 13:10:27.644361       1 state_net.go:47] "[network_plugin: katalyst-core/pkg/agent/qrm-plugins/network/state.NewNetworkPluginState] initializing new network plugin in-memory state store"
I0610 13:10:27.644372       1 util.go:37] [GenerateMachineState: katalyst-core/pkg/agent/qrm-plugins/network/state.GenerateMachineState] NIC wlp2s0's speed: -1, capacity: [0/0], reservation: 0
I0610 13:10:27.644511       1 util.go:37] [GenerateMachineState: katalyst-core/pkg/agent/qrm-plugins/network/state.GenerateMachineState] NIC wlp2s0's speed: -1, capacity: [0/0], reservation: 0
I0610 13:10:27.644531       1 state_net.go:121] "[network_plugin: katalyst-core/pkg/agent/qrm-plugins/network/state.(*networkPluginState).SetMachineState] updated network plugin machine state" NICMap="{\"wlp2s0\":{\"egress_state\":{\"Capacity\":0,\"SysReservation\":0,\"Reservation\":0,\"Allocatable\":0,\"Allocated\":0,\"Free\":0},\"ingress_state\":{\"Capacity\":0,\"SysReservation\":0,\"Reservation\":0,\"Allocatable\":0,\"Allocated\":0,\"Free\":0},\"pod_entries\":{}}}"
I0610 13:10:27.644543       1 state_net.go:145] "[network_plugin: katalyst-core/pkg/agent/qrm-plugins/network/state.(*networkPluginState).SetPodEntries] updated network plugin pod resource entries" podEntries="{}"
I0610 13:10:27.644555       1 state_checkpoint.go:136] "[network_plugin: katalyst-core/pkg/agent/qrm-plugins/network/state.(*stateCheckpoint).restoreState] state checkpoint: restored state from checkpoint"
I0610 13:10:27.644572       1 policy.go:177] [katalyst-core/pkg/agent/qrm-plugins/network/staticpolicy.(*StaticPolicy).ApplyConfig] apply configs, qosLevelToNetClassMap: map[dedicated_cores:0 reclaimed_cores:0 shared_cores:0 system_cores:0], podLevelNetClassAnnoKey: katalyst.kubewharf.io/net_class_id, podLevelNetAttributesAnnoKeys: []
I0610 13:10:27.644581       1 agent.go:102] needToRun "qrm_network_plugin"
I0610 13:10:27.644588       1 agent.go:91] initializing "periodical-handler-manager"
I0610 13:10:27.644593       1 agent.go:102] needToRun "periodical-handler-manager"
I0610 13:10:27.644600       1 agent.go:91] initializing "katalyst-agent-orm"
I0610 13:10:27.644631       1 manager.go:86] "Creating topology manager with policy per scope" topologyPolicyName=""
E0610 13:10:27.644640       1 manager.go:129] unknown policy: ""
E0610 13:10:27.644647       1 agent.go:94] Error initializing "katalyst-agent-orm"
I0610 13:10:27.644662       1 file.go:257] [GetUniqueLock] release lock successfully
I0610 13:10:28.396105       1 file.go:90] fsNotify watcher notify "/var/lib/kubelet/resource-plugins/kubelet_qrm_checkpoint": CREATE
I0610 13:10:28.396155       1 topology_adapter.go:281] qrm state file changed, notify to update topology status
I0610 13:10:28.396166       1 kubeletplugin.go:177] send topology change notification to plugin kubelet-reporter-plugin
run command error: failed to init ORM: unknown policy: ""

Only katalyst-agent not working

root@debian-node-1:~# kubectl get pods -n katalyst-system
NAME                                                       READY   STATUS             RESTARTS      AGE
katalyst-colocation-katalyst-agent-f5glx                   0/1     CrashLoopBackOff   4 (36s ago)   2m32s
katalyst-colocation-katalyst-agent-jzgft                   0/1     CrashLoopBackOff   4 (52s ago)   2m32s
katalyst-colocation-katalyst-controller-59b5c89cd6-jcn9m   1/1     Running            0             2m32s
katalyst-colocation-katalyst-controller-59b5c89cd6-vpjvq   1/1     Running            0             2m32s
katalyst-colocation-katalyst-metric-85c47ff4bf-nl9sf       1/1     Running            0             2m32s
katalyst-colocation-katalyst-scheduler-77cdd9d66f-8mszz    1/1     Running            0             2m32s
katalyst-colocation-katalyst-scheduler-77cdd9d66f-c27qc    1/1     Running            0             2m32s
katalyst-colocation-katalyst-webhook-5f6ccc7cb-ngz2x       1/1     Running            0             2m32s
katalyst-colocation-katalyst-webhook-5f6ccc7cb-vrnzs       1/1     Running            0             2m32s

But install katalyst-colocation-orm in Kubewharf enhanced kubernetes work fine(pod status of agent is Running

What did you expect to happen?

install katalyst-colocation in KubeWharf-enhanced-kubernetes work fine

How can we reproduce it (as minimally and precisely as possible)?

Install katalyst-colocation using helm after installing KubeWharf-enhanced-kubernetes

helm install katalyst-colocation -n katalyst-system --create-namespace kubewharf/katalyst-colocation

Software version

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions