Tanzu Community Edition

Documentation

Troubleshoot Cluster Bootstrapping

When you create a management cluster, a bootstrap cluster is created on your local client machine. This is a Kind based cluster - a cluster in a container. This bootstrap cluster then creates a cluster on your specified provider.

The bootstrap cluster is the key to being able to introspect and understand what is happening during bootstrapping. Any issues or errors that occur in the bootstrap cluster during bootstrapping will provide information about potential problems in the final cluster on the target provider.

Troubleshoot with Tanzu Diagnostics

One way to debug your cluster bootstrap issues is to use the tanzu diagnostics CLI plugin which comes with Tanzu Community Edition.

Prerequisites

Prior to collecting diagnostics data, your local machine must have the following programs in its $PATH:

  • kind
  • kubectl

Before you Begin

The kind bootstrap cluster name and the management cluster name are not related. If there are multiple kind bootstrap clusters present, before you begin debugging, you should determine the name of the kind cluster in one or both of the following files:

  • ~/.kube-tkg/tmp/
  • ~/.config/tanzu/tkg/config.yaml - find the cluster name in the name parameter and the corresponding kind bootstrap cluster name in the context parameter prefixed by kind-tkg-kind

Collecting bootstrap diagnostics

If the bootstrap process is stuck or did not finish successfully, use the diagnostics plugin to collect logs and cluster information:

tanzu diagnostics collect

This command searches for Tanzu bootstrap kind clusters (with tkg-kind-* prefix) and collects logs and API objects (excluding Secrets) that can help diagnose the bootstrap issues. You should see output similar to the following:

tanzu diagnostics collect
2021/09/20 11:05:17 Collecting bootstrap cluster diagnostics
2021/09/20 11:05:17 Info: Found bootstrap cluster(s): ["tkg-kind-b4o9sn5948199qbgca8d"]
2021/09/20 11:05:17 Info: Bootstrap cluster: tkg-kind-b4o9sn5948199qbgca8d: capturing node logs
2021/09/20 11:05:22 Info: Capturing pod logs: cluster=tkg-kind-b4o9sn5948199qbgca8d
2021/09/20 11:05:22 Info: Capturing API objects: cluster=tkg-kind-b4o9sn5948199qbgca8d
2021/09/20 11:05:25 Info: Archiving: bootstrap.tkg-kind-b4o9sn5948199qbgca8d.diagnostics.tar.gz

The command automatically generates a tar file which you can unpack to analyze and troubleshoot your cluster.

What is collected?

The diagnostics plugin collects a set of known resources to help troubleshoot bootstrap cluster problems, including:

  • Kind cluster logs (obtained with command kind export logs)
  • Pod logs from capi*, capv*, capa*, tkg-system namespaces (if they exist)
  • Pods, services, deployments, apps (from capi*, capv*, capa*, tkg-system namespaces)
  • Any other server resources in the cluster-api resource category

No other cluster objects are collected.

Collecting diagnostics for specific bootstrap cluster

The previous command will collect diagnostics for all bootstrap kind clusters that are found. You can use the plugin to specify a boostrap cluster name.

First, determine the name of the bootstrap cluster. This can be done using kind itself to list its clusters:

kind get clusters

Next, run the tanzu diagnostics command to collect diagnostics data to help troubleshoot the cluster identified in the step above:

tanzu diagnostics collect --boostrap-cluster-name=<BOOTSTRAP-CLUSTER-NAME>

Diagnostics for bootstrap clusters only

If you do not have a management cluster yet (or do not need to collect management cluster diagnostics), you can modify the tanzu diagnostics command to only collect diagnostics for the bootstrap cluster as follows:

tanzu diagnostics collect --management-cluster-skip

Troubleshooting manually

If the steps above are not enough, or you want complete control over your troubleshooting steps, complete the following steps to troubleshoot a bootstrap cluster:

  1. Run docker ps on your local Docker system to get the name of the bootstrap cluster container:

    docker ps
    

    The bootstrap cluster container name will begin with tkg-cluster followed by a unique ID, for example, tkg-cluster-example1234567abcdef. Copy the CONTAINER ID of the bootstrap cluster container.

  2. Open a bash shell in the bootstrap cluster container:

    docker exec -it <BOOTSTRAP-CLUSTER-ID> bash
    

    Where <BOOTSTRAP-CLUSTER-ID> is the value copied in the previous step.

  3. Before you can proceed to run kubectl commands against the pods inside the bootstrap cluster container, copy the admin.conf file to the default kubeconfig location:

    cp -v /etc/kubernetes/admin.conf ~/.kube/config
    
  4. Now you are inside the bootstrap cluster container that is going to bootstrap your cluster to the target provider, you can run kubectl commands against this container. By watching the status of the pods, you can understand what might go wrong in the bootstrap process. Run the following command to see the pods being created inside the container:

    kubectl get po -A
    
  5. Copy the name of the controller manager, it is usually first in the list. It will be named similarly to the following depending on your target provider:

    • capa-controller-manager-12a3456789-b1cde (AWS)
    • capd-controller-manager-12a3456789-b1cde (Docker)
    • capv-controller-manager-12a3456789-b1cde (vSphere)
    • capz-controller-manager-12a3456789-b1cde (Azure)
  6. Next, you can examine the logs of the controller manager that communicates with the target provider. This step is important, if you are having problems bootstrapping, the errors in the controller logs will provide the detail. Examine the logs for the controller manager:

    kubectl logs -n <NAMESPACE> <CONTROLLER-MANAGER> -c manager –f
    

    Where

    • <CONTROLLER-MANAGER> is the value copied in the previous step.
    • <NAMESPACE> will vary based on your provider, use:
      • capa-system (AWS)
      • capd-system (Docker)
      • capv-system (vSphere)
      • capz-system (Azure)
  7. [Optional] Events are also reported based on actions taken in the target provider. You can view the known events by running:

    kubectl get events -n tkg-system
    

Join us!

Our open community welcomes all users and contributors

Community