### Overview
Kubernetes relies heavily on API calls and is sensitive to network issues. When troubleshooting, standard Linux tools and processes are essential. If a shell, such as bash, is not available in an affected Pod, consider deploying another similar Pod with a shell, like busybox. Tools like DNS configuration files and `dig` are good starting points. For more challenging issues, tools like `tcpdump` may be necessary.
Large and diverse workloads can be difficult to track, making monitoring usage essential. Monitoring involves collecting key metrics such as CPU, memory, disk usage, and network bandwidth on nodes and applications. While Kubernetes now includes the Metric Server for such tasks, logging is not integrated.
### External Logging Solutions
To aggregate logs, external solutions like Fluentd, Prometheus, and the ELK stack (Elasticsearch, Logstash, Kibana) are often used. Fluentd serves as a unified logging layer, making it easier to visualize and search logs. Prometheus provides a time-series database and integrates with Grafana for visualization and dashboards.
### Basic Troubleshooting Steps
The troubleshooting flow should start with the obvious. If there are errors from the command line, investigate them first. Here's a basic flow:
1. **Command Line Errors**: Start by resolving any command line errors.
2. **Pod Logs**: Use `kubectl logs pod-name` to view container logs. If no logs are available, consider deploying a sidecar container for logging.
3. **Networking**: Check DNS, firewalls, and general connectivity using standard Linux commands and tools.
4. **Security Settings**: Verify RBAC settings, SELinux, and AppArmor configurations.
5. **API Calls**: Enable auditing for the kube-apiserver to review accepted actions.
6. **Node Logs**: Check node logs for errors and ensure sufficient resources.
7. **Inter-Node Networking**: Troubleshoot DNS and firewall settings between nodes.
#### Example Commands
Deploy a busybox container and access its shell:
```bash
$ kubectl create deploy busybox --image=busybox --command -- sleep 3600
$ kubectl exec -ti <busybox_pod> -- /bin/sh
```
### Ephemeral Containers
Kubernetes 1.16 introduced ephemeral containers, allowing you to add a container to a running Pod for debugging purposes. This feature is in alpha and may change. Ephemeral containers are added via the `ephemeralcontainers` handler through an API call, not via the PodSpec.
Example command:
```bash
$ kubectl debug buggypod --image debian --attach
```
### Cluster Start Sequence
For clusters built with `kubeadm`, the startup sequence is managed by `systemd`. The `kubelet.service` uses configuration files to start necessary Pods:
- `/etc/systemd/system/kubelet.service.d/10-kubeadm.conf`
- `/var/lib/kubelet/config.yaml`
- `/etc/kubernetes/manifests/`
The `kubelet` creates Pods for `kube-apiserver`, `etcd`, `kube-controller-manager`, and `kube-scheduler` from YAML files in the `manifests` directory.
### Monitoring
Monitoring involves collecting metrics from both infrastructure and applications. The Metric Server has replaced Heapster for this purpose. Prometheus, part of CNCF, allows scraping resource usage metrics across the cluster. Other CNCF projects like OpenTelemetry (for adding instrumentation to code) and Jaeger (for consuming telemetry) are also useful.
### Using krew
`krew` is a kubectl plugin manager that allows you to extend kubectl with additional functionality. Install krew by following instructions on its GitHub repository. After installation, ensure your `$PATH` includes krew plugins.
Example commands:
```bash
$ export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH"
$ kubectl krew search
$ kubectl krew install tail
$ kubectl plugin list
```
### Sniffing Traffic with Wireshark
Use the `sniff` plugin to monitor network traffic within the cluster. Ensure Wireshark is installed and capable of exporting graphical displays.
Example command:
```bash
$ kubectl krew install sniff
$ kubectl sniff nginx-123456-abcd -c webcont
```
### Logging Tools
Logging in Kubernetes involves collecting container logs locally and aggregating them before ingestion by a search engine like Elasticsearch. The ELK stack is common for this purpose.
Fluentd is a CNCF project that aggregates logs and works well with Prometheus for monitoring and logging. Setting up Fluentd for Kubernetes logging involves deploying agents on each node via a DaemonSet.
### Additional Resources
For more information and resources, refer to the Kubernetes documentation and community channels:
- [Troubleshooting applications](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-application/)
- [Troubleshooting clusters](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-cluster/)
- [Debugging Pods](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-pods/)
- [Debugging Services](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service/)
- [Kubernetes GitHub issues and bug tracking](https://github.com/kubernetes/kubernetes/issues)
- [Kubernetes Slack channel](https://slack.k8s.io/)
Continue: [[13-Custom Resource Definition - CRD]]