1、安装要求(提前确认)
在开始之前,部署Kubernetes集群机器需要满足以下几个条件:
- 三台机器,操作系统 CentOS7.5+(mini)
- 硬件配置:2GBRAM,2个CPU,硬盘30GB
2、安装步骤
角色 | IP |
---|---|
master |
192.168.50.128 |
node1 |
192.168.50.131 |
node2 |
192.168.50.132 |
2.1、安装前预处理操作
注意本小节这7个步骤中,在所有的节点(master和node节点)都要操作。
(1)关闭防火墙、selinux
~]# systemctl disable --now firewalld
~]# setenforce 0
~]# sed -i 's/enforcing/disabled/' /etc/selinux/config
(3)关闭swap
分区
~]# swapoff -a
~]# sed -i.bak 's/^.*centos-swap/#&/g' /etc/fstab
上面的是临时关闭,当然也可以永久关闭,即在/etc/fstab
文件中将swap
挂载所在的行注释掉即可。
(4)设置主机名
master主节点设置如下
~]# hostnamectl set-hostname master
node1从节点设置如下
~]# hostnamectl set-hostname node1
node2从节点设置如下
~]# hostnamectl set-hostname node2
执行bash
命令以加载新设置的主机名
(5)添加hosts
解析
~]# cat >>/etc/hosts <<EOF
192.168.50.128 master
192.168.50.131 node1
192.168.50.132 node2
EOF
(6)打开ipv6
流量转发。
~]# cat > /etc/sysctl.d/k8s.conf << EOF
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
~]# sysctl --system #立即生效
(7)配置yum
源
所有的节点均采用阿里云官网的base
和epel
源
~]# mv /etc/yum.repos.d/* /tmp
~]# curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
~]# curl -o /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
(8)时区与时间同步
~]# ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
~]# yum install dnf ntpdate -y
~]# dnf makecache
~]# ntpdate ntp.aliyun.com
2.2、安装docker
(1)添加docker
软件yum
源
~]# curl -o /etc/yum.repos.d/docker-ce.repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
~]# cat /etc/yum.repos.d/docker-ce.repo
[docker-ce-stable]
name=Docker CE Stable - $basearch
baseurl=https://mirrors.aliyun.com/docker-ce/linux/centos/7/$basearch/stable
enabled=1
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/docker-ce/linux/centos/gpg
.......
(2)安装docker-ce
列出所有可以安装的版本
~]# dnf list docker-ce --showduplicates
docker-ce.x86_64 3:18.09.6-3.el7 docker-ce-stable
docker-ce.x86_64 3:18.09.7-3.el7 docker-ce-stable
docker-ce.x86_64 3:18.09.8-3.el7 docker-ce-stable
docker-ce.x86_64 3:18.09.9-3.el7 docker-ce-stable
docker-ce.x86_64 3:19.03.0-3.el7 docker-ce-stable
docker-ce.x86_64 3:19.03.1-3.el7 docker-ce-stable
docker-ce.x86_64 3:19.03.2-3.el7 docker-ce-stable
docker-ce.x86_64 3:19.03.3-3.el7 docker-ce-stable
docker-ce.x86_64 3:19.03.4-3.el7 docker-ce-stable
docker-ce.x86_64 3:19.03.5-3.el7 docker-ce-stable
.....
这里我们安装最新版本的docker
,所有的节点都需要安装docker
服务
~]# dnf install -y docker-ce docker-ce-cli
(3)启动docker
并设置开机自启动
~]# systemctl enable --now docker
查看版本号,检测docker是否安装成功
~]# docker --version
Docker version 19.03.12, build 48a66213fea
上面的这种查看docker client
的版本的。建议使用下面这种方法查看docker-ce
版本号,这种方法把docker的client端和server端的版本号查看的一清二楚。
~]# docker version
Client:
Version: 19.03.12
API version: 1.40
Go version: go1.13.10
Git commit: 039a7df9ba
Built: Wed Sep 4 16:51:21 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.12
API version: 1.40 (minimum version 1.12)
Go version: go1.13.10
Git commit: 039a7df
Built: Wed Sep 4 16:22:32 2019
OS/Arch: linux/amd64
Experimental: false
(4)更换docker
的镜像仓库源
国内镜像仓库源有很多,比如阿里云,清华源,中国科技大,docker官方中国源等等。
~]# cat > /etc/docker/daemon.json << EOF
{
"registry-mirrors": ["https://f1bhsuge.mirror.aliyuncs.com"]
}
EOF
由于加载docker仓库源
,所以需要重启docker
~]# systemctl restart docker
2.3、安装kubernetes
服务
(1)添加kubernetes
软件yum
源
方法:浏览器打开mirrors.aliyun.com
网站,找到kubernetes
,即可看到镜像仓库源
~]# cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
(2)安装kubeadm、kubelet
和kubectl
组件
所有的节点都需要安装这几个组件。
~]# dnf list kubeadm --showduplicates
kubeadm.x86_64 1.17.7-0 kubernetes
kubeadm.x86_64 1.17.7-1 kubernetes
kubeadm.x86_64 1.17.8-0 kubernetes
kubeadm.x86_64 1.17.9-0 kubernetes
kubeadm.x86_64 1.18.0-0 kubernetes
kubeadm.x86_64 1.18.1-0 kubernetes
kubeadm.x86_64 1.18.2-0 kubernetes
kubeadm.x86_64 1.18.3-0 kubernetes
kubeadm.x86_64 1.18.4-0 kubernetes
kubeadm.x86_64 1.18.4-1 kubernetes
kubeadm.x86_64 1.18.5-0 kubernetes
kubeadm.x86_64 1.18.6-0 kubernetes
由于kubernetes版本变更非常快,因此这里先列出了有哪些版本,我们安装1.18.6
版本。所有节点都安装。
~]# dnf install -y kubelet-1.18.6 kubeadm-1.18.6 kubectl-1.18.6
(3)设置开机自启动
我们先设置开机自启,但是
kubelete
服务暂时先不启动。
~]# systemctl enable kubelet
2.4、部署Kubeadm Master
节点
(1)生成预处理文件
在master
节点执行如下指令,可能出现WARNING
警告,但是不影响部署:
~]# kubeadm config print init-defaults > kubeadm-init.yaml
这个文件kubeadm-init.yaml
,是我们初始化使用的文件,里面大概修改这几项参数。
[root@master1 ~]# cat kubeadm-init.yaml
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.50.128
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
name: master1
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers #阿里云的镜像站点
kind: ClusterConfiguration
kubernetesVersion: v1.18.3 #kubernetes版本号
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12 #选择默认即可,当然也可以自定义CIDR
podSubnet: 10.244.0.0/16 #添加pod网段
scheduler: {}
(2)提前拉取镜像
如果直接采用kubeadm init
来初始化,中间会有系统自动拉取镜像的这一步骤,这是比较慢的,我建议分开来做,所以这里就先提前拉取镜像。在master
节点操作如下指令:
[root@master ~]# kubeadm config images pull --config kubeadm-init.yaml
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-apiserver:v1.18.0
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-controller-manager:v1.18.0
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-scheduler:v1.18.0
[config/images] Pulled registry.aliyuncs.com/google_containers/kube-proxy:v1.18.0
[config/images] Pulled registry.aliyuncs.com/google_containers/pause:3.1
[config/images] Pulled registry.aliyuncs.com/google_containers/etcd:3.4.3-0
[config/images] Pulled registry.aliyuncs.com/google_containers/coredns:1.6.5
如果大家看到开头的两行warning
信息(我这里没有打印),不必担心,这只是警告,不影响我们完成实验。
既然镜像已经拉取成功了,那我们就可以直接开始初始化了。
(3)初始化kubenetes
的master
节点
执行如下命令:
[root@master ~]# kubeadm init --config kubeadm-init.yaml
[init] Using Kubernetes version: v1.18.3
[preflight] Running pre-flight checks
[WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service'
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.50.128]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [master localhost] and IPs [192.168.50.128 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [master localhost] and IPs [192.168.50.128 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
W0629 21:47:51.709568 39444 manifests.go:214] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[control-plane] Creating static Pod manifest for "kube-scheduler"
W0629 21:47:51.711376 39444 manifests.go:214] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 14.003225 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.17" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node master as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node master as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: abcdef.0123456789abcdef
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.50.128:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:05b84c41152f72ca33afe39a7ef7fa359eec3d3ed654c2692b665e2c4810af3e
这个过程大概15s的时间就做完了,之所以初始化的这么快就是因为我们提前拉取了镜像。
像我上面这样的没有报错信息,并且显示最后的kubeadm join 192.168.50.128:6443 --token abcdef.0123456789abcdef
这些,说明我们的master是初始化成功的。
当然我们还需要按照最后的提示在使用kubernetes集群之前还需要再做一下收尾工作,注意是在master节点上执行的。
[root@master ~]# mkdir -p $HOME/.kube
[root@master ~]# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
[root@master ~]# sudo chown $(id -u):$(id -g) $HOME/.kube/config
好了,此时的master节点就算是初始化完毕了。有个重要的点就是最后一行信息,这是node节点加入kubernetes集群的认证命令。这个密钥是系统根据sha256
算法计算出来的,必须持有这样的密钥才可以加入当前的kubernetes集群。
如果此时查看当前集群的节点,会发现只有master
节点自己。
[root@master ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
master NotReady master 2m53s v1.18.6
接下来我们把node节点加入到kubernetes集群中
2.5、node
节点加入kubernetes
集群中
先把加入集群的命令明确一下,此命令是master节点初始化成功之后给出的命令。
注意,你的初始化之后与我的密钥指令肯定是不一样的,因此要用自己的命令才行,我这边是为了给大家演示才贴出来的。
~]# kubeadm join 192.168.50.128:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:05b84c41152f72ca33afe39a7ef7fa359eec3d3ed654c2692b665e2c4810af3e
(1)node1
节点加入集群
[root@node1 ~]# kubeadm join 192.168.50.128:6443 --token abcdef.0123456789abcdef --discovery-token-ca-cert-hash sha256:05b84c41152f72ca33afe39a7ef7fa359eec3d3ed654c2692b665e2c4810af3e
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.17" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster
当看到This node has joined the cluster
,这一行信息表示node节点加入集群成功,
(2)node2
节点加入集群
node2
节点也是使用同样的方法来执行。所有的节点加入集群之后,此时我们可以在master节点执行如下命令查看此集群的现有节点。
[root@master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master NotReady master 2m53s v1.18.6
node1 NotReady 73s v1.18.6
node2 NotReady 7s v1.18.6
可以看到集群的三个节点都已经存在,但是现在还不能用,也就是说集群节点是不可用的,原因在于上面的第2个字段,我们看到三个节点都是NotReady
状态,这是因为我们还没有安装网络插件,这里我们选择使用flannel插件。
2.6、安装Flannel
网络插件
Flannel是 CoreOS 团队针对 Kubernetes 设计的一个覆盖网络(Overlay Network)工具,其目的在于帮助每一个使用 Kuberentes 的 CoreOS 主机拥有一个完整的子网。这次的分享内容将从Flannel的介绍、工作原理及安装和配置三方面来介绍这个工具的使用方法。
Flannel通过给每台宿主机分配一个子网的方式为容器提供虚拟网络,它基于Linux TUN/TAP
,使用UDP封装IP包来创建overlay网络,并借助etcd维护网络的分配情况
(1)默认方法
默认大家从网上的教程都会使用这个命令来初始化。
~]# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
事实上很多用户都不能成功,因为国内网络受限,所以可以这样子来做。
(2)更换flannel
镜像源
修改本地的hosts
文件添加如下内容以便解析才能下载该文件
199.232.28.133 raw.githubusercontent.com
然后下载flannel文件
[root@master ~]# curl -o kube-flannel.yml https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
编辑镜像源,默认的镜像地址我们修改一下。把yaml文件中所有的quay.io 修改为quay-mirror.qiniu.com
[root@master ~]# sed -i 's/quay.io/quay-mirror.qiniu.com/g' kube-flannel.yml
此时保存保存退出。在master节点执行此命令。
[root@master ~]# kubectl apply -f kube-flannel.yml
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds-amd64 created
daemonset.apps/kube-flannel-ds-arm64 created
daemonset.apps/kube-flannel-ds-arm created
daemonset.apps/kube-flannel-ds-ppc64le created
daemonset.apps/kube-flannel-ds-s390x created
这样子就可以成功拉取flannel镜像了。当然你也可以使用我提供给大家的kube-flannel.yml
文件。
- 查看
kube-flannel
的pod是否运行正常
[root@master ~]# kubectl get pod -n kube-system | grep kube-flannel
kube-flannel-ds-amd64-8svs6 1/1 Running 0 44s
kube-flannel-ds-amd64-k5k4k 0/1 Running 0 44s
kube-flannel-ds-amd64-mwbwp 0/1 Running 0 44s
(3)无法拉取镜像解决方法
像上面查看kube-flannel
的pod时发现不是Running
,这就表示该pod有问题,我们需要进一步分析。
执行kubectl describe pod xxxx
如果有以下报错:
Normal BackOff 24m (x6 over 26m) kubelet, master3 Back-off pulling image "quay-mirror.qiniu.com/coreos/flannel:v0.12.0-amd64"
Warning Failed 11m (x64 over 26m) kubelet, master3 Error: ImagePullBackOff
或者是
Error response from daemon: Get https://quay.io/v2/: net/http: TLS handshake timeout
上面的这些都表示是网络问题不能拉取镜像,我这里给大家提前准备了flannel的镜像。导入一下就可以了。
[root@master ~]# docker load -i flannel.tar
2.7、验证节点是否可用
稍等片刻,执行如下指令查看节点是否可用
[root@master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready master 82m v1.17.6
node1 Ready 60m v1.17.6
node2 Ready 55m v1.17.6
目前节点状态是Ready
,表示集群节点现在是可用的。
3、测试kubernetes
集群
3.1、kubernetes
集群测试
(1)创建一个nginx
的pod
现在我们在kubernetes集群中创建一个nginx的pod,验证是否能正常运行。
在master节点执行一下步骤:
[root@master ~]# kubectl create deployment nginx --image=nginx
deployment.apps/nginx created
[root@master ~]# kubectl expose deployment nginx --port=80 --type=NodePort
service/nginx exposed
现在我们查看pod和service
[root@master ~]# kubectl get pod,svc -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/nginx-86c57db685-kk755 1/1 Running 0 29m 10.244.1.10 node1
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.96.0.1 443/TCP 24h
service/nginx NodePort 10.96.5.205 80:32627/TCP 29m app=nginx
打印的结果中,前半部分是pod相关信息,后半部分是service相关信息。我们看service/nginx
这一行可以看出service暴漏给集群的端口是32627
。记住这个端口。
然后从pod的详细信息可以看出此时pod在node1节点之上。node1节点的IP地址是192.168.50.129
(2)访问nginx
验证集群
那现在我们访问一下。打开浏览器(建议火狐浏览器),访问地址就是:http://192.168.50.129:32627
3.2、安装dashboard
(1)创建dashboard
先把dashboard的配置文件下载下来。由于我们之前已经添加了hosts
解析,因此可以下载。
~]# curl -o recommended.yaml https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0-beta8/aio/deploy/recommended.yaml
默认Dashboard只能集群内部访问,修改Service
为NodePort
类型,暴露到外部:
大概在此文件的32-44
行之间,修改为如下:
kind: Service
apiVersion: v1
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kube-system
spec:
type: NodePort #加上此行
ports:
- port: 443
targetPort: 8443
nodePort: 30001 #加上此行,端口30001可以自行定义
selector:
k8s-app: kubernetes-dashboard
- 运行此
yaml
文件
[root@master ~]# kubectl apply -f recommended.yaml
namespace/kubernetes-dashboard created
serviceaccount/kubernetes-dashboard created
service/kubernetes-dashboard created
secret/kubernetes-dashboard-certs created
...
service/dashboard-metrics-scraper created
deployment.apps/dashboard-metrics-scraper created
- 查看
dashboard
运行是否正常
[root@master ~]# kubectl get pod,svc -n kubernetes-dashboard -o wide
NAME READY STATUS RESTARTS AGE IP NODE
pod/dashboard-metrics-scraper-76585494d8-vd9w6 1/1 Running 0 4h50m 10.244.2.3 node2
pod/kubernetes-dashboard-594b99b6f4-72zxw 1/1 Running 0 4h50m 10.244.2.2 node2
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/dashboard-metrics-scraper ClusterIP 10.96.45.110 8000/TCP 4h50m k8s-app=dashboard-metrics-scraper
service/kubernetes-dashboard NodePort 10.96.217.29 443:30001/TCP 4h50m k8s-app=kubernetes-dashboard
从上面可以看出,kubernetes-dashboard-594b99b6f4-72zxw
运行所在的节点是node2
上面,并且暴漏出来的端口是30001
,所以访问地址是: https://192.168.50.130:30001
- 浏览器访问
访问的时候会让输入token
,从此处可以查看到token的值。
~]# kubectl describe secrets -n kube-system $(kubectl -n kube-system get secret | awk '/dashboard-admin/{print $1}')
把上面的token
值输入进去即可进去dashboard界面。
不过现在我们虽然可以登陆上去,但是我们权限不够还查看不了集群信息,因为我们还没有绑定集群角色,同学们可以先按照上面的尝试一下,再来做下面的步骤
(2)cluster-admin
管理员角色绑定
~]# kubectl create serviceaccount dashboard-admin -n kube-system
~]# kubectl create clusterrolebinding dashboard-admin --clusterrole=cluster-admin --serviceaccount=kube-system:dashboard-admin
~]# kubectl describe secrets -n kube-system $(kubectl -n kube-system get secret | awk '/dashboard-admin/{print $1}')
再使用输出的token登陆dashboard即可。
4、集群报错总结
(1)拉取镜像报错没有找到
[root@master ~]# kubeadm config images pull --config kubeadm-init.yaml
W0801 11:00:00.705044 2780 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
failed to pull image "registry.aliyuncs.com/google_containers/kube-apiserver:v1.18.4": output: Error response from daemon: manifest for registry.aliyuncs.com/google_containers/kube-apiserver:v1.18.4 not found: manifest unknown: manifest unknown
, error: exit status 1
To see the stack trace of this error execute with --v=5 or higher
选择拉取的kubernetes镜像版本过高,因此需要降低一些,修改kubeadm-init.yaml
中的kubernetesVersion
即可。
(2)docker
存储驱动报错
在安装kubernetes
的过程中,经常会遇见如下错误
failed to create kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"
原因是docker
的Cgroup Driver
和kubelet
的Cgroup Driver
不一致。
1、修改docker的Cgroup Driver
修改/etc/docker/daemon.json
文件
{
"exec-opts": ["native.cgroupdriver=systemd"]
}
重启docker即可
systemctl daemon-reload
systemctl restart docker
(3)node
节点报localhost:8080
拒绝错误
node
节点执行kubectl get pod
报错如下:
[root@node1 ~]# kubectl get pod
The connection to the server localhost:8080 was refused - did you specify the right host or port?
出现这个问题的原因是kubectl命令需要使用kubernetes-admin
密钥来运行
解决方法:
在master
节点上将/etc/kubernetes/admin.conf
文件远程复制到node节点的/etc/kubernetes
目录下,然后在node
节点配置一下环境变量
[root@node1 images]# echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile
[root@node1 images]# source ~/.bash_profile
node
节点再次执行kubectl get pod
:
[root@node1 ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx-f89759699-z4fc2 1/1 Running 0 20m
(4)node
节点加入集群身份验证报错
[root@node1 ~]# kubeadm join 192.168.50.128:6443 --token abcdef.0123456789abcdef \
> --discovery-token-ca-cert-hash sha256:05b84c41152f72ca33afe39a7ef7fa359eec3d3ed654c2692b665e2c4810af3e
W0801 11:06:05.871557 2864 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
[preflight] Running pre-flight checks
error execution phase preflight: couldn't validate the identity of the API Server: cluster CA found in cluster-info ConfigMap is invalid: none of the public keys "sha256:a74a8f5a2690aa46bd2cd08af22276c08a0ed9489b100c0feb0409e1f61dc6d0" are pinned
To see the stack trace of this error execute with --v=5 or higher
密钥复制的不对,重新把master初始化之后的加入集群指令复制一下,
(5)初始化master节点时,swap未关闭
[ERROR Swap]:running with swap on is not supported please diable swap
关闭swap分区即可。
swapoff -a
sed -i.bak 's/^.*centos-swap/#&/g' /etc/fstab
(6)执行kubectl get cs
显示组件处于非健康状态
[root@master ~]# kubectl get cs
NAME STATUS MESSAGE ERROR
scheduler Unhealthy Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused
controller-manager Unhealthy Get http://127.0.0.1:10252/healthz: dial tcp 127.0.0.1:10252: connect: connection refused
etcd-0 Healthy {"health":"true"}
修改scheduler
和controller-manager
两个组件的配置文件,分别将--port=0
去掉。配置文件的路径是/etc/kubernetes/manifests/
,下面有kube-controller-manager.yaml
和kube-scheduler.yaml
两个配置文件。
修改好之后保存一下即可,不需要手动重启服务。等个半分钟集群自动就恢复正常,再次执行kubectl get cs
命令就可以看到组件是正常的了。
(7)dashboard报错:Get [https://10.96.0.1:443/version](https://10.96.0.1/version): dial tcp 10.96.0.1:443: i/o timeout
出现这个问题实际上还是集群网络存在问题,但是如果你查看节点或者flannel的pod等等是正常的,所以还是排查不出来问题的。最快的解决方法让dashboard调度到master节点上就可以了。
修改dashboard的配置文件,将下面几行注释掉(大约在232-234行)
nodeSelector:
"beta.kubernetes.io/os": linux
# Comment the following tolerations if Dashboard must not be deployed on master
# tolerations:
# - key: node-role.kubernetes.io/master
# effect: NoSchedule
也就是将上面的最后三行注释掉。
接着是再增加选中的节点
template:
metadata:
labels:
k8s-app: kubernetes-dashboard
spec:
nodeName: master
containers:
- name: kubernetes-dashboard
image: kubernetesui/dashboard:v2.0.0-beta8
imagePullPolicy: Always
ports:
大约在第190行,增加一行信息nodeName: master
保存好之后重新执行kubectl apply
命令申请加入集群即可。
如果想自己继续研究的话,多看看是不是flannel的网段定义的问题。
5、参考
个人参考的一些博客,在此记录一下:https://www.cnblogs.com/FengGeBlog/p/10810632.html
本文链接:http://www.yunweipai.com/38516.html
网友评论comments