Prometheus is widely adopted as a standard monitoring tool with Kubernetes because it provides many useful features such as dynamic service discovery, powerful queries, and seamless alert notification integration. There are many applications and client libraries support Prometheus which makes the operation’s life easier. Although things are going pretty well with prometheus, the original prometheus deployment is not able to easily achieve High Availablity and long term storage.
Thanos is developed by improbable which can be integrated with prometheus transparently and solve HA and long term storage issues without hurting performance. The idea of Thanos is to run sidecar component of prometheus, therefore meaning that sidecar components can interact with prometheus to upload or query metrics. Also, prometheus operator supports thanos natively which make us easier to deploy our promtheus cluster along with thanos. This solution seems pretty elegant when you choose prometheus operator to provision prometheus cluster.
This article includes the following contents
- How to deploy the prometheus operator on the kubernetes
- How to deploy the thanos sidecar w/ prometheus.
- Achieve HA: using thanos querier
- Query historical data: thanos store
- Reduce data size: thanos compactor
There are tons of article introducing why we need to adopt prometheus-operator to provision prometheus. I recommend you read the following references if you are not familiar with prometheus-operator.
brew install kubernetes-helm
sudo snap install helm
Note that we are using
coreos/prometheus-operator helm is going to be deprecated. We later need to modify chart value to provision prometheus cluster along with thanos sidecar. To install a stable helm chart with custom value, you need to download
values.yaml from github repo.
In this example, we named our prometheus operator as
prom-op and install it under
$ helm upgrade --install prom-op stable/prometheus-operator --namespace monitoring -f values.yaml
Use the following command to verify if prometheus-operator is provisioning successfully.
kubectl --namespace monitoring get pods -l "release=prom-op"
NEED TO KNOW
prometheus-operator should be greater than 0.28.0 to support Thanos 2.0
Official Architecture of Thanos
Our deployment steps
According to the above picture, there are several components of thanos:
The deployment steps:
- Prometheus should be deployed with thanos
- Deploy Thanos
Querierwhich is able to talks to prometheus
Sidecarthrough gossip protocol.
- Make sure Thanos
Sidecaris able to upload prometheus metrics to the given S3 bucket.
- Establish the Thanos
Storefor retrieving long term storage.
- Set up the
Compactorto shrink historical data.
To install Thanos sidecar along with prometheus-operator, we should specify thanos sidecar in the chart value as following:
objectStorageConfig can be configured through configuration file
Creating the kubernetes secret by applying following command
kubectl -n monitoring create secret generic thanos-objstore-config --from-file=thanos.yaml=/tmp/thanos-config.yaml
endpoint needs to be set in order to specify bucket located in which region.
$ kubectl get po -n monitoring
kubectl describe po/prometheus-prom-op-prometheus-0 -n monitoring
If everything goes well, we could find out there is thanos-sidecar in the prometheus pod
and if you check the log of sidecar, you will see following messages.
kubectl log -f po/prometheus-prom-op-prometheus-0 -n monitoring -c thanos-sidecar
level=info ts=2019-02-01T09:33:15.173007261Z caller=flags.go:90 msg="StoreAPI address that will be propagated through gossip" address=10.11.29.191:10901
Thanos Querier Layer provides the ability to retrieve metrics from all prometheus instances at once. It’s fully compatible with original prometheus PromQL and HTTP APIs so that it can be used along with Grafana.
Since there are too many yaml files, I put everything in my github repo
Thanos Store collaborates with
querier for retrieving historical data from the given bucket. It will join the Thanos cluster on setup.
kubectl apply -f thanos-store.yaml
Thanos Compactor will do downsampling for your all historical data. It’s a really useful component which can reduce file size. Recommend everyone read this well explained article.
kubectl apply -f thanos-compactor.yaml
you will see this kind of message of thanos component
level=error ts=2019-02-01T05:11:40.805153721Z caller=cluster.go:269 component=cluster msg="Refreshing memberlist" err="join peers thanos-peers.monitoring.svc.cluster.local:10900 : 1 error occurred:\n\t* Failed to resolve thanos-peers.monitoring.svc.cluster.local:10900: lookup thanos-peers.monitoring.svc.cluster.local on 172.20.0.10:53: no such host\n\n"
kubectl apply -f thanos-peers-svc.yaml