Preface
Prometheus is widely adopted as a standard monitoring tool with Kubernetes because it provides many useful features such as dynamic service discovery, powerful queries, and seamless alert notification integration. There are many applications and client libraries support Prometheus which makes the operation’s life easier. Although things are going pretty well with prometheus, the original prometheus deployment is not able to easily achieve High Availablity and long term storage.
Thanos comes to the rescue
Thanos is developed by improbable which can be integrated with prometheus transparently and solve HA and long term storage issues without hurting performance. The idea of Thanos is to run sidecar component of prometheus, therefore meaning that sidecar components can interact with prometheus to upload or query metrics. Also, prometheus operator supports thanos natively which make us easier to deploy our promtheus cluster along with thanos. This solution seems pretty elegant when you choose prometheus operator to provision prometheus cluster.
This article includes the following contents
- How to deploy the prometheus operator on the kubernetes
- How to deploy the thanos sidecar w/ prometheus.
- Achieve HA: using thanos querier
- Query historical data: thanos store
- Reduce data size: thanos compactor
Install Prometheus through Prometheus operator
There are tons of article introducing why we need to adopt prometheus-operator to provision prometheus. I recommend you read the following references[2] if you are not familiar with prometheus-operator.
1. Install Helm in your environment
- MacOS:
brew install kubernetes-helm
- Linux:
sudo snap install helm
2. Initialize helm and install tiller
1 | $ helm init |
3. Install coreos prometheus operator
Note that we are using stable/prometheus-operator
because coreos/prometheus-operator
helm is going to be deprecated. We later need to modify chart value to provision prometheus cluster along with thanos sidecar. To install a stable helm chart with custom value, you need to download values.yaml
from github repo.
In this example, we named our prometheus operator as prom-op
and install it under monitoring
namespace.
1 | $ helm upgrade --install prom-op stable/prometheus-operator --namespace monitoring -f values.yaml |
Use the following command to verify if prometheus-operator is provisioning successfully.
1 | kubectl --namespace monitoring get pods -l "release=prom-op" |
Thanos Deployment
NEED TO KNOW
prometheus-operator should be greater than 0.28.0 to support Thanos 2.0
Thanos Architecture
Official Architecture of Thanos
Our deployment steps
According to the above picture, there are several components of thanos:
- Sidecar
- Querier
- Store
- Compactor
The deployment steps:
- Prometheus should be deployed with thanos
Sidecar
. - Deploy Thanos
Querier
which is able to talks to prometheusSidecar
through gossip protocol. - Make sure Thanos
Sidecar
is able to upload prometheus metrics to the given S3 bucket. - Establish the Thanos
Store
for retrieving long term storage. - Set up the
Compactor
to shrink historical data.
Install Thanos sidecar
To install Thanos sidecar along with prometheus-operator, we should specify thanos sidecar in the chart value as following:
1 | thanos: |
objectStorageConfig
can be configured through configuration file thanos.yaml
1 | type: s3 |
Creating the kubernetes secret by applying following command
1 | kubectl -n monitoring create secret generic thanos-objstore-config --from-file=thanos.yaml=/tmp/thanos-config.yaml |
Warn: endpoint
needs to be set in order to specify bucket located in which region.
Verify Thanos Sidecar
1 | $ kubectl get po -n monitoring |
1 | kubectl describe po/prometheus-prom-op-prometheus-0 -n monitoring |
If everything goes well, we could find out there is thanos-sidecar in the prometheus pod
1 | thanos-sidecar: |
and if you check the log of sidecar, you will see following messages.1
kubectl log -f po/prometheus-prom-op-prometheus-0 -n monitoring -c thanos-sidecar
1 | level=info ts=2019-02-01T09:33:15.173007261Z caller=flags.go:90 msg="StoreAPI address that will be propagated through gossip" address=10.11.29.191:10901 |
Install Thanos Querier
Thanos Querier Layer provides the ability to retrieve metrics from all prometheus instances at once. It’s fully compatible with original prometheus PromQL and HTTP APIs so that it can be used along with Grafana.
Since there are too many yaml files, I put everything in my github repo
1 | $ cd thanos |
Install Thanos Store
Thanos Store collaborates with querier
for retrieving historical data from the given bucket. It will join the Thanos cluster on setup.
1 | $ kubectl apply -f thanos-store.yaml |
Install Thanos Compactor
Thanos Compactor will do downsampling for your all historical data. It’s a really useful component which can reduce file size. Recommend everyone read this well explained article.
1 | $ kubectl apply -f thanos-compactor.yaml |
Troubleshooting
Peering service didn’t set up properly
you will see this kind of message of thanos component1
level=error ts=2019-02-01T05:11:40.805153721Z caller=cluster.go:269 component=cluster msg="Refreshing memberlist" err="join peers thanos-peers.monitoring.svc.cluster.local:10900 : 1 error occurred:\n\t* Failed to resolve thanos-peers.monitoring.svc.cluster.local:10900: lookup thanos-peers.monitoring.svc.cluster.local on 172.20.0.10:53: no such host\n\n"
1 | $ kubectl apply -f thanos-peers-svc.yaml |
References
- https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/getting-started.md
- https://sysdig.com/blog/kubernetes-monitoring-prometheus-operator-part3/
- https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/alerting.md
- Thanos__Transforming_Prometheus_to_a_Global_Scale_in_a_Seven_Simple_Steps(FOSDEM).pdf.pdf)