Google Stackdriver Monitoring, 10/2019
Google Cloud Monitoring, which used to be Google Stackdriver Monitoring, helps to get visibility of your cloud resources. I ran some performance tests with it and reviewed it to show how things went. Here it is….
How Google Cloud Monitoring Monitors
Hey everybody, it’s Jean here with Monitoring Tool Reviews where I personally test each and every performance monitoring product I can get my hands on so that I can take a look at it, review it and help you figure out whether it’s something worth looking further into for your organization. So let’s get into the review.
So today we’re going to be taking a look at Stackdriver Monitoring. Stackdriver Monitoring is a product from Google that helps with cloud-native apps on both Google Cloud Platform and Amazon Web Services. So as we scroll through this page down here, what we see is that it’s basically full-stack monitoring all from Google. It covers both GCP and AWS or hybrid deployments of those two if you have them.
Obviously, you can look at getting information about issues for your apps. And one of the things that I like about it is the various integrations that it has. But first, let’s take a look at the pricing.
So as someone who’s been doing, you know, in the performance monitoring space for about twenty years now, one of the things that I really like about these more cloud-centric performance monitoring tools is the pricing structure. No more, years ago, no more of this, you know, pay upfront for licensing or maintenance fee or something like that before you’ve even had a chance to use it. A lot of the pricing is basically based on pay-per-use.
For Stackdriver Monitoring, for example, if you’re using, if you have, your deployments on Google Cloud Platform, basically, everything is free for you. And if you’re not using GCP or you have your instances, for example, on AWS, you get a 150 meg or less per month and you can see here from the pricing, effectively, twenty-five cents, 150 to a 100,000 meg, basically, up to one hundred gig, you’re only paying about twenty-five cents. Beyond one hundred gig, you’re paying about fifteen cents. And beyond 250 gig, you’re paying six cents.
So it’s all based on pay-per-use, so obviously the more data you have you want to be careful with that but the pricing is very flexible. And I really like that. And it makes for you to kind of ease into using a tool like Stackdriver Monitoring to monitor all of your applications.
So let’s kind of get into it here. So in order to get to Stackdriver, you would need to have an account with GCP at some point if you’re using GCP, obviously. But what I have here is my account with GCP and you can just go through from GCP, so you click on the console here and that’ll take us to the console.
So we get to the console and in your GCP instance, you have your project. And effectively, from your project, you have the ability to go, if you scroll all the way down here, under Stackdriver, so you have Monitoring. So there are other features of Stackdriver itself. We’re not going to get into these other features today. I’ll probably do a review of these other ones on a later date, but today I want to focus on just the monitoring piece.
So let’s click on Monitoring. So when you go to Stackdriver, Stackdriver was a product before it became part of Google. Google bought them a few years ago. So it has its own separate URL domain stackdriver.com. And when you click on it from GCP, you can click on the Monitoring URL and it takes you here. So what I have here is basically a couple of instances here that I’ve run some tests that I use to be able to take a look at this and be able to review the product here and this is the first screen that you see that I land on is just Monitoring Overview. And this effectively is like a quick dashboard of what you have, you know, the instances and any storage that Stackdriver Monitoring had detected. If I had any uptime checks and charts that I had created. A lot of that stuff is there.
By default, it gives you these four that you see here. CPU utilization. Receive bytes. Sent bytes. Disk read bytes. You can change these obviously. You can add more. I’m not sure how they came to selecting these four. I know me personally, one of the things that I would like to see as a quick dashboard when I get into something as I would like to know where the errors are happening, right? CPU utilization, that’s obviously, that’s great. Received bytes, sent bytes. That’s nice, but I would not just want to know what’s been sent or received. But you know, what’s dropping. If there are issues that I’d like to be able to kind of jump right into, it would be nice to be able to see that here.
So I guess you know, that’s something that you can always add, obviously. But I’m curious as to why they added just these four, but anyway, this Monitoring Overview is effectively a dashboard for you that once you come into here, you can kind of get a quick glance of how your applications are doing.
So let’s kind of go through and give you guys a walkthrough here of Stackdriver Monitoring. And the first thing I want to take a look at is the resources. You see Resources dashboard here you can also get here. So the Resources is basically everything that Stackdriver has discovered in terms of metrics that can be monitored in your systems basically. So whether it’s your instances, your applications, whether you’re running, in this case, if it’s GCP, whether you’re running Google App Engine, you have applications on Google App Engine. Or, you have, you know, GCE instances. This here, Infrastructure, this automatically gets discovered for you because especially on GCP, the Stackdriver driver agent already comes as part of it. So once you add Stackdriver to your GCP account, all that stuff, Stackdriver, the agent basically kind of gets triggered and it starts collecting data.
And once you come here, it’ll autodiscover some of the things that you may already have and, as I mentioned earlier, I have two instances, right? So I can just go here and select instances and see, get kind of like a high-level overview of how my instances are performing. CPU usage. Memory is not here. Memory is something that needs to be added. It’s not there by default, but you can kind of get to see some of those.
But if I wanted to add new metrics beyond CPU that I want to create charts for, I go to Metrics Explorer. And, you know, this will come up based on the things that were detected and you can see here the VM instance, GAE, Google App Engine, and many other things. So all the various different types of metrics that can be collected, I can add them and you know create charts for a lot of these things.
And I like being able to, you know, you can kind of quickly go to one-hour, six-hour, one-day, one-month and that kind of thing and just save it as a chart. And once I saved that as a chart, you could basically add it to your dashboard as we’ll see a little bit later on.
The next thing you want to be able to do is once you have all of these charts, right? You always want to be able to create dashboards. You have that ability in Stackdriver Monitoring. So you can basically just go in and create a dashboard here. And you can create your dashboard. Just go to add-to-chart and pretty much kind of gives you a view into resources.
So let’s just say we wanted to create a CPU utilization chart. So we just go here and select CPU utilization. And we have that here. And we just simply add metric. And we can add more if we wanted to, you know, some more metrics if we wanted to.
So, let’s say we’re done with that. We give our dashboard or name. Just call it Utilization Dashboard. And that’s our dashboard. Let’s see if we wanted to take a look and see errors. So we start, you know, once you start typing here, it tries to give you errors, any errors. Whatever words you start typing, you know start seeing in the dropdown here. So let’s see dropped bytes. So let’s add that as a metric and we save that. So okay, so we have that. So we have CPU utilization. We have errors.
So now we have our dashboard. So we can see it’s pretty pretty straightforward. You know, we can have this set up where it’s auto-refreshes. So if you’re sitting there, if you have this on a screen, for example, one of the things I like about this is the ability to, you can do like a full screen. So if you have this on the TV, you can just have it full screen and it kind of shows up like that or you can have an individual chart be shown full screen as well. So if you have this on a TV and you turn this to auto-refresh, you basically just effectively show on the TV for, you know, NetOps team or something like that. Or some Ops team that’s looking at all this stuff, that they can kind of see it.
So you have your dashboard, but in situations where you’re not, you know, either you’re part of the team that you’re not there or you’re not part of the Ops where you’re sitting in front of the computer or you’re sitting in front of a screen to kind of be able to see this, you want to be able, you have other stuff right that you were probably doing so you’re not able to be in front of your screen, so you want to be able to have some something where you can have some alerts for example.
So you also have the ability to create alerts in Stackdriver Monitoring. And you can go here, and let’s say we try to create a policy alert. And we go in and we add a condition. Let’s see, something like dropped packets. So let’s say dropped packets. And Advance.
Any time. So any time the series. So our condition is, if any time it violates, it’s above, let’s say let’s see. You don’t have, we have point one five packets anyway, so if we were to take a look at this and see the past day.
Okay, let’s do point two. 0.2.
If the number of dropped packets are 0.2 over for greater than one minute, then that is our condition and we can alert. You can see here there are a number of different mechanisms. So let’s pick Email notification channel. So I’ll get an email whenever that’s the case.
You can see here there are other notification types like Slack if I have Slack as an integration. Text messaging as an integration. Or even webhook or PagerDuty. I can do that as well. I can go in and add a documentation, you know something that’s a little bit more descriptive. Let’s say this alert rarely happens because, you know, one of the things you do is you want to avoid having alert fatigue, right? So realistically you want something where you’re not constantly being alerted, and you have to go in and take a look. You want something where you’re not being alerted as often and when you do get alerted, you know what it is and it’s something that you need to get to as quickly as possible because it likely means that something bad is really happening. So you could say something like, you know, too many dropped packets on my instances.
Preview. Take a look here.
Okay, and we name this policy Dropped packets. Give it a name.
So now we have this we have this alert. Right? So I like how it kind of shows you here once you’ve done it, it shows you where that threshold line is. Right now, on average, I’m below it now. So anytime it goes above this for a period of a minute, I should get an alert.
That’s one of the things that it’s easy to create these alerts. One of the things that I don’t like about this is I see no ability here to create, all of my alerts are basically static alerts. I see no ability to create dynamic alerts and that’s something that I would like to be able to see for this to be able to figure out that over time, it determines what’s considered normal across my infrastructure and when something is abnormal, it automatically notifies me of that abnormal behavior. So that’s something that I don’t see here that I would certainly like to see.
The other thing is to create Uptime Checks. One of the things that I like about Stackdriver, is that not only is it monitoring your cloud infrastructure, they also have a platform that allows you to effectively run external checks against your infrastructure. So if I wanted to say create a check of how, I have a web service or a web application, and checkup how whether it’s up or not or something like that.
So let’s say I have my website and I wanted to test it. I want to test if it’s always running HTTPS. Let’s do my blog is always up and I don’t want it to check it every fifteen. Let’s do a quick test here to see if it happens. Okay, here we go. Response 200 OK. And then, you know, I can get into a little bit more detail and stuff like that, but let’s just go with that. So every fifteen minutes, this thing will basically want to check against my blog and see if that blog is up. And as long as it gets HTTP 200 OK, then everything should be fine.
As you can see here, I can create an alert if it was something that was very important to me that that blog is always up when that check is done, then I could just do an alert, can create a policy on that and I can get alerted and I guess I can just set up an email or something like that.
So that’s one of the things that I do definitely think is a definite positive for Stackdriver Monitoring that I really like because, at the end of the day, if this will check to show you, check whether your systems, your whole infrastructure, your application is available. I mean, at the end of the day, no one cares that whether it’s you know, it’s 200 or performance is two seconds or five seconds or fifteen milliseconds or fifteen seconds or anything like that, if it’s not available, it doesn’t matter. So this pptime check is definitely something that I like.
And if you have a lot of instances in your infrastructure, as you can see, I only have about two but if you have a whole lot of instances, a whole lot of applications on GAE or even AWS, and bringing all that information in, you can group some of this stuff here. So that’s something that’s helpful as well, that can be pretty useful to help you group all of your different resources that you have that Stackdriver’s monitoring.
So there you are. That’s basically it. You know on a scale of one to ten, I would say that you know, I would give Stackdriver probably something like six or seven. There’s a lot of things that that it does but monitoring itself is one of those things where it’s largely showing you time series data and that’s good and that’s useful so I can see the errors for example here that there are errors but what I would like to be able to see is okay, if there are errors, you know, like see this this jump here, can I click on here or right click or anything in some way and it takes me right to that error. So that’s something that I would like to be able to see but that doesn’t seem to be there now, but overall, I think it’s a pretty good tool to start being able to monitor your systems and it’s definitely something to get started with and start playing around with it.
And I really like the integration with AWS and being able to bring in hybrid cloud information. I don’t believe that they support Azure as of yet, but maybe that’s coming soon. But it definitely, if you’re on GCP and AWS, you know, if you have your systems, your infrastructure, across those two cloud providers, then Stackdriver is definitely something that you can use and could be used for so. So that’s it. That was my walkthrough Stackdriver and see you next time.