Editor’s note: This post was originally written for the Raygun blog. You can check out the original here.
APM is something that some organizations either don’t fully understand or don’t put much thought into until it’s too late. When there’s a problem with an application, the organization scrambles to find a tool that can help solve the problem at hand. They contact the sales teams of various APM vendors to see which ones can help them.
But remember that the sales team’s job is to do their ABCs and “always be closing.” And they do a good job of it because I’ve come across many organizations over the years that have purchased an APM product but don’t have a clear understanding of what it’s doing for them or how it’s doing it.
But sometimes it’s our fault for not asking the right questions about what we need.
Well, I’m here to give you some questions to ask when your organization begins its search for an APM solution.
The importance of the right tool
Sometimes finding the right APM solution for your organization can be challenging. There are so many providers out there. Where do you start? Do you just look at what Gartner’s magic quadrant has to say and pick one or two from there to contact? I’d argue that while Gartner is a good place to start, you shouldn’t base your decision on that. Having the right solution for your organization is too important.
Using the right tool for the job can make all the difference for your organization and, most importantly, your applications’ end users. The best APM solution will let you know when slowness is an actual symptom of a problem that’s likely to occur and will alert you before your end users experience any issues.
Evaluation criteria
1. Solution architecture
The first criterion to consider when evaluating an APM product is whether it’ll be able to collect all the data you need from your applications. Like any software or hardware solution, you want something that has a robust architecture. The APM solution’s architecture must be able to scale appropriately to collect and process all the data in your infrastructure. And it has to do that without slowing down or impacting your applications. You shouldn’t collect less data just because the tool can’t handle more. Also, the architecture design shouldn’t require numerous systems be deployed on your network or cloud infrastructure in order to scale to your organization’s needs.
Here are questions to consider asking:
1) What other components need to be implemented?
2) Do you need a database installed on a separate machine?
3) If it uses agents to collect data, how many agents do you need to cover your infrastructure?
4) To process and view your data, what is the server count?
5) Will all this be on your network or the vendor’s network?
Many APM solutions tend to have collection agents installed on servers or instances, and they send the data back to a collection or analysis server. You want to see how much data gets sent across your infrastructure, especially as things get busy on your production environment. It may look pretty good in your development environment or in a small network lab, but you want to see how it performs in more of a pre-production environment. You want a tool that’ll help solve performance problems, not create new ones.
Make sure to understand an APM tool’s architecture before moving to the next step.
2. Operational overhead
The next criterion to consider is what kind of overhead the APM solution adds as your applications get busier, especially in production. If a vendor demos their APM product for you, you’ll more than likely see the application in its best light. You want to avoid doing a vendor demo on their environment only; do one on your environment, too, to test their product can handle it.
If you’re unable to do an appropriate test, ask the vendor if you can speak with an implementation engineer who’s deployed their APM product on a network that’s as big or complex as yours. Try to get an honest answer about the product’s fit for your needs.
In your testing, you also want to see how much overhead the APM product adds to your environment. When the tool gets deployed, it’ll be sitting right alongside your applications. The last thing you want is for it to be the cause of your applications slowing down. Try to see how much storage, CPU, and memory get eaten up. I recommend doing load tests, if you can, to generate all of the traffic.
3. Ease of implementation
A big positive for an APM solution is how fast you can deploy it and start getting value out of it. Can you deploy the product yourself, or do you need to sign up for some consulting services to help?
What used to be an age-old discussion of agent vs. agentless seems to be decided—most APM products utilize some form of agent-based monitoring. Do you have a preference? Typically, the longer an APM provider has been around, the more likely they are to have both agent and agentless products. Are there any restrictions in your environment for either kind of monitoring? If so, you have your answer.
You also want to find out how easy it is to implement the APM product into your environment.
- Do you simply install an agent on your servers or services?
- Can it be installed automatically as part of a server build process?
- Once it’s installed, do you need to go in and start configuring applications and thresholds?
- Or can the solution simply self-tune and identify applications for you and start highlighting performance issues within hours?
- Will the APM product take forever installing agents or appliances?
You want to find out all these things. You can’t have a product that’ll be hard to implement in your infrastructure.
4. Integrations
I’m going to take a wild guess here: you probably have other tools your organization uses to help manage the IT environment. Ideally, you want your shiny new APM tool to slide right on in without creating yet another place to look for data. So find out what tools the APM product you’re testing integrates with. Then compare these to what tools you already have or are considering implementing. Ask these questions of your APM vendors:
1) Can the APM tool pull data from other existing tools and be able to make some correlations about that data?
2) Does it contain an application program interface (API) that allows other tools to pull data from it?
3) Will you have to manually get the data, or can it be sent to your development team’s existing tools for alerts?
You’ll also want to make sure that when an application moves from development to production, the APM tool carries data from whatever testing was done during continuous delivery all the way to production. You need to have a baseline.
What about when there’s a problem? Will you hear about it from the APM tool, or will your users complain first? You want a tool that’ll integrate with any existing incident management, ChatOps, or other tools so that you can be alerted about any symptoms that could lead to problems. That way, you have a chance at solving something before it becomes a problem.
So make sure that your APM tool is able to integrate with existing tools you use in your IT organization for day-to-day operation.
5. Security
The security of your data is obviously important—more so than ever before. While this is performance data, it still may have a lot of personal information that you can’t afford to have stolen. So you want to find out whether your APM product will keep your organization’s data safe.
Make sure they’re compliant with your industry’s security standard. For instance, if you’re in healthcare, is the solution HIPAA compliant? If you’re in retail, is it PCI (payment card industry) compliant?
Some tools store the data they collect in your environment while others store it in their own environment. You want to ask the same set of questions for both—even more so if the data is taken off-prem.
If the product is agent-based, it’s likely listening on a TCP (transmission control protocol) port for requests to or from a collection server. Can this port be locked down or changed? Will your firewalls need to be opened up to allow this communication? How does the vendor protect against abuse of that communication?
You also want to find out what the retention period is for your data. A long one is good for analysis, but that also leaves your data more vulnerable. Also, what’s stored?
- Is it class and method names within your applications and their associated time-series data or actual user-specific application data?
- What types of privileges are required to make changes to the agent? Can changes be made directly to the agent where it’s installed, or can changes only be made through the APM tool’s user interface?
- Is role-based security available?
Find out where this data is viewable and how access is restricted. Your vendor should have a plan to keep your APM data out of the hands of bad actors who will get in even if it’s there’s just a tiny crack in your security.
6. Metrics and data
The whole point of having an APM solution is to collect the metrics and data that your applications generate. As end users utilize your application as intended, data goes through your application’s tiers, across the infrastructure, and to your users. You don’t know where an issue can occur when you have so many components involved to deliver your application.
Your APM solution needs to collect the transactions, metrics, and data needed to resolve a performance issue, regardless of what component is the cause. And it needs to do so for all your applications. Here are a couple of questions to consider:
1) Does it monitor your application’s specific technologies, protocols, and languages?
2) Does your application have any characteristics you consider unique, and does the APM product support them?
3) What level of granularity can it give you for all these application metrics?
7. Root cause analysis
When there’s an actual performance problem, your organization is losing money. You’ll need to solve it quickly. The goal of an APM product is to help get to the root cause or causes of any problem—and fast. That’s where mean time to recovery (MTTR) comes in. We’re oftentimes trying to find a needle in a haystack when troubleshooting.
- How fast can you find that needle with the APM product?
- Do you have to search for it, or is the product intelligent enough to surface abnormal application or infrastructure events?
- Will it alert you of performance that has deviated from the norm? Or will it only report on issues that you have specified as a threshold?
- If you do have to search, do you type a search function and wait for far too long, looking at the hourglass, while the APM product chugs along like the little engine that could?
Check also that your APM solution follows application transactions across server tiers and you don’t have to look at each tier individually. Having an APM tool that looks for problems across an application’s architecture of servers or instances—and that does it with speed—will help reduce MTTR.
8. Alert behavior
Modern applications are much more complex, and they’re deployed in even more complex environments. You can even have a mix of old and modern environments at your organization: physical, virtual, mainframe, cloud, and mobile. That’s a lot of data an APM tool would need to collect and process.
With all that data, unless you have a tool that alerts you intelligently about problematic transactions or metrics, your users will. And you know that, at that point, it could be too late. That user could be lost to a competitor, or your organization may be paying up for violating a certain service level agreement. So make sure the APM solution brings the most pressing performance issues to your attention via alerts.
Do you have to be the one to configure the APM solution to alert you of every problem, or can it dynamically adapt to what it sees as a typical application or transaction performance behavior?
You need an APM solution that’s capable of helping to reduce the noise and alerting you of real potential problems that you can solve before end users experience them.
9. User interface
It’s all well and good to have an APM tool that can do what the vendor says it can do, but whether or not you can navigate your way through the product is a completely different thing. You want to find out if the UI is intuitive and easy to use. And since each person has their own preference, see if the dashboards and other parts of the UI are customizable. Can development and operations use the same tool but have their own views set to their liking? Having all that data in your APM is great, but if your team can’t easily solve problems because the UI is bad, it’s almost useless.
10. Maintenance
If you’ve been able to identify some APM solutions that fit the above criteria so far, that’s great! But don’t jump to buy or download just yet. As the saying goes, “there’s no such thing as a free lunch.” Even with free products, you have to pay somewhere. That hidden cost is maintenance, and nothing is maintenance-free.
You can’t have an APM solution that’s so maintenance-dependent that it requires one or two people—or a whole team—to manage it. The ease of use and deployment that I mentioned earlier is key here. If you buy a product that’s hard to deploy and use, you’ll likely need more people to maintain it. If the product is complex, you’ll also likely need to train those people.
Ask your potential vendor these questions:
1) How many people will you need to manage their APM solution?
2) When new applications get deployed, does it dynamically detect and start monitoring them, or will you have to manually instrument them?
3) As new code releases occur, what changes do you need to make so the solution tracks releases?
4) Can anything that’s done manually be automated?
5) How does the provider update the product?
6) Does the vendor offer free training, or is everything paid?
7) Does the vendor provide well-documented steps on how to do functions in the product so you don’t need to call support for help?
11. Pricing
APM solutions are priced in a number of different ways. Some are per host or per month, while others charge per microservice or per app. You need to decide which is best for you based on your infrastructure and number of applications.
The most expensive option isn’t always the best. But the cheapest option may lead to disaster. Remember, a “cheap” option could be open-source, but you have to maintain a lot more yourself. If your organization is capable of doing that, you could be in for some late-night troubleshooting sessions.
The key thing to look for is value for your organization. You’re not only looking at the cost to purchase the product, but you’re also looking at your all-in costs. These costs include licensing, support, and people (for maintenance).
Here are some questions to consider:
1) Will you need to hire additional people or bring in consultants to run the APM tool?
2) Do you need to buy new hardware or software for the product?
3) What additional database licenses might you need?
4) Will overhead cause you to upgrade server resources?
5) How does the vendor charge for the product—many APM products charge per server. (Raygun charges per trace). Does that fit with how your organization typically does billing?
APM products that require a lot of money to maintain, especially in FTEs, usually end up being shelfware.
12. Support
Every APM tool at some point will break or not operate effectively. You’ll need to contact the provider to get some assistance on what to do. If it breaks at 3 am, will the vendor be there to support you? Can you talk to someone, or do they do support only via email? Do you get to speak to the developers to help solve problems, or is it a technical support person?
When doing a proof of concept, pay attention to how responsive a provider is during this process. If you have a question about something, can you talk to someone, or are you sent to the documentation? If it’s the latter, it could be a sign of bad things to come. Don’t be so blinded by the product features that you neglect the importance of having decent support. But bear in mind that if the provider has an online documentation system you feel comfortable with, you may be able to get by with less support.
Here are some questions to consider:
1) Is there an online community for the APM product?
2) Is documentation easily accessible?
3) Does the vendor have a YouTube presence, showing how you use the product?
4) Is training available for free, or is it paid?
Conclusion
So there you have it—some key things to look at and questions to ask when evaluating an APM product. With the right APM tool, your infrastructure operations can run fairly smoothly. You’ll have bumps here and there, but if you’ve followed what I’ve written here, you’ll choose a product that best fits your organization and infrastructure.
The right tool helps with minimizing silos found in traditional IT environments. You’ll be collecting data from everything. With the right role-based access, you’ll able to see how your infrastructure is performing in a safer way.
So start defining what you need to have in a tool, go through the criteria I mentioned, research APM products that fit these criteria, and reach out to some vendors to do some proofs of concept.