In the previous post, I talked about how having an application that was originally designed for the LAN is one reason why small and medium businesses need application testing. In you haven’t read it, go here.
In this post, I talk about reason #2.
So not only was this application also designed for users being local to the server, it was also deployed in a complex architecture of multiple tiers.
Users accessing the web application were going to a web server that was in turn talking to a database server to get and provide application data. Sounds simple enough right?
You wish!
User traffic goes through a Multiprotocol Label Switching (MPLS) network to a data center where the web and database servers are located. The data center contains private line connections to the SaaS provider’s internal network to talk to a virtual IP on a load balancer. To separate both networks, the customer’s egress router is running network address translation (NAT), and to save money, one IP address one being used.
The load balancer manages traffic to various pools of web servers and forwards user requests and responses to the most available web server.
The customer’s MPLS service provider recommended a bandwidth upgrade to fix the issue. There was already a concern that data was being dropped on the service provider’s network. So the customer wasn’t buying it…literally.
Unless your understand the application and how it works, you cannot make a blanket recommendation to upgrade the bandwidth. Bandwidth upgrades alone are often times not the answer.
So there a number of variables here. User traffic goes through many network components that could be the problem, including the customer’s network, the MPLS provider’s network, and multiple tiers in the application flow.
Is It Always the Network?
The SaaS provider was being blamed of having issues on their internal network that were causing these errors for the customer’s users.
How typical! The network is always the first to get the blame. Poor network. Sometimes it is warranted, but many times, it’s not the network!
And it wasn’t the provider’s network. They had many other customers accessing their application on the same network without these issues that the customer was encountering.
The customer was understandably getting fed up. This issue has gone on for months, and the SaaS provider was not able to get it resolved.
Their problem was that, not only was it not their network’s fault, they were unable to identify the culprit. They needed some expertise to help them find the culprit. And that’s where I was able to help.
Since the issues occurred intermittently, I knew I needed to capture the data continuously, until users report the problem.
Captures, Captures Everywhere!
So capture agents were installed at various tiers and locations to capture data to account for any possible culprits. I had agents everywhere.
- On user computers between the users and the data center
- Between the data center and the provider’s firewall
- Between the provider’s load balancer and a switch to the pool of web servers
- On web and database servers on the provider’s network
They were everywhere because when you have a customer trying to solve a problem that has gone unresolved for months, they need help to figure out the problem ASAP.
With the tool used, I had the ability to capture at all these points at one time and pull the data when needed.
With the multiple tiers, this is what was required to capture the data to help find the culprit as quickly as possible.
In part 3, I talk about the intermittent nature of these application issues.