Note: This post was originally written for the Scalyr blog. You can check out the original here.
The amount of event data to collect has seen a dramatic increase in the last few years. It continues to grow as more companies move to microservices, containers, and the modern infrastructure stack. For many, Elasticsearch has been the solution to help.
With more data comes some common scaling problems, so you may consider solutions that are Elasticsearch alternatives.
Choosing the wrong alternative can be risky. So in this post, you’re going to learn about five Elasticsearch alternatives you should consider. You’ll learn about some of their benefits and drawbacks, and also how they’re priced.
Log management solutions for event data are numerous. But which options are good for your organization? In this section, let’s discuss five options you should consider, ranging from open-source to commercially available.
Splunk can be considered the 800-pound gorilla when it comes to event data logging solutions. It’s a common alternative for someone considering leaving Elasticsearch. With Splunk, you’re able to aggregate and analyze data coming from your IT systems. Splunk provides a collection agent called a universal forwarder, which collects the event data and sends it to a Splunk deployment.
Like Elasticsearch, Splunk indexes data it receives from a forwarder with what’s called an indexer. The indexer parses data for searching and analysis. Unlike Elasticsearch, Splunk doesn’t have roots in open-source. It’s a fully commercial product, but it, too, offers both self-hosted and cloud solutions.
One Splunk benefit is that it’s more tightly integrated than Elasticsearch. You don’t need to install another component like Kibana or Grafana to visualize your data in Splunk. A Splunk deployment includes what’s called a search head, which you use to search Indexers for visualization, dashboarding, and alerting.
One of the biggest Splunk drawbacks is its price. Splunk has historically charged per ingested byte, so as your event data ingestion needs grow, so does the price. For this reason, others have considered alternatives to Splunk.
A second drawback is, like Elasticsearch, Splunk indexes ingested data. This takes time and resources. To scale, you’ll need more resources or a bigger Splunk deployment, or you should expect slower performance.
You can start with a Splunk Free plan to get going. Test drive Splunk ingesting less than 500 MB per day with this plan. The self-hosted offering, Splunk Enterprise, is available free for 60 days. The Splunk Cloud offering has a free 15-day trial.
Presto is an SQL query engine that can run commands against various types of data sources. It is distributed and supports collecting event data from systems wherever they are in your infrastructure.
Presto features include having an extensible architecture. A deployment will contain a coordinator that users connect to create and run queries. The coordinator can talk to multiple systems, called workers, that execute your queries against the data sources through connectors.
One Presto benefit is that one of the connectors is an Elasticsearch connector. This allows you to run queries against existing Elasticsearch clusters to view the data.
Another benefit is that Presto has no collect agents to install. The commands pull data. So Presto is more blackbox monitoring while Elasticsearch is more whitebox monitoring. Those are two very different ways of operating that could make a difference in your infrastructure.
One drawback with Presto is that your Elasticsearch queries likely need to be recreated. If you’re starting fresh or have a small Elasticsearch deployment, it may not be an issue. But if you have many Kibana queries already, for example, do you want to recreate them? Maybe. You’d have to decide whether it’s worth it for your organization.
Presto is open-source, so the direct monetary cost is lower than Elasticsearch. However, open-source doesn’t mean free, so you’ll pay in time for maintenance, support, and of course the direct cost of purchasing infrastructure resources.
3. Sumo Logic
Sumo Logic is a cloud-based solution that can collect and analyze event data from your infrastructure. Similar to Splunk, it’s both Elasticsearch and Kibana in one. But even better than that, there’s no install since it’s cloud-native.
To get going, you install the Sumo Logic collectors on your systems and start shipping your data over to them. Using its analytics services, you’re able to query and visualize data across all of your applications and systems.
One advantage of Sumo Logic is that you only need to install one piece of software on your end systems to collect data. They have what’s called installed collectors that you install on your self-hosted machines, and hosted collectors used to collect cloud-hosted data. What type of data these collectors send to Sumo Logic is configurable in the UI. No more installing and configuring Logstash or a bunch of Beats to get system and metrics data.
One big disadvantage of Sumo Logic is the cost. Because it charges on a per-ingested-byte basis, you’ll have to constantly watch how much data you’re collecting. A second drawback is that, like Elasticsearch, Sumo Logic indexes data. So ingest speed and performance will degrade as you collect more event data.
Sumo Logic has a free plan that includes 500 MB/day of ingested data. Once you go beyond that, you’re looking at almost $100/GB. However, their tiered pricing limits your features.
Humio is a log aggregation solution for event data used to provide visibility into your entire infrastructure. You can log any amount of data you need, and have that data immediately available for searching and analysis in real-time because it’s streamed in memory.
Humio has both self-hosted and cloud-based options like Elasticsearch. But it doesn’t need a separate installation for analytics data visualization. You have only one installation for self-hosting. Humio collects data with what’s called data shippers. They include the Beats component of the Elastic Stack, like Filebeat and Metricbeat, and other tools, like StatsD and rsyslog.
One advantage of Humio over Elasticsearch is that it doesn’t index the data it ingests. Because it’s index-free, Humio can collect data at much faster rates. Humio also compresses data as it ingests it, so it uses a lot fewer resources like disk space.
One Humio disadvantage is its pricing. While the pricing is highly competitive, they bill annually. That doesn’t lend itself to you taking it for a test drive. You have to pay up front for any testing while considering it as an alternative. A 30-day free trial can help, but that may not be enough to decide, depending on your Elasticsearch deployment size.
Another drawback is that self-hosted options require Kafka and Zookeeper, since Humio performs message queuing by default. This is an added capability to help scale Elasticsearch, but Humio makes use of these solutions built-in. As a result, you need them installed.
Humio has a free SaaS tier where you get 2 GB per day, but only with seven-day retention. For both self-hosted and SaaS solutions, you pay about $80 a month for 2 GB. They don’t charge on a per-ingested-data basis. Instead, Humio charges a monthly fee, billed annually.
Scalyr is a cloud-based log management and observability solution for event data and metrics. With no self-hosted option, there’s nothing to install on the server side. However, like Elasticsearch, you do have a Scalyr agent you must install on any machine you want to collect event data for. But unlike Elasticsearch, you have only one agent.
Like Elasticsearch, Scalyr also includes an API to send to and get data from its servers. Using Scalyr’s PowerQueries feature, you can transform your data before sending it to Scalyr or after in the Scalyr UI. So it can help do some of Logstash’s job as well.
Scalyr uses a columnar database instead of indices to store the data collected. Because of that, it’s able to provide fast data ingestion and parsing. This makes your data available more quickly for things like searches and alerts.
One advantage of Scalyr over Elasticsearch is ingest speed. Since Scalyr doesn’t use an index, it can ingest and search data much faster than Elasticsearch.
Another advantage is that you can use Scalyr’s Event Data Cloud service to power your custom applications, which makes it a great Elasticsearch alternative. No need to worry about how you scale and how fast user searches are. You simply worry about sending the data from your application or service, and Scalyr takes care of the rest. With Elasticsearch, even with the Elastic Cloud service, you would still have to keep watch on the number of indices and shards and make the necessary adjustments to maintain them.
A third advantage is that you wouldn’t have to suddenly switch to Scalyr. If you want to, you can do that and simply switch over. But if you’ve invested time and money in Elasticsearch, Scalyr offers a connector. The Scalyr Elasticsearch Connector is similar to the Presto connector but much more feature-filled. You can implement it as another part of your Elasticsearch deployment. With that, you’re able to send queries from Scalyr to the Elasticsearch Connector. It then converts your Scalyr query into an Elasticsearch query and waits for a response, just like you’re using Kibana.
One drawback for some could be that Scalyr isn’t open-source. Its agent is open-source, but it’s a commercial solution. Another drawback is the cloud-only option, like Sumo Logic. While this is subjective, your organization may have to plan for masking PII data before sending it to Scalyr.
Scalyr doesn’t offer a free plan but offers both a demo and a 30-day free trial. Similar to Humio, you don’t pay per ingested byte. Pricing starts at about $50 a month for 1 GB of data and reduces to as low as about $6/GB as your data grows. Unlike Humio, you can go on a month-to-month plan with Scalyr to give yourself more time for a transition.
Considering All Alternatives
Each of the above alternatives offers different advantages over Elasticsearch. You have solutions that are cloud-only, while others offer both cloud and self-hosted options. Even the cloud-based options vary in their deployment.
The choice is up to you. But a couple of things that you should expect from an event data solution are fast data ingestion and quick search queries. Most infrastructure scalability challenges will take money and time, but they can be overcome. The bigger challenge is being able to get the data into your solution fast enough without feeling like you need to take a walk while your search query runs.
This is where an index-free solution is something to strongly consider. It has shown to be faster than indexing. So Humio and Scalyr jump to the top of the list since they’re the only two with such options.
However, Scalyr’s Elasticsearch Connector sets it apart. With Elasticsearch’s footprint, customers may have invested years into their deployment. Maybe you can’t or don’t want to simply switch over. With the Scalyr Elasticsearch Connector, you can have Scalyr run alongside Elasticsearch, and slowly transition over.
Having to migrate from one event data solution to another can be stressful. You have a lot of data you can lose. You’ve seen five viable options you can consider as Elasticsearch alternatives, and you saw the advantages and disadvantages of each.
The best one depends on your organization’s specific needs. Do you only want an open-source solution, or is a commercial vendor OK? Are you ready to just switch and move on to something else, or do you want to slowly transition your data over? These are questions you want to answer and plan for accordingly. With a plan, you ensure a better move to the right Elasticsearch alternatives.
So, which alternative interests you most?