Slow ElasticSearch Indexing

This is a stretch, but after days of trying to figure it out, I’ve made zero progress so figured I might as well ask here.

We’re in the middle of migrating our company’s infrastructure for one of our products from Azure to AWS. Part of this has involved migrating our Kafka cluster (housing 100s of millions of events) to MSK. Now we need to index all of that data into the new ES instance using Logstash.

While MirrorMaker was able to saturate our old cluster’s gigabit connection to migrate the Kafka data over, we’re only able to index that data into ES at a rate of ~150/min. We realistically need to be indexing at a rate of 50000/min, or at least 10000/min. Our resource utilization on ES is sitting at ~3% for CPU and RAM on both the master and data nodes. Sharding is configured to be identical to our old setup, and Logstash has been configured exactly the same too.

The only difference between the two clusters is that the one on AWS is using never versions of everything. I’d appreciate any input that anyone might have.

I’m guessing latency

Latency is approximately 0.7ms. Thanks though.

Still working on this. Disconnected ElasticSearch from Logstash, the speed remains the same. So the issue is either Kafka (MSK) or Logstash

Lol, it’s sorted now. It ended up being a misconfiguration with Logstash. I never configured auto_offset_reset => earliest, so Logstash was only pushing the latest data MirrorMaker was syncing from our old Kafka cluster :man_facepalming: Posting it here in case someone somehow ends up running into the same issue. We’re now indexing approximately 2.8 million documents per minute.

5 Likes

Great. I want to setup an ElasticSearch instance to play soon. Looks funny :stuck_out_tongue:

1 Like

I believe @Jarland also would love some cool Elastisearch setup … :sweat_smile:

1 Like

It feels inevitable