Software development and beyond

Elasticsearch: What to keep in mind when doing integration testing

When writing integraion tests that exchange data between your application and Elasticsearch cluster, we need to keep a few things in mind.

Tests are simply series of steps to perform and usually we want to execute a specific test synchronously, one step after another. To do so we have to wait until each call to Elasticsearch finishes before making another one. This is true when we perform index or remove operations before searching.

In Java API, all Elasticsearch operations are executed using a Client object. All operations are completely asynchronous in nature (either accept a listener or return a future). We can make the calls blocking (and thus make them behave like synchronous calls) by waiting for their response. To do so, call execute() on the client object, followed by actionGet() to get the response object. In the following example I am indexing a document of the type docType to index indexName.

IndexResponse response = client
.prepareIndex(indexName, docType, docId)
.setSource(document)
.execute()
.actionGet();

You can also call just get() which is the same as .execute().actionGet().

The second problem is that Elasticsearch search is not realtime. There is a process of writing and opening a new segment called refresh, which makes new documents available for search*.* By default, the refresh interval is one second and this is the time we would need to wait after we index new documents.

For testing, we can utilize Refresh API, which can perform a refresh on all indices at once or on a particular index. Note that executing a refresh has performance penalty, but it is okay to call it more frequently for testing purposes.

So, after new documents are indexed and before they are searched, we can invoke refresh operation in our test for our index. Here is an example to refresh index indexName in Java API:

client.admin().indices().prepareRefresh(indexName).get();

Refresh API can be used through HTTP, as any Elasticsearch API:

curl -XPOST localhost:9200/indexName/_refresh

In case you are retrieving documents by Id, there is no need to call refresh. You will be able to retrieve documents right away after they are indexed.

So whenever you write integration tests using Elasticsearch, check that all your calls behave like they are synchronous if they need to be executed one after another and that shards are refreshed before searching them.

Last updated on 4.5.2017.

elasticsearch testing