Kafka as a remote array — binary search example

Adam Kotwasinski
2 min readJul 26, 2022

Very often when developers come to me asking to onboard a new Kafka usecase, they believe that Kafka can immediately solve all the problems. While it is indeed an excellent project, it can also be somewhat simplified to a simple abstraction of “remote record array” with several enrichments (high performance, consumer groups, compaction, multiple integrations etc.).

However at the bottom, a Kafka partition can be simply treated just as it is — a remote byte array, allowing for append-only write operations for producers and random access for consumers.

Kafka partitions as simple arrays of records

So on a bit lighter note, if Kafka can be treated as a byte array — then would it be possible to use it in typical computer science scenarios such as binary search? It turns out it is very much possible, if we create a partition with sorted data and then use a consumer to access array[i] .

In the above simple example, we basically implement our own binary search, and create anarrayAccess method to provide our Kafka-backed implementation of array[i] .

However, in this case we had to implement our own binary search, while we could have used Collections.binarySearch instead. To do this, we will need to create a “fun” class KafkaBackedArray that is going to implement List’s get method with a consumer — with the rest of the API being given to us almost for free.

In the end it works, as expected the binary search performs the accesses what translates into Kafka poll calls.

Careful readers will notice that I have configured the consumer to use max.poll.records=1 as I did not want to poll data just to have it thrown away. Our KafkaBackedArray implementation could be made a little smarter, and use some internal cache of already-received records, saving on latency and network I/O.