Proxying Kafka with Envoy without changing advertised listeners by using rewrite rules
When I wrote the Kafka broker filter back in 2020 it was just an ordinary TCP filter, what brought the typical requirement of having Kafka brokers to advertise themselves using the proxy address.
To make the client traffic go through the proxies, the Kafka brokers need to advertise themselves using their proxy addresses:
Unfortunately, this approach brings some drawbacks :
- Kafka configuration needs to be changed and there can be only one fleet of proxy instanced proxying the cluster (though in theory it could be avoided by leaving that to DNS),
- failure to update brokers’ configuration can still leave us with working code if the proxied cluster is reachable, simply without proxying benefits (the clients would to only initial API versions + metadata discovery via proxy, but the real communication would go directly client<->cluster).
To avoid this kind of situations, with the newest version of the Envoy proxy we can force the filter to change the returned broker responses (docs) so that the information about Kafka clusters does not leak upstream.
To make the Envoy instance we need to provide it with knowledge how to rewrite the received responses — the current implementation uses the broker ID property to replace the advertised host:port pair with our value. This unfortunately means that the Envoy owner needs to be aware of Kafka clusters’ topology (how many brokers are there, what are their ids) and replicate this knowledge in Envoy’s configuration.
One of the side benefits is that Kafka can be proxied by multiple proxy fleets, potentially each of them carrying different configuration / rewriting requests/responses in a different way. For example typical Envoy features such as rate limiting and ACLs could be applied in these filter chains (as well as any future Kafka-specific features).
The other benefit comes from the fact that Kafka does not need to know that it is being proxied — this allows development teams to setup (and mutate) their proxies without the need to engage the Kafka owners.
Regarding future improvements, right now the rewrite rules are trivial, requiring proxy owners to maintain the information about original Kafka cluster topology (as they need to provide an id -> host:port
function) — an additional way of doing something like prefix/suffix or sed might be valuable and easier to maintain (in the end allowing us to rewrite e.g. kafka1.internal:1234
to kafka1.external:2345
).
And I got to use C++’s pointer-to-member!