Wed Oct 05 2022
Protobuf vs JSON for Your Event-Driven Architecture
by
Mark Gregan

Choosing a serialization method for your data can be a confusing task considering the multitude of options out there, such as JSON, MessagePack, AVRO, Thrift, Protobuf, Flat Buffers, etc. Obviously, if you're using gRPC or Thrift RPC to communicate between your services, the decision has already been made for you. But for event-driven, you'll more than likely want to prioritize performance over other considerations.
Today we'll evaluate two of the most common serialization methods for event-driven systems: Protocol Buffers and JSON. Let's briefly overview both options:
Protobuf vs JSON: A Quick Overview
JSON
JSON is probably the most obvious candidate for most:
- Supported in the standard library of almost any language.
- Human-readable.
- No strict schema to deal with.
Due to its ubiquity, dealing with JSON is going to be the easiest option. Even better, you don't need any tooling to see what's in the messages. Just pop open kafdrop web UI and view your messages in plain text, easy-peasy! Depending on your needs and/or point-of-view, the lack of a strictly defined schema can be a positive or a negative. We'll discuss that further later on.
Protobuf
Protobuf is the less obvious choice of the two. But it definitely has its strengths:
- Schemas for free
- Support for binary fields without the need to base64 back and forth
- Built-in API support with gRPC
- And most importantly: Speed!
There is a bit of initial setup work when dealing with protobuf:
- First, you have to define your events and their schemas ahead of time.
- Then you have to use
protoc
, which isn't the world's most intuitive tool, to generate the code needed to create/manipulate protobuf events. - And finally, you have to include that generated code in all of your projects that are handling your events.
That's a few more steps than just calling json.Marshal()
and json.Unmarshal()
!
Protobuf vs. JSON: Schemas
Do I even need a Schema?
Plain JSON is definitely going to be easier and quicker to iterate with. You won't have to update any definitions, regenerate any code, or re-pull any changes into your code bases. Need a new field? Just add it in service A, and update service B to read the new field.
We can see that this is probably not going to scale well, however. Having multiple engineers on multiple teams adding, removing, or editing fields at will is going to result in things breaking at some point. It would be much better to have a single source of truth, a clearly defined schema.
Schema Options for JSON
The two big players in codifying the schema of a JSON message are AVRO and JSONSchema. The specifics/benefits of these two are outside the scope of this discussion, but we're including them to let you know that there are options out there.
Schema Options for Protobuf
Here's a key benefit of using protobuf, the definitions already define the schemas. No need to layer any other solution on top!
There are some caveats to be aware of though. Changing the type on a field will break backward compatibility, preventing you from replaying or reading messages generated before the change. It's best to treat protobuf definitions as additive only. If your field's data type needs to change, add a new field with the new data type, and "remove" the old field by marking it as deprecated (by appending [deprecated = true ]
to the end of the field definition), or reserving the field number.
Don't Miss Our Latest Updates
Protobuf VS JSON: Performance
If you're going the event-driven architecture route, you're probably interested in scale and want the best performance you can get out of a serialization method.
Let's write some benchmarks to get an idea of the speed of both. At Streamdal, we're a Golang shop, so we'll use that for our benchmark. If you're using a different language, its implementations of JSON and protobuf and the resulting performance will be different.
First we'll define both the JSON and protobuf events. Our JSON event will look like this:
type MyJSONMessage struct { Id string `json:"id"` Message string `json:"message"` Num int `json:"num"` }
And our protobuf event will look like this:
events/mypbmessage.proto:
syntax = "proto3";
package events;
option go_package = "github.com/streamdal/pbvsjson/events";
message MyPBMessage {
string id = 1;
string message = 2;
int32 num =
We'll compile that proto definition into Go code:
$ protoc -I ./events --go_out=paths=source_relative:events events/*.proto
Now let's benchmark serializing and deserializing our JSON and protobuf events:
main_test.go:
package main
import (
"encoding/json"
"math"
"testing"
"github.com/golang/protobuf/proto"
"github.com/streamdal/pbvsjson/events"
)
type MyJSONMessage struct {
Id string `json:"id"`
Message string `json:"message"`
Num int `json:"num"`
}
var msg *MyJSONMessage
var pbMsg *events.MyPBMessage
func init() {
msg = &MyJSONMessage{
Id: "97ec560f-2b14-4930-9a27-f0427f08951c",
Message: "Let's benchmark!",
Num: math.MaxInt32,
}
pbMsg = &events.MyPBMessage{
Id: "97ec560f-2b14-4930-9a27-f0427f08951c",
Message: "Let's benchmark!",
Num: math.MaxInt32,
}
}
func BenchmarkJSON(b *testing.B) {
for i := 0; i < b.N; i++ {
// First marshal to JSON
data, err := json.Marshal(msg)
if err != nil {
b.Error(err)
}
// Now unmarshal back into usable struct
decoded := &MyJSONMessage{}
if err := json.Unmarshal(data, decoded); err != nil {
b.Error(err)
}
}
}
func BenchmarkProtobuf(b *testing.B) {
for i := 0; i < b.N; i++ {
// First marshal to wire format
data, err := proto.Marshal(pbMsg)
if err != nil {
b.Error(err)
}
// Now unmarshal back into usable struct
tmpMsg := &events.MyPBMessage{}
if err := proto.Unmarshal(data, tmpMsg); err != nil {
b.Error(err
And let's see what those benchmarks look like:
$ go test -bench=. goos: darwin goarch: amd64 pkg: github.com/streamdal/pbvsjson cpu: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz BenchmarkJSON-8 612400 1731 ns/op BenchmarkProtobuf-8 3088689 385.9 ns/op
Protobuf is the clear winner here.
Let's dig a little deeper and see why with some more benchmarks:
main_test.go:
func BenchmarkJSON_marshal(b *testing.B) {
for i := 0; i < b.N; i++ {
// First marshal to JSON
_, err := json.Marshal(msg)
if err != nil {
b.Error(err)
}
}
}
func BenchmarkProtobuf_marshal(b *testing.B) {
for i := 0; i < b.N; i++ {
// First marshal to wire format
_, err := proto.Marshal(pbMsg)
if err != nil {
b.Error(err)
}
}
}
func BenchmarkJSON_unmarshal(b *testing.B) {
data, err := json.Marshal(msg)
if err != nil {
b.Error(err)
}
for i := 0; i < b.N; i++ {
tmpMsg := &MyJSONMessage{}
if err := json.Unmarshal(data, tmpMsg); err != nil {
b.Error(err)
}
}
}
func BenchmarkProtobuf_unmarshal(b *testing.B) {
data, err := proto.Marshal(pbMsg)
if err != nil {
b.Error(err)
}
for i := 0; i < b.N; i++ {
tmpMsg := &events.MyPBMessage{}
if err := proto.Unmarshal(data, tmpMsg); err != nil {
b.Error(err)
}
These additional benchmarks give us:
$ go test -bench=. goos: darwin goarch: amd64 pkg: github.com/streamdal/pbvsjson cpu: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz BenchmarkJSON-8 612400 1731 ns/op BenchmarkProtobuf-8 3088689 385.9 ns/op BenchmarkJSON_marshal-8 3975855 302.7 ns/op BenchmarkProtobuf_marshal-8 8163854 145.1 ns/op BenchmarkJSON_unmarshal-8 886795 1290 ns/op BenchmarkProtobuf_unmarshal-8 5829130 217.1 ns/op
A 2x advantage when serializing Protobuf vs JSON. Not bad! But the real win here is when unserializing. The overhead of having to parse the text-based JSON format is too insignificant not to ignore. At a 6x speed advantage, protobuf is definitely the way to go for the performance-minded.
Protobuf VS JSON: Message Size
The JSON message clocks in at 91 bytes, and the protobuf message at 62 bytes. Our example events are trivial in structure for the sake of grok-ability, but your events will grow along with the number transmitted through your infrastructure. Size should definitely be a consideration.
Protobuf VS JSON: Observability & Readability
As a text-based format, JSON wins the readability challenge. That's not to say Protobuf can't be readable easily though! In fact, that's part of what we do here at Streamdal! We make your protobuf events readable and searchable.
Simply upload your protobuf definitions to your Streamdal account, and use our open-source application Plumber to pipe your events in. We'll handle all of the messy decoding, indexing, and storage for you. Give it a shot and let us know how we can do more to help you observe your events!
Don't need search/replay/storage? Plumber can encode/decode protobuf live, using your .proto definitions. Plumber is also 100% free and open-source 🙂
Conclusions
In the world of scale, protobuf is definitely the way to go.
Schemas are a great benefit to microservice architecture by giving you a single source of truth for how your events should be structured. It may seem like a bunch of busy work in the beginning, but the benefits lay at scale.
Observability takes a hit when it comes to protobuf, but that's something we're trying to fix here at Streamdal!
For hobby projects, JSON is a solid choice for readability and development velocity. But in the world of scale, go binary, go protobuf.
Continue Exploring

Thu Jul 27 2023
Data Protection: Challenges and Opportunities
by
Daniel Selans
Explore data protection strategies, key regulations, and the role of automation in safeguarding sensitive information in an ever-evolving digital landscape.

Wed Jul 19 2023
Data Consistency in Distributed Enterprise Applications
by
Daniel Selans
Learn about data consistency in distributed enterprise apps, why it matters, and how to maintain it using validation and real-time data monitoring.