Protobuf

Wed Oct 05 2022

Protobuf vs JSON for Your Event-Driven Architecture

Mark Gregan

Choosing a serialization method for your data can be a confusing task considering the multitude of options out there, such as JSON, MessagePack, AVRO, Thrift, Protobuf, Flat Buffers, etc. Obviously, if you're using gRPC or Thrift RPC to communicate between your services, the decision has already been made for you. But for event-driven, you'll more than likely want to prioritize performance over other considerations.

Today we'll evaluate two of the most common serialization methods for event-driven systems: Protocol Buffers and JSON. Let's briefly overview both options:

Protobuf vs JSON: A Quick Overview

JSON

JSON is probably the most obvious candidate for most:

Supported in the standard library of almost any language.
Human-readable.
No strict schema to deal with.

Due to its ubiquity, dealing with JSON is going to be the easiest option. Even better, you don't need any tooling to see what's in the messages. Just pop open kafdrop web UI and view your messages in plain text, easy-peasy! Depending on your needs and/or point-of-view, the lack of a strictly defined schema can be a positive or a negative. We'll discuss that further later on.

Protobuf

Protobuf is the less obvious choice of the two. But it definitely has its strengths:

Schemas for free
Support for binary fields without the need to base64 back and forth
Built-in API support with gRPC
And most importantly: Speed!

There is a bit of initial setup work when dealing with protobuf:

First, you have to define your events and their schemas ahead of time.
Then you have to use protoc, which isn't the world's most intuitive tool, to generate the code needed to create/manipulate protobuf events.
And finally, you have to include that generated code in all of your projects that are handling your events.

That's a few more steps than just calling json.Marshal() and json.Unmarshal() !

Protobuf vs. JSON: Schemas

Do I even need a Schema?

Plain JSON is definitely going to be easier and quicker to iterate with. You won't have to update any definitions, regenerate any code, or re-pull any changes into your code bases. Need a new field? Just add it in service A, and update service B to read the new field.

We can see that this is probably not going to scale well, however. Having multiple engineers on multiple teams adding, removing, or editing fields at will is going to result in things breaking at some point. It would be much better to have a single source of truth, a clearly defined schema.

Schema Options for JSON

The two big players in codifying the schema of a JSON message are AVRO and JSONSchema. The specifics/benefits of these two are outside the scope of this discussion, but we're including them to let you know that there are options out there.

Schema Options for Protobuf

Here's a key benefit of using protobuf, the definitions already define the schemas. No need to layer any other solution on top!

There are some caveats to be aware of though. Changing the type on a field will break backward compatibility, preventing you from replaying or reading messages generated before the change. It's best to treat protobuf definitions as additive only. If your field's data type needs to change, add a new field with the new data type, and "remove" the old field by marking it as deprecated (by appending [deprecated = true ] to the end of the field definition), or reserving the field number.

Don't Miss Our Latest Updates

Protobuf VS JSON: Performance

If you're going the event-driven architecture route, you're probably interested in scale and want the best performance you can get out of a serialization method.

Let's write some benchmarks to get an idea of the speed of both. At Streamdal, we're a Golang shop, so we'll use that for our benchmark. If you're using a different language, its implementations of JSON and protobuf and the resulting performance will be different.

First we'll define both the JSON and protobuf events. Our JSON event will look like this:

type MyJSONMessage struct {
	Id      string `json:"id"`
	Message string `json:"message"`
	Num     int    `json:"num"`
}

And our protobuf event will look like this:

events/mypbmessage.proto:

syntax = "proto3";

package events;

option go_package = "github.com/streamdal/pbvsjson/events";

message MyPBMessage {
  string id = 1;
  string message = 2;
  int32 num =

We'll compile that proto definition into Go code:

$ protoc -I ./events --go_out=paths=source_relative:events events/*.proto

Now let's benchmark serializing and deserializing our JSON and protobuf events:

main_test.go:

package main


import (
	"encoding/json"
	"math"
	"testing"
	"github.com/golang/protobuf/proto"
	"github.com/streamdal/pbvsjson/events"
)


type MyJSONMessage struct {
	Id      string `json:"id"`
	Message string `json:"message"`
	Num     int    `json:"num"`
}


var msg *MyJSONMessage
var pbMsg *events.MyPBMessage


func init() {
	msg = &MyJSONMessage{
		Id:      "97ec560f-2b14-4930-9a27-f0427f08951c",
		Message: "Let's benchmark!",
		Num:     math.MaxInt32,
	}


	pbMsg = &events.MyPBMessage{
		Id:      "97ec560f-2b14-4930-9a27-f0427f08951c",
		Message: "Let's benchmark!",
		Num:     math.MaxInt32,
	}
}


func BenchmarkJSON(b *testing.B) {
	for i := 0; i < b.N; i++ {
		// First marshal to JSON
		data, err := json.Marshal(msg)
		if err != nil {
			b.Error(err)
		}

		// Now unmarshal back into usable struct
		decoded := &MyJSONMessage{}
		if err := json.Unmarshal(data, decoded); err != nil {
			b.Error(err)
		}
	}
}


func BenchmarkProtobuf(b *testing.B) {
	for i := 0; i < b.N; i++ {
		// First marshal to wire format
		data, err := proto.Marshal(pbMsg)
		if err != nil {
			b.Error(err)
		}

		// Now unmarshal back into usable struct
		tmpMsg := &events.MyPBMessage{}
		if err := proto.Unmarshal(data, tmpMsg); err != nil {
			b.Error(err

And let's see what those benchmarks look like:

$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/streamdal/pbvsjson
cpu: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
BenchmarkJSON-8                 	  612400	      1731 ns/op
BenchmarkProtobuf-8             	 3088689	       385.9 ns/op

Protobuf is the clear winner here.

Let's dig a little deeper and see why with some more benchmarks:

main_test.go:

func BenchmarkJSON_marshal(b *testing.B) {
	for i := 0; i < b.N; i++ {
		// First marshal to JSON
		_, err := json.Marshal(msg)
		if err != nil {
			b.Error(err)
		}
	}
}


func BenchmarkProtobuf_marshal(b *testing.B) {
	for i := 0; i < b.N; i++ {
		// First marshal to wire format
		_, err := proto.Marshal(pbMsg)
		if err != nil {
			b.Error(err)
		}
	}
}


func BenchmarkJSON_unmarshal(b *testing.B) {
	data, err := json.Marshal(msg)
	if err != nil {
		b.Error(err)
	}


	for i := 0; i < b.N; i++ {
		tmpMsg := &MyJSONMessage{}
		if err := json.Unmarshal(data, tmpMsg); err != nil {
			b.Error(err)
		}
	}
}


func BenchmarkProtobuf_unmarshal(b *testing.B) {
	data, err := proto.Marshal(pbMsg)
	if err != nil {
		b.Error(err)
	}


	for i := 0; i < b.N; i++ {
		tmpMsg := &events.MyPBMessage{}
		if err := proto.Unmarshal(data, tmpMsg); err != nil {
			b.Error(err)
		}

These additional benchmarks give us:

$ go test -bench=.
goos: darwin
goarch: amd64
pkg: github.com/streamdal/pbvsjson
cpu: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
BenchmarkJSON-8                 	  612400	      1731 ns/op
BenchmarkProtobuf-8             	 3088689	       385.9 ns/op
BenchmarkJSON_marshal-8         	 3975855	       302.7 ns/op
BenchmarkProtobuf_marshal-8     	 8163854	       145.1 ns/op
BenchmarkJSON_unmarshal-8       	  886795	      1290 ns/op
BenchmarkProtobuf_unmarshal-8   	 5829130	       217.1 ns/op

A 2x advantage when serializing Protobuf vs JSON. Not bad! But the real win here is when unserializing. The overhead of having to parse the text-based JSON format is too insignificant not to ignore. At a 6x speed advantage, protobuf is definitely the way to go for the performance-minded.

Protobuf VS JSON: Message Size

The JSON message clocks in at 91 bytes, and the protobuf message at 62 bytes. Our example events are trivial in structure for the sake of grok-ability, but your events will grow along with the number transmitted through your infrastructure. Size should definitely be a consideration.

Protobuf VS JSON: Observability & Readability

As a text-based format, JSON wins the readability challenge. That's not to say Protobuf can't be readable easily though! In fact, that's part of what we do here at Streamdal! We make your protobuf events readable and searchable.

Simply upload your protobuf definitions to your Streamdal account, and use our open-source application Plumber to pipe your events in. We'll handle all of the messy decoding, indexing, and storage for you. Give it a shot and let us know how we can do more to help you observe your events!

Don't need search/replay/storage? Plumber can encode/decode protobuf live, using your .proto definitions. Plumber is also 100% free and open-source 🙂

Conclusions

In the world of scale, protobuf is definitely the way to go.

Schemas are a great benefit to microservice architecture by giving you a single source of truth for how your events should be structured. It may seem like a bunch of busy work in the beginning, but the benefits lay at scale.

Observability takes a hit when it comes to protobuf, but that's something we're trying to fix here at Streamdal!

For hobby projects, JSON is a solid choice for readability and development velocity. But in the world of scale, go binary, go protobuf.

Protobuf vs JSON: A Quick Overview

Protobuf vs. JSON: Schemas

Protobuf VS JSON: Performance

Protobuf VS JSON: Message Size

Protobuf VS JSON: Observability & Readability

Conclusions

Mark Gregan

Engineering Lead

Event-driven enthusiast and Golang fan with over 17 years of experience as a software engineer. Mark leads the development of the Streamdal's platform and open-source message bus tool Plumber, helping engineers observe events flowing through their architecture.

Continue Exploring

Data Protection

Thu Jul 27 2023

Data Protection: Challenges and Opportunities

Daniel Selans

Explore data protection strategies, key regulations, and the role of automation in safeguarding sensitive information in an ever-evolving digital landscape.

Data Consistency in Distributed Enterprise Applications

Daniel Selans

Learn about data consistency in distributed enterprise apps, why it matters, and how to maintain it using validation and real-time data monitoring.

Protobuf vs JSON for Your Event-Driven Architecture

Protobuf vs JSON: A Quick Overview

JSON

Protobuf

Protobuf vs. JSON: Schemas

Do I even need a Schema?

Schema Options for JSON

Schema Options for Protobuf

Don't Miss Our Latest Updates

Protobuf VS JSON: Performance

Protobuf VS JSON: Message Size

Protobuf VS JSON: Observability & Readability

Conclusions

Continue Exploring

Data Protection: Challenges and Opportunities

Data Consistency in Distributed Enterprise Applications

Unlock real-time data visibility today!