{"id":15394,"date":"2026-03-29T09:22:46","date_gmt":"2026-03-29T07:22:46","guid":{"rendered":"https:\/\/monodes.com\/predaelli\/?p=15394"},"modified":"2026-03-29T09:22:48","modified_gmt":"2026-03-29T07:22:48","slug":"nobody-gets-fired-for-picking-json-but-maybe-they-should","status":"publish","type":"post","link":"https:\/\/monodes.com\/predaelli\/2026\/03\/29\/nobody-gets-fired-for-picking-json-but-maybe-they-should\/","title":{"rendered":"Nobody Gets Fired for Picking JSON, but Maybe They Should?"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\"><a href=\"https:\/\/mcyoung.xyz\/2024\/12\/10\/json-sucks\">Nobody Gets Fired for Picking JSON, but Maybe They Should?<\/a><\/h2>\n\n\n\n<p>By <a href=\"https:\/\/mcyoung.xyz\/about\/\">Miguel Young de la Sota<\/a><\/p>\n\n\n\n<!--more-->\n\n\n\n<!--nextpage-->\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<h1 class=\"wp-block-heading\"><a href=\"https:\/\/mcyoung.xyz\/2024\/12\/10\/json-sucks\">Nobody Gets Fired for Picking JSON, but Maybe They Should?<\/a><\/h1>\n\n\n\n<p>JSON is extremely popular but deeply flawed. This article discusses the details of JSON\u2019s design, how it\u2019s used (and misused), and how seemingly helpful \u201chuman readability\u201d features cause headaches instead. Crucially, you rarely find JSON-based tools (except dedicated tools like <code class=\"\" data-line=\"\">jq<\/code>) that can safely handle arbitrary JSON documents without a schema\u2014common corner cases can lead to data corruption!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"what-is-json\"><a href=\"https:\/\/mcyoung.xyz\/2024\/12\/10\/json-sucks\/#what-is-json\">What is JSON?<\/a><\/h2>\n\n\n\n<p>JSON is famously simple. In fact, you can <a href=\"https:\/\/www.flickr.com\/photos\/equanimity\/3763158824\/in\/photostream\/\">fit the entire grammar on the back of a business card<\/a>. It\u2019s so omnipresent in REST APIs that you might assume you already know JSON quite well. It has decimal numbers, quoted strings, arrays with square brackets, and key-value maps (called \u201cobjects\u201d) with curly braces. A JSON document consists of any of these constructs: <code class=\"\" data-line=\"\">null<\/code>, <code class=\"\" data-line=\"\">42<\/code>, and <code class=\"\" data-line=\"\">{&quot;foo&quot;:&quot;bar&quot;}<\/code> are all valid JSON documents.<\/p>\n\n\n\n<p>However, the formal definition of JSON is quite complicated. JSON is defined by the IETF document <a href=\"https:\/\/datatracker.ietf.org\/doc\/html\/rfc8259\">RFC8259<\/a> (if you don\u2019t know what the IETF is, it\u2019s the standards body for Internet protocols). However, it\u2019s <em>also<\/em> normatively defined by <a href=\"https:\/\/ecma-international.org\/publications-and-standards\/standards\/ecma-404\/\">ECMA-404<\/a>, which is from ECMA, the standards body that defines JavaScript<sup><a href=\"https:\/\/mcyoung.xyz\/2024\/12\/10\/json-sucks\/#fn:1\">1<\/a><\/sup>.<\/p>\n\n\n\n<p>JavaScript? Yes, JSON (JavaScript Object Notation) is closely linked with JavaScript and is, in fact, (almost) a subset of it. While JSON\u2019s JavaScript ancestry is the main source of its quirks, several other poor design decisions add additional unforced errors.<\/p>\n\n\n\n<p>However, the biggest problem with JSON isn\u2019t any specific design decision but rather the incredible diversity of parser behavior and non-conformance across and within language ecosystems. RFC8259 goes out of its way to call this out:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><a href=\"https:\/\/mcyoung.xyz\/2024\/12\/10\/json-sucks\/#ref:1\">reference<\/a><\/p>\n\n\n\n<p>Note, however, that ECMA-404 allows several practices that this specification recommends avoiding in the interests of maximal interoperability.<\/p>\n<\/blockquote>\n\n\n\n<p>The RFC makes many observations regarding interoperability elsewhere in the document. Probably the most glaring\u2014and terrifying\u2014is how numbers work.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"everything-is-implementation-defined\"><a href=\"https:\/\/mcyoung.xyz\/2024\/12\/10\/json-sucks\/#everything-is-implementation-defined\">Everything is Implementation-Defined<\/a><\/h2>\n\n\n\n<p>JSON numbers are encoded in decimal, with an optional minus sign, a fractional part after a decimal point, and a scientific notation exponent. This is similar to how many programming languages define their own numeric literals.<\/p>\n\n\n\n<p>Presumably, JSON numbers are meant to be floats, right?<\/p>\n\n\n\n<p>Wrong.<\/p>\n\n\n\n<p>RFC8259 reveals that the answer is, unfortunately, \u201cwhatever you want.&#8221;<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>This specification allows implementations to set limits on the range and precision of numbers accepted. Since software that implements IEEE 754 binary64 (double precision) numbers is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision.<\/p>\n<\/blockquote>\n\n\n\n<p><code class=\"\" data-line=\"\">binary64<\/code> is the \u201cstandards-ese\u201d name for the type usually known as <code class=\"\" data-line=\"\">double<\/code> or <code class=\"\" data-line=\"\">float64<\/code>. Floats have great dynamic range but often can\u2019t represent exact values. For example, <code class=\"\" data-line=\"\">1.1<\/code> isn\u2019t representable as a float because all floats are fractions of the form <code class=\"\" data-line=\"\">n \/ 2^m<\/code> for integers <code class=\"\" data-line=\"\">n<\/code> and <code class=\"\" data-line=\"\">m<\/code>, but <code class=\"\" data-line=\"\">1.1 = 11\/10<\/code>, which has a factor of 5 in its denominator. The closest <code class=\"\" data-line=\"\">float64<\/code> value is<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">2476979795053773 \/ 2^51 = 1.100000000000000088817841970012523233890533447265625\n<\/code><\/pre>\n\n\n\n<p><a href=\"https:\/\/mcyoung.xyz\/2024\/12\/10\/json-sucks\/#code:1\">Plaintext<\/a><\/p>\n\n\n\n<p>Of course, you might think to declare \u201call JSON values map to their closest <code class=\"\" data-line=\"\">float64<\/code> value\u201d. Unfortunately, this value might not be unique. For example, the value <code class=\"\" data-line=\"\">900000000000.00006103515625<\/code> isn\u2019t representable as a <code class=\"\" data-line=\"\">float64<\/code>, and it\u2019s precisely between two exact <code class=\"\" data-line=\"\">float64<\/code> values. Depending on the rounding mode, this rounds to either or <code class=\"\" data-line=\"\">900000000000<\/code> or <code class=\"\" data-line=\"\">900000000000.0001220703125<\/code> .<\/p>\n\n\n\n<p>IEEE 754 recommends \u201cround ties to even\u201d as the default rounding mode, so for almost all software, the result is <code class=\"\" data-line=\"\">900000000000<\/code>. But remember, floating-point state is a global variable implemented in hardware, and might just happen to be clobbered by some dependency that calls <code class=\"\" data-line=\"\">fesetround()<\/code> or a similar system function.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"data-loss-data-loss\"><a href=\"https:\/\/mcyoung.xyz\/2024\/12\/10\/json-sucks\/#data-loss-data-loss\">Data Loss! Data Loss!<\/a><\/h2>\n\n\n\n<p>You\u2019re probably thinking, \u201cI don\u2019t care about such fussy precision stuff. None of my numbers have any fractional parts\u2014and there is where you would be wrong. The <code class=\"\" data-line=\"\">n<\/code> part of <code class=\"\" data-line=\"\">n \/ 2^m<\/code> only has 53 bits available, but <code class=\"\" data-line=\"\">int64<\/code> values fall outside of that range. This means that for very large 64-bit integers, such as randomly generated IDs, a JSON parser that converts integers into floats results in <em>data loss.<\/em> Go\u2019s <code class=\"\" data-line=\"\">encoding\/json<\/code> package does this, for example.<\/p>\n\n\n\n<p>How often does this actually happen for randomly-generated numbers? We can do a little Monte Carlo simulation to find out.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">package main\n\nimport (\n\t&quot;fmt&quot;\n\t&quot;math&quot;\n\t&quot;math\/big&quot;\n\t&quot;math\/rand&quot;\n)\n\nconst trials = 5_000_000\nfunc main() {\n\tvar misses int\n\tvar err big.Float\n\tfor range trials {\n\t\tx := int64(rand.Uint64())\n\t\ty := int64(float64(x)) \/\/ Round-trip through binary64.\n\t\tif x != y {\n\t\t\tmisses++\n\t\t\terr.Add(&amp;err, big.NewFloat(math.Abs(float64(x - y))))\n\t\t}\n\t}\n\n\terr.Quo(&amp;err, big.NewFloat(trials))\n\tfmt.Printf(&quot;misses: %d\/%d, avg: %f&quot;, misses, trials, &amp;err)\n}\n\n\/\/ Output:\n\/\/ misses: 4970572\/5000000, avg: 170.638499\n<\/code><\/pre>\n\n\n\n<p><a href=\"https:\/\/mcyoung.xyz\/2024\/12\/10\/json-sucks\/#code:2\">Go<\/a><\/p>\n\n\n\n<p>It turns out that almost all randomly distributed <code class=\"\" data-line=\"\">int64<\/code> values are affected by round-trip data loss. Roughly, the only numbers that are safe are those with at most 16 digits (although not exactly: 9,999,999,999,999,999, for example, gets rounded up to a nice round 10 quadrillion).<\/p>\n\n\n\n<p>How does this affect you? Suppose you have a JSON document somewhere that includes a user ID and a transcript of their private messages with another user. Data loss due to rounding would result in the wrong user ID being associated with the private messages, which could result in leaking PII or incorrect management of privacy consent (such as GDPR requirements).<\/p>\n\n\n\n<p>This isn\u2019t just about <em>your<\/em> user IDs, mind you. Plenty of other vendors\u2019 IDs are nice big integers, which the JSON grammar can technically accommodate and which random tools will mangle. Some examples:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>License keys: for example, Adobe uses 24 digits for <a href=\"https:\/\/helpx.adobe.com\/x-productkb\/global\/invalid-revoked-serial-numbers.html\">their serial numbers<\/a>, which may be tempting to store as an integer.<\/li>\n\n\n\n<li>Barcode IDs like the unique serial numbers of medical devices, <a href=\"https:\/\/www.fda.gov\/medical-devices\/unique-device-identification-system-udi-system\/udi-basics\">which are tightly regulated<\/a>.<\/li>\n\n\n\n<li>Visa and Mastercard credit card numbers <em>happen<\/em> to fit in the \u201csafe\u201d range for <code class=\"\" data-line=\"\">binary64<\/code> , which may lull you into a false sense of security, since they\u2019re so common. But not all credit cards have 16 digit numbers: <a href=\"https:\/\/en.wikipedia.org\/wiki\/Payment_card_number#Structure\">some now support 19<\/a>.<\/li>\n<\/ul>\n\n\n\n<p>These are pretty bad compliance consequences purely due to a data serialization format.<\/p>\n\n\n\n<p>This problem is avoidable with care. After all, Go can parse JSON into any arbitrary type using reflection. For example, if we replace the inner loop of the Monte Carlo simulation with something like the following:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">for range trials {\n\tx := int64(rand.Uint64())\n\tvar v struct{ N int64 }\n\tjson.Unmarshal(&#091;]byte(fmt.Sprintf(`{&quot;N&quot;:%d}`, x)), &amp;v)\n\ty := v.N\n\tif x != y {\n\t\t\/\/ ...\n\t}\n}\n<\/code><\/pre>\n\n\n\n<p><a href=\"https:\/\/mcyoung.xyz\/2024\/12\/10\/json-sucks\/#code:3\">Go<\/a><\/p>\n\n\n\n<p>We suddenly see that <code class=\"\" data-line=\"\">x == y<\/code> in every trial. This is because with type information, Go\u2019s JSON library knows exactly what the target precision is. If we were parsing to an <code class=\"\" data-line=\"\">any<\/code> instead of to a <code class=\"\" data-line=\"\">struct { N int64 }<\/code>, we\u2019d be in deep trouble: the outer object would be parsed into a <code class=\"\" data-line=\"\">map[string]any<\/code>, and the <code class=\"\" data-line=\"\">N<\/code> field would become a <code class=\"\" data-line=\"\">float64<\/code>.<\/p>\n\n\n\n<p>This means that your system probably can\u2019t safely handle JSON documents with unknown fields. Tools like <code class=\"\" data-line=\"\">jq<\/code> must be extremely careful about number handling to avoid data loss. This is an easy mistake for third-party tools to make.<\/p>\n\n\n\n<p>But again, <code class=\"\" data-line=\"\">float64<\/code> isn\u2019t the standard\u2014there is no standard. Some implementations might only have 32-bit floats available, making the problem worse. Some implementations might try to be clever, using a <code class=\"\" data-line=\"\">float64<\/code> for fractional values and an <code class=\"\" data-line=\"\">int64<\/code> for integer values; however, this still imposes arbitrary limits on the parsed values, potentially resulting in data loss.<\/p>\n\n\n\n<p>Some implementations such as Python use bignums, so they appear not to have this problem. However, this can lead to a false sense of security where issues are not caught until it\u2019s too late: some database now contains ostensibly valid but non-interoperable JSON.<\/p>\n\n\n\n<p>Protobuf is forced to deal with this in a pretty non-portable way. To avoid data loss, large 64-bit integers are serialized as quoted strings when serializing to JSON. So, instead of writing <code class=\"\" data-line=\"\">{&quot;foo&quot;:6574404881820635023}<\/code>, it emits <code class=\"\" data-line=\"\">{&quot;foo&quot;:&quot;6574404881820635023&quot;}<\/code>. This solves the data loss issue but does not work with other JSON libraries such as Go\u2019s, producing errors like this one:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">json: cannot unmarshal string into Go struct field .N of type int64\n<\/code><\/pre>\n\n\n\n<p><a href=\"https:\/\/mcyoung.xyz\/2024\/12\/10\/json-sucks\/#code:4\">Plaintext<\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"non-finite-values\"><a href=\"https:\/\/mcyoung.xyz\/2024\/12\/10\/json-sucks\/#non-finite-values\">Non-Finite Values<\/a><\/h3>\n\n\n\n<p>The special floating point values <code class=\"\" data-line=\"\">Infinity<\/code>, <code class=\"\" data-line=\"\">-Infinity<\/code>, and <code class=\"\" data-line=\"\">NaN<\/code> are not representable: it\u2019s the wild west as to what happens when you try to serialize the equivalent of <code class=\"\" data-line=\"\">{x:1.0\/0.0}<\/code>.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Go refuses to serialize, citing <code class=\"\" data-line=\"\">json: unsupported value: +Inf<\/code>.<\/li>\n\n\n\n<li>Protobuf serializes it as <code class=\"\" data-line=\"\">{&quot;x&quot;:&quot;inf&quot;}<\/code> (or should\u2014it\u2019s unclear which implementations get it right).<\/li>\n\n\n\n<li>JavaScript won\u2019t even bother trying: <code class=\"\" data-line=\"\">JSON.stringify({x:Infinity})<\/code> prints <code class=\"\" data-line=\"\">{&quot;x&quot;:null}.<\/code><\/li>\n\n\n\n<li>Python is arguably the worst offender: <code class=\"\" data-line=\"\">json.dumps({&quot;x&quot;:float(&quot;inf&quot;)})<\/code> prints <code class=\"\" data-line=\"\">{&quot;x&quot;:Infinity}<\/code>, which isn\u2019t even valid JSON per RFC8259.<\/li>\n<\/ul>\n\n\n\n<p>NaN is arguably an even worse offender, because the NaN payload (yes, <a href=\"https:\/\/doc.rust-lang.org\/std\/primitive.f32.html#nan-bit-patterns\">NaNs have a special payload<\/a>) is discarded when converting to <code class=\"\" data-line=\"\">&quot;nan&quot;<\/code> or however your library represents it.<\/p>\n\n\n\n<p>Does this affect you? Well, if you\u2019re doing anything with floats, you\u2019re one division-by-zero or overflow away from triggering serialization errors. At best, it\u2019s \u201cbenign\u201d data corruption (JavaScript). At worst, when the data is partially user-controlled, it might result in crashes or unparseable output, which is the making of a DoS vector.<\/p>\n\n\n\n<p>In comparison, Protobuf serialization can\u2019t fail except due to non-UTF-8 <code class=\"\" data-line=\"\">string<\/code> fields or cyclic message references, both of which are comparatively unlikely to a NaN popping up in a calculation.<\/p>\n\n\n\n<p>The upshot is that all the parsers end up parsing a bunch of crazy things for the special floating-point values over time because of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Robustness_principle\">Postel\u2019s law<\/a>. RFC8259 makes no effort to provide suggestions for dealing with such real-world situations beyond \u201ctough luck, not interoperable.\u201d<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"text-encodings-and-invalid-unicode\"><a href=\"https:\/\/mcyoung.xyz\/2024\/12\/10\/json-sucks\/#text-encodings-and-invalid-unicode\">Text Encodings and Invalid Unicode<\/a><\/h2>\n\n\n\n<p>JSON strings are relatively tame, with some marked (but good) divergence from JavaScript. Specifically, JavaScript, being a language of a certain age (along with Java), uses UTF-16 as its Unicode text encoding. Most of the world has realized this is a bad idea (it doubles the size of ASCII text, which makes up almost all of Internet traffic), so JSON uses UTF-8 instead. RFC8259 actually specifies that the whole document MUST be encoded in UTF-8.<\/p>\n\n\n\n<p>But when we go to read about Unicode characters in \u00a78.2, we are disappointed: it merely says that it\u2019s <em>really great<\/em> when all quoted strings consist entirely of Unicode characters, which means that unpaired surrogates are allowed. In effect, the spec merely requires that JSON strings be <a href=\"https:\/\/en.wikipedia.org\/wiki\/UTF-8#Surrogates\">WTF-8<\/a>: UTF-8 that permits unpaired surrogates.<\/p>\n\n\n\n<p>What\u2019s an unpaired surrogate? It\u2019s any encoded Unicode 32-bit value in the range <code class=\"\" data-line=\"\">U+D800<\/code> to <code class=\"\" data-line=\"\">U+DFFF<\/code> , which form a gap in the Unicode codepoint range. UTF-8\u2019s variable-length integer encoding can encode them, but their presence in a bytestream makes it invalid UTF-8. WTF-8 is UTF-8 but permitting the appearance of these values.<\/p>\n\n\n\n<p>So, who actually supports parsing (or serializing) these? Consider the document <code class=\"\" data-line=\"\">{&quot;x&quot;:&quot;\\udead&quot;}<\/code>, which contains an unpaired surrogate, <code class=\"\" data-line=\"\">U+DEAD<\/code>.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Go gladly deserializes AND serializes it (Go\u2019s strings are arbitrary byte strings, not UTF-8). However, Go serializes a non-UTF-8 string such as <code class=\"\" data-line=\"\">&quot;\\xff&quot;<\/code> as <code class=\"\" data-line=\"\">&quot;\\ufffd&quot;<\/code>, having replaced the invalid byte with a <code class=\"\" data-line=\"\">U+FFFD<\/code> replacement character (this thing: \ufffd).<\/li>\n\n\n\n<li>Most Java parsers seem to follow the same behavior as Go, but there are many different parsers available, and we\u2019ve already learned that different JSON parsers may behave differently.<\/li>\n\n\n\n<li>JavaScript and Python similarly gladly parse unpaired surrogates, but they also serialize them back without converting them into <code class=\"\" data-line=\"\">U+FFFD<\/code>.<\/li>\n\n\n\n<li>Different Protobuf runtimes may not handle this identically, but the reference C++ implementation (whose JSON codec I wrote!) refuses to parse unpaired surrogates.<\/li>\n<\/ul>\n\n\n\n<p>There are other surprising pitfalls around strings: are <code class=\"\" data-line=\"\">&quot;x&quot;<\/code> and <code class=\"\" data-line=\"\">\u201c\\x78&quot;<\/code> the same string? RFC8259 feels the need to call out that they are, for the purposes of checking that object keys are equal. The fact that they feel the need to call it out indicates that this is also a source of potential problems.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"byte-strings\"><a href=\"https:\/\/mcyoung.xyz\/2024\/12\/10\/json-sucks\/#byte-strings\">Byte Strings<\/a><\/h2>\n\n\n\n<p>What if I don\u2019t want to send text? A common type of byte blob to send is a cryptographic hash that identifies a document in a content-addressed blobstore, or perhaps a digital signature (an encrypted hash). JSON has no native way of representing byte strings.<\/p>\n\n\n\n<p>You could send a quoted string full of ASCII and <code class=\"\" data-line=\"\">\\xNN<\/code> escapes (for bytes which are not in the ASCII range), but this is wasteful in terms of bandwidth, and has serious interoperability problems (as noted above, Go actively destroys data in this case). You could also encode it as an array of JSON numbers, which is much worse for bandwidth and serialization speed.<\/p>\n\n\n\n<p>What everyone winds up doing, one way or another, is to rely on base64 encoding. Protobuf, for example, encodes <code class=\"\" data-line=\"\">bytes<\/code> fields into base64 strings in JSON. This has the unfortunate side-effect of defeating JSON\u2019s human-readable property: if the blob contains mostly ASCII, a human reader can\u2019t tell.<\/p>\n\n\n\n<p>Because this isn\u2019t part of JSON, virtually no JSON codec does this decoding for you, particularly because in a schema-less context, there\u2019s nothing to distinguish a byte blob encoded with base64 from an actual textual string that <em>happens<\/em> to contain valid base64, such as an alphanumeric username.<\/p>\n\n\n\n<p>Compared to other problems, this is more like a paper cut, but it\u2019s unnecessary and adds complexity and interop problems. <a href=\"https:\/\/en.wikipedia.org\/wiki\/Base64#Variants_summary_table\">By the way, did you know there are multiple incompatible Base64 alphabets?<\/a><\/p>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"streaming-doesnt-work\"><a href=\"https:\/\/mcyoung.xyz\/2024\/12\/10\/json-sucks\/#streaming-doesnt-work\">Streaming Doesn\u2019t Work<\/a><\/h1>\n\n\n\n<p>A less obvious problem with JSON is that it can\u2019t be streamed. Almost all JSON documents are objects or arrays and are therefore <em>incomplete<\/em> until they reach the closing <code class=\"\" data-line=\"\">}<\/code> or <code class=\"\" data-line=\"\">]<\/code>, respectively. This means you can\u2019t send a stream of JSON documents that form a part of a larger document without some additional protocol for combining them in post-processing.<\/p>\n\n\n\n<p><a href=\"https:\/\/jsonlines.org\/\">JSONL<\/a> is the world\u2019s silliest spec that \u201csolves\u201d this problem in the simplest way possible: a JSONL document is a sequence of JSON documents separated by newlines. JSONL <em>is<\/em> streamable, but because it\u2019s done in the simplest way possible, it only supports streaming a giant array. You can\u2019t, for example, stream an object field-by-field or stream an array within that object.<\/p>\n\n\n\n<p>Protobuf doesn\u2019t have this problem: in a nutshell, the Protobuf wire format is as if we removed the braces and brackets from the top-level array or object of a document, and made it so that values with the same key get merged. In the wire format, the equivalent of the JSONL document<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">{&quot;foo&quot;: {&quot;x&quot;: 1}, &quot;bar&quot;: &#091;5, 6]}\n{&quot;foo&quot;: {&quot;y&quot;: 2}, &quot;bar&quot;: &#091;7, 8]}\n<\/code><\/pre>\n\n\n\n<p><a href=\"https:\/\/mcyoung.xyz\/2024\/12\/10\/json-sucks\/#code:5\">JSON<\/a><\/p>\n\n\n\n<p>is automatically \u201cmerged\u201d into the single document<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">{ &quot;foo&quot;: { &quot;x&quot;: 1, &quot;y&quot;: 2 }, &quot;bar&quot;: &#091;5, 6] }\n<\/code><\/pre>\n\n\n\n<p><a href=\"https:\/\/mcyoung.xyz\/2024\/12\/10\/json-sucks\/#code:6\">JSON<\/a><\/p>\n\n\n\n<p>This forms the basis of the \u201cmessage merge\u201d operation, which is intimately connected to how the wire format was designed. We\u2019ll dive into this fundamental operation in a future article.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"canonicalization-leads-to-data-loss\"><a href=\"https:\/\/mcyoung.xyz\/2024\/12\/10\/json-sucks\/#canonicalization-leads-to-data-loss\">Canonicalization Leads to Data Loss<\/a><\/h1>\n\n\n\n<p>Thanks to <a href=\"https:\/\/datatracker.ietf.org\/doc\/html\/rfc7519\">RFC7519<\/a> and <a href=\"https:\/\/datatracker.ietf.org\/doc\/html\/rfc7515\">RFC7515<\/a>, which define JSON Web Tokens (JWT) and JSON Web Signatures (JWS), digitally signing JSON documents is a very common operation. However, digital signatures can only sign specific byte blobs and are sensitive to things that JSON isn\u2019t, such as whitespace and key ordering.<\/p>\n\n\n\n<p>This results in specifications like <a href=\"https:\/\/datatracker.ietf.org\/doc\/html\/rfc8785\">RFC8785<\/a> for <em>canonicalization<\/em> of JSON documents. This introduces a new avenue by which existing JSON documents, which accidentally happen to contain non-interoperable (or, thanks to non-conforming implementations such as Python\u2019s) invalid JSON that must be manipulated and reformatted by third-party tools. RFC8785 itself references ECMA-262 (the JavaScript standard) for how to serialize numbers, meaning that it\u2019s <em>required<\/em> to induce data loss for 64-bit numerical values!<\/p>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"is-json-fixable\"><a href=\"https:\/\/mcyoung.xyz\/2024\/12\/10\/json-sucks\/#is-json-fixable\">Is JSON Fixable?<\/a><\/h1>\n\n\n\n<p>Plainly? No. JSON can\u2019t be fixed because of how extremely popular it is. Common mistakes are baked into the format. Are comments allowed? Trailing commas? Number formats? Nobody knows!<\/p>\n\n\n\n<p>What tools are touching your JSON? Are they aware of all of the rakes they can step on? Do they emit invalid JSON (like Python does)? How do you even begin to audit that?<\/p>\n\n\n\n<p>Thankfully, you don\u2019t have to use JSON. There are alternatives\u2014BSON, UBJSON, MessagePack, and CBOR are just a few binary formats that try to replicate JSON\u2019s data model. Unfortunately, many of them have their own problems.<\/p>\n\n\n\n<p>Protobuf, however, has none of these problems, because it was <em>designed<\/em> to fulfill needs JSON couldn\u2019t meet. Using a strongly-typed schema system, like Protobuf, makes all of these problems go away.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Of course, some wise guy will probably want to cite &lt;json.org>. I should underscore: &lt;json.org> is <strong>NOT<\/strong> a standard. It is <strong>NOT<\/strong> normative. the documents produced by the IETF and by ECMA, which are international standards organizations that represent the industry <strong>ARE<\/strong> normative. When a browser implementer wants to implement JSON to the letter, they go to ECMA, not to some dude\u2019s 90\u2019s ass website.\u00a0<a href=\"https:\/\/mcyoung.xyz\/2024\/12\/10\/json-sucks\/#fnref:1\"><\/a><\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">Related Posts<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>2025-03-11 \/ <a href=\"https:\/\/mcyoung.xyz\/2025\/03\/11\/formatters\"><\/a>The Art of Formatting Code<\/li>\n\n\n\n<li>2024-12-10 \/ <a href=\"https:\/\/mcyoung.xyz\/2024\/12\/10\/json-sucks\"><\/a>Nobody Gets Fired for Picking JSON, but Maybe They Should?<\/li>\n<\/ul>\n<\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p class=\"excerpt\">Nobody Gets Fired for Picking JSON, but Maybe They Should? By Miguel Young de la Sota<\/p>\n<p class=\"more-link-p\"><a class=\"more-link\" href=\"https:\/\/monodes.com\/predaelli\/2026\/03\/29\/nobody-gets-fired-for-picking-json-but-maybe-they-should\/\">Read more &rarr;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"link","meta":{"inline_featured_image":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":4,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"federated","footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[50],"tags":[527,84,526],"class_list":["post-15394","post","type-post","status-publish","format-link","hentry","category-javascript","tag-formats","tag-json","tag-parsing","post_format-post-format-link"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p6daft-40i","jetpack-related-posts":[{"id":15395,"url":"https:\/\/monodes.com\/predaelli\/2026\/03\/29\/protobuf-advolvendum-est\/","url_meta":{"origin":15394,"position":0},"title":"ProtoBuf advolvendum est","author":"Paolo Redaelli","date":"2026-03-29","format":false,"excerpt":"The proficient Miguel Young de la Sota suggests to use Protocol Buffers1 instead of JSON in his article \"Nobody Gets Fired for Picking JSON, but Maybe They Should?\" I shall add to my advolvenda2","rel":"","context":"In &quot;Advolvenda&quot;","block_context":{"text":"Advolvenda","link":"https:\/\/monodes.com\/predaelli\/category\/eiffel\/liberty-eiffel\/advolvenda\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":924,"url":"https:\/\/monodes.com\/predaelli\/2016\/01\/26\/json-references\/","url_meta":{"origin":15394,"position":1},"title":"JSON references","author":"Paolo Redaelli","date":"2016-01-26","format":false,"excerpt":"Is there a standard way of referencing objects by identity in JSON? For example, so that graphs and other data structures with lots of (possibly circular) references can be sanely serialized\/loaded? From: JSON: Standard way of referencing an object by identity (for, eg, circular references)? - Stack Overflow Then there's\u2026","rel":"","context":"In &quot;Javascript&quot;","block_context":{"text":"Javascript","link":"https:\/\/monodes.com\/predaelli\/category\/javascript\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":11513,"url":"https:\/\/monodes.com\/predaelli\/2024\/03\/24\/duckdb-as-the-new-jq\/","url_meta":{"origin":15394,"position":2},"title":"DuckDB as the New jq","author":"Paolo Redaelli","date":"2024-03-24","format":"quote","excerpt":"Recently, I\u2019ve been interested in the DuckDB project (like a SQLite geared towards data applications). And one of the amazing features is that it has many data importers included without requiring extra dependencies. This means it can natively read and parse JSON as a database table, among many other formats.\u2026","rel":"","context":"In &quot;Software Libero&quot;","block_context":{"text":"Software Libero","link":"https:\/\/monodes.com\/predaelli\/category\/software\/software-libero\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":9401,"url":"https:\/\/monodes.com\/predaelli\/2022\/05\/26\/the-simdjson-library\/","url_meta":{"origin":15394,"position":3},"title":"The simdjson library","author":"Paolo Redaelli","date":"2022-05-26","format":false,"excerpt":"The simdjson library Parsing gigabytes of JSON per second JSON is everywhere on the Internet. Servers spend a lot of time parsing it. The simdjson library uses commonly available SIMD instructions and microparallel algorithms to break speed records.","rel":"","context":"In &quot;Agenda&quot;","block_context":{"text":"Agenda","link":"https:\/\/monodes.com\/predaelli\/category\/agenda\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":8667,"url":"https:\/\/monodes.com\/predaelli\/2021\/09\/08\/useful-tools\/","url_meta":{"origin":15394,"position":4},"title":"Useful tools","author":"Paolo Redaelli","date":"2021-09-08","format":false,"excerpt":"htmlq, like jq, but for HTML. Uses CSS selectors to extract bits content from HTML files. Mozilla's MDN has a good reference for CSS selector syntax. jq is a lightweight and flexible command-line JSON processor.","rel":"","context":"In &quot;Software Libero&quot;","block_context":{"text":"Software Libero","link":"https:\/\/monodes.com\/predaelli\/category\/software\/software-libero\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":9252,"url":"https:\/\/monodes.com\/predaelli\/2022\/04\/07\/hyperfine\/","url_meta":{"origin":15394,"position":5},"title":"hyperfine","author":"Paolo Redaelli","date":"2022-04-07","format":false,"excerpt":"hyperfine A command-line benchmarking tool. Demo: Benchmarking fd and find: Features Statistical analysis across multiple runs.Support for arbitrary shell commands.Constant feedback about the benchmark progress and current estimates.Warmup runs can be executed before the actual benchmark.Cache-clearing commands can be set up before each timing run.Statistical outlier detection to detect interference\u2026","rel":"","context":"In &quot;Software&quot;","block_context":{"text":"Software","link":"https:\/\/monodes.com\/predaelli\/category\/software\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/posts\/15394","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/comments?post=15394"}],"version-history":[{"count":1,"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/posts\/15394\/revisions"}],"predecessor-version":[{"id":15396,"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/posts\/15394\/revisions\/15396"}],"wp:attachment":[{"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/media?parent=15394"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/categories?post=15394"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/tags?post=15394"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}