09. 01. 2023 William Calliari Development

Static Field Validation in Serde

I recently had to parse the JSON-RPC 2.0 standard and ran into the following problem: The standard requires the field "jsonrpc": "2.0" in the JSON itself, and I wanted to validate that with Serde to ensure the message conforms to the standard. On the other hand I don’t need the field in the actual struct, since once it’s validated it won’t ever and should never change.

Serde Tag

My first idea was to use the tag feature of Serde:

use serde::{Deserialize, Serialize};
use serde_json::Value;

#[derive(Serialize, Deserialize)]
#[derive(Debug, Eq, PartialEq)]
#[serde(tag = "jsonrpc", rename = "2.0")]
struct JsonRpcTagged {
    method: String,
    #[serde(skip_serializing_if = "Option::is_none")]
    params: Option<Value>,
    #[serde(skip_serializing_if = "Option::is_none")]
    id: Option<usize>,
}

so let’s try it out:

let jsonrpc = JsonRpcTagged { method: "my_method".to_owned(), params: None, id: None };

let message = serde_json::to_string(&jsonrpc).unwrap();
assert_eq!(message, r#"{"jsonrpc":"2.0","method":"my_method"}"#);

Wonderful! As expected it adds the jsonrpc tag to the message! Now let’s try de-serializing:

let message = r#"{"jsonrpc":"2.0","method":"my_method"}"#;
let expected = JsonRpcTagged { method: "my_method".to_owned(), params: None, id: None };

let result: JsonRpcTagged = serde_json::from_str(message).unwrap();
assert_eq!(result, expected);

Till now everything worked great and I was optimistic. But what we really want is to fail if the jsonrpc field isn’t given, as that’s what the specification demands. So let’s try that:

let message = r#"{"method":"my_method"}"#;
let expected = JsonRpcTagged { method: "my_method".to_owned(), params: None, id: None };

let result: JsonRpcTagged = serde_json::from_str(message).unwrap();
assert_eq!(result, expected);

Serde will still de-serialize this! Which is bad, actually, because I wanted to use it for validation. This path was obviously a dead end, so I went back to the drawing board.

(Ab)using enums

Now what if the tag actually means something? What if it’s actually an enum variant and therefore is needed for de-serialization? So I constructed my second attempt:

#[derive(Serialize, Deserialize)]
#[derive(Debug, Eq, PartialEq)]
#[serde(tag = "jsonrpc")]
enum JsonRpcEnum {
    #[serde(rename = "2.0")]
    JsonRpc {
        method: String,
        #[serde(skip_serializing_if = "Option::is_none")]
        params: Option<Value>,
        #[serde(skip_serializing_if = "Option::is_none")]
        id: Option<usize>,
    },
}

Running all the previous tests showed that this works! It serializes what it should and more importantly, doesn’t de-serialize what it shouldn’t. Even better, the Rust compiler is smart enough to see that the enum has only one possible variant, and it therefore optimized away the 8-byte tags that are normally associated with an enum.

assert_eq!(std::mem::size_of<JsonRpcTagged>(), std::mem::size_of<JsonRpcEnum>())

Now I should be happy right? I have achieved what I initially set out to do. But my perfectionism wouldn’t let me stop there. First of all, the type looks a bit weird, but more dramatically, we can’t access the fields in an easy way. The easiest way to access the contents of this enum is pretty ugly:

let JsonRpcEnum::JsonRpc{ref method, .. } = some_jsonrpc;

It’s cool to see that Rust actually lets you compile a monstrosity like that, but it’s definitely not something for the weak-spirited. So back to the drawing board I went one last time.

Using Custom De-serialization

First I tried implementing both Serialize and De-serialize for an empty struct: struct JsonRpcVersion;, but that meant for all other derives I’d have to add them to both the JsonRpc struct and this empty struct. Furthermore this was not very reusable as I’d need replicate the whole logic the next time I needed something similar.

Enter the empty tuple. One of my favorite Rust features is that it allows some cool code trickery and compiler optimizations. Look at the implementation of the HashSet if you have some time, or read some of the many other great blog posts. It solves my first problem of running (after validation) structs for all derive macros I want to use. It’s also zero-size, meaning we’re not wasting memory space or compute time.

The other savior of my soul was the #[serde(with = "Validator")] attribute. It lets you specify a validator that handles both serialization and de-serialization by calling Validator::serialize(..) and Validator::deserialize(..) on it respectively. Now all I need is a validator struct that implements it, and a trait that does the implementation for me.

So let’s take a look at the Trait I came up with:

use serde::de::{Error, Unexpected, Visitor};
use serde::{Deserializer, Serializer};
use std::fmt::Formatter;
use std::marker::PhantomData;

pub trait Validator: Sized {
    const EXPECTED_VALUE: &'static str;

    fn serialize<S: Serializer>(_version: &(), serializer: S) -> Result<S::Ok, S::Error> {
        serializer.serialize_str(Self::EXPECTED_VALUE)
    }

    fn deserialize<'de, D: Deserializer<'de>>(deserializer: D) -> Result<(), D::Error> {
        deserializer.deserialize_str(ValidatorVisitor::<Self> {
            phantom_data: PhantomData::default(),
        })
    }
}

#[derive(Default)]
struct ValidatorVisitor<T: Validator> {
    phantom_data: PhantomData<T>,
}

impl<'de, T: Validator> Visitor<'de> for ValidatorVisitor<T> {
    type Value = ();

    fn expecting(&self, formatter: &mut Formatter) -> std::fmt::Result {
        formatter.write_str(T::EXPECTED_VALUE)
    }

    fn visit_borrowed_str<E: Error>(self, v: &'de str) -> Result<Self::Value, E> {
        if v == T::EXPECTED_VALUE {
            Ok(())
        } else {
            Err(E::invalid_value(Unexpected::Str(v), &T::EXPECTED_VALUE))
        }
    }
}

It’s a bit of code, but it’s worth understanding, so let me guide you through it.

The trait itself implements the serialize and de-serialize functions that will later be called by Serde. The value for it is provided as an associated constant value. Shout-out to u/SkiFire13 on the Rust user forum for making me aware of that feature. It provides whatever data we need to parse the value later on. For the serialization, we can simply write the string to the formatter. But de-serialization gets a bit more complicated.

To de-serialize data, I need a Visitor. My first instinct was to implement the Visitor for all arbitrary types T: Validator to get quick access to the string slice in the trait bound. Sadly the compiler doesn’t allow blanket implementations of traits that aren’t defined in this crate so I had to create a wrapper type.

The ValidatorVisitor is a zero-sized struct that does nothing but hold the type T for the Visitor to access. Since the compiler doesn’t allow you to compile a struct with an unused generic argument, we need to add one field to the visitor: some phantom data that consumes that type in the background and does some compiler magic that allows this implementation to actually compile.

In the implementation body of the trait we then simply provide the function visit_borrowed_str(..) that returns an empty tuple if it receives the correct value; otherwise it returns an error.

Now what does the final implementation of my JsonRpc struct actually look like? The Validators are just four LOC:

struct JsonRpcVersionValidator;

impl Validator for JsonRpcVersionValidator {
    const EXPECTED_VALUE: &'static str = "2.0";
}

That’s what I’m talking about! That’s what clean code should look like! We have one point that statically controls the value, and it’s very re-usable. After we’ve defined the Validator, we can simply use it in the struct as follows:

#[derive(Serialize, Deserialize, Debug, Eq, PartialEq)]
struct JsonRpcValidated {
    #[serde(with = "JsonRpcVersionValidator", rename = "jsonrpc")]
    version: (),
    method: String,
    #[serde(skip_serializing_if = "Option::is_none")]
    params: Option<Value>,
    #[serde(skip_serializing_if = "Option::is_none")]
    id: Option<usize>,
}

And with that we have a struct that matches all the criteria I originally set out. It serializes correctly, it de-serializes what it should, and doesn’t what it shouldn’t. It’s easy to use and reuse and doesn’t waste any space or computation time.

These Solutions are Engineered by Humans

Did you find this article interesting? Does it match your skill set? Programming is at the heart of how we develop customized solutions. In fact, we’re currently hiring for roles just like this and others here at Würth Phoenix.

William Calliari

William Calliari

Author

William Calliari

Leave a Reply

Your email address will not be published. Required fields are marked *

Archive