fracpack
Psibase specifies a novel data-serialization format built for passing data between psibase services on a single node, between nodes, and between nodes and browsers. This is also the binary format of the objects stored in the database on any particular IP node.
Goals
- Extremely fast data packing and unpacking
- Reading without unpacking, to more efficiently handle large data.
- Forwards and backwards compatibility; it supports adding new optional fields to the end of structs and tuples, even when they are embedded in variable-length vectors, fixed-length arrays, optional, and other structs and tuples.
- Typical implementations shouldn't require external code generators
- Efficient compression when combined with the compression algorithm from Cap'n Proto
Psibase uses fracpack for all of its message formats and uses it for database storage. Wherever psibase uses binary data, it's in fracpack format. There is one slight deviation: cryptographic types, even though they are normally stored in fracpack format, use a different binary format when encoded as a string (e.g. PUB_K1_898DAWuc...
). Encoders and decoders take care of the extra conversion. e.g. when converting action arguments from json to binary, they decode the base-58 string, verify the checksum, then convert the resulting binary into fracpack format.
This document describes fracpack's binary format; it does not describe either C++'s or Rust's reflection, encoding, and decoding facilities.
Fixed-Size vs Variable-Size Objects
fracpack classifies objects into two categories: fixed size and variable size. The format for fixed-size objects is similar to what you'd get if you used memcpy, except subobjects, if any, are packed tightly without padding; fracpack doesn't align data. The format for variable-size objects is more complicated; see Variable-Size Objects.
Fixed-Size Objects
Numeric types
Numeric types are in twos-complement little-endian format. They are unaligned. This is the currently-supported set:
- Boolean: 1 byte; either 0 or 1
- Unsigned integer sizes (bits): 8, 16, 32, 64
- Signed integer sizes (bits): 8, 16, 32, 64
- Floating point: 32 and 64 bits; Intel format
- TODO: void; 0 bytes. This is for tagged-union alternatives which have no payload
Non-Extensible Fixed-Size Structs
A non-extensible struct is one that will never gain new fields in the future. Non-extensibility requires an explicit opt-in; fracpack considers structs extensible by default. fracpack's extensibility mechanism is unrelated to inheritance; fracpack doesn't model inheritance and its C++ implementation doesn't understand it.
A struct which is non-extensible and which contains only fixed-sized subobjects is itself fixed size. Its subobjects are packed in order without padding. A struct is variable size if it is extensible (the default) or if it contains any variable-sized data within it. Variable-size structs have a different encoding.
Fixed-Length Arrays of Fixed-Size Objects
A fixed-length array which contains fixed-size objects is itself fixed size. Its subobjects are packed in order without padding.
Variable-Size Objects
Variable-size objects contain two parts: a fixed-size (but sometimes growable) area and a variable-size area. Child objects live in one (or both) of these areas within the parent.
Variable-Size Structs
A struct is variable size if it is extensible (the default) or if it contains any variable-sized data within it. It has the following layout:
uint16_t fixed_size; // The amount of data in fixed_data
uint8_t[fixed_size] fixed_data; // Fixed-size objects and offsets
// to variable-size objects
uint8_t[] variable_data; // Variable-size inner objects
Fixed-size subobjects live entirely within fixed_data
. Variable-size subobjects have an Offset Pointer in fixed_data
which points into variable_data
. variable_data
MUST be encoded in order with no gaps. fixed_size
, when present, enables decoders to safely skip newly-added optional fields. fixed_size
is only present for extensible (the default) structs. Non-extensible structs don't have it since the fixed size is always known in that case.
Tuples
Tuples have the same encoding as structs, except they don't have a non-extensible option; tuples are always extensible, so they're always variable-size.
Unit (Rust)
Fracpack treats the unit type (()
) as an empty tuple. Be careful since this creates an inconsistency with serde_json
, which renders unit as null
instead of []
. See Empty
, below, for an alternative.
Unit Structs (Rust)
Fracpack doesn't support unit structs. See Empty
, below, for an alternative.
// Not supported
struct MyUnit;
Tuple Structs (Rust)
Fracpack encodes these structs as tuples. This matches serde_json
semantics. Beware of the special cases.
struct Empty(); // Encodes as an empty tuple
struct TupleOfOne((u32,)); // Encodes as a tuple
struct TupleOfTwo(u32, String); // Encodes as a tuple
struct TupleOfThree(u32, String, f64); // Encodes as a tuple
Special Cases (Rust)
Fracpack encodes these structs as their inner items instead of encoding them as a struct or tuple. This matches serde_json
semantics. This can result in either a fixed-size object or a variable-size object.
struct Single1(u32); // Encodes as a u32 (fixed size)
struct Single2(String, ); // Encodes as a String (variable size)
Offset Pointers
When a subobject is variable size, it has an Offset Pointer which points to the beginning of that subobject. Offset pointers are unsigned 32-bit integers which record the difference between the target address and the pointer's address. Values 0-3 point to within the offset pointer itself; they have special meaning:
0
: indicates a vector or string is empty1
: indicates an optional is empty2,3
: reserved for future use
The special values are a space-saving measure. e.g. suppose we have an std::optional<std::string>
. The 3 possible states, nullopt
, empty
, and !empty
, are encoded as 1
, 0
, or an offset. An empty container MUST NOT be encoded with an offset.
Fixed-Length Arrays of Variable-Size Objects
These have the following encoding:
uint8_t[fixed_size] fixed_data; // Offsets to Variable-size objects
uint8_t[] variable_data; // Variable-size inner objects
fixed_size
has a known size, so isn't recorded.
Vectors and Strings
Vectors and Strings have the following encoding:
uint32_t fixed_size; // The amount of data in fixed_data
uint8_t[fixed_size] fixed_data; // Fixed-size objects or offsets to
// Variable-size objects
uint8_t[] variable_data; // Variable-size objects
fixed_size
indirectly encodes the vector's length. If it's a vector of some type T
, and that type is fixed size, then then vector length is fixed_size / T's size
. If T
is variable size, then the vector length is fixed_size / 4
. Note that fixed_size
here is 32 bits instead of 16, which is used in structs.
Optionals
Optionals always use a variable-size encoding, whether their inner data is fixed-size or variable size. An optional represents an empty state by using 1
for its Offset Pointer. It represents non-empty as an Offset Pointer which points to the contained data. If the contained data is variable-size, then it already uses an offset pointer; Optional reuses this pointer to save space.
For optionals embedded within extensible structs and tuples:
- If the last
n
optional fields are empty, then they must be omitted fromfixed_data
.
TODO: Implement these rules in Rust. Verify them in C++.
Tagged Unions
Tagged Unions (std::variant
in C++ or enum
in Rust) have the following encoding:
uint8_t tag; // Identifies the alternative; 0 is first
uint32_t size; // The amount of data
uint8_t[size] data; // Selected inner object
Alternatives are sequentially-numbered starting from 0
. The tag must not be greater than 127.
Safety Checking
When deserializing an object of type T
- If any field that is part of
T
is malformed, validation MUST fail - if the data is not a valid serialization of any type that can be deserialized as
T
, validation SHOULD fail
The following is a non-exhaustive list of requirements to validate:
- bool values MUST be 0 or 1
- Unknown fixed data SHOULD be a multiple of 4 octets and each 4-octet group SHOULD be a valid offset pointer
- Every regular offset pointer MUST equal the end of the preceding object if it is known
- Every regular offset point MUST NOT be before the last known position even if the end of the preceding object is not known.
- Every offset pointer MUST be in bounds
- The size of a vector MUST be an exact multiple of the fixed size of each element
- An offset pointer to an empty vector MUST be represented as 0
- The size of a variant MUST be the same as the size of the inner object if the inner size is known
- The last field of a tuple or extensible struct MUST NOT be an empty optional
TODO: finish this list. Make sure both the C++ and Rust implementations' checkers enforce the rules.
Serialization Compatibility
An serialized object of type T
may be deserialized as type U
if any of the following are true:
T
is the same type asU
T
andU
are both non-extensible structs with the same number of members AND each member ofT
can be deserialized as the corresponding member ofU
T
andU
are both tuples or extensible structs with the same number of members AND each member ofT
can be deserialized as the corresponding member ofU
T
andU
are both tuples or extensible structs ANDT
has fewer members thanU
AND every member ofT
can be deserialized as the corresponding member ofU
AND every trailing member ofU
that does not have a corresponding member ofT
is optional. The trailing members ofU
will be set to empty.T
andU
are both tuples or extensible structs ANDT
has more members thanU
AND every member ofT
that has a corresponding member ofU
can be deserialized as the corresponding member ofU
AND every trailing member ofT
that does not have a corresponding member ofU
is optional. The trailing members ofT
will be dropped.T
andU
are unions ANDU
has at least as many alternatives asT
AND every alternative ofT
can be deserialized as the corresponding alternative ofU
T
andU
are both containers of the same kind AND the value type ofT
can be deserialized as the value type ofU
. The possible kinds of containers are optional, array, and vector.
Reference implementation
A reference implementation can be found here.