JSON Encoding/Decoding¶
A record is a Python dictionary which can be persisted as a JSON document in a database. Because the record is persisted as a JSON document, the Python dictionary can by default only hold valid JSON data types (string, number, object, array, boolean and null). Most notably, the Python dictionary cannot hold for instance a Python datetime object like this:
Record.create({"date": date(2020, 9, 7)})
Above will raise an error.
Custom encoder/decoder¶
Invenio-Records supports customizing the JSON encoder/decoder which is responsible for converting the dictionary to a JSON document and vice versa. This allows you to support non-JSON data types like for instance a Python date object.
First, let’s look at the encoder/decoder. An encoder is a simple class with
two methods: encode()
and decode()
:
class DateEncoder:
def __init__(self, *keys):
self.keys = keys
def encode(self, data):
for k in self.keys:
if k in data:
data[k] = data[k].isoformat()
def decode(self, data):
for k in self.keys:
if k in data:
s = data[k]
year, month, day = int(s[0:4]), int(s[5:7]), int(s[8:10])
data[k] = date(year, month, date)
The
encode()
method iterate over keys, and converts a Python date object into a string (a valid JSON data type) using theisoformat()
method of a date.The
decode()
method does the reverse. It parses a string and converts it into a date object.
Using the encoder¶
Next, you can use the encoder by assigning it to a custom model class and using that model class in your custom record. This could look like below:
class MyRecordMetadata(db.Model, RecordMetadataBase):
__tablename__ = 'myrecord_metadata'
encoder = DatetimeEncoder('pubdate')
class MyRecord(Record):
model_cls = MyRecordMetadata
You can now create and get records with a Python date object, which will be properly encoded/decoded on the way to/from the database
record = MyRecord.create({'pubdate': date(2020, 9, 3)})
record = MyRecord.get_record(record.id)
record['pubdate'] == date(2020, 9, 3)
JSONSchema validation¶
JSONSchema validation is done on the JSON encoded version of the record. This ensures that the schema validation is actually applied to the JSON document and not the Python dict representation of it, which would involve validating non-JSON data types.
Internals¶
It is important to realize that there exists two distinct representations of a record:
The Python dictionary - possibly holding complex Python data types.
The JSON encoded version of the Python dictionary - holding only JSON data types.
The Python dictionary is encoded to the JSON version only when we persist the record to the database. Similarly, we only decode the JSON version when we read the record from the database. This means that the two representations are not kept in sync.
You should only ever modify the Python dictionary. In simple terms that means:
# DON'T:
record.model.json['mykey']
record.model.json['mykey'] = ...
record.model.json = record
# DO:
record['mykey]
record['mykey] = ...
If you touch record.model.json
you risk creating a binding between the
Python dictionary and the JSON encoded version of it because of Python’s data
model (e.g. you modify a nested object on the Python dictionary will cause
the JSON version to also be updated because both holds a reference to the
nested dict).