Invenio-Records¶
Invenio-Records is a metadata storage module. A record is a JSON document with revision history identified by a unique UUID .
Features:
Generic JSON document storage with revision history.
JSONSchema validation of documents.
Records creation, update and deletion.
Administration interface for CRUD operations on records.
Further documentation available Documentation: https://invenio-records.readthedocs.io/
User’s Guide¶
This part of the documentation will show you how to get started in using Invenio-Records.
Installation¶
Invenio-Records can be installed from PyPI. Several installation options are possible, for example to use SQLite database backend:
pip install invenio-records[sqlite]
The other installation [options] include:
access
for access control capabilities;
docs
for documentation building dependencies;
mysql
to use MySQL database backend;
postgresql
to use PostgreSQL database backend;
sqlite
to use SQLite database backend;
admin
for Flask administration interfaces;
tests
for test dependencies.
Usage¶
Invenio-Records is a metadata storage module.
In a few words, a record is basically a structured collection of fields and values (metadata) which provides information about other data.
A record (and each revision) is identified by a unique UUID, as most of the others entities in Invenio.
Invenio-Records is a core component of Invenio and it provides a way to create, update and delete records. Records are versioned, to keep track of modifications and to be able to revert back to a specific revision.
When creating or updating a record, if the record contains a schema definition, the record data will be validated against its schema. Moreover, data format can for each field be also validated.
When deleting a record, two options are available:
soft deletion: record will be deletes but keeping its identifier and history, to ensure that the same record’s identifier cannot be reused, and that older revisions can be retrieved.
hard deletion: record will be completely deleted with its history.
Records creation and update can be validated if the schema is provided.
Further documentation available Documentation: https://invenio-records.readthedocs.io/
Initialization¶
Create a Flask application:
>>> import os
>>> db_url = os.environ.get('SQLALCHEMY_DATABASE_URI', 'sqlite://')
>>> from flask import Flask
>>> app = Flask('myapp')
>>> app.config.update({
... 'SQLALCHEMY_DATABASE_URI': db_url,
... 'SQLALCHEMY_TRACK_MODIFICATIONS': False,
... })
Initialize Invenio-Records dependencies and Invenio-Records itself:
>>> from invenio_db import InvenioDB
>>> ext_db = InvenioDB(app)
>>> from invenio_records import InvenioRecords
>>> ext_records = InvenioRecords(app)
The following examples needs to run in a Flask application context, so let’s push one:
>>> app.app_context().push()
Also, for the examples to work we need to create the database and tables (note, in this example we use an in-memory SQLite database by default):
>>> from invenio_db import db
>>> db.create_all()
CRUD operations¶
Creation¶
Let’s create a very simple record:
>>> from invenio_records import Record
>>> record = Record.create({"title": "The title of the record"})
>>> db.session.commit()
>>> assert record.revision_id == 0
A new row has been added to the database, in the table records_metadata
:
this corresponds to the record metadata, first version (version 1).
Update¶
Let’s try to update the previously created record with new data. This will
create a new version of the previous with the same uuid
but incremented
version/revision id.
Update the record and commit the changes to apply them to the record:
>>> record['title'] = 'The title of the 2nd version of the record'
>>> record = record.commit() # validate new data and store changes
>>> db.session.commit()
>>> assert record.revision_id == 1
A second row has been added, version 2. You can access to the different versions by doing:
>>> rec_v1 = record.revisions[0]
>>> rec_v2 = record.revisions[1]
Reverting¶
To restore the first version of the record, just:
>>> record = record.revert(0)
>>> db.session.commit()
>>> assert record.revision_id == 2
Patch¶
It is also possible to patch a record to perform multiple operations in one shot:
>>> record = Record.create({"title": "First title"})
>>> db.session.commit()
>>> assert len(record.revisions) == 1
>>> ops = [
... {"op": "replace", "path": "/title", "value": "Title first record"},
... {"op": "add", "path": "/description", "value": "Record description"}
... ]
>>> record = record.patch(ops)
>>> record = record.commit()
>>> db.session.commit()
>>> assert len(record.revisions) == 2
See JSON Patch documentation to have nice examples.
Deletion¶
Let’s create another record and then soft delete it:
>>> record = Record.create({"title": "Record to be deleted"})
>>> db.session.commit()
>>> record['title'] = 'Record to be deleted version 2'
>>> record = record.commit()
>>> db.session.commit()
>>> deleted = record.delete()
There is only one row left in the database corresponding to this record. Notice
that the json
column is empty, but the uuid
is still there. This
ensures uniqueness.
The record can be retrieved by doing:
>>> deleted = Record.get_record(record.id, with_deleted=True)
>>> assert deleted.id == record.id
Let’s hard delete it, completely:
>>> deleted = record.delete(force=True)
Now, try to retrieve it, it will throw an exception.
>>> Record.get_record(record.id,
... with_deleted=True)
Traceback (most recent call last):
...
NoResultFound: No row was found for one()
Record validation¶
When creating or updating a record, the input data can be validated to ensure that it is conform to a specified schema and values formats are respected. The validation is provided by the jsonschema library.
How jsonschema
works¶
Format checker: create a custom format checker (or use one of the available), for example to validate if the first letter of a string is uppercase:
>>> from jsonschema import FormatChecker >>> from jsonschema.validators import Draft4Validator >>> checker = FormatChecker() >>> f = checker.checks("uppercaseFirstLetter")(lambda value: value[0] ... .isupper()) >>> validator = Draft4Validator({"format": "uppercaseFirstLetter"}, ... format_checker=checker)
Now, let’s try it out:
>>> validator.validate("Title of the record")
Does not throw any exception, because the data is valid, the first letter is uppercase.
>>> validator.validate( ... "title of the record") Traceback (most recent call last): ... ValidationError: 'title of the record' is not a 'uppercaseFirstLetter' ...
This raises a ValidationError error exception, because the first letter is lowercase.
Schema validator: create a validator to ensure that the input data structure, fields and types conform to a specific schema.
>>> schema = { ... 'type': 'object', ... 'properties': { ... 'title': { 'type': 'string' }, ... 'description': { 'type': 'string' } ... }, ... 'required': ['title'] ... }
Try to validate a record without the field title, which is required.
>>> from jsonschema.validators import validate >>> record = {"description": "Description but no title"} >>> validate(record, schema) Traceback (most recent call last): ... ValidationError: 'title' is a required property ...
If the JSON schema is not defined inside the JSON itself, like in the example,
but it is defined somewhere else (e.g. any schema provider service), the record
should contain the $ref
field with the URI link to the schema definition.
Record provides a method api.RecordBase.replace_refs()
that
will resolve the URI in the $ref
field and return a new Record with the
schema definition injected.
Invenio-Records validation¶
Let’s put everything together and create a record with validation and format
checking: define a schema with a mandatory title
field and a validation
format for the title
field.
>>> from jsonschema import FormatChecker
>>> checker = FormatChecker()
>>> f = checker.checks("uppercaseFirstLetter")(lambda value: value[0]
... .isupper())
>>> schema = {
... 'type':'object',
... 'properties': {
... 'title': {
... 'type':'string',
... 'format': 'uppercaseFirstLetter'
... },
... 'description': {
... 'type':'string'
... }
... },
... 'required': ['title']
... }
Create a new record with an invalid value format for the title
field.
Notice that the schema
must be defined in the record with the field
$schema
and the format checker must be passed as kwarg
argument with
the key format_checker
, to be taken into account by the jsonschema
library.
>>> record = {
... "$schema": schema,
... "title": "title of this record", # first letter is lowercase
... "description": "Description of this record"
... }
>>> rec = Record.create(record,
... format_checker=checker)
Traceback (most recent call last):
...
ValidationError: 'title of this record' is not a 'uppercaseFirstLetter'
...
Create a new record without the title
field:
>>> record = {
... "$schema": schema,
... "description": "Description of this record without a title"
... }
>>> rec = Record.create(record,
... format_checker=checker)
Traceback (most recent call last):
...
ValidationError: 'title' is a required property
...
Signals¶
Invenio-Records provides several types of signals and they can be used to react to events to read or modify data before or after an operation.
Events are sent in case of:
record creation, before and after
record update, before and after
record deletion, before and after
record revert, before and after
Let’s modify the record before creation and verify, after creation, that the record has been correctly modified:
>>> from invenio_records.signals import (before_record_insert, \
... after_record_insert)
>>> def before_record_creation_add_flag(sender, *args, **kwargs):
... record = kwargs['record']
... record['created_with'] = 'Invenio'
...
>>> listener = before_record_insert.connect(before_record_creation_add_flag)
>>> def after_record_creation(sender, *args, **kwargs):
... record = kwargs['record']
... assert 'created_with' in record
...
>>> listener = after_record_insert.connect(after_record_creation)
>>> rec_events = Record.create({"title": "My new record"})
>>> db.session.commit()
See API Docs for extensive API documentation.
API Reference¶
If you are looking for information on a specific function, class or method, this part of the documentation is for you.
API Docs¶
Invenio module for metadata storage.
- class invenio_records.ext.InvenioRecords(app=None, **kwargs)[source]¶
Invenio-Records extension.
Extension initialization.
Record API¶
Record API.
- class invenio_records.api.Record(data, model=None, **kwargs)[source]¶
Define API for metadata creation and manipulation.
Initialize instance with dictionary data and SQLAlchemy model.
- Parameters
data – Dict with record metadata.
model –
RecordMetadata
instance.
- commit(format_checker=None, validator=None, **kwargs)[source]¶
Store changes of the current record instance in the database.
Send a signal
invenio_records.signals.before_record_update
with the current record to be committed as parameter.Validate the current record data.
Commit the current record in the database.
- Send a signal
invenio_records.signals.after_record_update
with the committed record as parameter.
- Send a signal
- Keyword Arguments
format_checker – An instance of the class
jsonschema.FormatChecker
, which contains validation rules for formats. Seevalidate()
for more details.validator – A
jsonschema.protocols.Validator
class that will be used to validate the record. Seevalidate()
for more details.
- Returns
The
Record
instance.
- classmethod create(data, id_=None, **kwargs)[source]¶
Create a new record instance and store it in the database.
Send a signal
invenio_records.signals.before_record_insert
with the new record as parameter.Validate the new record data.
Add the new record in the database.
Send a signal
invenio_records.signals.after_record_insert
with the new created record as parameter.
- Keyword Arguments
format_checker – An instance of the class
jsonschema.FormatChecker
, which contains validation rules for formats. Seevalidate()
for more details.validator – A
jsonschema.protocols.Validator
class that will be used to validate the record. Seevalidate()
for more details.
- Parameters
data – Dict with the record metadata.
id – Specify a UUID to use for the new record, instead of automatically generated.
- Returns
A new
Record
instance.
- delete(force=False)[source]¶
Delete a record.
If force is
False
, the record is soft-deleted: record data will be deleted but the record identifier and the history of the record will be kept. This ensures that the same record identifier cannot be used twice, and that you can still retrieve its history. If force isTrue
, then the record is completely deleted from the database.Send a signal
invenio_records.signals.before_record_delete
with the current record as parameter.Delete or soft-delete the current record.
Send a signal
invenio_records.signals.after_record_delete
with the current deleted record as parameter.
- Parameters
force – if
True
, completely deletes the current record from the database, otherwise soft-deletes it.- Returns
The deleted
Record
instance.
- classmethod get_record(id_, with_deleted=False)[source]¶
Retrieve the record by id.
Raise a database exception if the record does not exist.
- Parameters
id – record ID.
with_deleted – If True then it includes deleted records.
- Returns
The
Record
instance.
- classmethod get_records(ids, with_deleted=False)[source]¶
Retrieve multiple records by id.
- Parameters
ids – List of record IDs.
with_deleted – If True then it includes deleted records.
- Returns
A list of
Record
instances.
- patch(patch)[source]¶
Patch record metadata.
- Params patch
Dictionary of record metadata.
- Returns
A new
Record
instance.
- revert(revision_id)[source]¶
Revert the record to a specific revision.
Send a signal
invenio_records.signals.before_record_revert
with the current record as parameter.Revert the record to the revision id passed as parameter.
Send a signal
invenio_records.signals.after_record_revert
with the reverted record as parameter.
- Parameters
revision_id – Specify the record revision id
- Returns
The
Record
instance corresponding to the revision id
- property revisions¶
Get revisions iterator.
- send_signals = True¶
Class-level attribute to control if signals should be sent.
- class invenio_records.api.RecordBase(data, model=None, **kwargs)[source]¶
Base class for Record and RecordRevision to share common features.
Initialize instance with dictionary data and SQLAlchemy model.
- Parameters
data – Dict with record metadata.
model –
RecordMetadata
instance.
- clear_none(key=None)[source]¶
Helper method to clear None, empty dict and list values.
Modifications are done in place.
- property created¶
Get creation timestamp.
- dumper = <invenio_records.dumpers.base.Dumper object>¶
Class-level attribute to specify the default data dumper/loader.
For backward compatibility the dumper used here just produces a deep copy of the record.
- dumps(dumper=None)[source]¶
Make a dump of the record (defaults to a deep copy of the dict).
This method produces a version of a record that can be persisted on storage such as the database, Elasticsearch or other mediums depending on the dumper class used.
- Parameters
dumper – Dumper to use when dumping the record.
- Returns
A
dict
.
- enable_jsonref = True¶
Class-level attribute to control if JSONRef replacement is supported.
- format_checker = None¶
Class-level attribute to specify a default JSONSchema format checker.
- property id¶
Get model identifier.
- property is_deleted¶
Get creation timestamp.
- classmethod loads(data, loader=None)[source]¶
Load a record dump.
- Parameters
loader – Loader class to use when loading the record.
- Returns
A new
Record
instance.
- model_cls¶
- property revision_id¶
Get revision identifier.
- property updated¶
Get last updated timestamp.
- validate(format_checker=None, validator=None, **kwargs)[source]¶
Validate record according to schema defined in
$schema
key.- Keyword Arguments
format_checker – A
format_checker
is an instance of classjsonschema.FormatChecker
containing business logic to validate arbitrary formats. For example:>>> from jsonschema import FormatChecker >>> from jsonschema.validators import validate >>> checker = FormatChecker() >>> checker.checks('foo')(lambda el: el.startswith('foo')) <function <lambda> at ...> >>> validate('foo', {'format': 'foo'}, format_checker=checker)
returns
None
, which means that the validation was successful, while>>> validate('bar', {'format': 'foo'}, ... format_checker=checker) Traceback (most recent call last): ... ValidationError: 'bar' is not a 'foo' ...
raises a
jsonschema.exceptions.ValidationError
.validator – A
jsonschema.protocols.Validator
class used for record validation. It will be used as cls argument when callingjsonschema.validate()
. For example>>> from jsonschema.validators import extend, Draft4Validator >>> NoRequiredValidator = extend( ... Draft4Validator, ... validators={'required': lambda v, r, i, s: None} ... ) >>> schema = { ... 'type': 'object', ... 'properties': { ... 'name': { 'type': 'string' }, ... 'email': { 'type': 'string' }, ... 'address': {'type': 'string' }, ... 'telephone': { 'type': 'string' } ... }, ... 'required': ['name', 'email'] ... } >>> from jsonschema.validators import validate >>> validate({}, schema, NoRequiredValidator)
returns
None
, which means that the validation was successful, while>>> validate({}, schema) Traceback (most recent call last): ... ValidationError: 'name' is a required property ...
raises a
jsonschema.exceptions.ValidationError
.
- validator = None¶
Class-level attribute to specify a JSONSchema validator class.
Configuration¶
Default values for records configuration.
- invenio_records.config.RECORDS_REFRESOLVER_CLS = None¶
Custom JSONSchemas ref resolver class.
Note that when using a custom ref resolver class you should also set
RECORDS_REFRESOLVER_STORE
to point to a JSONSchema ref resolver store.
- invenio_records.config.RECORDS_REFRESOLVER_STORE = None¶
JSONSchemas ref resolver store.
Used together with
RECORDS_REFRESOLVER_CLS
to provide a specific ref resolver store.
- invenio_records.config.RECORDS_VALIDATION_TYPES = {}¶
Pass additional types when validating a record against a schema. For more details, see: https://python-jsonschema.readthedocs.io/en/latest/validate/#validating-types.
Errors¶
Errors for Invenio-Records module.
- exception invenio_records.errors.MissingModelError[source]¶
Error raised when a record has no model.
Models¶
Record models.
- class invenio_records.models.RecordMetadata(**kwargs)[source]¶
Represent a record metadata.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- created¶
- id¶
Record identifier.
- json¶
Store metadata in JSON format.
When you create a new
Record
thejson
field value should never beNULL
. Default value is an empty dict.NULL
value means that the record metadata has been deleted.
- updated¶
- version_id¶
Used by SQLAlchemy for optimistic concurrency control.
- class invenio_records.models.RecordMetadataBase(data=None, **kwargs)[source]¶
Represent a base class for record metadata.
The RecordMetadata object contains a
created
and aupdated
properties that are automatically updated.Initialize the model specifically by setting the.
- property data¶
Get data by decoding the JSON.
This allows a subclass to override
- encoder = None¶
“Class-level attribute to set a JSON data encoder/decoder.
This allows customizing you to e.g. convert specific entries to complex Python objects. For instance you could convert ISO-formatted datetime objects into Python datetime objects.
- id = Column(None, UUIDType(), table=None, primary_key=True, nullable=False, default=ColumnDefault(<function uuid4>))¶
Record identifier.
- is_deleted¶
Boolean flag to determine if a record is soft deleted.
- json = Column(None, Variant(), table=None, default=ColumnDefault(<function RecordMetadataBase.<lambda>>))¶
Store metadata in JSON format.
When you create a new
Record
thejson
field value should never beNULL
. Default value is an empty dict.NULL
value means that the record metadata has been deleted.
- version_id = Column(None, Integer(), table=None, nullable=False)¶
Used by SQLAlchemy for optimistic concurrency control.
Signals¶
Record module signals.
- invenio_records.signals.after_record_delete = <blinker.base.NamedSignal object at 0x7f1b86392350; 'after-record-delete'>¶
Signal sent after a record is deleted.
When implementing the event listener, the record data can be retrieved from kwarg[‘record’].
Note
Do not perform any modification to the record here: they will be not persisted.
- invenio_records.signals.after_record_insert = <blinker.base.NamedSignal object at 0x7f1b864a8810; 'after-record-insert'>¶
Signal sent after a record is inserted.
When implementing the event listener, the record data can be retrieved from kwarg[‘record’].
Note
Do not perform any modification to the record here: they will be not persisted.
- invenio_records.signals.after_record_revert = <blinker.base.NamedSignal object at 0x7f1b86392310; 'after-record-revert'>¶
Signal sent after a record is reverted.
When implementing the event listener, the record data can be retrieved from kwarg[‘record’].
Note
Do not perform any modification to the record here: they will be not persisted.
- invenio_records.signals.after_record_update = <blinker.base.NamedSignal object at 0x7f1b86392450; 'after-record-update'>¶
Signal sent after a record is updated.
When implementing the event listener, the record data can be retrieved from kwarg[‘record’].
Note
Do not perform any modification to the record here: they will be not persisted.
- invenio_records.signals.before_record_delete = <blinker.base.NamedSignal object at 0x7f1b86392490; 'before-record-delete'>¶
Signal is sent before a record is deleted.
When implementing the event listener, the record data can be retrieved from kwarg[‘record’].
- invenio_records.signals.before_record_insert = <blinker.base.NamedSignal object at 0x7f1b864465d0; 'before-record-insert'>¶
Signal is sent before a record is inserted.
When implementing the event listener, the record data can be retrieved from kwarg[‘record’]. Example event listener (subscriber) implementation:
def listener(sender, *args, **kwargs): record = kwargs['record'] # do something with the record from invenio_records.signals import before_record_insert before_record_insert.connect(listener)
- invenio_records.signals.before_record_revert = <blinker.base.NamedSignal object at 0x7f1b863922d0; 'before-record-revert'>¶
Signal is sent before a record is reverted.
When implementing the event listener, the record data can be retrieved from kwarg[‘record’].
- invenio_records.signals.before_record_update = <blinker.base.NamedSignal object at 0x7f1b86392390; 'before-record-update'>¶
Signal is sent before a record is updated.
When implementing the event listener, the record data can be retrieved from kwarg[‘record’].
Dumpers/Loaders¶
Dumpers used for producing versions of records that can be persisted.
A simple example
You can for instance use a dumper to produce the body of the document to be indexed for Elasticsearch:
dump = Record({...}).dumps(dumper=ElasticsearchDumper())
A dump can be loaded by the dumper as well:
record = Record.loads(dump, loader=ElasticsearchDumper())
Data harmonization
Invenio can read records from the database, Elasticsearch and data files. The master copy is always the database, however for performance reasons, it’s not efficient to always use the master version of a record. For instance, during searches it would come with a big performance impact if we had to read each record in a search result from the database.
The problem is however that a secondary copy of a record (e.g. in the search index) is not identical to the master copy. For instance, usage statistics may have been cached in the Elasticsearch version whereas we don’t persist it in the database. This is again for performance reasons and allows e.g. also having a “sort by most viewed” while not overloading the database with usage statistics updates.
Because the master and secondary copies might not be identical, this causes troubles for other Invenio modules who would have to “massage” the record depending on where it comes form. Overall, this eventually leads to a confusing data flow in the application.
The dumpers fixes this issue by harmonizing data access to a record from multiple different data sources. This way, other Invenio modules always have a standardized version of a record independently of where it was loaded from.
- class invenio_records.dumpers.Dumper[source]¶
Interface for dumpers.
- dump(record, data)[source]¶
Dump a record that can be used a source document for Elasticsearch.
The job of this method is to create a Python dictionary from the record provided in the argument.
If you overwrite this method without calling super, then you should ensure that you make a deep copy of the record dictionary, to avoid that changes to the dump affects the record.
- Parameters
record – The record to dump.
data – The initial dump data passed in by
record.dumps()
.
- load(data, record_cls)[source]¶
Load a record from the source document of an Elasticsearch hit.
The job of this method, is to create a record of type
record_cls
based on the inputdata
.- Parameters
data – A Python dictionary representing the data to load.
records_cls – The record class to be constructed.
- Returns
A instance of
record_cls
.
- class invenio_records.dumpers.ElasticsearchDumper(extensions=None, model_fields=None)[source]¶
Elasticsearch source dumper.
.
- dump(record, data)[source]¶
Dump a record.
The method adds the following keys (if the record has an associated model):
uuid
- UUID of the record.version_id
- the revision id of the record.created
- Creation timestamp in UTC.updated
- Modification timestamp in UTC.
- load(dump_data, record_cls)[source]¶
Load a record from an Elasticsearch document source.
The method reverses the changes made during the dump. If a model was associated, a model will also be initialized.
Warning
The model is not added to the SQLAlchemy session. If you plan on using the model, you must merge it into the session using e.g.:
db.session.merge(record.model)
Extensions¶
Extensions allow integration of features into a record class.
For instance, the system fields feature is built as an extension.
- class invenio_records.extensions.ExtensionMixin[source]¶
Defines the methods needed by an extension.
- pre_init(record, data, model=None, **kwargs)[source]¶
Called when a new record instance is initialized.
Called when a new record is instantiated (i.e. during all
Record({...})
). This means it’s also called when e.g. a record is created viaRecord.create()
.- Parameters
data – The dict passed to the record’s constructor.
model – The model class used for initialization.
System Fields¶
System fields provides managed access to the record’s dictionary.
A simple example
Take the following record subclass:
class MyRecord(Record, SystemFieldsMixin):
test = ConstantField('mykey', 'myval')
The class defines a system field named test
of the type ConstantField
.
The constant field adds a key (mykey
) to the record with the value
myval
when a record is created:
record = MyRecord({})
The key mykey
is part of the record’s dictionary (i.e. you can do
record['mykey']
to acecss the value):
record['mykey'] == 'myval'
The key can however also be accessed through the field (i.e. record.test
):
record.test == 'myval'
System fields is thus a way to manage a subpart of record an allows you the field to hook into the record API. This is a powerful API that can be used to create fields which provides integration with related objects.
A more advanced example
Imagine the following record subclass using an imaginary PIDField
:
class MyRecord(Record, SystemFieldsMixin):
pid = PIDField(pid_type='recid', object_type='rec')
You could use this field to set a PID on the record:
record.pid = PersistentIdentifier(...)
Or, you could access the PID on a record you get from the database:
record = MyRecord.get_record()
record.pid # would return a PersistentIdentifier object.
The simple example only worked with the record itself. The more advanced example here, the record is integrated with related objects.
Data access layer
System fields can do a lot, however you should seen them as part of the data access layer. This means that they primarily simplifies data access between records and related objects.
- class invenio_records.systemfields.ConstantField(key=None, value='')[source]¶
Constant fields add a constant value to a key in the record.
Initialize the field.
- Parameters
key – The key to set in the dictionary (dot notation supported for nested lookup).
value – The value to set for the key.
- class invenio_records.systemfields.DictField(key=None, clear_none=False, create_if_missing=True)[source]¶
Dictionary field.
Provides a shortcut for getting/setting a specific key on a record.
Initialise the dict field.
- Parameters
key – Key to set (dot notation supported).
clear_none – Boolean to control if empty/None values should be removed.
create_if_missing – If a subkey is missing it will be created if this option is set to true.
- __init__(key=None, clear_none=False, create_if_missing=True)[source]¶
Initialise the dict field.
- Parameters
key – Key to set (dot notation supported).
clear_none – Boolean to control if empty/None values should be removed.
create_if_missing – If a subkey is missing it will be created if this option is set to true.
- class invenio_records.systemfields.ModelField(model_field_name=None, dump=True, dump_key=None, dump_type=None, **kwargs)[source]¶
Model field for providing get and set access on a model field.
Initialize the field.
- Parameters
model_field_name – Name of field on the database model.
dump – Set to false to not dump the field.
dump_key – The dictionary key to use in dumps.
dump_type – The data type used to determine how to serialize the model field.
- __init__(model_field_name=None, dump=True, dump_key=None, dump_type=None, **kwargs)[source]¶
Initialize the field.
- Parameters
model_field_name – Name of field on the database model.
dump – Set to false to not dump the field.
dump_key – The dictionary key to use in dumps.
dump_type – The data type used to determine how to serialize the model field.
- property dump_key¶
The dictionary key to use in dump output.
Note, it’s up to the dumper to choose if it respects this name. The name defaults to the model field name.
- property dump_type¶
The data type used to determine how to serialize the model field.
Defaults to none, meaning the dumper will determine how to dump it.
- property model_field_name¶
The name of the SQLAlchemy field on the model.
Defaults to the attribute name used on the class.
- class invenio_records.systemfields.PKRelation(*args, record_cls=None, **kwargs)[source]¶
Primary-key relation type.
Initialize the PK relation.
- class invenio_records.systemfields.RelatedModelField(model, key=None, required=False, load=None, dump=None, context_cls=None)[source]¶
Related model system field.
Initialize the field.
- Parameters
model – Related SQLAlchemy model.
key – Name of key in the record to serialize the related object under.
required – Flag to determine if a related object is required on record commit time.
load – Callable to load the related object from a JSON object.
dump – Callable to dump the related object as a JSON object.
context_cls – The context class is used to provide additional methods on the field itself.
- __init__(model, key=None, required=False, load=None, dump=None, context_cls=None)[source]¶
Initialize the field.
- Parameters
model – Related SQLAlchemy model.
key – Name of key in the record to serialize the related object under.
required – Flag to determine if a related object is required on record commit time.
load – Callable to load the related object from a JSON object.
dump – Callable to dump the related object as a JSON object.
context_cls – The context class is used to provide additional methods on the field itself.
- obj(record)[source]¶
Get the related object.
Uses a cached object if it exists.
IMPORTANT: By default, if the object is loaded from the record JSON object instead of from the database model, it is NOT added to the database session. Thus, the related object will be in a transient state instead of persistent state. This is useful for instance in search queries to avoid hitting the database, however if you need to make operations on it you should add it to the session using:
Record.myattr.session_merge(record)
- class invenio_records.systemfields.RelatedModelFieldContext(field, record_cls)[source]¶
Context for RelatedModelField.
This class implements the class-level methods available on a RelatedModelField. I.e. when you access the field through the class, for instance:
Record.myattr.session_merge(record)
Initialise the field context.
- class invenio_records.systemfields.RelationsField(**fields)[source]¶
Relations field for connections to external entities.
Initialize the field.
- class invenio_records.systemfields.SystemField(key=None)[source]¶
Base class for all system fields.
A system field is a Python data descriptor set on a record class that can also hook into a record via the extensions API (e.g on record creation, dumping etc).
See
ExtensionMixin
for the full interface of methods that a field can override to hook into the record API.Initialise the field.
- __get__(record, owner=None)[source]¶
Accessing the object attribute.
A subclass that overwrites this method, should handle two cases:
Class access - If
instance
is None, the field is accessed through the class (e.g. Record.myfield). In this case a field or context should be returned. The purpose of the field context, is to allow a field to know from which class it was accessed (as the field may be created on a super class).Instance access - If
instance
is not None, the field is accessed through an instance of the class (e.g. record``.myfield``).
A simple example is provided below:
def __get__(self, record, owner=None): if record is None: return self return SystemFieldContext(self, owner) if 'mykey' in record: return record['mykey'] return None
- Parameters
record – The instance through which the field is being accessed or
None
if the field is accessed through the class.owner – The class which owns the field.
- __set__(record, value)[source]¶
Setting the attribute (instance access only).
This method only handles set operations from an instance (e.g.
record.myfield = val
). This is opposite to__get__()
which needs to handle both class and instance access.
- __set_name__(owner, name)[source]¶
Inject the class attribute name into the field.
This ensures that a field has access to the attribute name used on the class. In the following example, the attribute name
schema
would be set in theConstantField
object instance.class MyRecord(Record, SystemFieldsMixin): schema = ConstantField(...)
- property attr_name¶
Property to access the assigned class attribute name.
- Returns
None
if field is not assigned, otherwise the class attribute name.
- get_dictkey(instance)[source]¶
Helper to use a lookup key to get a nested object.
Assume the key have been set in
self.key
- property key¶
Property to access the dict key name.
Uses the attribute name if the key is not defined.
- class invenio_records.systemfields.SystemFieldContext(field, record_cls)[source]¶
Base class for a system field context.
A system field context is created once you access a field’s attribute on a class. As the system field may be defined on a super class, this context allows us to know from which class the field was accessed.
Normally you should subclass this class, and implement methods the methods on it that requires you to know the record class.
Initialise the field context.
- property field¶
Access the field to prevent it from being overwritten.
- property record_cls¶
Record class to prevent it from being overwritten.
- class invenio_records.systemfields.SystemFieldsMeta(name, bases, attrs)[source]¶
Metaclass for a record class that integrates system fields.
Create a new record class.
- class invenio_records.systemfields.SystemFieldsMixin[source]¶
Mixin class for records that add system fields capabilities.
This class is primarily syntax sugar for being able to do:
class MyRecord(Record, SystemsFieldsMixin): pass
instead of:
class MyRecord(Record, metaclass=SystemFieldsMeta): pass
There are subtle differences though between the two above methods. Mainly which classes will execute the
__new__()
method on the metaclass.
JSON Encoding/Decoding¶
A record is a Python dictionary which can be persisted as a JSON document in a database. Because the record is persisted as a JSON document, the Python dictionary can by default only hold valid JSON data types (string, number, object, array, boolean and null). Most notably, the Python dictionary cannot hold for instance a Python datetime object like this:
Record.create({"date": date(2020, 9, 7)})
Above will raise an error.
Custom encoder/decoder¶
Invenio-Records supports customizing the JSON encoder/decoder which is responsible for converting the dictionary to a JSON document and vice versa. This allows you to support non-JSON data types like for instance a Python date object.
First, let’s look at the encoder/decoder. An encoder is a simple class with
two methods: encode()
and decode()
:
class DateEncoder:
def __init__(self, *keys):
self.keys = keys
def encode(self, data):
for k in self.keys:
if k in data:
data[k] = data[k].isoformat()
def decode(self, data):
for k in self.keys:
if k in data:
s = data[k]
year, month, day = int(s[0:4]), int(s[5:7]), int(s[8:10])
data[k] = date(year, month, date)
The
encode()
method iterate over keys, and converts a Python date object into a string (a valid JSON data type) using theisoformat()
method of a date.The
decode()
method does the reverse. It parses a string and converts it into a date object.
Using the encoder¶
Next, you can use the encoder by assigning it to a custom model class and using that model class in your custom record. This could look like below:
class MyRecordMetadata(db.Model, RecordMetadataBase):
__tablename__ = 'myrecord_metadata'
encoder = DatetimeEncoder('pubdate')
class MyRecord(Record):
model_cls = MyRecordMetadata
You can now create and get records with a Python date object, which will be properly encoded/decoded on the way to/from the database
record = MyRecord.create({'pubdate': date(2020, 9, 3)})
record = MyRecord.get_record(record.id)
record['pubdate'] == date(2020, 9, 3)
JSONSchema validation¶
JSONSchema validation is done on the JSON encoded version of the record. This ensures that the schema validation is actually applied to the JSON document and not the Python dict representation of it, which would involve validating non-JSON data types.
Internals¶
It is important to realize that there exists two distinct representations of a record:
The Python dictionary - possibly holding complex Python data types.
The JSON encoded version of the Python dictionary - holding only JSON data types.
The Python dictionary is encoded to the JSON version only when we persist the record to the database. Similarly, we only decode the JSON version when we read the record from the database. This means that the two representations are not kept in sync.
You should only ever modify the Python dictionary. In simple terms that means:
# DON'T:
record.model.json['mykey']
record.model.json['mykey'] = ...
record.model.json = record
# DO:
record['mykey]
record['mykey] = ...
If you touch record.model.json
you risk creating a binding between the
Python dictionary and the JSON encoded version of it because of Python’s data
model (e.g. you modify a nested object on the Python dictionary will cause
the JSON version to also be updated because both holds a reference to the
nested dict).
Optimistic concurrency control¶
Invenio makes use of SQLAlchemy’s version counter feature to provide optimistic concurrency control on the records table when the database transaction isolation level is below repeatable read isolation level (e.g. read committed isolation level which is the default in PostgreSQL).
Imagine the following sequence of events for two transactions A and B:
Transaction A reads existing record 1.
Transaction B reads existing record 1.
Transaction A modifies record 1.
Transaction B modifies record 1.
Transaction A commits.
Transaction B commits.
Repeatable read¶
Under either serializable and repeatable read isolation level, the transaction B in step 4 will wait until transaction A commits in step 5, and then produce an error as well as rollback then entire transaction B - i.e. transaction B never commits.
Read committed¶
Under read committed isolation level (which is the default in PostgreSQL), then again transaction B in step 4 will wait until transaction A commits in step 5, however transaction B will then try to update the record with the new value from transaction A.
The JSON document for a record is stored in a single column, thus under read committed isolation level, changes made by transaction A to the JSON document would be overwritten by transaction B.
To prevent this scenario under read committed isolation level, Invenio stores a version counter in the database table. The fields of the records table looks like this:
id
(uuid)json
(jsonb)version_id
(integer)created
(timestamp)updated
(timestamp)
When transaction A modifies the record in step 3, it does it with an UPDATE
statement which looks similar to this:
UPDATE records_metadata
SET json=..., version_id=2
WHERE id=1 AND version_id=1
When transaction B tries to modify the record in step 4 it uses the same
UPDATE
statement. As described above, transaction B then waits until
transaction A commits in step 5. However, now the WHERE
condition (id=1
and version_id=1
) will no longer match the record’s row in the database
(because version_id
is now 2). Thus transaction B will update 0 rows
and make SQLAlchemy throw an error about stale data, and afterwards rollback
the transaction.
Thus, the version counter prevents scenarios that could cause concurrent transactions to overwrite each other under read committed isolation level.
Note
The version counter does not prevent concurrent transactions to overwrite
each other’s data if you update many records in a single UPDATE
statement. Normally this is not possible if you use the
Record
API.
If, however, you use the low-level SQLAlchemy model
RecordMetadata
directly, it is possible
to execute UPDATE
statements that update multiple rows at once and you
should be very careful and be aware of details (or e.g. change your
isolation level to repeatable read).
REST API¶
The version counter is also used in the REST API to provide concurrency control. The version counter is provided in an ETag header when a record is retrieved via the REST API. When a client then issues an update of a record and includes the version counter in the If-Match header, it’s checked against the current record’s version and refused if it doesn’t match, thus preventing REST API clients to overwrite each other’s changes.
Additional Notes¶
Notes on how to contribute, legal information and changes are here for the interested.
Contributing¶
Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.
Types of Contributions¶
Report Bugs¶
Report bugs at https://github.com/inveniosoftware/invenio-records/issues.
If you are reporting a bug, please include:
Your operating system name and version.
Any details about your local setup that might be helpful in troubleshooting.
Detailed steps to reproduce the bug.
Fix Bugs¶
Look through the GitHub issues for bugs. Anything tagged with “bug” is open to whoever wants to implement it.
Implement Features¶
Look through the GitHub issues for features. Anything tagged with “feature” is open to whoever wants to implement it.
Write Documentation¶
Invenio-Records could always use more documentation, whether as part of the official Invenio-Records docs, in docstrings, or even on the web in blog posts, articles, and such.
Submit Feedback¶
The best way to send feedback is to file an issue at https://github.com/inveniosoftware/invenio-records/issues.
If you are proposing a feature:
Explain in detail how it would work.
Keep the scope as narrow as possible, to make it easier to implement.
Remember that this is a volunteer-driven project, and that contributions are welcome :)
Get Started!¶
Ready to contribute? Here’s how to set up invenio-records for local development.
Fork the inveniosoftware/invenio-records repo on GitHub.
Clone your fork locally:
$ git clone git@github.com:your_name_here/invenio-records.git
Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:
$ mkvirtualenv invenio-records $ cd invenio-records/ $ pip install -e .[all]
Create a branch for local development:
$ git checkout -b name-of-your-bugfix-or-feature
Now you can make your changes locally.
When you’re done making changes, check that your changes pass tests:
$ ./run-tests.sh
The tests will provide you with test coverage and also check PEP8 (code style), PEP257 (documentation), flake8 as well as build the Sphinx documentation and run doctests.
Commit your changes and push your branch to GitHub:
$ git add . $ git commit -s -m "component: title without verbs" -m "* NEW Adds your new feature." -m "* FIX Fixes an existing issue." -m "* BETTER Improves and existing feature." -m "* Changes something that should not be visible in release notes." $ git push origin name-of-your-bugfix-or-feature
Submit a pull request through the GitHub website.
Pull Request Guidelines¶
Before you submit a pull request, check that it meets these guidelines:
The pull request should include tests and must not decrease test coverage.
If the pull request adds functionality, the docs should be updated. Put your new functionality into a function with a docstring.
The pull request should work for Python 2.7, 3.3, 3.4 and 3.5. Check https://github.com/inveniosoftware/invenio-records/actions?query=event%3Apull_request and make sure that the tests pass for all supported Python versions.
Changes¶
Version 1.6.2 (released 2022-04-06)
Removes python 3.6 from test suite.
Initializes parent class of ModelField.
Bumps several dependencies (invenio-db, invenio-base, etc.) to support Flask 2.1.
Version 1.6.1 (released 2021-12-04)
Adds support for the post commit life-cycle hook.
Version 1.6.0 (released 2021-10-20)
Adds a new relations system field for managing relations between records. Part of RFC #40.
Adds a new related model system field to serialize/dump a related object into the record JSON.
Adds new configuration variables to allow injecting a custom JSONSchema RefResolver together with a custom JSONSchema store. Part of RFC #42 to simplify JSON Schema resolution and registry management and more easily build composable JSONSchemas.
Deprecated the Record.patch() method.
Version 1.5.0
Not released to avoid polluting Invenio v3.4.
Version 1.4.0 (released 2020-12-09)
Backwards incompatible: By default the versioning table is now disabled in the
RecordMetadataBase
(theRecordMetadata
is still versioned). If you subclassesRecordMetadataBase
and needs versioning, you need to add the following line in your class:class MyRecordMetadata(db.Model, RecordMetadataBase): __versioned__ = {}
Backwards incompatible: The
Record.validate()
method is now split in two methodsvalidate()
and_validate()
. If you overwrote thevalidate()
method in a subclass, you may need to overwrite instead_validate()
.Backwards incompatible: Due to the JSON encoding/decoding support, the Python dictionary representing the record and the SQLAlchemy models are separate objects and updating one, won’t automatically update the other. Normally, you should not have accessed
record.model.json
in your code, however if you did, you need to rewrite it and rely on thecreate()
andcommit()
methods to update the model’sjson
column.Adds a new is_deleted property to the Records API.
Removes the @ prefix that was used to separate metadata fields from other fields.
Adds a SystemFieldContext which allows knowing the record class when accessing the attribute through the class instead of object instance.
Adds helpers for caching related objects on the record.
Adds support for JSON encoding/decoding to/from the database. This allows e.g. have records with complex data types such as datetime objects. JSONSchema validation happens on the JSON encoded version of the record.
Adds dumpers to support dumping and loading records from secondary copies (e.g. records stored in an Elasticsearch index).
Adds support record extensions as a more strict replacement of signals. Allows writing extensions (like the system fields), that integrate into the Records API.
Adds support for system fields that are Python data descriptors on the Record which allows for managed access to the Record’s dictionary.
Adds support for disabling signals.
Adds support for disabling JSONRef replacement.
Adds support for specifying JSONSchema format checkers and validator class at a class-level instead of per validate call.
Adds support for specifying class-wide JSONSchema format checkers
Adds a cleaner definition of a what a soft-deleted record using the is_deleted hybrid property on the database model.
Adds support for undeleting a soft-deleted record.
Version 1.3.2 (released 2020-05-27)
Fixes a bug causing incorrect revisions to be fetched. If
record.commit()
was called multiple times prior to adb.session.commit()
, there would be gaps in the version ids persisted in the database. This meant that if you usedrecord.revisions[revision_id]
to access a revision, it was not guaranteed to return that specific revision id. See #221
Version 1.3.1 (released 2020-05-07)
Deprecated Python versions lower than 3.6.0. Now supporting 3.6.0 and 3.7.0.
Removed dependency on Invenio-PIDStore and releated documentation. Functionality was removed in v1.3.0.
Version 1.3.0 (released 2019-08-01)
Removed deprecated CLI.
Version 1.2.2 (released 2019-07-11)
Fix XSS vulnerability in admin interface.
Version 1.2.1 (released 2019-05-14)
Relax Flask dependency to v0.11.1.
Version 1.2.0 (released 2019-05-08)
Allow to store RecordMetadata in a custom db table.
Version 1.1.1 (released 2019-07-11)
Fix XSS vulnerability in admin interface.
Version 1.1.0 (released 2019-02-22)
Removed deprecated Celery task.
Deprecated CLI
Version 1.0.2 (released 2019-07-11)
Fix XSS vulnerability in admin interface.
Version 1.0.1 (released 2018-12-14)
Fix CliRunner exceptions.
Fix JSON Schema URL.
Version 1.0.0 (released 2018-03-23)
Initial public release.
License¶
MIT License
Copyright (C) 2015-2021 CERN. Copyright (C) 2021 RERO.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Note
In applying this license, CERN does not waive the privileges and immunities granted to it by virtue of its status as an Intergovernmental Organization or submit itself to any jurisdiction.
Contributors¶
Alizee Pace
Diego Rodriguez Rodriguez
Esteban J. G. Gabancho
Harris Tzovanakis
Jacopo Notarstefano
Jan Aage Lavik
Javier Delgado
Javier Martin Montull
Jiri Kuncar
Jose Benito Gonzalez Lopez
Krzysztof Nowak
Lars Holm Nielsen
Leonardo Rossi
Nicola Tarocco
Nicolas Harraudeau
Orestis Melkonian
Paulina Lach
Rémi Ducceschi
Sami Hiltunen
Tibor Simko
Maximilian Moser