API Docs¶
Invenio module for metadata storage.
- class invenio_records.ext.InvenioRecords(app=None, **kwargs)[source]¶
Invenio-Records extension.
Extension initialization.
Record API¶
Record API.
- class invenio_records.api.Record(data, model=None, **kwargs)[source]¶
Define API for metadata creation and manipulation.
Initialize instance with dictionary data and SQLAlchemy model.
- Parameters
data – Dict with record metadata.
model –
RecordMetadata
instance.
- commit(format_checker=None, validator=None, **kwargs)[source]¶
Store changes of the current record instance in the database.
Send a signal
invenio_records.signals.before_record_update
with the current record to be committed as parameter.Validate the current record data.
Commit the current record in the database.
- Send a signal
invenio_records.signals.after_record_update
with the committed record as parameter.
- Send a signal
- Keyword Arguments
format_checker – An instance of the class
jsonschema.FormatChecker
, which contains validation rules for formats. Seevalidate()
for more details.validator – A
jsonschema.protocols.Validator
class that will be used to validate the record. Seevalidate()
for more details.
- Returns
The
Record
instance.
- classmethod create(data, id_=None, **kwargs)[source]¶
Create a new record instance and store it in the database.
Send a signal
invenio_records.signals.before_record_insert
with the new record as parameter.Validate the new record data.
Add the new record in the database.
Send a signal
invenio_records.signals.after_record_insert
with the new created record as parameter.
- Keyword Arguments
format_checker – An instance of the class
jsonschema.FormatChecker
, which contains validation rules for formats. Seevalidate()
for more details.validator – A
jsonschema.protocols.Validator
class that will be used to validate the record. Seevalidate()
for more details.
- Parameters
data – Dict with the record metadata.
id – Specify a UUID to use for the new record, instead of automatically generated.
- Returns
A new
Record
instance.
- delete(force=False)[source]¶
Delete a record.
If force is
False
, the record is soft-deleted: record data will be deleted but the record identifier and the history of the record will be kept. This ensures that the same record identifier cannot be used twice, and that you can still retrieve its history. If force isTrue
, then the record is completely deleted from the database.Send a signal
invenio_records.signals.before_record_delete
with the current record as parameter.Delete or soft-delete the current record.
Send a signal
invenio_records.signals.after_record_delete
with the current deleted record as parameter.
- Parameters
force – if
True
, completely deletes the current record from the database, otherwise soft-deletes it.- Returns
The deleted
Record
instance.
- classmethod get_record(id_, with_deleted=False)[source]¶
Retrieve the record by id.
Raise a database exception if the record does not exist.
- Parameters
id – record ID.
with_deleted – If True then it includes deleted records.
- Returns
The
Record
instance.
- classmethod get_records(ids, with_deleted=False)[source]¶
Retrieve multiple records by id.
- Parameters
ids – List of record IDs.
with_deleted – If True then it includes deleted records.
- Returns
A list of
Record
instances.
- patch(patch)[source]¶
Patch record metadata.
- Params patch
Dictionary of record metadata.
- Returns
A new
Record
instance.
- revert(revision_id)[source]¶
Revert the record to a specific revision.
Send a signal
invenio_records.signals.before_record_revert
with the current record as parameter.Revert the record to the revision id passed as parameter.
Send a signal
invenio_records.signals.after_record_revert
with the reverted record as parameter.
- Parameters
revision_id – Specify the record revision id
- Returns
The
Record
instance corresponding to the revision id
- property revisions¶
Get revisions iterator.
- send_signals = True¶
Class-level attribute to control if signals should be sent.
- class invenio_records.api.RecordBase(data, model=None, **kwargs)[source]¶
Base class for Record and RecordRevision to share common features.
Initialize instance with dictionary data and SQLAlchemy model.
- Parameters
data – Dict with record metadata.
model –
RecordMetadata
instance.
- clear_none(key=None)[source]¶
Helper method to clear None, empty dict and list values.
Modifications are done in place.
- property created¶
Get creation timestamp.
- dumper = <invenio_records.dumpers.base.Dumper object>¶
Class-level attribute to specify the default data dumper/loader.
For backward compatibility the dumper used here just produces a deep copy of the record.
- dumps(dumper=None)[source]¶
Make a dump of the record (defaults to a deep copy of the dict).
This method produces a version of a record that can be persisted on storage such as the database, search or other mediums depending on the dumper class used.
- Parameters
dumper – Dumper to use when dumping the record.
- Returns
A
dict
.
- enable_jsonref = True¶
Class-level attribute to control if JSONRef replacement is supported.
- format_checker = None¶
Class-level attribute to specify a default JSONSchema format checker.
- property id¶
Get model identifier.
- property is_deleted¶
Get creation timestamp.
- classmethod loads(data, loader=None)[source]¶
Load a record dump.
- Parameters
loader – Loader class to use when loading the record.
- Returns
A new
Record
instance.
- model_cls¶
SQLAlchemy model class defining which table stores the records.
alias of
RecordMetadata
- property revision_id¶
Get revision identifier.
- property updated¶
Get last updated timestamp.
- validate(format_checker=None, validator=None, **kwargs)[source]¶
Validate record according to schema defined in
$schema
key.- Keyword Arguments
format_checker – A
format_checker
is an instance of classjsonschema.FormatChecker
containing business logic to validate arbitrary formats. For example:>>> from jsonschema import FormatChecker >>> from jsonschema.validators import validate >>> checker = FormatChecker() >>> checker.checks('foo')(lambda el: el.startswith('foo')) <function <lambda> at ...> >>> validate('foo', {'format': 'foo'}, format_checker=checker)
returns
None
, which means that the validation was successful, while>>> validate('bar', {'format': 'foo'}, ... format_checker=checker) Traceback (most recent call last): ... ValidationError: 'bar' is not a 'foo' ...
raises a
jsonschema.exceptions.ValidationError
.validator – A
jsonschema.protocols.Validator
class used for record validation. It will be used as cls argument when callingjsonschema.validate()
. For example>>> from jsonschema.validators import extend, Draft4Validator >>> NoRequiredValidator = extend( ... Draft4Validator, ... validators={'required': lambda v, r, i, s: None} ... ) >>> schema = { ... 'type': 'object', ... 'properties': { ... 'name': { 'type': 'string' }, ... 'email': { 'type': 'string' }, ... 'address': {'type': 'string' }, ... 'telephone': { 'type': 'string' } ... }, ... 'required': ['name', 'email'] ... } >>> from jsonschema.validators import validate >>> validate({}, schema, NoRequiredValidator)
returns
None
, which means that the validation was successful, while>>> validate({}, schema) Traceback (most recent call last): ... ValidationError: 'name' is a required property ...
raises a
jsonschema.exceptions.ValidationError
.
- validator = None¶
Class-level attribute to specify a JSONSchema validator class.
Configuration¶
Default values for records configuration.
- invenio_records.config.RECORDS_REFRESOLVER_CLS = None¶
Custom JSONSchemas ref resolver class.
Note that when using a custom ref resolver class you should also set
RECORDS_REFRESOLVER_STORE
to point to a JSONSchema ref resolver store.
- invenio_records.config.RECORDS_REFRESOLVER_STORE = None¶
JSONSchemas ref resolver store.
Used together with
RECORDS_REFRESOLVER_CLS
to provide a specific ref resolver store.
- invenio_records.config.RECORDS_VALIDATION_TYPES = {}¶
Pass additional types when validating a record against a schema. For more details, see: https://python-jsonschema.readthedocs.io/en/latest/validate/#validating-types.
Errors¶
Errors for Invenio-Records module.
- exception invenio_records.errors.MissingModelError[source]¶
Error raised when a record has no model.
Models¶
Record models.
- class invenio_records.models.RecordMetadata(**kwargs)[source]¶
Represent a record metadata.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- created¶
- id¶
Record identifier.
- json¶
Store metadata in JSON format.
When you create a new
Record
thejson
field value should never beNULL
. Default value is an empty dict.NULL
value means that the record metadata has been deleted.
- query: t.ClassVar[Query]¶
A SQLAlchemy query for a model. Equivalent to
db.session.query(Model)
. Can be customized per-model by overridingquery_class
.Warning
The query interface is considered legacy in SQLAlchemy. Prefer using
session.execute(select())
instead.
- updated¶
- version_id¶
Used by SQLAlchemy for optimistic concurrency control.
- class invenio_records.models.RecordMetadataBase(data=None, **kwargs)[source]¶
Represent a base class for record metadata.
The RecordMetadata object contains a
created
and aupdated
properties that are automatically updated.Initialize the model specifically by setting the.
- property data¶
Get data by decoding the JSON.
This allows a subclass to override
- encoder = None¶
“Class-level attribute to set a JSON data encoder/decoder.
This allows customizing you to e.g. convert specific entries to complex Python objects. For instance you could convert ISO-formatted datetime objects into Python datetime objects.
- id = Column(None, UUIDType(), table=None, primary_key=True, nullable=False, default=ColumnDefault(<function uuid4>))¶
Record identifier.
- is_deleted¶
Boolean flag to determine if a record is soft deleted.
- json = Column(None, Variant(), table=None, default=ColumnDefault(<function RecordMetadataBase.<lambda>>))¶
Store metadata in JSON format.
When you create a new
Record
thejson
field value should never beNULL
. Default value is an empty dict.NULL
value means that the record metadata has been deleted.
- version_id = Column(None, Integer(), table=None, nullable=False)¶
Used by SQLAlchemy for optimistic concurrency control.
Signals¶
Record module signals.
- invenio_records.signals.after_record_delete = <blinker.base.NamedSignal object at 0x7fd43519bb10; 'after-record-delete'>¶
Signal sent after a record is deleted.
When implementing the event listener, the record data can be retrieved from kwarg[‘record’].
Note
Do not perform any modification to the record here: they will be not persisted.
- invenio_records.signals.after_record_insert = <blinker.base.NamedSignal object at 0x7fd43522ea10; 'after-record-insert'>¶
Signal sent after a record is inserted.
When implementing the event listener, the record data can be retrieved from kwarg[‘record’].
Note
Do not perform any modification to the record here: they will be not persisted.
- invenio_records.signals.after_record_revert = <blinker.base.NamedSignal object at 0x7fd43519bc10; 'after-record-revert'>¶
Signal sent after a record is reverted.
When implementing the event listener, the record data can be retrieved from kwarg[‘record’].
Note
Do not perform any modification to the record here: they will be not persisted.
- invenio_records.signals.after_record_update = <blinker.base.NamedSignal object at 0x7fd43519b8d0; 'after-record-update'>¶
Signal sent after a record is updated.
When implementing the event listener, the record data can be retrieved from kwarg[‘record’].
Note
Do not perform any modification to the record here: they will be not persisted.
- invenio_records.signals.before_record_delete = <blinker.base.NamedSignal object at 0x7fd43519ba50; 'before-record-delete'>¶
Signal is sent before a record is deleted.
When implementing the event listener, the record data can be retrieved from kwarg[‘record’].
- invenio_records.signals.before_record_insert = <blinker.base.NamedSignal object at 0x7fd4351eced0; 'before-record-insert'>¶
Signal is sent before a record is inserted.
When implementing the event listener, the record data can be retrieved from kwarg[‘record’]. Example event listener (subscriber) implementation:
def listener(sender, *args, **kwargs): record = kwargs['record'] # do something with the record from invenio_records.signals import before_record_insert before_record_insert.connect(listener)
- invenio_records.signals.before_record_revert = <blinker.base.NamedSignal object at 0x7fd43519bbd0; 'before-record-revert'>¶
Signal is sent before a record is reverted.
When implementing the event listener, the record data can be retrieved from kwarg[‘record’].
- invenio_records.signals.before_record_update = <blinker.base.NamedSignal object at 0x7fd43519b990; 'before-record-update'>¶
Signal is sent before a record is updated.
When implementing the event listener, the record data can be retrieved from kwarg[‘record’].
Dumpers/Loaders¶
Dumpers used for producing versions of records that can be persisted.
A simple example
You can for instance use a dumper to produce the body of the document to be indexed for the search engine:
dump = Record({...}).dumps(dumper=SearchDumper())
A dump can be loaded by the dumper as well:
record = Record.loads(dump, loader=SearchDumper())
Data harmonization
Invenio can read records from the database, search engine and data files. The master copy is always the database, however for performance reasons, it’s not efficient to always use the master version of a record. For instance, during searches it would come with a big performance impact if we had to read each record in a search result from the database.
The problem is however that a secondary copy of a record (e.g. in the search index) is not identical to the master copy. For instance, usage statistics may have been cached in the search engine version whereas we don’t persist it in the database. This is again for performance reasons and allows e.g. also having a “sort by most viewed” while not overloading the database with usage statistics updates.
Because the master and secondary copies might not be identical, this causes troubles for other Invenio modules who would have to “massage” the record depending on where it comes form. Overall, this eventually leads to a confusing data flow in the application.
The dumpers fixes this issue by harmonizing data access to a record from multiple different data sources. This way, other Invenio modules always have a standardized version of a record independently of where it was loaded from.
- class invenio_records.dumpers.Dumper[source]¶
Interface for dumpers.
- dump(record, data)[source]¶
Dump a record that can be used a source document for the search engine.
The job of this method is to create a Python dictionary from the record provided in the argument.
If you overwrite this method without calling super, then you should ensure that you make a deep copy of the record dictionary, to avoid that changes to the dump affects the record.
- Parameters
record – The record to dump.
data – The initial dump data passed in by
record.dumps()
.
- load(data, record_cls)[source]¶
Load a record from the source document of a search engine hit.
The job of this method, is to create a record of type
record_cls
based on the inputdata
.- Parameters
data – A Python dictionary representing the data to load.
records_cls – The record class to be constructed.
- Returns
A instance of
record_cls
.
- class invenio_records.dumpers.SearchDumper(extensions=None, model_fields=None)[source]¶
Search source dumper.
.
- dump(record, data)[source]¶
Dump a record.
The method adds the following keys (if the record has an associated model):
uuid
- UUID of the record.version_id
- the revision id of the record.created
- Creation timestamp in UTC.updated
- Modification timestamp in UTC.
- load(dump_data, record_cls)[source]¶
Load a record from an Search document source.
The method reverses the changes made during the dump. If a model was associated, a model will also be initialized.
Warning
The model is not added to the SQLAlchemy session. If you plan on using the model, you must merge it into the session using e.g.:
db.session.merge(record.model)
Extensions¶
Extensions allow integration of features into a record class.
For instance, the system fields feature is built as an extension.
- class invenio_records.extensions.ExtensionMixin[source]¶
Defines the methods needed by an extension.
- pre_init(record, data, model=None, **kwargs)[source]¶
Called when a new record instance is initialized.
Called when a new record is instantiated (i.e. during all
Record({...})
). This means it’s also called when e.g. a record is created viaRecord.create()
.- Parameters
data – The dict passed to the record’s constructor.
model – The model class used for initialization.
System Fields¶
System fields provides managed access to the record’s dictionary.
A simple example
Take the following record subclass:
class MyRecord(Record, SystemFieldsMixin):
test = ConstantField('mykey', 'myval')
The class defines a system field named test
of the type ConstantField
.
The constant field adds a key (mykey
) to the record with the value
myval
when a record is created:
record = MyRecord({})
The key mykey
is part of the record’s dictionary (i.e. you can do
record['mykey']
to acecss the value):
record['mykey'] == 'myval'
The key can however also be accessed through the field (i.e. record.test
):
record.test == 'myval'
System fields is thus a way to manage a subpart of record an allows you the field to hook into the record API. This is a powerful API that can be used to create fields which provides integration with related objects.
A more advanced example
Imagine the following record subclass using an imaginary PIDField
:
class MyRecord(Record, SystemFieldsMixin):
pid = PIDField(pid_type='recid', object_type='rec')
You could use this field to set a PID on the record:
record.pid = PersistentIdentifier(...)
Or, you could access the PID on a record you get from the database:
record = MyRecord.get_record()
record.pid # would return a PersistentIdentifier object.
The simple example only worked with the record itself. The more advanced example here, the record is integrated with related objects.
Data access layer
System fields can do a lot, however you should seen them as part of the data access layer. This means that they primarily simplifies data access between records and related objects.
- class invenio_records.systemfields.ConstantField(key=None, value='')[source]¶
Constant fields add a constant value to a key in the record.
Initialize the field.
- Parameters
key – The key to set in the dictionary (dot notation supported for nested lookup).
value – The value to set for the key.
- class invenio_records.systemfields.DictField(key=None, clear_none=False, create_if_missing=True)[source]¶
Dictionary field.
Provides a shortcut for getting/setting a specific key on a record.
Initialise the dict field.
- Parameters
key – Key to set (dot notation supported).
clear_none – Boolean to control if empty/None values should be removed.
create_if_missing – If a subkey is missing it will be created if this option is set to true.
- __init__(key=None, clear_none=False, create_if_missing=True)[source]¶
Initialise the dict field.
- Parameters
key – Key to set (dot notation supported).
clear_none – Boolean to control if empty/None values should be removed.
create_if_missing – If a subkey is missing it will be created if this option is set to true.
- class invenio_records.systemfields.ModelField(model_field_name=None, dump=True, dump_key=None, dump_type=None, **kwargs)[source]¶
Model field for providing get and set access on a model field.
Initialize the field.
- Parameters
model_field_name – Name of field on the database model.
dump – Set to false to not dump the field.
dump_key – The dictionary key to use in dumps.
dump_type – The data type used to determine how to serialize the model field.
- __init__(model_field_name=None, dump=True, dump_key=None, dump_type=None, **kwargs)[source]¶
Initialize the field.
- Parameters
model_field_name – Name of field on the database model.
dump – Set to false to not dump the field.
dump_key – The dictionary key to use in dumps.
dump_type – The data type used to determine how to serialize the model field.
- property dump_key¶
The dictionary key to use in dump output.
Note, it’s up to the dumper to choose if it respects this name. The name defaults to the model field name.
- property dump_type¶
The data type used to determine how to serialize the model field.
Defaults to none, meaning the dumper will determine how to dump it.
- property model_field_name¶
The name of the SQLAlchemy field on the model.
Defaults to the attribute name used on the class.
- class invenio_records.systemfields.ModelRelation(record_cls, model_field_name, key, keys=None, attrs=None)[source]¶
Define a relation stored as a foreign key on the record’s model.
Constructor.
- result_cls¶
alias of
ModelRelationResult
- class invenio_records.systemfields.MultiRelationsField(**fields)[source]¶
Relations field for connections to external entities.
It allows to define nested relation fields. For example:
class Record: relations = MultiRelationsField( field_one=PIDListRelation( "metadata.field_one", ... ), inner=RelationsField( inner_field=PIDListRelation( "metadata.inner_field", ... ), ) )
Initialize the field.
The nested RelationFields will be flattened to the root. In the example above, the relations field has a field (field_one) and a nested RelationsField with a field (inner_field). However, both of them will be accessed through the relations field.
relations.field_one relations.inner_field # correct relations.inner.inner_field # incorrect
- __init__(**fields)[source]¶
Initialize the field.
The nested RelationFields will be flattened to the root. In the example above, the relations field has a field (field_one) and a nested RelationsField with a field (inner_field). However, both of them will be accessed through the relations field.
relations.field_one relations.inner_field # correct relations.inner.inner_field # incorrect
- class invenio_records.systemfields.PKRelation(*args, record_cls=None, **kwargs)[source]¶
Primary-key relation type.
Initialize the PK relation.
- class invenio_records.systemfields.RelatedModelField(model, key=None, required=False, load=None, dump=None, context_cls=None)[source]¶
Related model system field.
Initialize the field.
- Parameters
model – Related SQLAlchemy model.
key – Name of key in the record to serialize the related object under.
required – Flag to determine if a related object is required on record commit time.
load – Callable to load the related object from a JSON object.
dump – Callable to dump the related object as a JSON object.
context_cls – The context class is used to provide additional methods on the field itself.
- __init__(model, key=None, required=False, load=None, dump=None, context_cls=None)[source]¶
Initialize the field.
- Parameters
model – Related SQLAlchemy model.
key – Name of key in the record to serialize the related object under.
required – Flag to determine if a related object is required on record commit time.
load – Callable to load the related object from a JSON object.
dump – Callable to dump the related object as a JSON object.
context_cls – The context class is used to provide additional methods on the field itself.
- obj(record)[source]¶
Get the related object.
Uses a cached object if it exists.
IMPORTANT: By default, if the object is loaded from the record JSON object instead of from the database model, it is NOT added to the database session. Thus, the related object will be in a transient state instead of persistent state. This is useful for instance in search queries to avoid hitting the database, however if you need to make operations on it you should add it to the session using:
Record.myattr.session_merge(record)
- class invenio_records.systemfields.RelatedModelFieldContext(field, record_cls)[source]¶
Context for RelatedModelField.
This class implements the class-level methods available on a RelatedModelField. I.e. when you access the field through the class, for instance:
Record.myattr.session_merge(record)
Initialise the field context.
- class invenio_records.systemfields.RelationsField(**fields)[source]¶
Relations field for connections to external entities.
Initialize the field.
- class invenio_records.systemfields.SystemField(key=None)[source]¶
Base class for all system fields.
A system field is a Python data descriptor set on a record class that can also hook into a record via the extensions API (e.g on record creation, dumping etc).
See
ExtensionMixin
for the full interface of methods that a field can override to hook into the record API.Initialise the field.
- __get__(record, owner=None)[source]¶
Accessing the object attribute.
A subclass that overwrites this method, should handle two cases:
Class access - If
instance
is None, the field is accessed through the class (e.g. Record.myfield). In this case a field or context should be returned. The purpose of the field context, is to allow a field to know from which class it was accessed (as the field may be created on a super class).Instance access - If
instance
is not None, the field is accessed through an instance of the class (e.g. record``.myfield``).
A simple example is provided below:
def __get__(self, record, owner=None): if record is None: return self return SystemFieldContext(self, owner) if 'mykey' in record: return record['mykey'] return None
- Parameters
record – The instance through which the field is being accessed or
None
if the field is accessed through the class.owner – The class which owns the field.
- __set__(record, value)[source]¶
Setting the attribute (instance access only).
This method only handles set operations from an instance (e.g.
record.myfield = val
). This is opposite to__get__()
which needs to handle both class and instance access.
- __set_name__(owner, name)[source]¶
Inject the class attribute name into the field.
This ensures that a field has access to the attribute name used on the class. In the following example, the attribute name
schema
would be set in theConstantField
object instance.class MyRecord(Record, SystemFieldsMixin): schema = ConstantField(...)
- property attr_name¶
Property to access the assigned class attribute name.
- Returns
None
if field is not assigned, otherwise the class attribute name.
- get_dictkey(instance)[source]¶
Helper to use a lookup key to get a nested object.
Assume the key have been set in
self.key
- property key¶
Property to access the dict key name.
Uses the attribute name if the key is not defined.
- class invenio_records.systemfields.SystemFieldContext(field, record_cls)[source]¶
Base class for a system field context.
A system field context is created once you access a field’s attribute on a class. As the system field may be defined on a super class, this context allows us to know from which class the field was accessed.
Normally you should subclass this class, and implement methods the methods on it that requires you to know the record class.
Initialise the field context.
- property field¶
Access the field to prevent it from being overwritten.
- property record_cls¶
Record class to prevent it from being overwritten.
- class invenio_records.systemfields.SystemFieldsMeta(name, bases, attrs)[source]¶
Metaclass for a record class that integrates system fields.
Create a new record class.
- class invenio_records.systemfields.SystemFieldsMixin[source]¶
Mixin class for records that add system fields capabilities.
This class is primarily syntax sugar for being able to do:
class MyRecord(Record, SystemsFieldsMixin): pass
instead of:
class MyRecord(Record, metaclass=SystemFieldsMeta): pass
There are subtle differences though between the two above methods. Mainly which classes will execute the
__new__()
method on the metaclass.