The import process¶
The process to import data into Solr is relatively straightforward.
There’s a SearchEntity
object for each
entity type that can be imported which keeps track of the indexable fields and
the model in mbdata for that entity type.
Once its known which entity types will be imported,
sir.indexing._multiprocessed_import()
will successivey spawn
multiprocessing.Process
es via multiprocessing.pool
.
Each of the processes will retrieve one batch of entities from the database via
a query built from
build_entity_query()
and convert
them
into regular dicts via
query_result_to_dict()
.
The result of the conversion will be passed into a
multiprocessing.Queue
.
On the other end of the queue, another process running
sir.indexing.queue_to_solr()
will send them to Solr in batches.
Paths¶
Each SearchEntity
is assigned a
declarative via its model attribute and a
collection of SearchField
objects, each
corresponding to a field in the entities Solr core. Those fields each have one
or more paths that “lead” to the values that will be put into the field in
Solr. iterate_path_values()
is a method that returns an
iterator over all values for a specific field from an instance of a
declarative class and its docstring describes
how that works, so here’s a verbatim copy of it:
-
querying.
iterate_path_values
(obj) Return an iterator over all values for path on obj, an instance of a declarative class by first splitting the path into its elements by splitting it on the dots, resulting in a list of path elements. Then, for each element, a call to
getattr()
is made - the arguments will be the current model (which initially is the model assigned to theSearchEntity
) and the current path element. After doing this, there are two cases:- The path element is not the last one in the path. In this case, the
getattr()
call returns one or more objects of another model which will replace the current one. - The path element is the last one in the path. In this case, the value
returned by the
getattr()
call will be returned and added to the list of values for this field.
To give an example, lets presume the object we’re starting with is an instance of
Artist
and the path is “begin_area.name”. The firstgetattr()
call will be:getattr(obj, "begin_area")
which returns an
Area
object, on which the call:getattr(obj, "name")
will return the final value:
>>> from mbdata.models import Artist, Area >>> artist = Artist(name="Johan de Meij") >>> area = Area(name="Netherlands") >>> artist.begin_area = area >>> list(iterate_path_values("begin_area.name", artist)) ['Netherlands']
One-to-many relationships will of course be handled as well:
>>> from mbdata.models import Recording, ISRC >>> recording = Recording(name="Fortuna Imperatrix Mundi: O Fortuna") >>> recording.isrcs.append(ISRC(isrc="DEF056730100")) >>> recording.isrcs.append(ISRC(isrc="DEF056730101")) >>> list(iterate_path_values("isrcs.isrc", recording)) ['DEF056730100', 'DEF056730101']
- The path element is not the last one in the path. In this case, the
sir.schema.SCHEMA
is a dictionary mapping core names to
SearchEntity
objects.