Writing and reading HDF5#

mammos_entity provides support for writing entities and entity collections to HDF5 and reading them back in.

HDF5 provides many options and therefore mammos_entity does not prescribe a complete structure. Instead it focuses on storing entity collections and entity likes inside an HDF5 file:

  • entity collections are stored as HDF5 groups

  • entity likes are stored as HDF5 datasets

In write mode mammos_entity can either create a new file or work on parts of an existing file.

import h5py
import mammos_entity as me
import mammos_units as u

Single entity#

As a first example we create a single entity with four values an save it to disk:

T = me.T([10, 20, 50, 100], "K")
T
ThermodynamicTemperature(value=[ 10.  20.  50. 100.], unit=K)

To save it to file we can specify the file name and a name of the dataset storing the entity. A new hdf5 file will be created automatically. If a file with the same name exits it will be overwritten.

T.to_hdf5("test.hdf5", "temperature")

We can inspect the content of the file using h5glance:

!h5glance --attrs test.hdf5
test.hdf5
└temperature	[float64: 4]
  └5 attributes:
    ├description: ''
    ├mammos_entity_version: '0.12.0'
    ├ontology_iri: 'https://w3id.org/em...2_86c6_69e26182a17f'
    ├ontology_label: 'ThermodynamicTemperature'
    └unit: 'K'

We can see that we got a single dataset temperature with data of type float64 and four elements (we don’t see the actual values). Furthermore, the temperature dataset contains metedata attributes for description, ontology information, the unit, and the version of mammos_entity used to write the dataset.

We can read the file and get back an entity collection:

content = me.from_hdf5("test.hdf5")
content
EntityCollection(
    description='',
    temperature=Entity(ontology_label='ThermodynamicTemperature', value=array([ 10.,  20.,  50., 100.]), unit='K'),
)

We can access the entity using the name we have chosen when saving the file:

content.temperature
ThermodynamicTemperature(value=[ 10.  20.  50. 100.], unit=K)

We can also open the file ourselves and only read a single dataset. We then directly get the entity:

with h5py.File("test.hdf5") as f:
    print(me.from_hdf5(f["/temperature"]))
ThermodynamicTemperature(value=[ 10.  20.  50. 100.], unit=K)

Entity collection#

To group together multiple entities we use EntityCollections. For HDF5 mammos_entity maps the collection to an HDF5 group.

collection = me.EntityCollection(
    description="intrinsic properties",
    Tc=me.Tc(800, "K"),
    Ms=me.Ms(600, "kA/m"),
)
collection
EntityCollection(
    description='intrinsic properties',
    Tc=Entity(ontology_label='CurieTemperature', value=800.0, unit='K'),
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=600.0, unit='kA / m'),
)

We will show different options writing the collection.

First, we open the file ourselves and pass the open file object to the to_hdf5 method, which allows us to append to the previously generated file.

We pass a name for the group that will store the collection. It will be created automatically and will keep track of the order of the entities in the collection. Each entity of the collection will be stored as a dataset inside the newly created group. The names of these datasets will be the names of the entities in the collection.

with h5py.File("test.hdf5", "a") as f:  # append to the file created before
    collection.to_hdf5(f, "/properties")
!h5glance --attrs test.hdf5
test.hdf5
├properties
│ ├2 attributes:
│ │ ├description: 'intrinsic properties'
│ │ └mammos_entity_version: '0.12.0'
│ ├Tc	[float64: scalar]
│ │ └4 attributes:
│ │   ├description: ''
│ │   ├ontology_iri: 'https://w3id.org/em...3_a1d6_54c9f778343d'
│ │   ├ontology_label: 'CurieTemperature'
│ │   └unit: 'K'
│ └Ms	[float64: scalar]
│   └4 attributes:
│     ├description: ''
│     ├ontology_iri: 'https://w3id.org/em...b-9c9d-6dafaa17ef25'
│     ├ontology_label: 'SpontaneousMagnetization'
│     └unit: 'kA / m'
└temperature	[float64: 4]
  └5 attributes:
    ├description: ''
    ├mammos_entity_version: '0.12.0'
    ├ontology_iri: 'https://w3id.org/em...2_86c6_69e26182a17f'
    ├ontology_label: 'ThermodynamicTemperature'
    └unit: 'K'

We can see that our HDF5 file now has two top-level elements:

  • the dataset temperature created in the first step

  • the group properties with two datasets Tc and Ms created from the collection. The group attributes contain the collection description and the version of mammos_entity used to write the file. The individual entities of the collection do not record that version, as it can be inferred from the group.

We can read the whole file and get two nested collections:

me.from_hdf5("test.hdf5")
EntityCollection(
    description='',
    properties=EntityCollection(
        description='intrinsic properties',
        Tc=Entity(ontology_label='CurieTemperature', value=800.0, unit='K'),
        Ms=Entity(ontology_label='SpontaneousMagnetization', value=600.0, unit='kA / m'),
    ),
    temperature=Entity(ontology_label='ThermodynamicTemperature', value=array([ 10.,  20.,  50., 100.]), unit='K'),
)

We can also open the file first and let mammos_entity only read the group and get a single EntityCollection:

with h5py.File("test.hdf5") as f:
    print(me.from_hdf5(f["/properties"]))
EntityCollection(
    description='intrinsic properties',
    Tc=Entity(ontology_label='CurieTemperature', value=800.0, unit='K'),
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=600.0, unit='kA / m'),
)

If we pass the file name instead of the file object the previous file content will be overwritten:

collection.to_hdf5("test.hdf5", "/properties-overwritten")
!h5glance --attrs test.hdf5
test.hdf5
└properties-overwritten
  ├2 attributes:
  │ ├description: 'intrinsic properties'
  │ └mammos_entity_version: '0.12.0'
  ├Tc	[float64: scalar]
  │ └4 attributes:
  │   ├description: ''
  │   ├ontology_iri: 'https://w3id.org/em...3_a1d6_54c9f778343d'
  │   ├ontology_label: 'CurieTemperature'
  │   └unit: 'K'
  └Ms	[float64: scalar]
    └4 attributes:
      ├description: ''
      ├ontology_iri: 'https://w3id.org/em...b-9c9d-6dafaa17ef25'
      ├ontology_label: 'SpontaneousMagnetization'
      └unit: 'kA / m'

When writing collections the name attribute is optional. If not provided, all entities will be added to the file/group root. E.g. when creating a new file all entities and the collection description will be added directly to the root group:

collection.to_hdf5("test.hdf5")
!h5glance --attrs test.hdf5
test.hdf5
├2 attributes:
│ ├description: 'intrinsic properties'
│ └mammos_entity_version: '0.12.0'
├Ms	[float64: scalar]
│ └4 attributes:
│   ├description: ''
│   ├ontology_iri: 'https://w3id.org/em...b-9c9d-6dafaa17ef25'
│   ├ontology_label: 'SpontaneousMagnetization'
│   └unit: 'kA / m'
└Tc	[float64: scalar]
  └4 attributes:
    ├description: ''
    ├ontology_iri: 'https://w3id.org/em...3_a1d6_54c9f778343d'
    ├ontology_label: 'CurieTemperature'
    └unit: 'K'

Multiple writes#

We can add as many datasets/groups anywhere in the HDF5 file we like. We are also not limited to only writing data with mammos_entity and instead can also add other data.

collection = me.EntityCollection(
    description="intrinsic properties",
    Tc=me.Tc(800, "K"),
    Ms=me.Ms(600, "kA/m"),
    q=5 * u.mm**2,  # a hypothetic quantity
)
collection
EntityCollection(
    description='intrinsic properties',
    Tc=Entity(ontology_label='CurieTemperature', value=800.0, unit='K'),
    Ms=Entity(ontology_label='SpontaneousMagnetization', value=600.0, unit='kA / m'),
    q=<Quantity 5. mm2>,
)
with h5py.File("test.hdf5", "w") as f:  # overwrite the file created before
    # store an entity
    Hc = me.Hc(300, "kA/m").to_hdf5(f, "/Hc")

    # implicitly create a new group
    me.Entity("Length", 5, "nm", description="edge length x").to_hdf5(f, "/geometry/x")

    # pass a group instead of a file
    me.Entity("Length", 10, "nm", description="edge length y").to_hdf5(f["/geometry"], "y")

    # store a collection
    collection.to_hdf5(f, "intrinsic properties")

    # add additional entity to the collection 'intrinsic properties'
    me.Entity("ExchangeStiffnessConstant", 1e-11, "J/m").to_hdf5(f, "/intrinsic properties/A")

    # additional data, making use of other options available in h5py.File.create_dataset
    f.create_dataset("raw data", data=[0.1, 0.2, 0.3, 0.5, 0.9], dtype="float32")
!h5glance --attrs test.hdf5
test.hdf5
├Hc	[float64: scalar]
│ └5 attributes:
│   ├description: ''
│   ├mammos_entity_version: '0.12.0'
│   ├ontology_iri: 'https://w3id.org/em...8-886b-fa6d6052ce98'
│   ├ontology_label: 'CoercivityHcExternal'
│   └unit: 'kA / m'
├geometry
│ ├x	[float64: scalar]
│ │ └5 attributes:
│ │   ├description: 'edge length x'
│ │   ├mammos_entity_version: '0.12.0'
│ │   ├ontology_iri: 'https://w3id.org/em...1_b27e_2e88db027bac'
│ │   ├ontology_label: 'Length'
│ │   └unit: 'nm'
│ └y	[float64: scalar]
│   └5 attributes:
│     ├description: 'edge length y'
│     ├mammos_entity_version: '0.12.0'
│     ├ontology_iri: 'https://w3id.org/em...1_b27e_2e88db027bac'
│     ├ontology_label: 'Length'
│     └unit: 'nm'
├intrinsic properties
│ ├2 attributes:
│ │ ├description: 'intrinsic properties'
│ │ └mammos_entity_version: '0.12.0'
│ ├Tc	[float64: scalar]
│ │ └4 attributes:
│ │   ├description: ''
│ │   ├ontology_iri: 'https://w3id.org/em...3_a1d6_54c9f778343d'
│ │   ├ontology_label: 'CurieTemperature'
│ │   └unit: 'K'
│ ├Ms	[float64: scalar]
│ │ └4 attributes:
│ │   ├description: ''
│ │   ├ontology_iri: 'https://w3id.org/em...b-9c9d-6dafaa17ef25'
│ │   ├ontology_label: 'SpontaneousMagnetization'
│ │   └unit: 'kA / m'
│ ├q	[float64: scalar]
│ │ └1 attributes:
│ │   └unit: 'mm2'
│ └A	[float64: scalar]
│   └5 attributes:
│     ├description: ''
│     ├mammos_entity_version: '0.12.0'
│     ├ontology_iri: 'https://w3id.org/em...e-8eb8-8a900f2b3b78'
│     ├ontology_label: 'ExchangeStiffnessConstant'
│     └unit: 'J / m'
└raw data	[float32: 5]

In the attributes we can see that the version of mammos_entity is recorded for each outermost object written explicitely with to_hdf5. For example, the entity A records the version despite being part of the group intrinsic properties, the group geometry does not have that version because it has only been created implicitly as parent of entity x.

We can read the whole file as a single nested entity collection. It will read all groups/datasets and choose the most appropriate type:

with h5py.File("test.hdf5") as f:
    content = me.from_hdf5(f)

content
EntityCollection(
    description='',
    Hc=Entity(ontology_label='CoercivityHcExternal', value=300.0, unit='kA / m'),
    geometry=EntityCollection(
        description='',
        x=Entity(ontology_label='Length', value=5.0, unit='nm', description='edge length x'),
        y=Entity(ontology_label='Length', value=10.0, unit='nm', description='edge length y'),
    ),
    intrinsic properties=EntityCollection(
        description='intrinsic properties',
        Tc=Entity(ontology_label='CurieTemperature', value=800.0, unit='K'),
        Ms=Entity(ontology_label='SpontaneousMagnetization', value=600.0, unit='kA / m'),
        q=<Quantity 5. mm2>,
        A=Entity(ontology_label='ExchangeStiffnessConstant', value=1e-11, unit='J / m'),
    ),
    raw data=array([0.1, 0.2, 0.3, 0.5, 0.9], dtype=float32),
)

We can access individual elements of the nested structure:

content.Hc
CoercivityHcExternal(value=300.0, unit=kA / m)
content.geometry.x
Length(value=5.0, unit=nm, description='edge length x')

Some of the names are not valid python variables, so we have to use the dict interface of EntityCollection:

content["intrinsic properties"].Tc
CurieTemperature(value=800.0, unit=K)
content["intrinsic properties"].A
ExchangeStiffnessConstant(value=1e-11, unit=J / m)
content["intrinsic properties"].q
\[5 \; \mathrm{mm^{2}}\]
content["raw data"]
array([0.1, 0.2, 0.3, 0.5, 0.9], dtype=float32)