Usage#

Here is an example of using the pre-configured models in this package.

import uuid

import ctao_datamodel as dm
import ctao_datamodel.models.dataproducts as dp
from astropy.time import Time

Data product metadata is stored in the Product class. You can get more information about it using the functionality of this package. This is helpful to get an overview of what fields are available, which types are used, and which are optional.

dm.print_model(dp.ProductType)
Element                         : Type                           :Opt: Parent Relation
=====================================================================================
 producttype                    : ProductType                    :   : none                
   level                        : DataLevel                      :   : contains            
   division                     : DataDivision                   :   : contains            
   association                  : DataAssociation                :   : contains            
   type                         : DataType                       :   : contains            

Creating metadata for a data product#

First, we’ll determine the data product type we need. For that, see the CTAO DataProducts Data Model Specification, and see the table in the appendix to choose the right type. In this example, we will describe a category-3 (final) DL3 event list from an observation.

ProductType#

The ProductType represents the “class” of the data product; all products with the same type should contain the same metadata fields and data format.

To fill in the ProductType, we can either use the enumerations directly, or just use the string representations if you know them:

thetype = dp.ProductType(
    level="DL3", division="Event", association="Subarray", type=dp.DataType.OBSERVATION
)
thetype
ProductType(level=<DataLevel.DL3: 'DL3'>, division=<DataDivision.EVENT: 'Event'>, association=<DataAssociation.SUBARRAY: 'Subarray'>, type=<DataType.OBSERVATION: 'Observation'>)
print(thetype)
DL3/Event/Subarray/Observation

Note that when we turn it into a string, it gets represented in a standard format. You can also create one from an existing string:

thetype = dp.ProductType.from_str("DL3/Event/Subarray/Observation")
thetype
ProductType(level=<DataLevel.DL3: 'DL3'>, division=<DataDivision.EVENT: 'Event'>, association=<DataAssociation.SUBARRAY: 'Subarray'>, type=<DataType.OBSERVATION: 'Observation'>)

We can use the flatten_model_instance() function to see where we are so far:

dm.flatten_model_instance(thetype)
{'level': 'DL3',
 'division': 'Event',
 'association': 'Subarray',
 'type': 'Observation'}

InstanceIdentifier#

Next we need to know which instance identifier fields are required for this data product. Again, consult the table in the DataProducts Data Model Specification, where we can see that we need the obs_id. Let’s use 1000012345.

instance = dp.InstanceIdentifier(obs_id=1000012345)
dm.flatten_model_instance(instance)
{'id': 'ec5c4160-e688-44ff-8db9-17bf21d81ef8',
 'obs_id': 1000012345,
 'facility_name': 'CTAO'}

Curation#

curation = dp.Curation(
    release="CTAO/DL3-DR1", copyright="CTAO gGmbH", rights=dp.DataRights.PUBLIC
)
dm.flatten_model_instance(curation)
{'release': 'CTAO/DL3-DR1',
 'license': 'CC-BY-SA-4.0',
 'license_url': 'https://creativecommons.org/licenses/by-sa/4.0/',
 'copyright': 'CTAO gGmbH',
 'rights': 'public'}

DataModel and Contact#

We also need to describe the data model that was used to serialize the data product, as well as some contact information:

model = dp.DataModel(
    name="GADF",
    version="v0.3",
    url="https://gamma-astro-data-formats.readthedocs.io/en/v0.3/",
)
model
DataModel(name='GADF', version='v0.3', url=AnyUrl('https://gamma-astro-data-formats.readthedocs.io/en/v0.3/'))
contact = dp.Contact(name="CTAO HelpDesk", email="help@ctao.org", organization="CTAO")
contact
Contact(name='CTAO HelpDesk', organization='CTAO', email='help@ctao.org')

Activity Provenance metadata#

The models.dataproducts.Activity metadata describes part of the local provenance of the data product by specifying the software or human process that generated it. Let’s define an example here. The id of the activity should be a unique UUID generated when the activity started, and it allows one to link multiple data products that were generated by the same software.

Optionally, we can also add linked data products that were used as input to the activity

activity = dp.Activity(
    process=dp.ObservatoryProcess.DATA_PROCESSING,
    name="generate-dl3.cwl",
    description="Workflow that produces DL3 data products for an observation",
    id=uuid.uuid1(),
    start=Time("2025-11-28 13:45:12.62"),
    software=dp.Software(
        name="ctao-datapipe",
        version="v1.1.0",
        url="http://cta-computing.gitlab-pages.cta-observatory.org/dpps/datapipe/datapipe/latest/",
    ),
    configuration_id="hillas-standard",
)
dm.flatten_model_instance(activity)
{'process': 'data_processing',
 'name': 'generate-dl3.cwl',
 'description': 'Workflow that produces DL3 data products for an observation',
 'id': '67a299c2-2158-11f1-9963-467970a44361',
 'start': '2025-11-28T13:45:12.620000000',
 'software.name': 'ctao-datapipe',
 'software.version': 'v1.1.0',
 'software.url': 'http://cta-computing.gitlab-pages.cta-observatory.org/dpps/datapipe/datapipe/latest/',
 'configuration_id': 'hillas-standard'}

Optionally, we can also add linked data products that were used as input to the activity. Here we expect the models.dataproducts.ExternalDataProduct id to be the unique instance identifier id of the other data product.

activity.inputs = [
    dp.ExternalDataProduct(
        id="1fcade22-d04d-11f0-84af-acde48001122",
        uri="file:./irf_calibration.fits",
        role="IRF calibration coefficients",
    ),
    dp.ExternalDataProduct(
        id="1e6f6306-d050-11f0-8af5-acde48001122",
        uri="file:./other_input.fits",
        role="Some critical input",
    ),
]

Observation context#

Since this data product is associated with an observation, we can include some optional information about the spatial/temporal/spectral coverage of the data product. It is optional, since it is linked to the obs_id we already set, however, including it in the data product metadata itself is often useful for discoverability.

observation = dp.Observation(
    coverage=dp.Coverage(
        time=dp.TimeCoverage(
            t_min="2026-10-02 15:13:21.1", t_max="2026-10-02 15:13:41.244"
        ),
        space=dp.SpaceCoverage(frame="ICRS", ra=129.23, dec=-42.102, field_of_view=6.0),
        energy=dp.EnergyCoverage(energy_unit="TeV", energy_min=0.003, energy_max=300.0),
    ),
)
dm.flatten_model_instance(observation)
{'coverage.time.t_min': '2026-10-02T15:13:21.100000000',
 'coverage.time.t_max': '2026-10-02T15:13:41.244000000',
 'coverage.space.frame': 'ICRS',
 'coverage.space.ra': 129.23,
 'coverage.space.dec': -42.102,
 'coverage.space.field_of_view': 6.0,
 'coverage.energy.energy_unit': 'TeV',
 'coverage.energy.energy_min': 0.003,
 'coverage.energy.energy_max': 300.0}

Full Product metadata#

Finally, let’s build the full Product metadata:

product = dp.Product(
    data=thetype,
    instance=instance,
    description="An example DL3 Event list",
    creation_time=Time("2025-11-28 14:15:16.123"),
    curation=curation,
    model=model,
    contact=contact,
    activity=activity,
    observation=observation,
)
dm.flatten_model_instance(product)
{'ctao_metadata_version': '1.0.0rc3',
 'description': 'An example DL3 Event list',
 'data.level': 'DL3',
 'data.division': 'Event',
 'data.association': 'Subarray',
 'data.type': 'Observation',
 'instance.id': 'ec5c4160-e688-44ff-8db9-17bf21d81ef8',
 'instance.obs_id': 1000012345,
 'instance.facility_name': 'CTAO',
 'curation.release': 'CTAO/DL3-DR1',
 'curation.license': 'CC-BY-SA-4.0',
 'curation.license_url': 'https://creativecommons.org/licenses/by-sa/4.0/',
 'curation.copyright': 'CTAO gGmbH',
 'curation.rights': 'public',
 'model.name': 'GADF',
 'model.version': 'v0.3',
 'model.url': 'https://gamma-astro-data-formats.readthedocs.io/en/v0.3/',
 'creation_time': '2025-11-28T14:15:16.123000000',
 'contact.name': 'CTAO HelpDesk',
 'contact.organization': 'CTAO',
 'contact.email': 'help@ctao.org',
 'activity.process': 'data_processing',
 'activity.name': 'generate-dl3.cwl',
 'activity.description': 'Workflow that produces DL3 data products for an observation',
 'activity.id': '67a299c2-2158-11f1-9963-467970a44361',
 'activity.start': '2025-11-28T13:45:12.620000000',
 'activity.software.name': 'ctao-datapipe',
 'activity.software.version': 'v1.1.0',
 'activity.software.url': 'http://cta-computing.gitlab-pages.cta-observatory.org/dpps/datapipe/datapipe/latest/',
 'activity.configuration_id': 'hillas-standard',
 'activity.inputs.0.uri': 'file:///irf_calibration.fits',
 'activity.inputs.0.role': 'IRF calibration coefficients',
 'activity.inputs.0.id': '1fcade22-d04d-11f0-84af-acde48001122',
 'activity.inputs.1.uri': 'file:///other_input.fits',
 'activity.inputs.1.role': 'Some critical input',
 'activity.inputs.1.id': '1e6f6306-d050-11f0-8af5-acde48001122',
 'observation.coverage.time.t_min': '2026-10-02T15:13:21.100000000',
 'observation.coverage.time.t_max': '2026-10-02T15:13:41.244000000',
 'observation.coverage.space.frame': 'ICRS',
 'observation.coverage.space.ra': 129.23,
 'observation.coverage.space.dec': -42.102,
 'observation.coverage.space.field_of_view': 6.0,
 'observation.coverage.energy.energy_unit': 'TeV',
 'observation.coverage.energy.energy_min': 0.003,
 'observation.coverage.energy.energy_max': 300.0}

Note that a few fields have been filled in automatically, like instance.id, which should be unique when this metadata is created.

Conversion to and from FITS style keys#

Note that by default any keyword with a fits_keyword mapping attribute is translated automatically. Keys without one will use the HIERARCH CTAO X X X long-keyword convention.

header = dm.instance_to_fits_header(product)
header
WARNING: VerifyWarning: Card is too long, comment will be truncated. [astropy.io.fits.card]
CTAOMETA= '1.0.0rc3'           / CTAO DataProducts Metadata Version             
TITLE   = 'An example DL3 Event list' / Human-readable description of this data 
DATALEVL= 'DL3     '           / CTAO Data Level, see Top-Level Data Model. Opti
DATADIV = 'Event   '           / Primary data type.  See the CTAO Top-level Data
DATAASSO= 'Subarray'           / The main associated instrument or analysis part
DATATYPE= 'Observation'        / The specific type of the product.  This is used
DATAID  = 'ec5c4160-e688-44ff-8db9-17bf21d81ef8' / A locally-generated unique ID
OBS_ID  =           1000012345 / Unique identifier of the observation block, in 
TELESCOP= 'CTAO    '           / Observatory or facility used to collect the dat
RELEASE = 'CTAO/DL3-DR1'       / Name of the data release (data collection) that
LICENSE = 'CC-BY-SA-4.0'       / License for this data product                  
LICENURL= 'https://creativecommons.org/licenses/by-sa/4.0/' / URL pointing to th
COPYRIGH= 'CTAO gGmbH'         / Copyright holder(s) of this data product       
RIGHTS  = 'public  '           / Availability of the data product, if known at t
MODEL   = 'GADF    '           / Name of the overall data model, which may conta
MODELVER= 'v0.3    '           / Version number of the data model               
MODELURL= 'https://gamma-astro-data-formats.readthedocs.io/en/v0.3/' / URL or DO
CREATED = '2025-11-28T14:15:16.123000000' / UTC Date-time the data product was c
AUTHOR  = 'CTAO HelpDesk'      / Contact name for this data product.            
ORIGIN  = 'CTAO    '           / Contact organization name of this data product.
EMAIL   = 'help@ctao.org'      / Contact's email address                        
ACTPROC = 'data_processing'    / Observatory operational Process.  These are the
ACTIVITY= 'generate-dl3.cwl'   / Name of the activity that produced this data pr
ACTDESC = 'Workflow that produces DL3 data products for an observation' / Human-
ACTID   = '67a299c2-2158-11f1-9963-467970a44361' / Unique identifier of this pro
ACTSTART= '2025-11-28T13:45:12.620000000' / Start time of the activity          
SOFTWARE= 'ctao-datapipe'      / Descriptive name of the software.              
SOFTVER = 'v1.1.0  '           / Version number                                 
SOFTURL = 'http://cta-computing.gitlab-pages.cta-observatory.org/dpps/datapipe&'
CONTINUE  '/datapipe/latest/&'                                                  
CONTINUE  '' / URL or DOI linking to more detail.                               
ANAMODE = 'hillas-standard'    / Identifier for the configuration for the softwa
HIERARCH CTAO ACTIVITY INPUTS 0 URI = 'file:///irf_calibration.fits' / URI of th
HIERARCH CTAO ACTIVITY INPUTS 0 ROLE = 'IRF calibration coefficients' / context 
HIERARCH CTAO ACTIVITY INPUTS 0 ID = '1fcade22-d04d-11f0-84af-acde48001122' / Un
HIERARCH CTAO ACTIVITY INPUTS 1 URI = 'file:///other_input.fits' / URI of the da
HIERARCH CTAO ACTIVITY INPUTS 1 ROLE = 'Some critical input' / context of the da
HIERARCH CTAO ACTIVITY INPUTS 1 ID = '1e6f6306-d050-11f0-8af5-acde48001122' / Un
TSTART  = '2026-10-02T15:13:21.100000000' / Start of time range of the data as e
TSTOP   = '2026-10-02T15:13:41.244000000' / End of time range of the data, as ei
RADESYS = 'ICRS    '           / Standard equatorial coordinate system used for 
RA_PNT  =               129.23 / [deg] ICRS Right ascension of the center of the
DEC_PNT =              -42.102 / [deg] ICRS Declination of the center of the reg
FOV     =                  6.0 / [deg] Approximate diameter (not radius) of the 
EUNIT   = 'TeV     '           / Unit for all energy metadata.                  
EMIN    =                0.003 / [TeV] Approximate minimum energy of the dataset
EMAX    =                300.0 / [TeV] Approximate maximum energy of the dataset
new_instance = dm.fits_header_to_instance(header, dp.Product)
new_instance
Product(ctao_metadata_version='1.0.0rc3', description='An example DL3 Event list', data=ProductType(level=<DataLevel.DL3: 'DL3'>, division=<DataDivision.EVENT: 'Event'>, association=<DataAssociation.SUBARRAY: 'Subarray'>, type=<DataType.OBSERVATION: 'Observation'>), instance=InstanceIdentifier(id=UUID('ec5c4160-e688-44ff-8db9-17bf21d81ef8'), obs_id=1000012345, event_type_group=None, ae_id=None, ae_class=None, subarray_id=None, chunk_id=None, batch_id=None, calibration_service_id=None, observing_night=None, sublevel_id=None, target_id=None, region_id=None, observing_period_id=None, lunar_cycle_id=None, facility_name=<FacilityName.CTAO: 'CTAO'>, site_id=None, particle_pdgid=None, category=None, data_source=None, assembly_name=None), curation=Curation(release='CTAO/DL3-DR1', reference=None, license='CC-BY-SA-4.0', license_url='https://creativecommons.org/licenses/by-sa/4.0/', copyright='CTAO gGmbH', rights=<DataRights.PUBLIC: 'public'>, release_date=None, valid_from=None, valid_to=None), model=DataModel(name='GADF', version='v0.3', url=AnyUrl('https://gamma-astro-data-formats.readthedocs.io/en/v0.3/')), disclaimer=None, creation_time=<Time object: scale='utc' format='isot' value=2025-11-28T14:15:16.123>, contact=Contact(name='CTAO HelpDesk', organization='CTAO', email='help@ctao.org'), activity=Activity(process=<ObservatoryProcess.DATA_PROCESSING: 'data_processing'>, name='generate-dl3.cwl', description='Workflow that produces DL3 data products for an observation', id=UUID('67a299c2-2158-11f1-9963-467970a44361'), start=<Time object: scale='utc' format='isot' value=2025-11-28T13:45:12.620>, end=None, software=Software(name='ctao-datapipe', version='v1.1.0', url=AnyUrl('http://cta-computing.gitlab-pages.cta-observatory.org/dpps/datapipe/datapipe/latest/')), configuration_id='hillas-standard', inputs=[ExternalDataProduct(uri=AnyUrl('file:///irf_calibration.fits'), role='IRF calibration coefficients', id=UUID('1fcade22-d04d-11f0-84af-acde48001122')), ExternalDataProduct(uri=AnyUrl('file:///other_input.fits'), role='Some critical input', id=UUID('1e6f6306-d050-11f0-8af5-acde48001122'))]), observation=Observation(coverage=Coverage(time=TimeCoverage(reference=None, t_min=<Time object: scale='utc' format='isot' value=2026-10-02T15:13:21.100>, t_max=<Time object: scale='utc' format='isot' value=2026-10-02T15:13:41.244>), space=SpaceCoverage(frame=<SpatialFrame.ICRS: 'ICRS'>, ra=129.23, dec=-42.102, field_of_view=6.0, region_of_interest=None, moc=None), energy=EnergyCoverage(energy_unit=Unit("TeV"), energy_min=0.003, energy_max=300.0), tracking=None), location=None), acquisition=None)

Visualizing#

This code includes a simple wrapper class for visualizing models with PlantUML in a notebook, which can be used as follows:

dm.PlantUMLDiagram(dp.ProductType)

By default, relationship classes show without details, however to add more detail, you can combine diagrams:

dm.PlantUMLDiagram(dp.ProductType) + dm.PlantUMLDiagram(dp.DataDivision)

To visualize full diagrams, use the details option

dm.PlantUMLDiagram(dp.ProductType, details=True)

You can also add custom diagram text by summing diagrams, for example yu can use it to change colors or to add some hidden connectors to influence the diagram output.

preamble = dm.PlantUMLDiagram(
    """
    hide circles
    package CTAO.DataProducts #lightblue-white {
    }
    class CTAO.DataProducts.DataType #red

    CTAO.DataProducts.ProductType -[hidden]u-> CTAO.DataProducts.DataType
    """
)

preamble + dm.PlantUMLDiagram(dp.ProductType)