The siminterface Module

Interacting with Simulations Using SimIM

Once they are formatted, SimIM interfaces with all simulation data through “Handler” classes. The SimHandler class provides an interface to all data from a simulation, while the SnapHandler class provides an interface with data from a single simulation snapshot.

class simim.siminterface.SimHandler(sim, init_snaps=False, in_h_units=False)

Class to handle I/O for subhalo/galaxy catalogs in SimIM format

This class handles basic operations for accessing and analyzing simulation data. It is a wrapper around Handlers for individual snapshots, and in many cases performs operations iteratively over all snapshots in a particular simulation. It is also a convenient wrapper for accessing specific snapshots of a given simulation.

The general philosophy of Handlers is to not load actual properties of a halo into memory until they are requested, and to remove them from memory when they are no longer in use (or at least make it convenient to do so).

Methods

delete_property(*property_names)

Remove a property from the saved file on the disk for all simulation snapshots

extract_snap_keys()

Get the fields associated with halos in the simulation

extract_snap_meta(snap)

Get the meta-data for a snapshot

get_mass_index(mass, snap[, in_h_units])

Find the indices above a specified mass

get_snap(snap)

Return a SnapHandler instance for a specified snapshot

get_snap_from_z(z)

Return a SnapHandler instance for the snapshot closest to a requested redshift

initialize_all_snaps([remake])

Initialize SnapHandlers for each snapshot

make_property(property[, rename, kw_remap, ...])

Use a galprops.prop instance to evaluate a new property over all snapshots

number_volumes(volume[, in_h_units])

Compute the number of times a specified volume can fit in the simulation box

set_in_h_units(in_h_units)

Globally set whether units are interpreted to be in little h units

set_property_range([property_name, pmin, ...])

Restrict property range for all snapshots

snap_stat(stat_function, kwargs[, kw_remap, ...])

Evaluate stat_function over every snapshot and return results

z_to_snap(z)

Determine the snapshot corresponding to a particular redshift

Initialize Handler for a specified simulation

Provides a generic interface for interacting with data from any simulation that has been converted to SimIM format and is accessible to memory. Note that simulations must be downloaded and formatted (see e.g. simim.siminterface.illustris or simim.siminterface.universemachine for code to accomplish this)

Parameters:
simstring

Name of the simulation to load.

init_snapsbool, default=False

Setting this as True will create persistent Handler instances for every snapshot, rather than doing so when data from a given handler is called for. This is generally not necessary, but is used when creating properties for all snapshots but NOT writing them to disk.

in_h_unitsbool

If True, values will be returned, plotted, etc. in units including little h. If False, little h dependence will be removed. This can be overridden in most method calls.

Methods

delete_property(*property_names)

Remove a property from the saved file on the disk for all simulation snapshots

extract_snap_keys()

Get the fields associated with halos in the simulation

extract_snap_meta(snap)

Get the meta-data for a snapshot

get_mass_index(mass, snap[, in_h_units])

Find the indices above a specified mass

get_snap(snap)

Return a SnapHandler instance for a specified snapshot

get_snap_from_z(z)

Return a SnapHandler instance for the snapshot closest to a requested redshift

initialize_all_snaps([remake])

Initialize SnapHandlers for each snapshot

make_property(property[, rename, kw_remap, ...])

Use a galprops.prop instance to evaluate a new property over all snapshots

number_volumes(volume[, in_h_units])

Compute the number of times a specified volume can fit in the simulation box

set_in_h_units(in_h_units)

Globally set whether units are interpreted to be in little h units

set_property_range([property_name, pmin, ...])

Restrict property range for all snapshots

snap_stat(stat_function, kwargs[, kw_remap, ...])

Evaluate stat_function over every snapshot and return results

z_to_snap(z)

Determine the snapshot corresponding to a particular redshift

initialize_all_snaps(remake=False)

Initialize SnapHandlers for each snapshot

Parameters:
remakebool, default=False

Determines whether snapshots should be re-initialized if this method is called twice

set_in_h_units(in_h_units)

Globally set whether units are interpreted to be in little h units

Changes the default way units are processed

Parameters:
in_h_unitsbool

If True, values will, by default, be returned in units including little h. If False, little h dependence will be removed.

number_volumes(volume, in_h_units=None)

Compute the number of times a specified volume can fit in the simulation box

Parameters:
volumefloat

The volume to check in units of Mpc^3

in_h_unitsbool (default is determined by self.default_in_h_units)

If True the value of volume will be assumed to have units of (Mpc/h)^3

extract_snap_meta(snap)

Get the meta-data for a snapshot

Parameters:
snapint

Number of snapshot to be extracted

Returns:
snap_meta

The meta data for the requested snapshot

z_to_snap(z)

Determine the snapshot corresponding to a particular redshift

Parameters:
zfloat

Redshift to search for

Returns:
snap_ind

The index number of the snapshot matching the requested redshift

extract_snap_keys()

Get the fields associated with halos in the simulation

Parameters:
none
Returns:
keys

The fields of each snapshot

get_mass_index(mass, snap, in_h_units=None)

Find the indices above a specified mass

Parameters:
massfloat

Minimum mass to access in Msun units

snapint

Number of snapshot to be extracted

in_h_unitsbool (default is determined by self.default_in_h_units)

If True, mass will be taken to have units including little h, otherwise, it will be assumed to have units with no h dependence.

Returns:
indexint

The index

get_snap(snap)

Return a SnapHandler instance for a specified snapshot

Parameters:
snapint

Index-number of the desired snapshot

Returns:
SnapHandler

A SnapHandler instance for the requested snapshot

get_snap_from_z(z)

Return a SnapHandler instance for the snapshot closest to a requested redshift

Parameters:
zfloat

The desired redshift for the snap

Returns:
SnapHandler

A SnapHandler instance for the requested snapshot

set_property_range(property_name=None, pmin=-inf, pmax=inf, reset=True, in_h_units=None)

Restrict property range for all snapshots

This is a wraper around SnapHandler.set_property_range that iteratively applies it to all snapshots. Initializing handlers for each snapshot is necessary for this to work.

Parameters:
property_namestr

The name of the field to use

pminfloat

The minimum value of the property to bracket the selected range.

pmaxfloat

The maximum value of the property to bracket the selected range.

resetbool, optional

If True, the active indices will be those selected between pmin and pmax. If False, the active indices will be that satisfy pmin<=p<=pmax and which were previously in the active indices (ie this allows selection over multiple properties.)

in_h_unitsbool (default is determined by self.default_in_h_units)

If True, pmin and pmax will be taken to have units including little h, otherwise, they will be assumed to have units with no h dependence (and have the correct dependency applied before setting cuts for parameters where the stored catalog values are in h units).

make_property(property, rename=None, kw_remap={}, other_kws={}, overwrite=False, use_all_inds=False, write=False, writedtype=None)

Use a galprops.prop instance to evaluate a new property over all snapshots

This is a wraper around SnapHandler.make_property that iteratively applies it to all snapshots. For this to work, either 1) write must be set to True (resulting in the new property being saved to disk) or 2) SnapHandlers must be initialized for each snapshot, in which case the property can be stored only in memory. The latter is likely to require a significant allocation of memory and should be used carefully.

Parameters:
propertygalprops.property instance

The galprops.property instance containing the property information and generating function

renamelist, optional

List of names specifying how to rename the property from the name specified in the galprops.prop instance

kw_remapdict, optional

A dictinary remaping kwargs of the property generating function to different properties of the lightcone. By default if the function calls for kwarg ‘x’ it will be evaluated on simulation property ‘x’, but passing the dictionary {‘x’:’y’} will result in the function being evaluated on simulation property ‘y’.

other_kwsdict, optional

A dictionary of additional keyword arguments passed directly to the property.prop_function call

overwritebool, default=False

Default is False. If a property name is already in use and overwrite is False, an error will be raised. Otherwise the property will be overwritten.

use_all_indsbool, default=False

If True values will be assigned for all halos, otherwise only active halos will be evaluated, and others will be assigned nan.

writebool, default=False

If True values of the new property will be written to the disk storage for the simulation snapshots.

writedtypeNone or dtype

Specifies the data format to write the new property in.

Returns:
None
delete_property(*property_names)

Remove a property from the saved file on the disk for all simulation snapshots

Parameters:
property_namesstr

The name of the field to be written, can give multiple

Returns:
None
snap_stat(stat_function, kwargs, kw_remap={}, other_kws={}, give_args_in_h_units=None, use_all_inds=False, snaps=None)

Evaluate stat_function over every snapshot and return results

This is a wraper around SnapHandler.eval_stat that iteratively applies it to all snapshots.

Parameters:
stat_functionfunction

Any function which can be applied to data in a snapshot

kwargslist

List containing the arguments that must be passed to stat_function

kw_remapdict

Dictionary mapping between function arguments (listed in kwargs) as the keys and the names of handler properties to feed in as values. E.g. to provide handler property ‘mass’ to stat_function argument ‘a’ one would use kw_remap={‘a’:’mass’}

other_kwsdict, optional

A dictionary of additional keyword arguments passed directly to the stat_function call

use_all_indsbool, default=False

If True function will be computed using all halos, otherwise only active halos will be evaluated.

give_args_in_h_unitsbool (default is determined by self.default_in_h_units)

If True, values will be fed to stat_function in units including little h. If False, little h dependence will be removed.

snapslist, optional

A list of snapshots on which to evaluate the stat_function. If none is specified all snapshots will be used.

Returns:
valslist

List containing the value(s) returned by stat_function on each snapshot

redshiftslist

List containing the redshift of each snapshot

class simim.siminterface.SnapHandler(path, snap, redshift, cosmo, box_edge, in_h_units=False)

Handler for individual snapshots - see generic Handler documentation.

The simplest way to initialize a SnapHandler instance is probably via a SimHandler instance for the simulation containing the snapshot in question. Then the method SimHandler.get_snap will return a Handler instance for the snapshot with only the snapshot index-number specified.

Methods

delete_property(*property_names)

Remove a property from the saved file on the disk

eval_stat(stat_function, kwargs[, kw_remap, ...])

Evaluate stat_function over the objects in a Handler instance and return the result

extract_keys([set])

Get the fields attached to a file

grid(*property_names[, in_h_units, ...])

Place selected properties into a 3d grid

has_loaded(property_name)

Check whether a property has been loaded into memory

has_property(property_name)

Check whether a property has been loaded into memory

hist(*property_names[, use_all_inds, ...])

Make a histogram of a property

load_property(*property_names)

Load a property from file into memory

make_property(property[, rename, kw_remap, ...])

Use a galprops.prop instance to evaluate a new property

plot(xname, *ynames[, use_all_inds, save, ...])

Make a scatter plot of two properties

return_property(property_name[, ...])

Load a property from file and return

set_in_h_units(in_h_units)

Globally set whether units are interpreted to be in little h units

set_property_range([property_name, pmin, ...])

Set a range in a given property to be the active indices.

unload_property(*property_names)

Remove a property from memory (does not erase from file on disk)

write_property(*property_names[, overwrite, ...])

Write a property from object memory onto the saved file on the disk

Initialize Handler for a simulation snapshot

Parameters:
pathstring

Path to SimIM formatted file containing the snapshot (most likely [path to SimIM data directory]/[Simulation Name]/data.hdf5)

snapint

Index-number of the snapshot within the whole simulation

redshiftfloat

Redshift at which the snapshot was taken

cosmodict

Dictionary containing the cosmological parameters for the simulation

box_edgefloat

The edge length of the simulation box, should ALWAYS be in units of Mpc/h.

in_h_unitsbool

If True, values will be returned, plotted, etc. in units including little h. If False, little h dependence will be removed. This can be overridden in most method calls.

Methods

delete_property(*property_names)

Remove a property from the saved file on the disk

eval_stat(stat_function, kwargs[, kw_remap, ...])

Evaluate stat_function over the objects in a Handler instance and return the result

extract_keys([set])

Get the fields attached to a file

grid(*property_names[, in_h_units, ...])

Place selected properties into a 3d grid

has_loaded(property_name)

Check whether a property has been loaded into memory

has_property(property_name)

Check whether a property has been loaded into memory

hist(*property_names[, use_all_inds, ...])

Make a histogram of a property

load_property(*property_names)

Load a property from file into memory

make_property(property[, rename, kw_remap, ...])

Use a galprops.prop instance to evaluate a new property

plot(xname, *ynames[, use_all_inds, save, ...])

Make a scatter plot of two properties

return_property(property_name[, ...])

Load a property from file and return

set_in_h_units(in_h_units)

Globally set whether units are interpreted to be in little h units

set_property_range([property_name, pmin, ...])

Set a range in a given property to be the active indices.

unload_property(*property_names)

Remove a property from memory (does not erase from file on disk)

write_property(*property_names[, overwrite, ...])

Write a property from object memory onto the saved file on the disk

set_in_h_units(in_h_units)

Globally set whether units are interpreted to be in little h units

Changes the default way units are processed

Parameters:
in_h_unitsbool

If True, values will, by default, be returned in units including little h. If False, little h dependence will be removed.

grid(*property_names, in_h_units=None, use_all_inds=False, res=None, xlim=None, ylim=None, zlim=None, norm=None)

Place selected properties into a 3d grid

Uses the properties of the array to construct a position (pos_x,pos_y,pos_z)- value (property_names) grid. Only required argument is a valid property name or names. Additional arguments can specify the limits and resolution of the grid

Parameters:
property_namesstr

The name or names of properties in the Handler instance

in_h_unitsbool (default is determined by self.default_in_h_units)

If True, positions and property values fed to the gridder will be in units including little h. If False, little h dependence will be removed. Defaults to whatever is set globally for the Handler instance.

use_all_indsbool, default=False

If True function all halos will be gridded, otherwise only active halos will be included.

resfloat, optional

The resolution for the grid in Mpc (if in_h_units==False) or Mpc/h (if in_h_units==True). If no value is specified, it will default to 1/100th of the box edge length

xlim, xylim, zlimtuples, optional

Tuples containing minimum and maximum values of the grid along the x, y, and z axes, in units of Mpc (if in_h_units==False) or Mpc/h (if in_h_units==True). If no values are specified the defaults are (0, box edge length).

normNone, ‘cell_volume’, float

Apply a normalization to the gridded values. Default is None, if ‘cell_volume’ is specified each cell will be divided by its volume. If a float is given each cell will multiplied by the float

Returns:
gridsimim.map.grid instance

The gridded properties

delete_property(*property_names)

Remove a property from the saved file on the disk

Parameters:
property_namesstr

The name of the field to be written, can give multiple

Returns:
None
eval_stat(stat_function, kwargs, kw_remap={}, other_kws={}, use_all_inds=False, give_args_in_h_units=None)

Evaluate stat_function over the objects in a Handler instance and return the result

This can be used to evaluate any function of the the properties contained in the simulation, but is generally envisioned as a way to compute ensemble statistics (means, luminosity functions, correlations, etc.)

Parameters:
stat_functionfunction

Any function which can be applied to data in a Handler instance

kwargslist

List containing the arguments that must be passed to stat_function

kw_remapdict

Dictionary mapping between function arguments (listed in kwargs) as the keys and the names of Handler properties to feed in as values. E.g. to provide handler property ‘mass’ to stat_function argument ‘a’ one would use kw_remap={‘a’:’mass’}

other_kwsdict, optional

A dictionary of additional keyword arguments passed directly to the stat_function call

use_all_indsbool, default=False

If True function will be computed using all halos, otherwise only active halos will be evaluated.

give_args_in_h_unitsbool (default is determined by self.default_in_h_units)

If True, values will be fed to stat_function in units including little h. If False, little h dependence will be removed first. Defaults to whatever is set globally for the Handler instance.

Returns:
vals

The value(s) returned by stat_function

Examples

Compute the total halo mass in a simulation accesed via the variable handler:

>>> handler.eval_stat(np.sum, kwargs=['a'], kw_remap={'a':'mass'})
extract_keys(set='any')

Get the fields attached to a file

Parameters:
set{‘any’,’loaded’,’saved’,’generated’}

What type of keys to return, default is all

Returns:
keys

The fields associated with the lightcone

has_loaded(property_name)

Check whether a property has been loaded into memory

Parameters:
property_namestr

The name of the field to be loaded

Returns:
loadedbool

True if the property is loaded, otherwise, false

has_property(property_name)

Check whether a property has been loaded into memory

Parameters:
property_namestr

The name of the field to be loaded

Returns:
existsbool

True if the property is present, otherwise, false

hist(*property_names, use_all_inds=False, logtransform=False, save=None, axkws={}, plotkws={}, in_h_units=None)

Make a histogram of a property

Parameters:
*property_namesstr

The name(s) of the field(s) to use, multiple fields can be given and will be plotted on the same axes with the same settings

logtransformbool, optional

If set to True, will take the log of the property before making the histogram

use_all_indsbool, optional

If True values will be assigned for all halos, otherwise only active halos will be evaluated, and others will be assigned nan.

savestr, optional

If specified, the plot will be saved to the given location

axkwsdict, optional

A dictionary of keyword args and values that will be fed to ax.set() when creating the plot axes

plotkwsdict, optional

A dictionary of keyword args and values that will be fed to plt.hist() when creating the plot data

in_h_unitsbool (default is determined by self.default_in_h_units)

If True, values will be plotted in units including little h. If False, little h dependence will be removed. Defaults to whatever is set globally for the Handler instance.

Returns:
None
load_property(*property_names)

Load a property from file into memory

Parameters:
property_namesstr

The name of the field to be loaded, can give multiple

Returns:
none
make_property(property, rename=None, kw_remap={}, other_kws={}, overwrite=False, use_all_inds=False, write_to_disk=False, overwrite_to_disk=False, dtype_to_disk=None, unload_to_disk=False)

Use a galprops.prop instance to evaluate a new property

Parameters:
propertygalprops.prop instance

The galprops.property instance containing the property information and generating function

renamelist, optional

List of names specifying how to rename the property from the name specified in the galprops.prop instance

kw_remapdict, optional

A dictinary remaping kwargs of the property generating function to different properties of the lightcone. By default if the function calls for kwarg ‘x’ it will be evaluated on simulation property ‘x’, but passing the dictionary {‘x’:’y’} will result in the function being evaluated on simulation property ‘y’.

other_kwsdict, optional

A dictionary of additional keyword arguments passed directly to the property.prop_function call

overwritebool, optional

Default is False. If a property name is already in use and overwrite is False, an error will be raised. Otherwise the property will be overwritten.

use_all_indsbool

If True values will be assigned for all halos, otherwise only active halos will be evaluated, and others will be assigned nan.

write_to_diskbool

Default is False. If True, write assessed property to disk (note use_all_inds must be True)

overwrite_to_diskbool

Default is False. If a property name is already in use on disk and overwrite_to_disk is False, an error will be raised when trying to write to disk. Otherwise the property will be overwritten. Be careful if you set this to True.

dtype_to_diskNone or data type

Specify the data type to use for saving writing the data - useful for converting to lower precision floats for using less storage

unload_to_diskbool

Default is False. If True, will unload properties after writing them to disk.

Returns:
None
plot(xname, *ynames, use_all_inds=False, save=None, axkws={}, plotkws={}, in_h_units=None)

Make a scatter plot of two properties

Parameters:
xnamestr

The name of the field to use as the x-value

*ynamesstr

The name(s) of the field(s) to use as the y-value. Multiple fields can be given and will be plotted on the same axes against a single x vale

use_all_indsbool or ‘compare’, optional

If True values will be assigned for all halos, otherwise only active halos will be evaluated, and others will be assigned nan. If ‘compare’, both sets of indices will be plotted to allow easy comparison.

savestr, optional

If specified, the plot will be saved to the given location

axkwsdict, optional

A dictionary of keyword args and values that will be fed to ax.set() when creating the plot axes

plotkwsdict, optional

A dictionary of keyword args and values that will be fed to plt.plot() when creating the plot data

in_h_unitsbool (default is determined by self.default_in_h_units)

If True, values will be plotted in units including little h. If False, little h dependence will be removed. Defaults to whatever is set globally for the Handler instance.

Returns:
None
return_property(property_name, use_all_inds=False, in_h_units=None)

Load a property from file and return

Parameters:
property_namestr

The name of the field to be loaded

use_all_indsbool

If True values will be returned for all halos, otherwise only active halos will be returned. Default is False, but by default all halos are active

in_h_unitsbool (default is determined by self.default_in_h_units)

If True, values will be returned in units including little h. If False, little h dependence will be removed. Defaults to whatever is set globally for the Handler instance.

Returns:
property_valuesarray

Values of the requested property

set_property_range(property_name=None, pmin=-inf, pmax=inf, reset=True, in_h_units=None)

Set a range in a given property to be the active indices. If no arguments are passed, this resets the active indices to all halos

Parameters:
property_namestr

The name of the field to use

pminfloat

The minimum value of the property to bracket the selected range.

pmaxfloat

The maximum value of the property to bracket the selected range.

resetbool, optional

If True, the active indices will be those selected between pmin and pmax. If False, the active indices will be that satisfy pmin<=p<=pmax and which were previously in the active indices (ie this allows selection over multiple properties.)

in_h_unitsbool (default is determined by self.default_in_h_units)

If True, pmin and pmax will be taken to have units including little h, otherwise, they will be assumed to have units with no h dependence (and have the correct dependency applied before setting cuts for parameters where the stored catalog values are in h units). Defaults to whatever is set globally for the Handler instance.

Returns:
None
unload_property(*property_names)

Remove a property from memory (does not erase from file on disk)

Parameters:
property_namesstr

The name of the field to be loaded, can give multiple

Returns:
none
write_property(*property_names, overwrite=False, dtype=None)

Write a property from object memory onto the saved file on the disk

Parameters:
property_namesstr

The name of the field to be written, can specify multiple

overwritebool, optional

Default is False. If a property name is already in use and overwrite is False, an error will be raised. Otherwise the property will be overwritten. Be careful if you set this to True.

dtypeNone or data type

Specify the data type to use for saving writing the data - useful for converting to lower precision floats for using less storage

Returns:
None

Downloading and Formatting Simulations

These are tools for downloading and formatting simulation data from various sources.

class simim.siminterface.IllustrisCatalogs(sim, api_key, path='auto', snaps='all', updatepath=True)

Class to download and format Illustris or TNG group catalogs

Methods

clean_raw()

Remove unformatted data for a simulation

download([redownload])

Download Illustris/TNG subhalo (group) catalogs

download_meta([redownload])

Download and generate metadata for the set of snapshots specified when initalizing the class

format([remake, overwrite, basic, ...])

Convert the unformatted simulation data into the uniform format used by SimIM for all halo catalogs.

Initialize an interface with the Illustris/TNG data

Parameters:
simstring

Name of the Illustris/TNG simulation you want to download/ format. Options are ‘Illustris-1’,’Illustris-2’,’Illustris-3’, ‘Illustris-1-Dark’,’Illustris-2-Dark’,’Illustris-3-Dark’, ‘TNG300-1’,’TNG300-2’,’TNG300-3’,’TNG300-1-Dark’,’TNG300-2-Dark’, ‘TNG300-3-Dark’,’TNG100-1’,’TNG100-2’,’TNG100-3’,’TNG100-1-Dark’, ‘TNG100-2-Dark’,’TNG100-3-Dark’

api_keystring

An API key for accessing Illustris data - see here https://www.tng-project.org/users/register/

pathoptional, ‘auto’ or string

The path for saving/accessing the simulation data. If ‘auto’, this will be looked up or created in the default SimIM filepaths. Defaults to ‘auto’ and should probably only be changed if you want are making additional copies of the data for some reason.

snapsoptional, ‘all’ or list of ints

The snapshots to use when downloading/formatting the simulation. Defaults to ‘all’ which will use all known snapshots.

updatepathoptional, bool

Defaults to True. If True, the path parameter will be saved as the default path to this simulation in future uses.

Methods

clean_raw()

Remove unformatted data for a simulation

download([redownload])

Download Illustris/TNG subhalo (group) catalogs

download_meta([redownload])

Download and generate metadata for the set of snapshots specified when initalizing the class

format([remake, overwrite, basic, ...])

Convert the unformatted simulation data into the uniform format used by SimIM for all halo catalogs.

clean_raw()

Remove unformatted data for a simulation

This method permanently deletes the unformatted data for a simulation. This should only be done if the data is no longer needed - e.g. after the SimIM formatted file has been created and validated.

format(remake=False, overwrite=False, basic=False, realtime_clean_raw=False, realtime_clean_raw_check=True)

Convert the unformatted simulation data into the uniform format used by SimIM for all halo catalogs.

This method makes a data.hdf5 file containing the formatted data for the snapshots pointed to in the SimCatalogs instance. If a file already exists, it will be added to, not overwritten (i.e. snapshots already present in data.hdf5 won’t be reformatted/replaced, but those not present will be added). By setting remake=True the whole file can be overwritten. By setting overwrite=True snaps present in both the data.hdf5 file and the snap_catalogs instance will be reformatted and rewritten. Note: if a data.hdf5 file exists but contains no snapshots, the behavior will be the same as if remake=True.

Set realtime_clean_raw=True to delete the raw data files as they are processed and save space on disk - be careful as this will permanently delete files and may require re-downloading if there’s a problem formatting.

Set basic=True to save only a limited set of halo properties (position, velocity, mass) instead of everything in the raw catalog. This is ignored if appending to an existing file, and the properties in that file are used instead.

Parameters:
remakebool, default=False

If True, a new file will be created (overwriting any data.hdf5 file that previously existed) and the snaps listed in the SimCatalogs instance will be written to it. If False, only new data will be added.

overwritebool, default=False

If True, snaps already present in the data.hdf5 file but also listed in the SimCatalogs instance will be written over with newly formatted versions. If False, the versions alreay in data.hdf5 will be left untouched

basicbool, default=False

If True, only halo positions and masses will be formatted and saved in data.hdf5, if False, all fields will be saved. If a data.hdf5 file already exists this will be ignored and the the fields in the existing data file will be matched (unless remake=True)

realtime_clean_rawbool, default=False

If set to True, the unformatted simulation data will be deleted as it is formatted and written to data.hdf5. This saves disk space but will require redownloading data if there are problems with the formatting.

realtime_clean_raw_checkbool, default=True

Confirms with the user before starting to delete unformatted snaps when realtime_clean_raw is set to True

download(redownload=False)

Download Illustris/TNG subhalo (group) catalogs

download_meta(redownload=False)

Download and generate metadata for the set of snapshots specified when initalizing the class

Note: the metadata saved is dependent on the list of snapshots, therefore if you plan to use many snapshots for some applications but have for some reason only initialized your IllustrisCatalogs instance with a few it is probably best to do something like the following:

>>> x = IllustrisCatalogs(...,snaps='all')
>>> x.download_meta()
>>> x = IllustrisCatalogs(...,snaps=[10,11,12])
>>> x.download()
class simim.siminterface.UniversemachineCatalogs(sim, path='auto', snaps='all', updatepath=True)

Initialize an interface with the UniverseMachine Bolshoi/MD catalogs

Parameters:
simstring

Name of the simulation/UM catalog you want to download/ format. Options are ‘UniverseMachine-BolshoiPlanck’, ‘UniverseMachine-SMDPL’,’UniverseMachine-MDPL2’

pathoptional, ‘auto’ or string

The path for saving/accessing the simulation data. If ‘auto’, this will be looked up or created in the default SimIM filepaths. Defaults to ‘auto’ and should probably only be changed if you want are making additional copies of the data for some reason.

snapsoptional, ‘all’ or list of ints

The snapshots to use when downloading/formatting the simulation. Defaults to ‘all’ which will use all known snapshots.

updatepathoptional, bool

Defaults to True. If True, the path parameter will be saved as the default path to this simulation in future uses.

Methods

clean_raw()

Remove unformatted data for a simulation

download([redownload])

Download UniverseMachine catalogs

download_meta([redownload])

Download and generate metadata for the set of snapshots specified when initalizing the class

format([remake, overwrite, basic, ...])

Convert the unformatted simulation data into the uniform format used by SimIM for all halo catalogs.

clean_raw()

Remove unformatted data for a simulation

This method permanently deletes the unformatted data for a simulation. This should only be done if the data is no longer needed - e.g. after the SimIM formatted file has been created and validated.

format(remake=False, overwrite=False, basic=False, realtime_clean_raw=False, realtime_clean_raw_check=True)

Convert the unformatted simulation data into the uniform format used by SimIM for all halo catalogs.

This method makes a data.hdf5 file containing the formatted data for the snapshots pointed to in the SimCatalogs instance. If a file already exists, it will be added to, not overwritten (i.e. snapshots already present in data.hdf5 won’t be reformatted/replaced, but those not present will be added). By setting remake=True the whole file can be overwritten. By setting overwrite=True snaps present in both the data.hdf5 file and the snap_catalogs instance will be reformatted and rewritten. Note: if a data.hdf5 file exists but contains no snapshots, the behavior will be the same as if remake=True.

Set realtime_clean_raw=True to delete the raw data files as they are processed and save space on disk - be careful as this will permanently delete files and may require re-downloading if there’s a problem formatting.

Set basic=True to save only a limited set of halo properties (position, velocity, mass) instead of everything in the raw catalog. This is ignored if appending to an existing file, and the properties in that file are used instead.

Parameters:
remakebool, default=False

If True, a new file will be created (overwriting any data.hdf5 file that previously existed) and the snaps listed in the SimCatalogs instance will be written to it. If False, only new data will be added.

overwritebool, default=False

If True, snaps already present in the data.hdf5 file but also listed in the SimCatalogs instance will be written over with newly formatted versions. If False, the versions alreay in data.hdf5 will be left untouched

basicbool, default=False

If True, only halo positions and masses will be formatted and saved in data.hdf5, if False, all fields will be saved. If a data.hdf5 file already exists this will be ignored and the the fields in the existing data file will be matched (unless remake=True)

realtime_clean_rawbool, default=False

If set to True, the unformatted simulation data will be deleted as it is formatted and written to data.hdf5. This saves disk space but will require redownloading data if there are problems with the formatting.

realtime_clean_raw_checkbool, default=True

Confirms with the user before starting to delete unformatted snaps when realtime_clean_raw is set to True

download_meta(redownload=False)

Download and generate metadata for the set of snapshots specified when initalizing the class

Note: the metadata saved is dependent on the list of snapshots, therefore if you plan to use many snapshots for some applications but have for some reason only initialized your IllustrisCatalogs instance with a few it is probably best to do something like the following:

>>> x = UniversemachineCatalogs(...,snaps='all')
>>> x.download_meta()
>>> x = UniversemachineCatalogs(...,snaps=[10,11,12])
>>> x.download()
download(redownload=False)

Download UniverseMachine catalogs

Under the Hood

These features are useful for building installers for new simulations, which should work by extending the simim.siminterface._rawsiminterface.SimCatalogs class.

class simim.siminterface._rawsiminterface.SimCatalogs(sim, path='auto', snaps='all', updatepath=True)

Generic class for interacting with simulation halo catalogs and converting them into SimIM’s preferred format.

This class isn’t used directly, but should be extended for any simulation that is to be integrated into SimIM. The modules illustris.py, and universemachine.py show how to do this for two examples.

To construct a new SimCatalogs subclass, a few steps are necessary. 1. Define the __init__ method: this method should first create an array self.allsnaps, which contains the number-index of every snap in the orignial simulation. __init__ should then call SimCatalogs.__init__ with the relevant arguments (see Parameters for the SimCatalogs.__init__ method). Then it should construct three dictionaries that define the field names in the unformatted halo catalog files and how these names map to fields in the SimIM data structure.

The mapping between unformatted and formatted fields is done with three dictionaries - self.basic_fields for fields that MUST be included in the formatted SimIM catalog (these are pos_x, pos_y, pos_z, and mass for subhalo positional coordinates and subhamo mass), self.dm_fields for fields describing additional dark matter properties, and self.matter_fields for fields describing baryonic properties. The latter dictionaries can be left empty if no additional properties are in the unformatted catalogs and/or if none of these other properties will ever be propagated into the SimIM formatted data.

The basic fields dictionary should be structured as follows: self.basic_fields =

{‘[unformatted field name]’:[(‘formatted field name’,

‘formatted field dtype’, ‘formatted field units’, ‘formatted field dependence on hubble constant’ )]}

The use of lists for the values of each dictionary entry is to allow for the possibility that a field in the unformatted catalog consists of a tuple of values. For example, illustris catalogs store halo positins as a tuple ‘SubhaloPos’. To convert this to SimIM format (pos_x, pos_y, pos_z) the following dictionary entry is used:

{‘SubhaloPos’:[(‘pos_x’,’f’,’Mpc/h’,-1),

(‘pos_y’,’f’,’Mpc/h’,-1), (‘pos_z’,’f’,’Mpc/h’,-1) ]}

The ‘formatted field dtype’ should be whatever data format you want the final data to be written as, and the ‘formatted field units’ and ‘formatted field dependence on hubble constant’ should be the units and hubble constant dependence of the data in its final format, after any transformations have been applied (see below). Note that SimIM generally assumes data are saved in ‘little h’ units, so if they are saved units with no little h, the h dependence should be set to 0.

The dm_fields and matter_fields dictionaries are constructed similarly.

In addition to the dictionaries specifying keys, a dictionary specifying transformations to apply when formatting the data can be providded. This should be called self.transform_keys and should have keys that match the names of unformatted keys in the simulation catalog and arguments that specify a function to apply. For example, SimIM generally assumes distance units of Mpc/h, while Illustris uses kpc/h for some fields. A conversion might look like

{‘SubhaloPos’:lambda x: x/1000}

Finally, the init function should have a check to verify which snapshots have already been downloaded, and which still need to be:

# Check whether snapshots have been downloaded not_downloaded = [] for i in self.snaps:

file_path = [path where snapshot would be saved] if not os.path.exists(file_path):

not_downloaded.append(i)

if len(not_downloaded) > 0:

warnings.warn(“No data exists for snapshots {} - run .download”.format(not_downloaded))

2. The extended class also need methods to download the halo catalog (self.download) and simulation metadata (self.download_meta). These functions should save the data in the directory listed in self.path. The simulation halo catalogs should be placed in a subdirctory called ‘raw’, and the metadata should be placed in self.meta_path and self.snap_meta_path. Look at existing code for examples of how to do this.

3. The extended class needs a self._loader which takes as arguments the path to the data, a snapshot number, and a list of fields and returns a dictionary containing key-value pairs of the property name and the values for every halo of the property, along with an integer specifying the number of halos found.

Methods

clean_raw()

Remove unformatted data for a simulation

format([remake, overwrite, basic, ...])

Convert the unformatted simulation data into the uniform format used by SimIM for all halo catalogs.

Initialize the information needed to downlaod and format a simulation.

Parameters:
simstring

The string naming the simulation - this string must be in the _acceptedsims list from the _sims.py submodule

pathoptional, ‘auto’ or string

The path for saving/accessing the simulation data. If ‘auto’, this will be looked up or created in the default SimIM filepaths. Defaults to ‘auto’ and should probably only be changed if you want are making additional copies of the data for some reason.

snapsoptional, ‘all’ or list of ints

The snapshots to use when downloading/formatting the simulation. Defaults to ‘all’ which will use all known snapshots.

updatepathoptional, bool

Defaults to True. If True, the path parameter will be saved as the default path to this simulation in future uses.

Methods

clean_raw()

Remove unformatted data for a simulation

format([remake, overwrite, basic, ...])

Convert the unformatted simulation data into the uniform format used by SimIM for all halo catalogs.

clean_raw()

Remove unformatted data for a simulation

This method permanently deletes the unformatted data for a simulation. This should only be done if the data is no longer needed - e.g. after the SimIM formatted file has been created and validated.

format(remake=False, overwrite=False, basic=False, realtime_clean_raw=False, realtime_clean_raw_check=True)

Convert the unformatted simulation data into the uniform format used by SimIM for all halo catalogs.

This method makes a data.hdf5 file containing the formatted data for the snapshots pointed to in the SimCatalogs instance. If a file already exists, it will be added to, not overwritten (i.e. snapshots already present in data.hdf5 won’t be reformatted/replaced, but those not present will be added). By setting remake=True the whole file can be overwritten. By setting overwrite=True snaps present in both the data.hdf5 file and the snap_catalogs instance will be reformatted and rewritten. Note: if a data.hdf5 file exists but contains no snapshots, the behavior will be the same as if remake=True.

Set realtime_clean_raw=True to delete the raw data files as they are processed and save space on disk - be careful as this will permanently delete files and may require re-downloading if there’s a problem formatting.

Set basic=True to save only a limited set of halo properties (position, velocity, mass) instead of everything in the raw catalog. This is ignored if appending to an existing file, and the properties in that file are used instead.

Parameters:
remakebool, default=False

If True, a new file will be created (overwriting any data.hdf5 file that previously existed) and the snaps listed in the SimCatalogs instance will be written to it. If False, only new data will be added.

overwritebool, default=False

If True, snaps already present in the data.hdf5 file but also listed in the SimCatalogs instance will be written over with newly formatted versions. If False, the versions alreay in data.hdf5 will be left untouched

basicbool, default=False

If True, only halo positions and masses will be formatted and saved in data.hdf5, if False, all fields will be saved. If a data.hdf5 file already exists this will be ignored and the the fields in the existing data file will be matched (unless remake=True)

realtime_clean_rawbool, default=False

If set to True, the unformatted simulation data will be deleted as it is formatted and written to data.hdf5. This saves disk space but will require redownloading data if there are problems with the formatting.

realtime_clean_raw_checkbool, default=True

Confirms with the user before starting to delete unformatted snaps when realtime_clean_raw is set to True

class simim.siminterface._rawsiminterface.Snapshot(index, redshift, metadata)

Class containing information for individual snapshots

This class stores information and does some basic calculations about individual snapshots - it’s initialized when loading and formatting simulation data. Users probably don’t need it for anything

Methods

dif_higherz_snap(other)

Determine redshift, time, distance where snap ends and an earlier one begins.

dif_lowerz_snap(other)

Determine redshift, time, distance where snap ends and a later one begins.

dif_snap(other)

Determine the midpoint between two snaps

Class containing information for individual snapshots

This class stores information and does some basic calculations about individual snapshots - it’s initialized when loading and formatting simulation data. Users probably don’t need it for anything.

Initialization parameters are generally hard coded for the different sims, or otherwise extracted from simulation metadata.

Parameters:
indexint

The numerical index of the snapshot

redshiftfloat

The redshift of the snapshot

metadatadict

Dictionary containing the snapshot metadata. This must contain (at a minimum) ‘cosmo_h’, ‘cosmo_omega_matter’, ‘cosmo_omega_baryon’ (for defining the cosmology used by the sim)

Methods

dif_higherz_snap(other)

Determine redshift, time, distance where snap ends and an earlier one begins.

dif_lowerz_snap(other)

Determine redshift, time, distance where snap ends and a later one begins.

dif_snap(other)

Determine the midpoint between two snaps

dif_snap(other)

Determine the midpoint between two snaps

Method to determine the redshift and time of the midpoint between this snapshot and other snapshot. The midpoint is defined to be the cosmoligcal age half-way between the ages of the two snapshots.

dif_lowerz_snap(other)

Determine redshift, time, distance where snap ends and a later one begins.

If other is 0 it will treat this as the last (in time) snap and assume that it extends to z=0.

dif_higherz_snap(other)

Determine redshift, time, distance where snap ends and an earlier one begins.

If other is ‘max’ it will treat this as the first (in time) snap and assume that it extends to the actual redshift of the snap.