The siminterface Module
Interacting with Simulations Using SimIM
Once they are formatted, SimIM interfaces with all simulation data through
“Handler” classes. The SimHandler class provides an interface to all data
from a simulation, while the SnapHandler class provides an interface with
data from a single simulation snapshot.
- class simim.siminterface.SimHandler(sim, init_snaps=False, in_h_units=False)
Class to handle I/O for subhalo/galaxy catalogs in SimIM format
This class handles basic operations for accessing and analyzing simulation data. It is a wrapper around Handlers for individual snapshots, and in many cases performs operations iteratively over all snapshots in a particular simulation. It is also a convenient wrapper for accessing specific snapshots of a given simulation.
The general philosophy of Handlers is to not load actual properties of a halo into memory until they are requested, and to remove them from memory when they are no longer in use (or at least make it convenient to do so).
Methods
delete_property(*property_names)Remove a property from the saved file on the disk for all simulation snapshots
Get the fields associated with halos in the simulation
extract_snap_meta(snap)Get the meta-data for a snapshot
get_mass_index(mass, snap[, in_h_units])Find the indices above a specified mass
get_snap(snap)Return a SnapHandler instance for a specified snapshot
Return a SnapHandler instance for the snapshot closest to a requested redshift
initialize_all_snaps([remake])Initialize SnapHandlers for each snapshot
make_property(property[, rename, kw_remap, ...])Use a galprops.prop instance to evaluate a new property over all snapshots
number_volumes(volume[, in_h_units])Compute the number of times a specified volume can fit in the simulation box
set_in_h_units(in_h_units)Globally set whether units are interpreted to be in little h units
set_property_range([property_name, pmin, ...])Restrict property range for all snapshots
snap_stat(stat_function, kwargs[, kw_remap, ...])Evaluate stat_function over every snapshot and return results
z_to_snap(z)Determine the snapshot corresponding to a particular redshift
Initialize Handler for a specified simulation
Provides a generic interface for interacting with data from any simulation that has been converted to SimIM format and is accessible to memory. Note that simulations must be downloaded and formatted (see e.g. simim.siminterface.illustris or simim.siminterface.universemachine for code to accomplish this)
- Parameters:
- simstring
Name of the simulation to load.
- init_snapsbool, default=False
Setting this as True will create persistent Handler instances for every snapshot, rather than doing so when data from a given handler is called for. This is generally not necessary, but is used when creating properties for all snapshots but NOT writing them to disk.
- in_h_unitsbool
If True, values will be returned, plotted, etc. in units including little h. If False, little h dependence will be removed. This can be overridden in most method calls.
Methods
delete_property(*property_names)Remove a property from the saved file on the disk for all simulation snapshots
Get the fields associated with halos in the simulation
extract_snap_meta(snap)Get the meta-data for a snapshot
get_mass_index(mass, snap[, in_h_units])Find the indices above a specified mass
get_snap(snap)Return a SnapHandler instance for a specified snapshot
Return a SnapHandler instance for the snapshot closest to a requested redshift
initialize_all_snaps([remake])Initialize SnapHandlers for each snapshot
make_property(property[, rename, kw_remap, ...])Use a galprops.prop instance to evaluate a new property over all snapshots
number_volumes(volume[, in_h_units])Compute the number of times a specified volume can fit in the simulation box
set_in_h_units(in_h_units)Globally set whether units are interpreted to be in little h units
set_property_range([property_name, pmin, ...])Restrict property range for all snapshots
snap_stat(stat_function, kwargs[, kw_remap, ...])Evaluate stat_function over every snapshot and return results
z_to_snap(z)Determine the snapshot corresponding to a particular redshift
- initialize_all_snaps(remake=False)
Initialize SnapHandlers for each snapshot
- Parameters:
- remakebool, default=False
Determines whether snapshots should be re-initialized if this method is called twice
- set_in_h_units(in_h_units)
Globally set whether units are interpreted to be in little h units
Changes the default way units are processed
- Parameters:
- in_h_unitsbool
If True, values will, by default, be returned in units including little h. If False, little h dependence will be removed.
- number_volumes(volume, in_h_units=None)
Compute the number of times a specified volume can fit in the simulation box
- Parameters:
- volumefloat
The volume to check in units of Mpc^3
- in_h_unitsbool (default is determined by self.default_in_h_units)
If True the value of volume will be assumed to have units of (Mpc/h)^3
- extract_snap_meta(snap)
Get the meta-data for a snapshot
- Parameters:
- snapint
Number of snapshot to be extracted
- Returns:
- snap_meta
The meta data for the requested snapshot
- z_to_snap(z)
Determine the snapshot corresponding to a particular redshift
- Parameters:
- zfloat
Redshift to search for
- Returns:
- snap_ind
The index number of the snapshot matching the requested redshift
- extract_snap_keys()
Get the fields associated with halos in the simulation
- Parameters:
- none
- Returns:
- keys
The fields of each snapshot
- get_mass_index(mass, snap, in_h_units=None)
Find the indices above a specified mass
- Parameters:
- massfloat
Minimum mass to access in Msun units
- snapint
Number of snapshot to be extracted
- in_h_unitsbool (default is determined by self.default_in_h_units)
If True, mass will be taken to have units including little h, otherwise, it will be assumed to have units with no h dependence.
- Returns:
- indexint
The index
- get_snap(snap)
Return a SnapHandler instance for a specified snapshot
- Parameters:
- snapint
Index-number of the desired snapshot
- Returns:
- SnapHandler
A SnapHandler instance for the requested snapshot
- get_snap_from_z(z)
Return a SnapHandler instance for the snapshot closest to a requested redshift
- Parameters:
- zfloat
The desired redshift for the snap
- Returns:
- SnapHandler
A SnapHandler instance for the requested snapshot
- set_property_range(property_name=None, pmin=-inf, pmax=inf, reset=True, in_h_units=None)
Restrict property range for all snapshots
This is a wraper around SnapHandler.set_property_range that iteratively applies it to all snapshots. Initializing handlers for each snapshot is necessary for this to work.
- Parameters:
- property_namestr
The name of the field to use
- pminfloat
The minimum value of the property to bracket the selected range.
- pmaxfloat
The maximum value of the property to bracket the selected range.
- resetbool, optional
If True, the active indices will be those selected between pmin and pmax. If False, the active indices will be that satisfy pmin<=p<=pmax and which were previously in the active indices (ie this allows selection over multiple properties.)
- in_h_unitsbool (default is determined by self.default_in_h_units)
If True, pmin and pmax will be taken to have units including little h, otherwise, they will be assumed to have units with no h dependence (and have the correct dependency applied before setting cuts for parameters where the stored catalog values are in h units).
- make_property(property, rename=None, kw_remap={}, other_kws={}, overwrite=False, use_all_inds=False, write=False, writedtype=None)
Use a galprops.prop instance to evaluate a new property over all snapshots
This is a wraper around SnapHandler.make_property that iteratively applies it to all snapshots. For this to work, either 1) write must be set to True (resulting in the new property being saved to disk) or 2) SnapHandlers must be initialized for each snapshot, in which case the property can be stored only in memory. The latter is likely to require a significant allocation of memory and should be used carefully.
- Parameters:
- propertygalprops.property instance
The galprops.property instance containing the property information and generating function
- renamelist, optional
List of names specifying how to rename the property from the name specified in the galprops.prop instance
- kw_remapdict, optional
A dictinary remaping kwargs of the property generating function to different properties of the lightcone. By default if the function calls for kwarg ‘x’ it will be evaluated on simulation property ‘x’, but passing the dictionary {‘x’:’y’} will result in the function being evaluated on simulation property ‘y’.
- other_kwsdict, optional
A dictionary of additional keyword arguments passed directly to the property.prop_function call
- overwritebool, default=False
Default is False. If a property name is already in use and overwrite is False, an error will be raised. Otherwise the property will be overwritten.
- use_all_indsbool, default=False
If True values will be assigned for all halos, otherwise only active halos will be evaluated, and others will be assigned nan.
- writebool, default=False
If True values of the new property will be written to the disk storage for the simulation snapshots.
- writedtypeNone or dtype
Specifies the data format to write the new property in.
- Returns:
- None
- delete_property(*property_names)
Remove a property from the saved file on the disk for all simulation snapshots
- Parameters:
- property_namesstr
The name of the field to be written, can give multiple
- Returns:
- None
- snap_stat(stat_function, kwargs, kw_remap={}, other_kws={}, give_args_in_h_units=None, use_all_inds=False, snaps=None)
Evaluate stat_function over every snapshot and return results
This is a wraper around SnapHandler.eval_stat that iteratively applies it to all snapshots.
- Parameters:
- stat_functionfunction
Any function which can be applied to data in a snapshot
- kwargslist
List containing the arguments that must be passed to stat_function
- kw_remapdict
Dictionary mapping between function arguments (listed in kwargs) as the keys and the names of handler properties to feed in as values. E.g. to provide handler property ‘mass’ to stat_function argument ‘a’ one would use kw_remap={‘a’:’mass’}
- other_kwsdict, optional
A dictionary of additional keyword arguments passed directly to the stat_function call
- use_all_indsbool, default=False
If True function will be computed using all halos, otherwise only active halos will be evaluated.
- give_args_in_h_unitsbool (default is determined by self.default_in_h_units)
If True, values will be fed to stat_function in units including little h. If False, little h dependence will be removed.
- snapslist, optional
A list of snapshots on which to evaluate the stat_function. If none is specified all snapshots will be used.
- Returns:
- valslist
List containing the value(s) returned by stat_function on each snapshot
- redshiftslist
List containing the redshift of each snapshot
- class simim.siminterface.SnapHandler(path, snap, redshift, cosmo, box_edge, in_h_units=False)
Handler for individual snapshots - see generic Handler documentation.
The simplest way to initialize a SnapHandler instance is probably via a SimHandler instance for the simulation containing the snapshot in question. Then the method SimHandler.get_snap will return a Handler instance for the snapshot with only the snapshot index-number specified.
Methods
delete_property(*property_names)Remove a property from the saved file on the disk
eval_stat(stat_function, kwargs[, kw_remap, ...])Evaluate stat_function over the objects in a Handler instance and return the result
extract_keys([set])Get the fields attached to a file
grid(*property_names[, in_h_units, ...])Place selected properties into a 3d grid
has_loaded(property_name)Check whether a property has been loaded into memory
has_property(property_name)Check whether a property has been loaded into memory
hist(*property_names[, use_all_inds, ...])Make a histogram of a property
load_property(*property_names)Load a property from file into memory
make_property(property[, rename, kw_remap, ...])Use a galprops.prop instance to evaluate a new property
plot(xname, *ynames[, use_all_inds, save, ...])Make a scatter plot of two properties
return_property(property_name[, ...])Load a property from file and return
set_in_h_units(in_h_units)Globally set whether units are interpreted to be in little h units
set_property_range([property_name, pmin, ...])Set a range in a given property to be the active indices.
unload_property(*property_names)Remove a property from memory (does not erase from file on disk)
write_property(*property_names[, overwrite, ...])Write a property from object memory onto the saved file on the disk
Initialize Handler for a simulation snapshot
- Parameters:
- pathstring
Path to SimIM formatted file containing the snapshot (most likely [path to SimIM data directory]/[Simulation Name]/data.hdf5)
- snapint
Index-number of the snapshot within the whole simulation
- redshiftfloat
Redshift at which the snapshot was taken
- cosmodict
Dictionary containing the cosmological parameters for the simulation
- box_edgefloat
The edge length of the simulation box, should ALWAYS be in units of Mpc/h.
- in_h_unitsbool
If True, values will be returned, plotted, etc. in units including little h. If False, little h dependence will be removed. This can be overridden in most method calls.
Methods
delete_property(*property_names)Remove a property from the saved file on the disk
eval_stat(stat_function, kwargs[, kw_remap, ...])Evaluate stat_function over the objects in a Handler instance and return the result
extract_keys([set])Get the fields attached to a file
grid(*property_names[, in_h_units, ...])Place selected properties into a 3d grid
has_loaded(property_name)Check whether a property has been loaded into memory
has_property(property_name)Check whether a property has been loaded into memory
hist(*property_names[, use_all_inds, ...])Make a histogram of a property
load_property(*property_names)Load a property from file into memory
make_property(property[, rename, kw_remap, ...])Use a galprops.prop instance to evaluate a new property
plot(xname, *ynames[, use_all_inds, save, ...])Make a scatter plot of two properties
return_property(property_name[, ...])Load a property from file and return
set_in_h_units(in_h_units)Globally set whether units are interpreted to be in little h units
set_property_range([property_name, pmin, ...])Set a range in a given property to be the active indices.
unload_property(*property_names)Remove a property from memory (does not erase from file on disk)
write_property(*property_names[, overwrite, ...])Write a property from object memory onto the saved file on the disk
- set_in_h_units(in_h_units)
Globally set whether units are interpreted to be in little h units
Changes the default way units are processed
- Parameters:
- in_h_unitsbool
If True, values will, by default, be returned in units including little h. If False, little h dependence will be removed.
- grid(*property_names, in_h_units=None, use_all_inds=False, res=None, xlim=None, ylim=None, zlim=None, norm=None)
Place selected properties into a 3d grid
Uses the properties of the array to construct a position (pos_x,pos_y,pos_z)- value (property_names) grid. Only required argument is a valid property name or names. Additional arguments can specify the limits and resolution of the grid
- Parameters:
- property_namesstr
The name or names of properties in the Handler instance
- in_h_unitsbool (default is determined by self.default_in_h_units)
If True, positions and property values fed to the gridder will be in units including little h. If False, little h dependence will be removed. Defaults to whatever is set globally for the Handler instance.
- use_all_indsbool, default=False
If True function all halos will be gridded, otherwise only active halos will be included.
- resfloat, optional
The resolution for the grid in Mpc (if in_h_units==False) or Mpc/h (if in_h_units==True). If no value is specified, it will default to 1/100th of the box edge length
- xlim, xylim, zlimtuples, optional
Tuples containing minimum and maximum values of the grid along the x, y, and z axes, in units of Mpc (if in_h_units==False) or Mpc/h (if in_h_units==True). If no values are specified the defaults are (0, box edge length).
- normNone, ‘cell_volume’, float
Apply a normalization to the gridded values. Default is None, if ‘cell_volume’ is specified each cell will be divided by its volume. If a float is given each cell will multiplied by the float
- Returns:
- gridsimim.map.grid instance
The gridded properties
- delete_property(*property_names)
Remove a property from the saved file on the disk
- Parameters:
- property_namesstr
The name of the field to be written, can give multiple
- Returns:
- None
- eval_stat(stat_function, kwargs, kw_remap={}, other_kws={}, use_all_inds=False, give_args_in_h_units=None)
Evaluate stat_function over the objects in a Handler instance and return the result
This can be used to evaluate any function of the the properties contained in the simulation, but is generally envisioned as a way to compute ensemble statistics (means, luminosity functions, correlations, etc.)
- Parameters:
- stat_functionfunction
Any function which can be applied to data in a Handler instance
- kwargslist
List containing the arguments that must be passed to stat_function
- kw_remapdict
Dictionary mapping between function arguments (listed in kwargs) as the keys and the names of Handler properties to feed in as values. E.g. to provide handler property ‘mass’ to stat_function argument ‘a’ one would use kw_remap={‘a’:’mass’}
- other_kwsdict, optional
A dictionary of additional keyword arguments passed directly to the stat_function call
- use_all_indsbool, default=False
If True function will be computed using all halos, otherwise only active halos will be evaluated.
- give_args_in_h_unitsbool (default is determined by self.default_in_h_units)
If True, values will be fed to stat_function in units including little h. If False, little h dependence will be removed first. Defaults to whatever is set globally for the Handler instance.
- Returns:
- vals
The value(s) returned by stat_function
Examples
Compute the total halo mass in a simulation accesed via the variable
handler:>>> handler.eval_stat(np.sum, kwargs=['a'], kw_remap={'a':'mass'})
- extract_keys(set='any')
Get the fields attached to a file
- Parameters:
- set{‘any’,’loaded’,’saved’,’generated’}
What type of keys to return, default is all
- Returns:
- keys
The fields associated with the lightcone
- has_loaded(property_name)
Check whether a property has been loaded into memory
- Parameters:
- property_namestr
The name of the field to be loaded
- Returns:
- loadedbool
True if the property is loaded, otherwise, false
- has_property(property_name)
Check whether a property has been loaded into memory
- Parameters:
- property_namestr
The name of the field to be loaded
- Returns:
- existsbool
True if the property is present, otherwise, false
- hist(*property_names, use_all_inds=False, logtransform=False, save=None, axkws={}, plotkws={}, in_h_units=None)
Make a histogram of a property
- Parameters:
- *property_namesstr
The name(s) of the field(s) to use, multiple fields can be given and will be plotted on the same axes with the same settings
- logtransformbool, optional
If set to True, will take the log of the property before making the histogram
- use_all_indsbool, optional
If True values will be assigned for all halos, otherwise only active halos will be evaluated, and others will be assigned nan.
- savestr, optional
If specified, the plot will be saved to the given location
- axkwsdict, optional
A dictionary of keyword args and values that will be fed to ax.set() when creating the plot axes
- plotkwsdict, optional
A dictionary of keyword args and values that will be fed to plt.hist() when creating the plot data
- in_h_unitsbool (default is determined by self.default_in_h_units)
If True, values will be plotted in units including little h. If False, little h dependence will be removed. Defaults to whatever is set globally for the Handler instance.
- Returns:
- None
- load_property(*property_names)
Load a property from file into memory
- Parameters:
- property_namesstr
The name of the field to be loaded, can give multiple
- Returns:
- none
- make_property(property, rename=None, kw_remap={}, other_kws={}, overwrite=False, use_all_inds=False, write_to_disk=False, overwrite_to_disk=False, dtype_to_disk=None, unload_to_disk=False)
Use a galprops.prop instance to evaluate a new property
- Parameters:
- propertygalprops.prop instance
The galprops.property instance containing the property information and generating function
- renamelist, optional
List of names specifying how to rename the property from the name specified in the galprops.prop instance
- kw_remapdict, optional
A dictinary remaping kwargs of the property generating function to different properties of the lightcone. By default if the function calls for kwarg ‘x’ it will be evaluated on simulation property ‘x’, but passing the dictionary {‘x’:’y’} will result in the function being evaluated on simulation property ‘y’.
- other_kwsdict, optional
A dictionary of additional keyword arguments passed directly to the property.prop_function call
- overwritebool, optional
Default is False. If a property name is already in use and overwrite is False, an error will be raised. Otherwise the property will be overwritten.
- use_all_indsbool
If True values will be assigned for all halos, otherwise only active halos will be evaluated, and others will be assigned nan.
- write_to_diskbool
Default is False. If True, write assessed property to disk (note use_all_inds must be True)
- overwrite_to_diskbool
Default is False. If a property name is already in use on disk and overwrite_to_disk is False, an error will be raised when trying to write to disk. Otherwise the property will be overwritten. Be careful if you set this to True.
- dtype_to_diskNone or data type
Specify the data type to use for saving writing the data - useful for converting to lower precision floats for using less storage
- unload_to_diskbool
Default is False. If True, will unload properties after writing them to disk.
- Returns:
- None
- plot(xname, *ynames, use_all_inds=False, save=None, axkws={}, plotkws={}, in_h_units=None)
Make a scatter plot of two properties
- Parameters:
- xnamestr
The name of the field to use as the x-value
- *ynamesstr
The name(s) of the field(s) to use as the y-value. Multiple fields can be given and will be plotted on the same axes against a single x vale
- use_all_indsbool or ‘compare’, optional
If True values will be assigned for all halos, otherwise only active halos will be evaluated, and others will be assigned nan. If ‘compare’, both sets of indices will be plotted to allow easy comparison.
- savestr, optional
If specified, the plot will be saved to the given location
- axkwsdict, optional
A dictionary of keyword args and values that will be fed to ax.set() when creating the plot axes
- plotkwsdict, optional
A dictionary of keyword args and values that will be fed to plt.plot() when creating the plot data
- in_h_unitsbool (default is determined by self.default_in_h_units)
If True, values will be plotted in units including little h. If False, little h dependence will be removed. Defaults to whatever is set globally for the Handler instance.
- Returns:
- None
- return_property(property_name, use_all_inds=False, in_h_units=None)
Load a property from file and return
- Parameters:
- property_namestr
The name of the field to be loaded
- use_all_indsbool
If True values will be returned for all halos, otherwise only active halos will be returned. Default is False, but by default all halos are active
- in_h_unitsbool (default is determined by self.default_in_h_units)
If True, values will be returned in units including little h. If False, little h dependence will be removed. Defaults to whatever is set globally for the Handler instance.
- Returns:
- property_valuesarray
Values of the requested property
- set_property_range(property_name=None, pmin=-inf, pmax=inf, reset=True, in_h_units=None)
Set a range in a given property to be the active indices. If no arguments are passed, this resets the active indices to all halos
- Parameters:
- property_namestr
The name of the field to use
- pminfloat
The minimum value of the property to bracket the selected range.
- pmaxfloat
The maximum value of the property to bracket the selected range.
- resetbool, optional
If True, the active indices will be those selected between pmin and pmax. If False, the active indices will be that satisfy pmin<=p<=pmax and which were previously in the active indices (ie this allows selection over multiple properties.)
- in_h_unitsbool (default is determined by self.default_in_h_units)
If True, pmin and pmax will be taken to have units including little h, otherwise, they will be assumed to have units with no h dependence (and have the correct dependency applied before setting cuts for parameters where the stored catalog values are in h units). Defaults to whatever is set globally for the Handler instance.
- Returns:
- None
- unload_property(*property_names)
Remove a property from memory (does not erase from file on disk)
- Parameters:
- property_namesstr
The name of the field to be loaded, can give multiple
- Returns:
- none
- write_property(*property_names, overwrite=False, dtype=None)
Write a property from object memory onto the saved file on the disk
- Parameters:
- property_namesstr
The name of the field to be written, can specify multiple
- overwritebool, optional
Default is False. If a property name is already in use and overwrite is False, an error will be raised. Otherwise the property will be overwritten. Be careful if you set this to True.
- dtypeNone or data type
Specify the data type to use for saving writing the data - useful for converting to lower precision floats for using less storage
- Returns:
- None
Downloading and Formatting Simulations
These are tools for downloading and formatting simulation data from various sources.
- class simim.siminterface.IllustrisCatalogs(sim, api_key, path='auto', snaps='all', updatepath=True)
Class to download and format Illustris or TNG group catalogs
Methods
Remove unformatted data for a simulation
download([redownload])Download Illustris/TNG subhalo (group) catalogs
download_meta([redownload])Download and generate metadata for the set of snapshots specified when initalizing the class
format([remake, overwrite, basic, ...])Convert the unformatted simulation data into the uniform format used by SimIM for all halo catalogs.
Initialize an interface with the Illustris/TNG data
- Parameters:
- simstring
Name of the Illustris/TNG simulation you want to download/ format. Options are ‘Illustris-1’,’Illustris-2’,’Illustris-3’, ‘Illustris-1-Dark’,’Illustris-2-Dark’,’Illustris-3-Dark’, ‘TNG300-1’,’TNG300-2’,’TNG300-3’,’TNG300-1-Dark’,’TNG300-2-Dark’, ‘TNG300-3-Dark’,’TNG100-1’,’TNG100-2’,’TNG100-3’,’TNG100-1-Dark’, ‘TNG100-2-Dark’,’TNG100-3-Dark’
- api_keystring
An API key for accessing Illustris data - see here https://www.tng-project.org/users/register/
- pathoptional, ‘auto’ or string
The path for saving/accessing the simulation data. If ‘auto’, this will be looked up or created in the default SimIM filepaths. Defaults to ‘auto’ and should probably only be changed if you want are making additional copies of the data for some reason.
- snapsoptional, ‘all’ or list of ints
The snapshots to use when downloading/formatting the simulation. Defaults to ‘all’ which will use all known snapshots.
- updatepathoptional, bool
Defaults to True. If True, the path parameter will be saved as the default path to this simulation in future uses.
Methods
Remove unformatted data for a simulation
download([redownload])Download Illustris/TNG subhalo (group) catalogs
download_meta([redownload])Download and generate metadata for the set of snapshots specified when initalizing the class
format([remake, overwrite, basic, ...])Convert the unformatted simulation data into the uniform format used by SimIM for all halo catalogs.
- clean_raw()
Remove unformatted data for a simulation
This method permanently deletes the unformatted data for a simulation. This should only be done if the data is no longer needed - e.g. after the SimIM formatted file has been created and validated.
- format(remake=False, overwrite=False, basic=False, realtime_clean_raw=False, realtime_clean_raw_check=True)
Convert the unformatted simulation data into the uniform format used by SimIM for all halo catalogs.
This method makes a data.hdf5 file containing the formatted data for the snapshots pointed to in the SimCatalogs instance. If a file already exists, it will be added to, not overwritten (i.e. snapshots already present in data.hdf5 won’t be reformatted/replaced, but those not present will be added). By setting remake=True the whole file can be overwritten. By setting overwrite=True snaps present in both the data.hdf5 file and the snap_catalogs instance will be reformatted and rewritten. Note: if a data.hdf5 file exists but contains no snapshots, the behavior will be the same as if remake=True.
Set realtime_clean_raw=True to delete the raw data files as they are processed and save space on disk - be careful as this will permanently delete files and may require re-downloading if there’s a problem formatting.
Set basic=True to save only a limited set of halo properties (position, velocity, mass) instead of everything in the raw catalog. This is ignored if appending to an existing file, and the properties in that file are used instead.
- Parameters:
- remakebool, default=False
If True, a new file will be created (overwriting any data.hdf5 file that previously existed) and the snaps listed in the SimCatalogs instance will be written to it. If False, only new data will be added.
- overwritebool, default=False
If True, snaps already present in the data.hdf5 file but also listed in the SimCatalogs instance will be written over with newly formatted versions. If False, the versions alreay in data.hdf5 will be left untouched
- basicbool, default=False
If True, only halo positions and masses will be formatted and saved in data.hdf5, if False, all fields will be saved. If a data.hdf5 file already exists this will be ignored and the the fields in the existing data file will be matched (unless remake=True)
- realtime_clean_rawbool, default=False
If set to True, the unformatted simulation data will be deleted as it is formatted and written to data.hdf5. This saves disk space but will require redownloading data if there are problems with the formatting.
- realtime_clean_raw_checkbool, default=True
Confirms with the user before starting to delete unformatted snaps when realtime_clean_raw is set to True
- download(redownload=False)
Download Illustris/TNG subhalo (group) catalogs
- download_meta(redownload=False)
Download and generate metadata for the set of snapshots specified when initalizing the class
Note: the metadata saved is dependent on the list of snapshots, therefore if you plan to use many snapshots for some applications but have for some reason only initialized your IllustrisCatalogs instance with a few it is probably best to do something like the following:
>>> x = IllustrisCatalogs(...,snaps='all') >>> x.download_meta() >>> x = IllustrisCatalogs(...,snaps=[10,11,12]) >>> x.download()
- class simim.siminterface.UniversemachineCatalogs(sim, path='auto', snaps='all', updatepath=True)
Initialize an interface with the UniverseMachine Bolshoi/MD catalogs
- Parameters:
- simstring
Name of the simulation/UM catalog you want to download/ format. Options are ‘UniverseMachine-BolshoiPlanck’, ‘UniverseMachine-SMDPL’,’UniverseMachine-MDPL2’
- pathoptional, ‘auto’ or string
The path for saving/accessing the simulation data. If ‘auto’, this will be looked up or created in the default SimIM filepaths. Defaults to ‘auto’ and should probably only be changed if you want are making additional copies of the data for some reason.
- snapsoptional, ‘all’ or list of ints
The snapshots to use when downloading/formatting the simulation. Defaults to ‘all’ which will use all known snapshots.
- updatepathoptional, bool
Defaults to True. If True, the path parameter will be saved as the default path to this simulation in future uses.
Methods
Remove unformatted data for a simulation
download([redownload])Download UniverseMachine catalogs
download_meta([redownload])Download and generate metadata for the set of snapshots specified when initalizing the class
format([remake, overwrite, basic, ...])Convert the unformatted simulation data into the uniform format used by SimIM for all halo catalogs.
- clean_raw()
Remove unformatted data for a simulation
This method permanently deletes the unformatted data for a simulation. This should only be done if the data is no longer needed - e.g. after the SimIM formatted file has been created and validated.
- format(remake=False, overwrite=False, basic=False, realtime_clean_raw=False, realtime_clean_raw_check=True)
Convert the unformatted simulation data into the uniform format used by SimIM for all halo catalogs.
This method makes a data.hdf5 file containing the formatted data for the snapshots pointed to in the SimCatalogs instance. If a file already exists, it will be added to, not overwritten (i.e. snapshots already present in data.hdf5 won’t be reformatted/replaced, but those not present will be added). By setting remake=True the whole file can be overwritten. By setting overwrite=True snaps present in both the data.hdf5 file and the snap_catalogs instance will be reformatted and rewritten. Note: if a data.hdf5 file exists but contains no snapshots, the behavior will be the same as if remake=True.
Set realtime_clean_raw=True to delete the raw data files as they are processed and save space on disk - be careful as this will permanently delete files and may require re-downloading if there’s a problem formatting.
Set basic=True to save only a limited set of halo properties (position, velocity, mass) instead of everything in the raw catalog. This is ignored if appending to an existing file, and the properties in that file are used instead.
- Parameters:
- remakebool, default=False
If True, a new file will be created (overwriting any data.hdf5 file that previously existed) and the snaps listed in the SimCatalogs instance will be written to it. If False, only new data will be added.
- overwritebool, default=False
If True, snaps already present in the data.hdf5 file but also listed in the SimCatalogs instance will be written over with newly formatted versions. If False, the versions alreay in data.hdf5 will be left untouched
- basicbool, default=False
If True, only halo positions and masses will be formatted and saved in data.hdf5, if False, all fields will be saved. If a data.hdf5 file already exists this will be ignored and the the fields in the existing data file will be matched (unless remake=True)
- realtime_clean_rawbool, default=False
If set to True, the unformatted simulation data will be deleted as it is formatted and written to data.hdf5. This saves disk space but will require redownloading data if there are problems with the formatting.
- realtime_clean_raw_checkbool, default=True
Confirms with the user before starting to delete unformatted snaps when realtime_clean_raw is set to True
- download_meta(redownload=False)
Download and generate metadata for the set of snapshots specified when initalizing the class
Note: the metadata saved is dependent on the list of snapshots, therefore if you plan to use many snapshots for some applications but have for some reason only initialized your IllustrisCatalogs instance with a few it is probably best to do something like the following:
>>> x = UniversemachineCatalogs(...,snaps='all') >>> x.download_meta() >>> x = UniversemachineCatalogs(...,snaps=[10,11,12]) >>> x.download()
- download(redownload=False)
Download UniverseMachine catalogs
Under the Hood
These features are useful for building installers for new simulations, which
should work by extending the simim.siminterface._rawsiminterface.SimCatalogs
class.
- class simim.siminterface._rawsiminterface.SimCatalogs(sim, path='auto', snaps='all', updatepath=True)
Generic class for interacting with simulation halo catalogs and converting them into SimIM’s preferred format.
This class isn’t used directly, but should be extended for any simulation that is to be integrated into SimIM. The modules illustris.py, and universemachine.py show how to do this for two examples.
To construct a new SimCatalogs subclass, a few steps are necessary. 1. Define the __init__ method: this method should first create an array self.allsnaps, which contains the number-index of every snap in the orignial simulation. __init__ should then call SimCatalogs.__init__ with the relevant arguments (see Parameters for the SimCatalogs.__init__ method). Then it should construct three dictionaries that define the field names in the unformatted halo catalog files and how these names map to fields in the SimIM data structure.
The mapping between unformatted and formatted fields is done with three dictionaries - self.basic_fields for fields that MUST be included in the formatted SimIM catalog (these are pos_x, pos_y, pos_z, and mass for subhalo positional coordinates and subhamo mass), self.dm_fields for fields describing additional dark matter properties, and self.matter_fields for fields describing baryonic properties. The latter dictionaries can be left empty if no additional properties are in the unformatted catalogs and/or if none of these other properties will ever be propagated into the SimIM formatted data.
The basic fields dictionary should be structured as follows: self.basic_fields =
- {‘[unformatted field name]’:[(‘formatted field name’,
‘formatted field dtype’, ‘formatted field units’, ‘formatted field dependence on hubble constant’ )]}
The use of lists for the values of each dictionary entry is to allow for the possibility that a field in the unformatted catalog consists of a tuple of values. For example, illustris catalogs store halo positins as a tuple ‘SubhaloPos’. To convert this to SimIM format (pos_x, pos_y, pos_z) the following dictionary entry is used:
- {‘SubhaloPos’:[(‘pos_x’,’f’,’Mpc/h’,-1),
(‘pos_y’,’f’,’Mpc/h’,-1), (‘pos_z’,’f’,’Mpc/h’,-1) ]}
The ‘formatted field dtype’ should be whatever data format you want the final data to be written as, and the ‘formatted field units’ and ‘formatted field dependence on hubble constant’ should be the units and hubble constant dependence of the data in its final format, after any transformations have been applied (see below). Note that SimIM generally assumes data are saved in ‘little h’ units, so if they are saved units with no little h, the h dependence should be set to 0.
The dm_fields and matter_fields dictionaries are constructed similarly.
In addition to the dictionaries specifying keys, a dictionary specifying transformations to apply when formatting the data can be providded. This should be called self.transform_keys and should have keys that match the names of unformatted keys in the simulation catalog and arguments that specify a function to apply. For example, SimIM generally assumes distance units of Mpc/h, while Illustris uses kpc/h for some fields. A conversion might look like
{‘SubhaloPos’:lambda x: x/1000}
Finally, the init function should have a check to verify which snapshots have already been downloaded, and which still need to be:
# Check whether snapshots have been downloaded not_downloaded = [] for i in self.snaps:
file_path = [path where snapshot would be saved] if not os.path.exists(file_path):
not_downloaded.append(i)
- if len(not_downloaded) > 0:
warnings.warn(“No data exists for snapshots {} - run .download”.format(not_downloaded))
2. The extended class also need methods to download the halo catalog (self.download) and simulation metadata (self.download_meta). These functions should save the data in the directory listed in self.path. The simulation halo catalogs should be placed in a subdirctory called ‘raw’, and the metadata should be placed in self.meta_path and self.snap_meta_path. Look at existing code for examples of how to do this.
3. The extended class needs a self._loader which takes as arguments the path to the data, a snapshot number, and a list of fields and returns a dictionary containing key-value pairs of the property name and the values for every halo of the property, along with an integer specifying the number of halos found.
Methods
Remove unformatted data for a simulation
format([remake, overwrite, basic, ...])Convert the unformatted simulation data into the uniform format used by SimIM for all halo catalogs.
Initialize the information needed to downlaod and format a simulation.
- Parameters:
- simstring
The string naming the simulation - this string must be in the _acceptedsims list from the _sims.py submodule
- pathoptional, ‘auto’ or string
The path for saving/accessing the simulation data. If ‘auto’, this will be looked up or created in the default SimIM filepaths. Defaults to ‘auto’ and should probably only be changed if you want are making additional copies of the data for some reason.
- snapsoptional, ‘all’ or list of ints
The snapshots to use when downloading/formatting the simulation. Defaults to ‘all’ which will use all known snapshots.
- updatepathoptional, bool
Defaults to True. If True, the path parameter will be saved as the default path to this simulation in future uses.
Methods
Remove unformatted data for a simulation
format([remake, overwrite, basic, ...])Convert the unformatted simulation data into the uniform format used by SimIM for all halo catalogs.
- clean_raw()
Remove unformatted data for a simulation
This method permanently deletes the unformatted data for a simulation. This should only be done if the data is no longer needed - e.g. after the SimIM formatted file has been created and validated.
- format(remake=False, overwrite=False, basic=False, realtime_clean_raw=False, realtime_clean_raw_check=True)
Convert the unformatted simulation data into the uniform format used by SimIM for all halo catalogs.
This method makes a data.hdf5 file containing the formatted data for the snapshots pointed to in the SimCatalogs instance. If a file already exists, it will be added to, not overwritten (i.e. snapshots already present in data.hdf5 won’t be reformatted/replaced, but those not present will be added). By setting remake=True the whole file can be overwritten. By setting overwrite=True snaps present in both the data.hdf5 file and the snap_catalogs instance will be reformatted and rewritten. Note: if a data.hdf5 file exists but contains no snapshots, the behavior will be the same as if remake=True.
Set realtime_clean_raw=True to delete the raw data files as they are processed and save space on disk - be careful as this will permanently delete files and may require re-downloading if there’s a problem formatting.
Set basic=True to save only a limited set of halo properties (position, velocity, mass) instead of everything in the raw catalog. This is ignored if appending to an existing file, and the properties in that file are used instead.
- Parameters:
- remakebool, default=False
If True, a new file will be created (overwriting any data.hdf5 file that previously existed) and the snaps listed in the SimCatalogs instance will be written to it. If False, only new data will be added.
- overwritebool, default=False
If True, snaps already present in the data.hdf5 file but also listed in the SimCatalogs instance will be written over with newly formatted versions. If False, the versions alreay in data.hdf5 will be left untouched
- basicbool, default=False
If True, only halo positions and masses will be formatted and saved in data.hdf5, if False, all fields will be saved. If a data.hdf5 file already exists this will be ignored and the the fields in the existing data file will be matched (unless remake=True)
- realtime_clean_rawbool, default=False
If set to True, the unformatted simulation data will be deleted as it is formatted and written to data.hdf5. This saves disk space but will require redownloading data if there are problems with the formatting.
- realtime_clean_raw_checkbool, default=True
Confirms with the user before starting to delete unformatted snaps when realtime_clean_raw is set to True
- class simim.siminterface._rawsiminterface.Snapshot(index, redshift, metadata)
Class containing information for individual snapshots
This class stores information and does some basic calculations about individual snapshots - it’s initialized when loading and formatting simulation data. Users probably don’t need it for anything
Methods
dif_higherz_snap(other)Determine redshift, time, distance where snap ends and an earlier one begins.
dif_lowerz_snap(other)Determine redshift, time, distance where snap ends and a later one begins.
dif_snap(other)Determine the midpoint between two snaps
Class containing information for individual snapshots
This class stores information and does some basic calculations about individual snapshots - it’s initialized when loading and formatting simulation data. Users probably don’t need it for anything.
Initialization parameters are generally hard coded for the different sims, or otherwise extracted from simulation metadata.
- Parameters:
- indexint
The numerical index of the snapshot
- redshiftfloat
The redshift of the snapshot
- metadatadict
Dictionary containing the snapshot metadata. This must contain (at a minimum) ‘cosmo_h’, ‘cosmo_omega_matter’, ‘cosmo_omega_baryon’ (for defining the cosmology used by the sim)
Methods
dif_higherz_snap(other)Determine redshift, time, distance where snap ends and an earlier one begins.
dif_lowerz_snap(other)Determine redshift, time, distance where snap ends and a later one begins.
dif_snap(other)Determine the midpoint between two snaps
- dif_snap(other)
Determine the midpoint between two snaps
Method to determine the redshift and time of the midpoint between this snapshot and other snapshot. The midpoint is defined to be the cosmoligcal age half-way between the ages of the two snapshots.
- dif_lowerz_snap(other)
Determine redshift, time, distance where snap ends and a later one begins.
If other is 0 it will treat this as the last (in time) snap and assume that it extends to z=0.
- dif_higherz_snap(other)
Determine redshift, time, distance where snap ends and an earlier one begins.
If other is ‘max’ it will treat this as the first (in time) snap and assume that it extends to the actual redshift of the snap.