API Reference¶

Session¶

class radical.analytics.Session(src, stype, sid=None, _entities=None, _init=True)[source]¶

__init__(src, stype, sid=None, _entities=None, _init=True)[source]¶

Create a radical.analytics session for analysis.

The session is created from a set of traces, which usually have been produced by a Session object in the RCT stack, such as radical.pilot or radical.entk. Profiles are accepted in two forms: in a directory, or in a tarball (of such a directory). In the latter case, the tarball are extracted into $TMP, and then handled just as the directory case.

If no sid (session ID) is specified, that ID is derived from the directory name.

concurrency(state=None, event=None, time=None, sampling=None)[source]¶

This method accepts the same set of parameters as the ranges() method, and will use the ranges() method to obtain a set of ranges. It will return a time series, counting the number of tasks which are concurrently matching the ranges filter at any point in time.

The additional parameter sampling determines the exact points in time for which the concurrency is computed, and thus determines the sampling rate for the returned time series. If not specified, the time series will contain all points at which the concurrency changed. If specified, it is interpreted as second (float) interval at which, after the starting point (begin of first event matching the filters) the concurrency is computed.

Returned is an ordered list of tuples:

[ [time_0, concurrency_0],
  [time_1, concurrency_1],
  ...
  [time_n, concurrency_n] ]

where time_n is represented as float, and concurrency_n as int.

Example:

session.filter(etype='task').concurrency(state=[rp.AGENT_EXECUTING,
    rp.AGENT_STAGING_OUTPUT_PENDING])

consistency(mode=None)[source]¶

Performs a number of data consistency checks, and returns a set of UIDs for entities which have been found to be inconsistent. The method accepts a single parameter mode which can be a list of strings defining what consistency checks are to be performed. Valid strings are:

state_model: check if all entity states are in adherence to the respective entity state model
event_model: check if all entity events are in adherence to the respective entity event model
timestamps: check if events and states are recorded with correct ordering in time.

If not specified, the method will execute all three checks.

After this method has been run, each checked entity will have more detailed consistency information available via:

entity.consistency['state_model'] (bool)
entity.consistency['event_model'] (bool)
entity.consistency['timestamps' ] (bool)
entity.consistency['log' ]        (list of strings)

The boolean values each indicate consistency of the respective test, the log will contain human readable information about specific consistency violations.

duration(state=None, event=None, time=None, ranges=None)[source]¶

This method accepts the same set of parameters as the ranges() method, and will use the ranges() method to obtain a set of ranges. It will return the sum of the durations for all resulting & collapsed ranges.

Example:

session.duration(state=[rp.NEW, rp.FINAL]))

where rp.FINAL is a list of final task states.

ranges(state=None, event=None, time=None, collapse=True)[source]¶

Gets a set of initial and final conditions, and computes time ranges in accordance to those conditions from all session entities. The resulting set of ranges is then collapsed to the minimal equivalent set of ranges covering the same set of times.

Please refer to the Entity.ranges documentation on detail on the constrain parameters.

Setting ‘collapse’ to ‘True’ (default) will prompt the method to collapse the resulting set of ranges.

rate(state=None, event=None, time=None, sampling=None, first=False)[source]¶

This method accepts the same parameters as the timestamps() method: it will count all matching events and state transitions as given, and will return a time series of the rate of how many of those events and/or transitions occurred per second.

The additional parameter sampling determines the exact points in time for which the rate is computed, and thus determines the sampling rate for the returned time series. If not specified, the time series will contain all points at which and event occurred, and the rate value will only be determined by the time passed between two consecutive events. If specified, it is interpreted as second (float) interval at which, after the starting point (begin of first event matching the filters) the rate is computed.

Returned is an ordered list of tuples:

[ [time_0, rate_0] ,
  [time_1, rate_1] ,
  ...
  [time_n, rate_n] ]

where time_n is represented as float, and rate_n as int.

The time parameter is expected to be a single tuple, or a list of tuples, each defining a pair of start and end time which are used to constrain the resulting time series.

The ‘first’ is defined, only the first matching event fir the selected entities is considered viable.

Example:

session.filter(etype='task').rate(state=[rp.AGENT_EXECUTING])

timestamps(state=None, event=None, time=None, first=False)[source]¶

This method accepts a set of conditions, and returns the list of timestamps for which those conditions applied, i.e., for which state transitions or events are known which match the given ‘state’ or ‘event’ parameter. If no match is found, an empty list is returned.

Both state and event can be lists, in which case the union of all timestamps are returned.

The time parameter is expected to be a single tuple, or a list of tuples, each defining a pair of start and end time which are used to constrain the matching timestamps.

If first is set to True, only the timestamps for the first matching events (per entity) are returned.

The returned list will be sorted.

tzero(t)[source]¶: Setting a tzero timestamp will shift all timestamps for all entities in this session by that amount. This simplifies the alignment of multiple sessions, or the focus on specific events.

usage(alloc_entity, alloc_events, block_entity, block_events, use_entity, use_events)[source]¶

This method creates a dict with three entries: alloc, block, use. Those three dict entries in turn have a a dict of entity IDs for all entities which have blocks in the respective category, and foreach of those entity IDs the dict values will be a list of rectangles.

A resource is considered:

alloc (allocated) when it is owned by the RCT application;
block (blocked) when it is reserveed for a specific task;
use (used) when it is utilized by that task.

Each of the rectangles represents a continuous block of resources which is alloced/blocked/used:

x_0 time when alloc/block/usage begins;
x_1 time when alloc/block/usage ends;
y_0 lowest index of a continuous block of resource IDs;
y_1 highest index of a continuous block of resource IDs.

Any specific entity (pilot, task) can have a set of such resource blocks, for example, a task might be placed over multiple, non-consecutive nodes:

gpu and cpu resources are rendered as separate blocks (rectangles).

Parameters:

alloc_entity (Entity) – Entity instance which allocates resources
alloc_events (list) – event tuples which specify allocation time
block_entity (Entity) – Entity instance which blocks resources
block_events (list) – event tuples which specify blocking time
use_entity (Entity) – Entity instance which uses resources
use_events (list) – event tuples which specify usage time

Example:

usage('pilot', [{ru.STATE: None, ru.EVENT: 'bootstrap_0_start'},
                {ru.STATE: None, ru.EVENT: 'bootstrap_0_stop' }],
      'task' , [{ru.STATE: None, ru.EVENT: 'schedule_ok'      },
                {ru.STATE: None, ru.EVENT: 'unschedule_stop'  }],
      'task' , [{ru.STATE: None, ru.EVENT: 'exec_start'       },
                {ru.STATE: None, ru.EVENT: 'exec_stop'        }])

Entity¶

class radical.analytics.Entity(_uid, _profile, _details)[source]¶

__init__(_uid, _profile, _details)[source]¶

Parameters:	uid (str) – an ID assumed to be unique in the scope of an RA Session profile – a list of profile events for this entity details – a dictionary of complementary information on this entity

duration(state=None, event=None, time=None, ranges=None)[source]¶: This method accepts a set of initial and final conditions, interprets them as documented in the ranges() method (which has the same signature), and then returns the difference between the resulting timestamps.

ranges(state=None, event=None, time=None, expand=False, collapse=True)[source]¶

This method accepts a set of initial and final conditions, in the form of range of state and or event specifiers:

entity.ranges(state=[['INITIAL_STATE_1', 'INITIAL_STATE_2'],
                      'FINAL_STATE_1',   'FINAL_STATE_2'  ]],
              event=[[ initial_event_1,   initial_event_2 ]
                     [ final_event_1,     final_event_2   ]],
              time =[[2.0, 2.5], [3.0, 3.5]])

More specifically, the state and event parameter are expected to be a tuple, where the first element defines the initial condition, and the second element defines the final condition. The time parameter is expected to be a single tuple, or a list of tuples, each defining a pair of start and end time which are used to constrain the resulting ranges. States are expected as strings, events as full event tuples:

[ru.TIME,  ru.NAME, ru.UID,  ru.STATE, ru.EVENT, ru.MSG,  ru.ENTITY]

where empty fields are not applied in the filtering - all other fields must match exactly. The events can also be specified as dictionaries, which then don’t need to have all fields set.

The method will:

determine the earliest timestamp when any of the given initial conditions have been met, which can be either an event or a state;

determine the next timestamp when any of the given final conditions have been met (when expand is set to False [default]) OR

determine the last timestamp when any of the given final conditions have been met (when expand is set to True)

From that final point in time the search for the next initial condition applies again, which may result in another time range to be found. The method returns the set of found ranges, as a list of [start, end] time tuples.

The resulting ranges are constrained by the time constraints, if such are given.

Note that with expand=True, at most one range will be found.

Setting ‘collapse’ to ‘True’ (default) will prompt the method to collapse the resulting set of ranges.

The returned ranges are time-sorted

Example:

task.ranges(state=[rp.NEW, rp.FINAL]))
task.ranges(event=[{ru.NAME : 'exec_start'},
                   {ru.NAME : 'exec_ok'}])

timestamps(state=None, event=None, time=None)[source]¶

This method accepts a set of conditions, and returns the list of timestamps for which those conditions applied, i.e., for which state transitions or events are known which match the given ‘state’ or ‘event’ parameter. If no match is found, an empty list is returned.

Both state and event can be lists, in which case the union of all timestamps are returned.

The time parameter is expected to be a single tuple, or a list of tuples, each defining a pair of start and end time which are used to constrain the matching timestamps.

The returned list will be sorted.

Experiment¶

class radical.analytics.Experiment(sources, stype)[source]¶

__init__(sources, stype)[source]¶

This class represents an RCT experiment, i.e., a series of RA sessions which are collectively analyzed.

sources is expected to be a list of tuples of session source paths pointing to tarballs or session directories. The order of tuples in the list determines the default order used in plots etc.

The session type stype will be uniformely applied to all sessions.

utilization(metrics, rtype='cpu', udurations=None)[source]¶

return five dictionaries:

provided resources
consumed resources
absolute stats
relative stats
information about resource utilization

The resource dictionaries have the following structures:

provided = {
    <session_id> : {
        'metric_1' : {
            'uid_1'        : [float, list],
            'uid_2'        : [float, list],
            ...
        },
        'metric_2' : {
            'uid_1'        : [float, list],
            'uid_2'        : [float, list],
            ...
        },
        ...
    },
    ...
}
consumed = {
    <session_id> : {
        'metric_1' : {
            'uid_1'         : [float, list]
            'uid_2'         : [float, list],
            ...
        },
        'metric_2' :         {
            'uid_1'         : [float, list],
            'uid_2'         : [float, list],
            ...
        },
        ...
    },
    ...
}

float is always in tasks of resource * time, (think core-hours), list is a list of 4-tuples [t0, t1, r0, r1] which signify at what specific time interval (t0 to t1) what specific resources (r0 to r1) have been used. The task of the resources are here dependent on the session type: only RP sessions are supported at the moment where those resource values are indexes in to the list of cores used in that specific session (offset over multiple pilots, if needed).

utils¶

radical.analytics.get_plotsize(width, fraction=1, subplots=(1, 1))[source]¶

Sets aesthetic figure dimensions to avoid scaling in latex.

Parameters:	width (float) – Width in points (pts). fraction (float) – Fraction of the width which you wish the figure to occupy. subplots (tuple) – Number of raws and number of columns of the plot.
Returns:	fig_dim – Dimensions of figure in inches.
Return type:	tuple

radical.analytics.get_mplstyle(name)[source]¶

Returns the installation path of a Matplotlib style.

Parameters:	name (string) – Filename ending in .txt.
Returns:	path – Normalized path.
Return type:	string

radical.analytics.stack_transitions(series, tresource, to_stack)[source]¶

Creates data frames for each metric and combines them into one data frame for alignment. Since transitions obviously happen at arbitrary times, the timestamps for metric A may see no transitions for metric B. When using a combined timeline, we end up with NaN entries for some metrics on most timestamp, which in turn leads to gaps when plotting. So we fill the NaN values with the previous valid value, which in our case holds until the next transition happens.

Parameters:	series (dict) – Pairs of timestamps for each metric of each type of resource. E.g. series[‘cpu’][‘term’] = [[0.0, 0.0], [302.4374113082886, 100.0], [304.6761999130249, 0.0]]. tresource (string) – Type of resource. E.g., ‘cpu’ or ‘gpu’. to_stack (list) – List of metrics to stack. E.g., [‘bootstrap’, ‘exec_cmd’, ‘schedule’, ‘exec_rp’, ‘term’, ‘idle’].
Returns:	stacked – Columns: time and one for each metric. Rows: timestamp and percentage / amount of resource utilization for each metric at that point in time.
Return type:	pandas.DataFrame

radical.analytics.get_pilot_series(session, pilot, tmap, resrc, percent=True)[source]¶

Derives the series of pilot resource transition points from the metrics.

Parameters:

session (ra.Session) – The Session object of RADICAL-Analytics created from a RCT sandbox.
pilot (ra.Entity) – The pilot object of session.
tmap (dict) – Map events to transition points in which a metric changes its owner. E.g., [{1: ‘bootstrap_0_start’}, ‘system’, ‘bootstrap’] defines bootstrap_0_start as the event in which resources pass from the system to the bootstrapper.
resrc (list) – Type of resources. E.g., [‘cpu’, ‘gpu’].
percent (bool) – Whether we want to return resource utilization as percentage of the total resources available or as count of a type of resource.

Returns:

p_resrc (dict) – Amount of resources in the pilot.
series (dict) – List of time series per metric and resource type. E.g., series[‘cpu’][‘term’] = [[0.0, 0.0], [302.4374113082886, 100.0], [304.6761999130249, 0.0]].
x (dict) – Mix and max value of the X-axes.

radical.analytics.get_plot_utilization(metrics, consumed, t_zero, sid)[source]¶

Calculates the resources utilized by a set of metrics. Utilization is calculated for each resource without stacking and aggregation. May take hours or days with >100K tasks, 100K resource items. Use get_pilot_series and stack_transitions instead.

Parameters:

metrics (list) – Each element is a list with name, metrics and color. E.g., [‘Bootstrap’, [‘boot’, ‘setup_1’], ‘#c6dbef’].
consumed (dict) – min-max timestamp and resource id range for each metric and pilot. E.g., {‘boot’: {‘pilot.0000’: [[2347.582849740982, 2365.6164498329163, 0, 167]}.
t_zero (float) – Start timestamp for the pilot.
sid (string) – Identifier of a ra.Session object.

Returns:

legend (dict) – keys: Type of resource (‘cpu’, ‘gpu’); values: list of matplotlib.lines.Line2D objects for the plot’s legend.
patches (dict) – keys: Type of resource (‘cpu’, ‘gpu’); values: list of matplotlib.patches.Rectangle. Each rectangle represents the utilization for a set of resources.
x (dict) – Mix and max value of the X-axes.
y (dict) – Mix and max value of the Y-axes.

radical.analytics.get_pilots_zeros(ra_exp_obj)[source]¶

Calculates when a set of pilots become available.

Parameters:	ra_exp_obj (ra.Experiment) – RADICAL-Analytics Experiment object with all the pilot entity objects for which to calculate the starting timestamp.
Returns:	p_zeros – Session ID, pilot ID and starting timestamp. E.g., {‘re.session.login1.lei.018775.0005’: {‘pilot.0000’: 2347.582849740982}}.
Return type:	dict

radical.analytics.to_latex(data)[source]¶

Transforms the input string(s) so that it can be used as latex compiled plot label, title etc. Escapes special characters with a slash.

Parameters:	data (list or str) – An individual string or a list of strings to transform.
Returns:	data – Transformed data.
Return type:	list of str

radical.analytics.tabulate_durations(durations)[source]¶

Takes a dict of durations as defined in rp.utils (e.g., rp.utils.PILOT_DURATIONS_DEBUG) and returns a list of durations with their start and stop timestamps. That list can be directly converted to a panda.df.

Parameters:	durations (dict) – Dict of lists of dicts/lists of dicts. It contains details about states and events.
Returns:	data – List of dicts, each dict containing ‘Duration Name’, ‘Start Timestamp’ and ‘Stop Timestamp’.
Return type:	list