MergeConsecutiveOp

class MergeConsecutiveOp(parameters)[source]

Merge consecutive rows of a columnar file with same column value.

Required remodeling parameters:
  • column_name (str): name of column whose consecutive values are to be compared (the merge column).

  • event_code (str or int or float): the particular value in the match column to be merged.

  • set_durations (bool): If true, set the duration of the merged event to the extent of the merged events.

  • ignore_missing (bool): If true, missing match_columns are ignored.

Optional remodeling parameters:
  • match_columns (list): A list of columns whose values have to be matched for two events to be the same.

Notes

This operation is meant for time-based tabular files that have an onset column.

Methods

MergeConsecutiveOp.__init__(parameters)

Constructor for the merge consecutive operation.

MergeConsecutiveOp.do_op(dispatcher, df, name)

Merge consecutive rows with the same column value.

MergeConsecutiveOp.validate_input_data(...)

Verify that the column name is not in match columns.

Attributes

MergeConsecutiveOp.NAME

MergeConsecutiveOp.PARAMS

MergeConsecutiveOp.__init__(parameters)[source]

Constructor for the merge consecutive operation.

Parameters:

parameters (dict) – Actual values of the parameters for the operation.

MergeConsecutiveOp.do_op(dispatcher, df, name, sidecar=None)[source]

Merge consecutive rows with the same column value.

Parameters:
  • dispatcher (Dispatcher) – Manages the operation I/O.

  • df (DataFrame) – The DataFrame to be remodeled.

  • name (str) – Unique identifier for the dataframe – often the original file path.

  • sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A new dataframe after processing.

Return type:

Dataframe

Raises:

ValueError

  • If dataframe does not have the anchor column and ignore_missing is False.

  • If a match column is missing and ignore_missing is False.

  • If the durations were to be set and the dataframe did not have an onset column.

  • If the durations were to be set and the dataframe did not have a duration column.

static MergeConsecutiveOp.validate_input_data(parameters)[source]

Verify that the column name is not in match columns.

Parameters:

parameters (dict) – Dictionary of parameters of actual implementation.

MergeConsecutiveOp.NAME = 'merge_consecutive'
MergeConsecutiveOp.PARAMS = {'additionalProperties': False, 'properties': {'column_name': {'description': 'The name of the column to check for repeated consecutive codes.', 'type': 'string'}, 'event_code': {'description': 'The event code to match for duplicates.', 'type': ['string', 'number']}, 'ignore_missing': {'description': 'If true, missing match columns are ignored.', 'type': 'boolean'}, 'match_columns': {'description': 'List of columns whose values must also match to be considered a repeat.', 'items': {'type': 'string'}, 'type': 'array'}, 'set_durations': {'description': 'If true, then the duration should be computed based on start of first to end of last.', 'type': 'boolean'}}, 'required': ['column_name', 'event_code', 'set_durations', 'ignore_missing'], 'type': 'object'}