SummarizeColumnValuesOp

class SummarizeColumnValuesOp(parameters)[source]

Summarize the values in the columns of a columnar file.

Required remodeling parameters:
  • summary_name (str): The name of the summary.

  • summary_filename (str): Base filename of the summary.

Optional remodeling parameters:
  • append_timecode (bool): (Optional: Default False) If True append timecodes to the summary filename.

  • max_categorical (int): Maximum number of unique values to include in summary for a categorical column.

  • skip_columns (list): Names of columns to skip in the summary.

  • value_columns (list): Names of columns to treat as value columns rather than categorical columns.

  • values_per_line (int): The number of values output per line in the summary.

The purpose is to produce a summary of the values in a tabular file.

Methods

SummarizeColumnValuesOp.__init__(parameters)

Constructor for the summarize column values operation.

SummarizeColumnValuesOp.do_op(dispatcher, ...)

Create a summary of the column values in df.

SummarizeColumnValuesOp.validate_input_data(...)

Additional validation required of operation parameters not performed by JSON schema validator.

Attributes

SummarizeColumnValuesOp.MAX_CATEGORICAL

SummarizeColumnValuesOp.NAME

SummarizeColumnValuesOp.PARAMS

SummarizeColumnValuesOp.SUMMARY_TYPE

SummarizeColumnValuesOp.VALUES_PER_LINE

SummarizeColumnValuesOp.__init__(parameters)[source]

Constructor for the summarize column values operation.

Parameters:

parameters (dict) – Dictionary with the parameter values for required and optional parameters.

SummarizeColumnValuesOp.do_op(dispatcher, df, name, sidecar=None)[source]

Create a summary of the column values in df.

Parameters:
  • dispatcher (Dispatcher) – Manages the operation I/O.

  • df (DataFrame) – The DataFrame to be remodeled.

  • name (str) – Unique identifier for the dataframe – often the original file path.

  • sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A copy of df.

Return type:

DataFrame

Side effect:

Updates the relevant summary.

static SummarizeColumnValuesOp.validate_input_data(parameters)[source]

Additional validation required of operation parameters not performed by JSON schema validator.

SummarizeColumnValuesOp.MAX_CATEGORICAL = 50
SummarizeColumnValuesOp.NAME = 'summarize_column_values'
SummarizeColumnValuesOp.PARAMS = {'additionalProperties': False, 'properties': {'append_timecode': {'description': 'If true, the timecode is appended to the base filename so each run has a unique name.', 'type': 'boolean'}, 'max_categorical': {'description': 'Maximum number of unique column values to show in text description.', 'type': 'integer'}, 'skip_columns': {'description': 'List of columns to skip when creating the summary.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'summary_filename': {'description': 'Name to use for the summary file name base.', 'type': 'string'}, 'summary_name': {'description': 'Name to use for the summary in titles.', 'type': 'string'}, 'value_columns': {'description': 'Columns to be annotated with a single HED annotation and placeholder.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'values_per_line': {'description': 'Number of items per line to display in the text file.', 'type': 'integer'}}, 'required': ['summary_name', 'summary_filename'], 'type': 'object'}
SummarizeColumnValuesOp.SUMMARY_TYPE = 'column_values'
SummarizeColumnValuesOp.VALUES_PER_LINE = 5