SummarizeColumnValuesOp

class SummarizeColumnValuesOp(parameters)[source]

Summarize the values in the columns of a tabular file.

Required remodeling parameters:
  • summary_name (str): The name of the summary.

  • summary_filename (str): Base filename of the summary.

  • skip_columns (list): Names of columns to skip in the summary.

  • value_columns (list): Names of columns to treat as value columns rather than categorical columns.

Optional remodeling parameters:
  • max_categorical (int): Maximum number of unique values to include in summary for a categorical column.

The purpose is to produce a summary of the values in a tabular file.

Methods

hed.tools.remodeling.operations.summarize_column_values_op.SummarizeColumnValuesOp.__init__(...)

Constructor for the summarize column values operation.

hed.tools.remodeling.operations.summarize_column_values_op.SummarizeColumnValuesOp.check_parameters(...)

Verify that the parameters meet the operation specification.

hed.tools.remodeling.operations.summarize_column_values_op.SummarizeColumnValuesOp.do_op(...)

Create a summary of the column values in df.

Attributes

hed.tools.remodeling.operations.summarize_column_values_op.SummarizeColumnValuesOp.MAX_CATEGORICAL

hed.tools.remodeling.operations.summarize_column_values_op.SummarizeColumnValuesOp.PARAMS

hed.tools.remodeling.operations.summarize_column_values_op.SummarizeColumnValuesOp.SUMMARY_TYPE

hed.tools.remodeling.operations.summarize_column_values_op.SummarizeColumnValuesOp.VALUES_PER_LINE

SummarizeColumnValuesOp.__init__(parameters)[source]

Constructor for the summarize column values operation.

Parameters:

parameters (dict) – Dictionary with the parameter values for required and optional parameters.

Raises:
  • KeyError

    • If a required parameter is missing.

    • If an unexpected parameter is provided.

  • TypeError

    • If a parameter has the wrong type.

SummarizeColumnValuesOp.check_parameters(parameters)

Verify that the parameters meet the operation specification.

Parameters:

parameters (dict) – Dictionary of parameters for this operation.

Raises:
  • KeyError

    • If a required parameter is missing.

    • If an unexpected parameter is provided.

  • TypeError

    • If a parameter has the wrong type.

SummarizeColumnValuesOp.do_op(dispatcher, df, name, sidecar=None)[source]

Create a summary of the column values in df.

Parameters:
  • dispatcher (Dispatcher) – Manages the operation I/O.

  • df (DataFrame) – The DataFrame to be remodeled.

  • name (str) – Unique identifier for the dataframe – often the original file path.

  • sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A copy of df.

Return type:

DataFrame

Side effect:

Updates the relevant summary.

SummarizeColumnValuesOp.MAX_CATEGORICAL = 50
SummarizeColumnValuesOp.PARAMS = {'operation': 'summarize_column_values', 'optional_parameters': {'append_timecode': <class 'bool'>, 'max_categorical': <class 'int'>, 'values_per_line': <class 'int'>}, 'required_parameters': {'skip_columns': <class 'list'>, 'summary_filename': <class 'str'>, 'summary_name': <class 'str'>, 'value_columns': <class 'list'>}}
SummarizeColumnValuesOp.SUMMARY_TYPE = 'column_values'
SummarizeColumnValuesOp.VALUES_PER_LINE = 5