SummarizeColumnValuesOp¶
- class SummarizeColumnValuesOp(parameters)[source]
Summarize the values in the columns of a columnar file.
- Required remodeling parameters:
summary_name (str): The name of the summary.
summary_filename (str): Base filename of the summary.
- Optional remodeling parameters:
append_timecode (bool): (Optional: Default False) If True append timecodes to the summary filename.
max_categorical (int): Maximum number of unique values to include in summary for a categorical column.
skip_columns (list): Names of columns to skip in the summary.
value_columns (list): Names of columns to treat as value columns rather than categorical columns.
values_per_line (int): The number of values output per line in the summary.
The purpose is to produce a summary of the values in a tabular file.
Methods
|
Constructor for the summarize column values operation. |
|
Create a summary of the column values in df. |
Additional validation required of operation parameters not performed by JSON schema validator. |
Attributes
- SummarizeColumnValuesOp.__init__(parameters)[source]¶
Constructor for the summarize column values operation.
- Parameters:
parameters (dict) – Dictionary with the parameter values for required and optional parameters.
- SummarizeColumnValuesOp.do_op(dispatcher, df, name, sidecar=None)[source]¶
Create a summary of the column values in df.
- Parameters:
dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Not needed for this operation.
- Returns:
A copy of df.
- Return type:
DataFrame
- Side effect:
Updates the relevant summary.
- static SummarizeColumnValuesOp.validate_input_data(parameters)[source]¶
Additional validation required of operation parameters not performed by JSON schema validator.
- SummarizeColumnValuesOp.MAX_CATEGORICAL = 50¶
- SummarizeColumnValuesOp.NAME = 'summarize_column_values'¶
- SummarizeColumnValuesOp.PARAMS = {'additionalProperties': False, 'properties': {'append_timecode': {'description': 'If true, the timecode is appended to the base filename so each run has a unique name.', 'type': 'boolean'}, 'max_categorical': {'description': 'Maximum number of unique column values to show in text description.', 'type': 'integer'}, 'skip_columns': {'description': 'List of columns to skip when creating the summary.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'summary_filename': {'description': 'Name to use for the summary file name base.', 'type': 'string'}, 'summary_name': {'description': 'Name to use for the summary in titles.', 'type': 'string'}, 'value_columns': {'description': 'Columns to be annotated with a single HED annotation and placeholder.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'values_per_line': {'description': 'Number of items per line to display in the text file.', 'type': 'integer'}}, 'required': ['summary_name', 'summary_filename'], 'type': 'object'}¶
- SummarizeColumnValuesOp.SUMMARY_TYPE = 'column_values'¶
- SummarizeColumnValuesOp.VALUES_PER_LINE = 5¶