SummarizeColumnValuesOp¶
- class SummarizeColumnValuesOp(parameters)[source]
Summarize the values in the columns of a tabular file.
- Required remodeling parameters:
summary_name (str): The name of the summary.
summary_filename (str): Base filename of the summary.
skip_columns (list): Names of columns to skip in the summary.
value_columns (list): Names of columns to treat as value columns rather than categorical columns.
- Optional remodeling parameters:
max_categorical (int): Maximum number of unique values to include in summary for a categorical column.
The purpose is to produce a summary of the values in a tabular file.
Methods
|
Constructor for the summarize column values operation. |
|
Verify that the parameters meet the operation specification. |
|
Create a summary of the column values in df. |
Attributes
- SummarizeColumnValuesOp.__init__(parameters)[source]¶
Constructor for the summarize column values operation.
- Parameters:
parameters (dict) – Dictionary with the parameter values for required and optional parameters.
- Raises:
KeyError –
If a required parameter is missing.
If an unexpected parameter is provided.
TypeError –
If a parameter has the wrong type.
- SummarizeColumnValuesOp.check_parameters(parameters)¶
Verify that the parameters meet the operation specification.
- Parameters:
parameters (dict) – Dictionary of parameters for this operation.
- Raises:
KeyError –
If a required parameter is missing.
If an unexpected parameter is provided.
TypeError –
If a parameter has the wrong type.
- SummarizeColumnValuesOp.do_op(dispatcher, df, name, sidecar=None)[source]¶
Create a summary of the column values in df.
- Parameters:
dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Not needed for this operation.
- Returns:
A copy of df.
- Return type:
DataFrame
- Side effect:
Updates the relevant summary.
- SummarizeColumnValuesOp.MAX_CATEGORICAL = 50¶
- SummarizeColumnValuesOp.PARAMS = {'operation': 'summarize_column_values', 'optional_parameters': {'append_timecode': <class 'bool'>, 'max_categorical': <class 'int'>, 'values_per_line': <class 'int'>}, 'required_parameters': {'skip_columns': <class 'list'>, 'summary_filename': <class 'str'>, 'summary_name': <class 'str'>, 'value_columns': <class 'list'>}}¶
- SummarizeColumnValuesOp.SUMMARY_TYPE = 'column_values'¶
- SummarizeColumnValuesOp.VALUES_PER_LINE = 5¶