ColumnValueSummary¶

class ColumnValueSummary(sum_op)[source]: Manager for summaries of column contents for columnar files.

Methods

`ColumnValueSummary.__init__`(sum_op)	Constructor for column value summary manager.
`ColumnValueSummary.dump_summary`(filename, ...)
`ColumnValueSummary.get_details_dict`(summary)	Return a dictionary with the summary contained in a TabularSummary.
`ColumnValueSummary.get_individual`(...[, ...])	Return a dictionary of the individual file summaries.
`ColumnValueSummary.get_list_str`(lst)	Return a str version of a list with items separated by a blank.
`ColumnValueSummary.get_summary`([...])	Return a summary dictionary with the information.
`ColumnValueSummary.get_summary_details`([...])	Return a dictionary with the details for individual files and the overall dataset.
`ColumnValueSummary.get_text_summary`([...])	Return a complete text summary by assembling the individual pieces.
`ColumnValueSummary.get_text_summary_details`([...])	Return a text summary of the information represented by this summary.
`ColumnValueSummary.merge_all_info`()	Create a TabularSummary containing the overall dataset summary.
`ColumnValueSummary.partition_list`(lst, n)	Partition a list into lists of n items.
`ColumnValueSummary.save`(save_dir[, ...])	Save the summaries using the format indicated.
`ColumnValueSummary.save_visualizations`(save_dir)	Save summary visualizations, if any, using the format indicated.
`ColumnValueSummary.sort_dict`(count_dict[, ...])
`ColumnValueSummary.update_summary`(new_info)	Update the summary for a given tabular input file.

Attributes

`ColumnValueSummary.DISPLAY_INDENT`
`ColumnValueSummary.INDIVIDUAL_SUMMARIES_PATH`

ColumnValueSummary.__init__(sum_op)[source]¶

Constructor for column value summary manager.

Parameters:: sum_op (BaseOp) – Operation associated with this summary.

static ColumnValueSummary.dump_summary(filename, summary)¶

ColumnValueSummary.get_details_dict(summary)[source]¶

Return a dictionary with the summary contained in a TabularSummary.

Parameters:: summary (TabularSummary) – Dictionary of merged summary information.
Returns:: Dictionary with the information suitable for extracting printout.
Return type:: dict

ColumnValueSummary.get_individual(summary_details, separately=True)¶

Return a dictionary of the individual file summaries.

Parameters:

summary_details (dict) – Dictionary of the individual file summaries.
separately (bool) – If True (the default), each individual summary has a header for separate output.

static ColumnValueSummary.get_list_str(lst)[source]¶

Return a str version of a list with items separated by a blank.

Returns:: String version of list.
Return type:: str

ColumnValueSummary.get_summary(individual_summaries='separate')¶

Return a summary dictionary with the information.

Parameters:: individual_summaries (str) – “separate”, “consolidated”, or “none”
Returns:: dict - dictionary with “Dataset” and “Individual files” keys.

Notes: The individual_summaries value is processed as follows:

“separate” individual summaries are to be in separate files.
“consolidated” means that the individual summaries are in same file as overall summary.
“none” means that only the overall summary is produced.

ColumnValueSummary.get_summary_details(include_individual=True)¶

Return a dictionary with the details for individual files and the overall dataset.

Parameters:: include_individual (bool) – If True, summaries for individual files are included.
Returns:: dict - a dictionary with ‘Dataset’ and ‘Individual files’ keys.

Notes

The ‘Dataset’ value is either a string or a dictionary with the overall summary.
The ‘Individual files’ value is dictionary whose keys are file names and values are
their corresponding summaries.

Users are expected to provide merge_all_info and get_details_dict functions to support this.

ColumnValueSummary.get_text_summary(individual_summaries='separate')¶

Return a complete text summary by assembling the individual pieces.

Parameters:: individual_summaries (str) – One of the values “separate”, “consolidated”, or “none”.
Returns:: Complete text summary.
Return type:: str

Notes: The options are:

“none”: Just has “Dataset” key.
“consolidated” Has “Dataset” and “Individual files” keys with the values of each is a string.
“separate” Has “Dataset” and “Individual files” keys. The values of “Individual files” is a dict.

ColumnValueSummary.get_text_summary_details(include_individual=True)¶

Return a text summary of the information represented by this summary.

Parameters:: include_individual (bool) – If True (the default), individual summaries are in “Individual files”.

ColumnValueSummary.merge_all_info()[source]¶

Create a TabularSummary containing the overall dataset summary.

Returns:: TabularSummary - the summary object for column values.

static ColumnValueSummary.partition_list(lst, n)[source]¶

Partition a list into lists of n items.

Parameters:

lst (list) – List to be partitioned.
n (int) – Number of items in each sublist.

Returns:

list of lists of n elements, the last might have fewer.

Return type:

list

ColumnValueSummary.save(save_dir, file_formats=['.txt'], individual_summaries='separate', task_name='')¶

Save the summaries using the format indicated.

Parameters:

save_dir (str) – Name of the directory to save the summaries in.
file_formats (list) – List of file formats to use for saving.
individual_summaries (str) – Save one file or multiple files based on setting.
task_name (str) – If this summary corresponds to files from a task, the task_name is used in filename.

ColumnValueSummary.save_visualizations(save_dir, file_formats=['.svg'], individual_summaries='separate', task_name='')¶

Save summary visualizations, if any, using the format indicated.

Parameters:

save_dir (str) – Name of the directory to save the summaries in.
file_formats (list) – List of file formats to use for saving.
individual_summaries (str) – Save one file or multiple files based on setting.
task_name (str) – If this summary corresponds to files from a task, the task_name is used in filename.

static ColumnValueSummary.sort_dict(count_dict, reverse=False)[source]¶

ColumnValueSummary.update_summary(new_info)[source]¶

Update the summary for a given tabular input file.

Parameters:: new_info (dict) – A dictionary with the parameters needed to update a summary.

Notes

The summary information is kept in separate TabularSummary objects for each file.
The summary needs a “name” str and a “df” .

ColumnValueSummary.DISPLAY_INDENT = ' '¶

ColumnValueSummary.INDIVIDUAL_SUMMARIES_PATH = 'individual_summaries'¶