ColumnValueSummary¶
- class ColumnValueSummary(sum_op)[source]
Manager for summaries of column contents for columnar files.
Methods
|
Constructor for column value summary manager. |
|
|
|
Return a dictionary with the summary contained in a TabularSummary. |
|
Return a dictionary of the individual file summaries. |
Return a str version of a list with items separated by a blank. |
|
Return a summary dictionary with the information. |
|
Return a dictionary with the details for individual files and the overall dataset. |
|
Return a complete text summary by assembling the individual pieces. |
|
Return a text summary of the information represented by this summary. |
|
Create a TabularSummary containing the overall dataset summary. |
|
Partition a list into lists of n items. |
|
|
Save the summaries using the format indicated. |
|
Save summary visualizations, if any, using the format indicated. |
|
|
|
Update the summary for a given tabular input file. |
Attributes
- ColumnValueSummary.__init__(sum_op)[source]¶
Constructor for column value summary manager.
- Parameters:
sum_op (SummarizeColumnValuesOp) – Operation associated with this summary.
- static ColumnValueSummary.dump_summary(filename, summary)¶
- ColumnValueSummary.get_details_dict(summary)[source]¶
Return a dictionary with the summary contained in a TabularSummary.
- Parameters:
summary (TabularSummary) – Dictionary of merged summary information.
- Returns:
Dictionary with the information suitable for extracting printout.
- Return type:
dict
- ColumnValueSummary.get_individual(summary_details, separately=True)¶
Return a dictionary of the individual file summaries.
- Parameters:
summary_details (dict) – Dictionary of the individual file summaries.
separately (bool) – If True (the default), each individual summary has a header for separate output.
- static ColumnValueSummary.get_list_str(lst)[source]¶
Return a str version of a list with items separated by a blank.
- Returns:
String version of list.
- Return type:
str
- ColumnValueSummary.get_summary(individual_summaries='separate')¶
Return a summary dictionary with the information.
- Parameters:
individual_summaries (str) – “separate”, “consolidated”, or “none”
- Returns:
dict - dictionary with “Dataset” and “Individual files” keys.
- Notes: The individual_summaries value is processed as follows:
“separate” individual summaries are to be in separate files.
“consolidated” means that the individual summaries are in same file as overall summary.
“none” means that only the overall summary is produced.
- ColumnValueSummary.get_summary_details(include_individual=True)¶
Return a dictionary with the details for individual files and the overall dataset.
- Parameters:
include_individual (bool) – If True, summaries for individual files are included.
- Returns:
dict - a dictionary with ‘Dataset’ and ‘Individual files’ keys.
Notes
The ‘Dataset’ value is either a string or a dictionary with the overall summary.
- The ‘Individual files’ value is dictionary whose keys are file names and values are
their corresponding summaries.
Users are expected to provide merge_all_info and get_details_dict functions to support this.
- ColumnValueSummary.get_text_summary(individual_summaries='separate')¶
Return a complete text summary by assembling the individual pieces.
- Parameters:
individual_summaries (str) – One of the values “separate”, “consolidated”, or “none”.
- Returns:
Complete text summary.
- Return type:
str
- Notes: The options are:
“none”: Just has “Dataset” key.
“consolidated” Has “Dataset” and “Individual files” keys with the values of each is a string.
“separate” Has “Dataset” and “Individual files” keys. The values of “Individual files” is a dict.
- ColumnValueSummary.get_text_summary_details(include_individual=True)¶
Return a text summary of the information represented by this summary.
- Parameters:
include_individual (bool) – If True (the default), individual summaries are in “Individual files”.
- ColumnValueSummary.merge_all_info()[source]¶
Create a TabularSummary containing the overall dataset summary.
- Returns:
TabularSummary - the summary object for column values.
- static ColumnValueSummary.partition_list(lst, n)[source]¶
Partition a list into lists of n items.
- Parameters:
lst (list) – List to be partitioned.
n (int) – Number of items in each sublist.
- Returns:
list of lists of n elements, the last might have fewer.
- Return type:
list
- ColumnValueSummary.save(save_dir, file_formats=['.txt'], individual_summaries='separate', task_name='')¶
Save the summaries using the format indicated.
- Parameters:
save_dir (str) – Name of the directory to save the summaries in.
file_formats (list) – List of file formats to use for saving.
individual_summaries (str) – Save one file or multiple files based on setting.
task_name (str) – If this summary corresponds to files from a task, the task_name is used in filename.
- ColumnValueSummary.save_visualizations(save_dir, file_formats=['.svg'], individual_summaries='separate', task_name='')¶
Save summary visualizations, if any, using the format indicated.
- Parameters:
save_dir (str) – Name of the directory to save the summaries in.
file_formats (list) – List of file formats to use for saving.
individual_summaries (str) – Save one file or multiple files based on setting.
task_name (str) – If this summary corresponds to files from a task, the task_name is used in filename.
- ColumnValueSummary.update_summary(new_info)[source]¶
Update the summary for a given tabular input file.
- Parameters:
new_info (dict) – A dictionary with the parameters needed to update a summary.
Notes
The summary information is kept in separate TabularSummary objects for each file.
The summary needs a “name” str and a “df” .
- ColumnValueSummary.DISPLAY_INDENT = ' '¶
- ColumnValueSummary.INDIVIDUAL_SUMMARIES_PATH = 'individual_summaries'¶