TabularSummary

class TabularSummary(value_cols=None, skip_cols=None, name='')[source]

Summarize the contents of tabular files.

Methods

hed.tools.analysis.tabular_summary.TabularSummary.__init__([...])

Constructor for a BIDS tabular file summary.

hed.tools.analysis.tabular_summary.TabularSummary.extract_sidecar_template()

Extract a BIDS sidecar-compatible dictionary.

hed.tools.analysis.tabular_summary.TabularSummary.extract_summary(...)

Create a TabularSummary object from a serialized summary

hed.tools.analysis.tabular_summary.TabularSummary.get_columns_info(...)

Extract unique value counts for columns.

hed.tools.analysis.tabular_summary.TabularSummary.get_number_unique([...])

Return the number of unique values in columns.

hed.tools.analysis.tabular_summary.TabularSummary.get_summary([...])

hed.tools.analysis.tabular_summary.TabularSummary.make_combined_dicts(...)

Return combined and individual summaries.

hed.tools.analysis.tabular_summary.TabularSummary.update(data)

Update the counts based on data.

hed.tools.analysis.tabular_summary.TabularSummary.update_summary(tab_sum)

Add TabularSummary values to this object.

Attributes

TabularSummary.__init__(value_cols=None, skip_cols=None, name='')[source]

Constructor for a BIDS tabular file summary.

Parameters:
  • value_cols (list, None) – List of columns to be treated as value columns.

  • skip_cols (list, None) – List of columns to be skipped.

  • name (str) – Name associated with the dictionary.

TabularSummary.extract_sidecar_template()[source]

Extract a BIDS sidecar-compatible dictionary.

static TabularSummary.extract_summary(summary_info)[source]

Create a TabularSummary object from a serialized summary

Parameters:

summary_info (dict or str) – A JSON string or a dictionary containing contents of a TabularSummary.

Returns:

contains the information in summary_info as a TabularSummary object.

Return type:

TabularSummary

static TabularSummary.get_columns_info(dataframe, skip_cols=None)[source]

Extract unique value counts for columns.

Parameters:
  • dataframe (DataFrame) – The DataFrame to be analyzed.

  • skip_cols (list) – List of names of columns to be skipped in the extraction.

Returns:

A dictionary with keys that are column names and values that

are dictionaries of unique value counts.

Return type:

dict

TabularSummary.get_number_unique(column_names=None)[source]

Return the number of unique values in columns.

Parameters:

column_names (list, None) – A list of column names to analyze or all columns if None.

Returns:

Column names are the keys and the number of unique values in the column are the values.

Return type:

dict

TabularSummary.get_summary(as_json=False)[source]
static TabularSummary.make_combined_dicts(file_dictionary, skip_cols=None)[source]

Return combined and individual summaries.

Parameters:
  • file_dictionary (FileDictionary) – Dictionary of file name keys and full path.

  • skip_cols (list) – Name of the column.

Returns:

  • TabularSummary: Summary of the file dictionary.

  • dict: of individual TabularSummary objects.

Return type:

tuple

TabularSummary.update(data, name=None)[source]

Update the counts based on data.

Parameters:
  • data (DataFrame, str, or list) – DataFrame containing data to update.

  • name (str) – Name of the summary

TabularSummary.update_summary(tab_sum)[source]

Add TabularSummary values to this object.

Parameters:

tab_sum (TabularSummary) – A TabularSummary to be combined.

Notes

  • The value_cols and skip_cols are updated as long as they are not contradictory.

  • A new skip column cannot be used.