ColumnMapper¶
- class ColumnMapper(sidecar=None, tag_columns=None, column_prefix_dictionary=None, optional_tag_columns=None, warn_on_missing_column=False)[source]
Mapping of a base input file columns into HED tags.
Notes
All column numbers are 0 based.
Methods
|
Constructor for ColumnMapper. |
Validate there are no blank column names. |
|
Find all issues given the current column_map, tag_columns, etc. |
|
Get all the issues with finalizing column mapping(duplicate columns, missing required, etc.). |
|
|
Return def dicts from every column description. |
Return the column numbers or names that are mapped to be HedTags. |
|
Return the transformers to use on a dataframe. |
|
|
Set the column number to name mapping. |
Set the column prefix dictionary. |
|
|
Set tag columns and optional tag columns. |
Attributes
Return the column_prefix_dictionary with numbers turned into names where possible. |
|
Pass through to get the sidecar ColumnMetadata. |
|
Return the known tag and optional tag columns with numbers as names when possible. |
- ColumnMapper.__init__(sidecar=None, tag_columns=None, column_prefix_dictionary=None, optional_tag_columns=None, warn_on_missing_column=False)[source]¶
Constructor for ColumnMapper.
- Parameters:
sidecar (Sidecar) – A sidecar to gather column data from.
tag_columns – (list): A list of ints or strings containing the columns that contain the HED tags. Sidecar column definitions will take precedent if there is a conflict with tag_columns.
column_prefix_dictionary (dict) – Dictionary with keys that are column numbers/names and values are HED tag prefixes to prepend to the tags in that column before processing.
optional_tag_columns (list) – A list of ints or strings containing the columns that contain the HED tags. If the column is otherwise unspecified, convert this column type to HEDTags.
warn_on_missing_column (bool) – If True, issue mapping warnings on column names that are missing from the sidecar.
Notes
All column numbers are 0 based.
- The column_prefix_dictionary may be deprecated/renamed in the future.
These are no longer prefixes, but rather converted to value columns: {“key”: “Description”, 1: “Label/”} will turn into value columns as {“key”: “Description/#”, 1: “Label/#”} It will be a validation issue if column 1 is called “key” in the above example. This means it no longer accepts anything but the value portion only in the columns.
- static ColumnMapper.check_for_blank_names(column_map, allow_blank_names)[source]¶
Validate there are no blank column names.
- Parameters:
column_map (iterable) – A list of column names.
allow_blank_names (bool) – Only find issues if True.
- Returns:
A list of dicts, one per issue.
- Return type:
issues(list)
- ColumnMapper.check_for_mapping_issues(allow_blank_names=False)[source]¶
Find all issues given the current column_map, tag_columns, etc.
- Parameters:
allow_blank_names (bool) – Only flag blank names if False.
- Returns:
All issues found as a list of dicts.
- Return type:
issue_list(list of dict)
- ColumnMapper.get_column_mapping_issues()[source]¶
Get all the issues with finalizing column mapping(duplicate columns, missing required, etc.).
Notes
This is deprecated and now a wrapper for “check_for_mapping_issues()”.
- Returns:
A list dictionaries of all issues found from mapping column names to numbers.
- Return type:
list
- ColumnMapper.get_def_dict(hed_schema, extra_def_dicts=None)[source]¶
Return def dicts from every column description.
- Parameters:
hed_schema (Schema) – A HED schema object to use for extracting definitions.
extra_def_dicts (list, DefinitionDict, or None) – Extra dicts to add to the list.
- Returns:
A single definition dict representing all the data(and extra def dicts).
- Return type:
DefinitionDict
- ColumnMapper.get_tag_columns()[source]¶
Return the column numbers or names that are mapped to be HedTags.
Note: This is NOT the tag_columns or optional_tag_columns parameter, though they set it.
- Returns:
- A list of column numbers or names that are ColumnType.HedTags.
0-based if integer-based, otherwise column name.
- Return type:
column_identifiers(list)
- ColumnMapper.get_transformers()[source]¶
Return the transformers to use on a dataframe.
- Returns:
dict({str or int: func}): The functions to use to transform each column. need_categorical(list of int): A list of columns to treat as categorical.
- Return type:
tuple(dict, list)
- ColumnMapper.set_column_map(new_column_map=None)[source]¶
Set the column number to name mapping.
- Parameters:
new_column_map (list or dict) – Either an ordered list of the column names or column_number:column name. dictionary. In both cases, column numbers start at 0.
- Returns:
List of issues. Each issue is a dictionary.
- Return type:
list
- ColumnMapper.set_column_prefix_dictionary(column_prefix_dictionary, finalize_mapping=True)[source]¶
Set the column prefix dictionary.
- ColumnMapper.set_tag_columns(tag_columns=None, optional_tag_columns=None, finalize_mapping=True)[source]¶
Set tag columns and optional tag columns.
- Parameters:
tag_columns (list) – A list of ints or strings containing the columns that contain the HED tags. If None, clears existing tag_columns
optional_tag_columns (list) – A list of ints or strings containing the columns that contain the HED tags, but not an error if missing. If None, clears existing tag_columns
finalize_mapping (bool) – Re-generate the internal mapping if True, otherwise no effect until finalize.
- ColumnMapper.column_prefix_dictionary¶
Return the column_prefix_dictionary with numbers turned into names where possible.
- Returns:
A column_prefix_dictionary with column labels as keys.
- Return type:
column_prefix_dictionary(list of str or int)
- ColumnMapper.sidecar_column_data¶
Pass through to get the sidecar ColumnMetadata.
- Returns:
ColumnMetadata}): The column metadata defined by this sidecar.
- Return type:
dict({str
- ColumnMapper.tag_columns¶
Return the known tag and optional tag columns with numbers as names when possible.
- Returns:
A list of all tag and optional tag columns as labels.
- Return type:
tag_columns(list of str or int)