df_util¶
Utilities for assembly and conversion of HED strings to different forms.
Functions
|
Convert all tags in underlying dataframe to the specified form (in place). |
|
Expands any def tags found in the dataframe. |
|
Return the series, with rows that have the same onset combined. |
|
Gather def-expand tags in the strings/compare with known definitions to find any differences. |
|
Replace column ref in x with y. |
|
Shrink (in place) any def-expand tags found in the specified columns in the dataframe. |
Gather def-expand tags in the strings/compare with known definitions to find any differences. |
|
|
Sorts the series based on Delay tags, so that the onsets are in order after delay is applied. |
- convert_to_form(df, hed_schema, tag_form, columns=None)[source]¶
Convert all tags in underlying dataframe to the specified form (in place).
- Parameters:
df (pd.Dataframe or pd.Series) – The dataframe or series to modify.
hed_schema (HedSchema) – The schema to use to convert tags.
tag_form (str) – HedTag property to convert tags to.
columns (list) – The columns to modify on the dataframe.
- expand_defs(df, hed_schema, def_dict, columns=None)[source]¶
Expands any def tags found in the dataframe.
Converts in place
- Parameters:
df (pd.Dataframe or pd.Series) – The dataframe or series to modify.
hed_schema (HedSchema or None) – The schema to use to identify defs.
def_dict (DefinitionDict) – The definitions to expand.
columns (list or None) – The columns to modify on the dataframe.
- filter_series_by_onset(series, onsets)[source]¶
Return the series, with rows that have the same onset combined.
- Parameters:
series (pd.Series or pd.Dataframe) – the series to filter. If dataframe, it filters the “HED” column
onsets (pd.Series) – the onset column to filter by
- Returns:
the series with rows filtered together.
- Return type:
Series or Dataframe
- process_def_expands(hed_strings, hed_schema, known_defs=None, ambiguous_defs=None)[source]¶
Gather def-expand tags in the strings/compare with known definitions to find any differences.
- Parameters:
hed_strings (list or pd.Series) – A list of HED strings to process.
hed_schema (HedSchema) – The schema to use.
known_defs (DefinitionDict or list or str or None) – A DefinitionDict or anything its constructor takes. These are the known definitions going in, that must match perfectly.
ambiguous_defs (dict) – A dictionary containing ambiguous definitions. format TBD. Currently def name key: list of lists of HED tags values
- Returns:
A tuple containing the DefinitionDict, ambiguous definitions, and errors.
- Return type:
tuple
- replace_ref(text, oldvalue, newvalue='n/a')[source]¶
Replace column ref in x with y. If it’s n/a, delete extra commas/parentheses.
- Parameters:
text (str) – The input string containing the ref enclosed in curly braces.
oldvalue (str) – The full tag or ref to replace
newvalue (str) – The replacement value for the ref.
- Returns:
The modified string with the ref replaced or removed.
- Return type:
str
- shrink_defs(df, hed_schema, columns=None)[source]¶
Shrink (in place) any def-expand tags found in the specified columns in the dataframe.
- Parameters:
df (pd.Dataframe or pd.Series) – The dataframe or series to modify.
hed_schema (HedSchema or None) – The schema to use to identify defs.
columns (list or None) – The columns to modify on the dataframe.
- sort_dataframe_by_onsets(df)[source]¶
Gather def-expand tags in the strings/compare with known definitions to find any differences.
- Parameters:
df (pd.Dataframe) – Dataframe to sort.
- Returns:
The sorted dataframe, or the original dataframe if it didn’t have an onset column.
- split_delay_tags(series, hed_schema, onsets)[source]¶
Sorts the series based on Delay tags, so that the onsets are in order after delay is applied.
- Parameters:
series (pd.Series or None) – the series of tags to split/sort
hed_schema (HedSchema) – The schema to use to identify tags
onsets (pd.Series or None) –
- Returns:
- If we had onsets, a dataframe with 3 columns
”HED”: The HED strings(still str) “onset”: the updated onsets “original_index”: the original source line. Multiple lines can have the same original source line.
- Return type:
sorted_df(pd.Dataframe or None)
Note: This dataframe may be longer than the original series, but it will never be shorter.