df_util

Utilities for assembly and conversion of HED strings to different forms.

Functions

convert_to_form(df, hed_schema, tag_form[, ...])

Convert all tags in underlying dataframe to the specified form (in place).

expand_defs(df, hed_schema, def_dict[, columns])

Expands any def tags found in the dataframe.

filter_series_by_onset(series, onsets)

Return the series, with rows that have the same onset combined.

process_def_expands(hed_strings, hed_schema)

Gather def-expand tags in the strings/compare with known definitions to find any differences.

replace_ref(text, oldvalue[, newvalue])

Replace column ref in x with y.

shrink_defs(df, hed_schema[, columns])

Shrink (in place) any def-expand tags found in the specified columns in the dataframe.

sort_dataframe_by_onsets(df)

Gather def-expand tags in the strings/compare with known definitions to find any differences.

split_delay_tags(series, hed_schema, onsets)

Sorts the series based on Delay tags, so that the onsets are in order after delay is applied.

convert_to_form(df, hed_schema, tag_form, columns=None)[source]

Convert all tags in underlying dataframe to the specified form (in place).

Parameters:
  • df (pd.Dataframe or pd.Series) – The dataframe or series to modify.

  • hed_schema (HedSchema) – The schema to use to convert tags.

  • tag_form (str) – HedTag property to convert tags to.

  • columns (list) – The columns to modify on the dataframe.

expand_defs(df, hed_schema, def_dict, columns=None)[source]

Expands any def tags found in the dataframe.

Converts in place

Parameters:
  • df (pd.Dataframe or pd.Series) – The dataframe or series to modify.

  • hed_schema (HedSchema or None) – The schema to use to identify defs.

  • def_dict (DefinitionDict) – The definitions to expand.

  • columns (list or None) – The columns to modify on the dataframe.

filter_series_by_onset(series, onsets)[source]

Return the series, with rows that have the same onset combined.

Parameters:
  • series (pd.Series or pd.Dataframe) – the series to filter. If dataframe, it filters the “HED” column

  • onsets (pd.Series) – the onset column to filter by

Returns:

the series with rows filtered together.

Return type:

Series or Dataframe

process_def_expands(hed_strings, hed_schema, known_defs=None, ambiguous_defs=None)[source]

Gather def-expand tags in the strings/compare with known definitions to find any differences.

Parameters:
  • hed_strings (list or pd.Series) – A list of HED strings to process.

  • hed_schema (HedSchema) – The schema to use.

  • known_defs (DefinitionDict or list or str or None) – A DefinitionDict or anything its constructor takes. These are the known definitions going in, that must match perfectly.

  • ambiguous_defs (dict) – A dictionary containing ambiguous definitions. format TBD. Currently def name key: list of lists of HED tags values

Returns:

A tuple containing the DefinitionDict, ambiguous definitions, and errors.

Return type:

tuple

replace_ref(text, oldvalue, newvalue='n/a')[source]

Replace column ref in x with y. If it’s n/a, delete extra commas/parentheses.

Parameters:
  • text (str) – The input string containing the ref enclosed in curly braces.

  • oldvalue (str) – The full tag or ref to replace

  • newvalue (str) – The replacement value for the ref.

Returns:

The modified string with the ref replaced or removed.

Return type:

str

shrink_defs(df, hed_schema, columns=None)[source]

Shrink (in place) any def-expand tags found in the specified columns in the dataframe.

Parameters:
  • df (pd.Dataframe or pd.Series) – The dataframe or series to modify.

  • hed_schema (HedSchema or None) – The schema to use to identify defs.

  • columns (list or None) – The columns to modify on the dataframe.

sort_dataframe_by_onsets(df)[source]

Gather def-expand tags in the strings/compare with known definitions to find any differences.

Parameters:

df (pd.Dataframe) – Dataframe to sort.

Returns:

The sorted dataframe, or the original dataframe if it didn’t have an onset column.

split_delay_tags(series, hed_schema, onsets)[source]

Sorts the series based on Delay tags, so that the onsets are in order after delay is applied.

Parameters:
  • series (pd.Series or None) – the series of tags to split/sort

  • hed_schema (HedSchema) – The schema to use to identify tags

  • onsets (pd.Series or None) –

Returns:

If we had onsets, a dataframe with 3 columns

”HED”: The hed strings(still str) “onset”: the updated onsets “original_index”: the original source line. Multiple lines can have the same original source line.

Return type:

sorted_df(pd.Dataframe or None)

Note: This dataframe may be longer than the original series, but it will never be shorter.