df_util¶

Utilities for assembly and conversion of HED strings to different forms.

Functions

`convert_to_form`(df, hed_schema, tag_form[, ...])	Convert all tags in underlying dataframe to the specified form (in place).
`expand_defs`(df, hed_schema, def_dict[, columns])	Expands any def tags found in the dataframe.
`filter_series_by_onset`(series, onsets)	Return the series, with rows that have the same onset combined.
`process_def_expands`(hed_strings, hed_schema)	Gather def-expand tags in the strings/compare with known definitions to find any differences.
`replace_ref`(text, oldvalue[, newvalue])	Replace column ref in x with y.
`shrink_defs`(df, hed_schema[, columns])	Shrink (in place) any def-expand tags found in the specified columns in the dataframe.
`sort_dataframe_by_onsets`(df)	Gather def-expand tags in the strings/compare with known definitions to find any differences.
`split_delay_tags`(series, hed_schema, onsets)	Sorts the series based on Delay tags, so that the onsets are in order after delay is applied.

convert_to_form(df, hed_schema, tag_form, columns=None)[source]¶

Convert all tags in underlying dataframe to the specified form (in place).

Parameters:

df (pd.Dataframe or pd.Series) – The dataframe or series to modify.
hed_schema (HedSchema) – The schema to use to convert tags.
tag_form (str) – HedTag property to convert tags to.
columns (list) – The columns to modify on the dataframe.

expand_defs(df, hed_schema, def_dict, columns=None)[source]¶

Expands any def tags found in the dataframe.

Converts in place

Parameters:

df (pd.Dataframe or pd.Series) – The dataframe or series to modify.
hed_schema (HedSchema or None) – The schema to use to identify defs.
def_dict (DefinitionDict) – The definitions to expand.
columns (list or None) – The columns to modify on the dataframe.

filter_series_by_onset(series, onsets)[source]¶

Return the series, with rows that have the same onset combined.

Parameters:

series (pd.Series or pd.Dataframe) – the series to filter. If dataframe, it filters the “HED” column
onsets (pd.Series) – the onset column to filter by

Returns:

the series with rows filtered together.

Return type:

Series or Dataframe

process_def_expands(hed_strings, hed_schema, known_defs=None, ambiguous_defs=None)[source]¶

Gather def-expand tags in the strings/compare with known definitions to find any differences.

Parameters:

hed_strings (list or pd.Series) – A list of HED strings to process.
hed_schema (HedSchema) – The schema to use.
known_defs (DefinitionDict or list or str or None) – A DefinitionDict or anything its constructor takes. These are the known definitions going in, that must match perfectly.
ambiguous_defs (dict) – A dictionary containing ambiguous definitions. format TBD. Currently def name key: list of lists of HED tags values

Returns:

A tuple containing the DefinitionDict, ambiguous definitions, and errors.

Return type:

tuple

replace_ref(text, oldvalue, newvalue='n/a')[source]¶

Replace column ref in x with y. If it’s n/a, delete extra commas/parentheses.

Parameters:

text (str) – The input string containing the ref enclosed in curly braces.
oldvalue (str) – The full tag or ref to replace
newvalue (str) – The replacement value for the ref.

Returns:

The modified string with the ref replaced or removed.

Return type:

str

shrink_defs(df, hed_schema, columns=None)[source]¶

Shrink (in place) any def-expand tags found in the specified columns in the dataframe.

Parameters:

df (pd.Dataframe or pd.Series) – The dataframe or series to modify.
hed_schema (HedSchema or None) – The schema to use to identify defs.
columns (list or None) – The columns to modify on the dataframe.

sort_dataframe_by_onsets(df)[source]¶

Gather def-expand tags in the strings/compare with known definitions to find any differences.

Parameters:: df (pd.Dataframe) – Dataframe to sort.
Returns:: The sorted dataframe, or the original dataframe if it didn’t have an onset column.

split_delay_tags(series, hed_schema, onsets)[source]¶

Sorts the series based on Delay tags, so that the onsets are in order after delay is applied.

Parameters:

series (pd.Series or None) – the series of tags to split/sort
hed_schema (HedSchema) – The schema to use to identify tags
onsets (pd.Series or None) –

Returns:

If we had onsets, a dataframe with 3 columns: ”HED”: The hed strings(still str) “onset”: the updated onsets “original_index”: the original source line. Multiple lines can have the same original source line.

Return type:

sorted_df(pd.Dataframe or None)

Note: This dataframe may be longer than the original series, but it will never be shorter.