ontology_util

Utility functions for saving as an ontology or dataframe.

Functions

assign_hed_ids_section(df, unused_tag_ids)

Adds missing HedIds to dataframe.

convert_df_to_omn(dataframes)

Convert the dataframe format schema to omn format.

get_all_ids(df)

Returns a set of all unique hedIds in the dataframe

get_prefixes(dataframes)

merge_dfs(dest_df, source_df)

Merges extra columns from source_df into dest_df, adding the extra columns from the ontology to the schema df.

update_dataframes_from_schema(dataframes, schema)

Write out schema as a dataframe, then merge in extra columns from dataframes.

assign_hed_ids_section(df, unused_tag_ids)[source]

Adds missing HedIds to dataframe.

Parameters:
  • df (pd.DataFrame) – The dataframe to add id’s to.

  • unused_tag_ids (set of int) – The possible HED id’s to assign from

convert_df_to_omn(dataframes)[source]

Convert the dataframe format schema to omn format.

Parameters:

dataframes (dict) – A set of dataframes representing a schema, potentially including extra columns

Returns:

omn_file(str): A combined string representing (most of) a schema omn file. omn_data(dict): a dict of DF_SUFFIXES:str, representing each .tsv file in omn format.

Return type:

tuple

get_all_ids(df)[source]

Returns a set of all unique hedIds in the dataframe

Parameters:

df (pd.DataFrame) – The dataframe

Returns:

None if this has no HED column, otherwise all unique numbers as a set.

Return type:

numbers(Set or None)

get_prefixes(dataframes)[source]
merge_dfs(dest_df, source_df)[source]

Merges extra columns from source_df into dest_df, adding the extra columns from the ontology to the schema df.

Parameters:
  • dest_df – The dataframe to add extra columns to

  • source_df – The dataframe to get extra columns from

update_dataframes_from_schema(dataframes, schema, schema_name='', get_as_ids=False, assign_missing_ids=False)[source]

Write out schema as a dataframe, then merge in extra columns from dataframes.

Parameters:
  • dataframes (dict) – A full set of schema spreadsheet formatted dataframes

  • schema (HedSchema) – The schema to write into the dataframes:

  • schema_name (str) – The name to use to find the schema id range.

  • get_as_ids (bool) – If True, replace all known references with HedIds

  • assign_missing_ids (bool) – If True, replacing any blank(new) HedIds with valid ones

Returns:

pd.DataFrames): The updated dataframes

These dataframes can potentially have extra columns

Return type:

dataframes(dict of str