ontology_util

Utility functions for saving as an ontology or dataframe.

Functions

assign_hed_ids_section(df, unused_tag_ids)

Adds missing HedIds to dataframe.

convert_df_to_omn(dataframes)

Convert the dataframe format schema to omn format.

create_empty_dataframes()

Returns the default empty dataframes

get_all_ids(df)

Returns a set of all unique hedIds in the dataframe

get_attributes_from_row(row)

Get the tag attributes from a line.

get_library_name_and_id(schema)

Get the library("Standard" for the standard schema) and first id for a schema range

merge_dfs(dest_df, source_df)

Merges extra columns from source_df into dest_df, adding the extra columns from the ontology to the schema df.

remove_prefix(text, prefix)

save_dataframes(base_filename, dataframe_dict)

Writes out the dataframes using the provided suffixes.

update_dataframes_from_schema(dataframes, schema)

Write out schema as a dataframe, then merge in extra columns from dataframes.

assign_hed_ids_section(df, unused_tag_ids)[source]

Adds missing HedIds to dataframe.

Parameters:
  • df (pd.DataFrame) – The dataframe to add id’s to.

  • unused_tag_ids (set of int) – The possible hed id’s to assign from

convert_df_to_omn(dataframes)[source]

Convert the dataframe format schema to omn format.

Parameters:

dataframes (dict) – A set of dataframes representing a schema, potentially including extra columns

Returns:

omn_file(str): A combined string representing (most of) a schema omn file. omn_data(dict): a dict of DF_SUFFIXES:str, representing each .tsv file in omn format.

Return type:

tuple

create_empty_dataframes()[source]

Returns the default empty dataframes

get_all_ids(df)[source]

Returns a set of all unique hedIds in the dataframe

Parameters:

df (pd.DataFrame) – The dataframe

Returns:

None if this has no hed column, otherwise all unique numbers as a set.

Return type:

numbers(Set or None)

get_attributes_from_row(row)[source]

Get the tag attributes from a line.

Parameters:

row (pd.Series) – A tag line.

Returns:

Dictionary of attributes.

Return type:

dict

get_library_name_and_id(schema)[source]

Get the library(“Standard” for the standard schema) and first id for a schema range

Parameters:

schema (HedSchema) – The schema to check

Returns:

The capitalized library name first_id(int): the first id for a given library

Return type:

library_name(str)

merge_dfs(dest_df, source_df)[source]

Merges extra columns from source_df into dest_df, adding the extra columns from the ontology to the schema df.

Parameters:
  • dest_df – The dataframe to add extra columns to

  • source_df – The dataframe to get extra columns from

remove_prefix(text, prefix)[source]
save_dataframes(base_filename, dataframe_dict)[source]

Writes out the dataframes using the provided suffixes.

Does not validate contents or suffixes.

If base_filename has a .tsv suffix, save directly to the indicated location. If base_filename is a directory(does NOT have a .tsv suffix), save the contents into a directory named that. The subfiles are named the same. e.g. HED8.3.0/HED8.3.0_Tag.tsv

Parameters:
  • base_filename (str) – The base filename to use. Output is {base_filename}_{suffix}.tsv See DF_SUFFIXES for all expected names.

  • str (dataframe_dict(dict of) – df.DataFrame): The list of files to save out. No validation is done.

update_dataframes_from_schema(dataframes, schema, schema_name='', get_as_ids=False, assign_missing_ids=False)[source]

Write out schema as a dataframe, then merge in extra columns from dataframes.

Parameters:
  • dataframes (dict) – A full set of schema spreadsheet formatted dataframes

  • schema (HedSchema) – The schema to write into the dataframes:

  • schema_name (str) – The name to use to find the schema id range.

  • get_as_ids (bool) – If True, replace all known references with HedIds

  • assign_missing_ids (bool) – If True, replacing any blank(new) HedIds with valid ones

Returns:

pd.DataFrames): The updated dataframes

These dataframes can potentially have extra columns

Return type:

dataframes(dict of str