ontology_util¶

Utility functions for saving as an ontology or dataframe.

Functions

`assign_hed_ids_section`(df, unused_tag_ids)	Adds missing HedIds to dataframe.
`convert_df_to_omn`(dataframes)	Convert the dataframe format schema to omn format.
`create_empty_dataframes`()	Returns the default empty dataframes
`get_all_ids`(df)	Returns a set of all unique hedIds in the dataframe
`get_attributes_from_row`(row)	Get the tag attributes from a line.
`get_library_name_and_id`(schema)	Get the library("Standard" for the standard schema) and first id for a schema range
`merge_dfs`(dest_df, source_df)	Merges extra columns from source_df into dest_df, adding the extra columns from the ontology to the schema df.
`remove_prefix`(text, prefix)
`save_dataframes`(base_filename, dataframe_dict)	Writes out the dataframes using the provided suffixes.
`update_dataframes_from_schema`(dataframes, schema)	Write out schema as a dataframe, then merge in extra columns from dataframes.

assign_hed_ids_section(df, unused_tag_ids)[source]¶

Adds missing HedIds to dataframe.

Parameters:

df (pd.DataFrame) – The dataframe to add id’s to.
unused_tag_ids (set of int) – The possible hed id’s to assign from

convert_df_to_omn(dataframes)[source]¶

Convert the dataframe format schema to omn format.

Parameters:: dataframes (dict) – A set of dataframes representing a schema, potentially including extra columns
Returns:: omn_file(str): A combined string representing (most of) a schema omn file. omn_data(dict): a dict of DF_SUFFIXES:str, representing each .tsv file in omn format.
Return type:: tuple

create_empty_dataframes()[source]¶: Returns the default empty dataframes

get_all_ids(df)[source]¶

Returns a set of all unique hedIds in the dataframe

Parameters:: df (pd.DataFrame) – The dataframe
Returns:: None if this has no hed column, otherwise all unique numbers as a set.
Return type:: numbers(Set or None)

get_attributes_from_row(row)[source]¶

Get the tag attributes from a line.

Parameters:: row (pd.Series) – A tag line.
Returns:: Dictionary of attributes.
Return type:: dict

get_library_name_and_id(schema)[source]¶

Get the library(“Standard” for the standard schema) and first id for a schema range

Parameters:: schema (HedSchema) – The schema to check
Returns:: The capitalized library name first_id(int): the first id for a given library
Return type:: library_name(str)

merge_dfs(dest_df, source_df)[source]¶

Merges extra columns from source_df into dest_df, adding the extra columns from the ontology to the schema df.

Parameters:

dest_df – The dataframe to add extra columns to
source_df – The dataframe to get extra columns from

remove_prefix(text, prefix)[source]¶

save_dataframes(base_filename, dataframe_dict)[source]¶

Writes out the dataframes using the provided suffixes.

Does not validate contents or suffixes.

If base_filename has a .tsv suffix, save directly to the indicated location. If base_filename is a directory(does NOT have a .tsv suffix), save the contents into a directory named that. The subfiles are named the same. e.g. HED8.3.0/HED8.3.0_Tag.tsv

Parameters:

base_filename (str) – The base filename to use. Output is {base_filename}_{suffix}.tsv See DF_SUFFIXES for all expected names.
str (dataframe_dict(dict of) – df.DataFrame): The list of files to save out. No validation is done.

update_dataframes_from_schema(dataframes, schema, schema_name='', get_as_ids=False, assign_missing_ids=False)[source]¶

Write out schema as a dataframe, then merge in extra columns from dataframes.

Parameters:

dataframes (dict) – A full set of schema spreadsheet formatted dataframes
schema (HedSchema) – The schema to write into the dataframes:
schema_name (str) – The name to use to find the schema id range.
get_as_ids (bool) – If True, replace all known references with HedIds
assign_missing_ids (bool) – If True, replacing any blank(new) HedIds with valid ones

Returns:

pd.DataFrames): The updated dataframes: These dataframes can potentially have extra columns

Return type:

dataframes(dict of str