ontology_util¶
Utility functions for saving as an ontology or dataframe.
Functions
|
Adds missing HedIds to dataframe. |
|
Convert the dataframe format schema to omn format. |
Returns the default empty dataframes |
|
|
Returns a set of all unique hedIds in the dataframe |
Get the tag attributes from a line. |
|
|
Get the library("Standard" for the standard schema) and first id for a schema range |
|
Merges extra columns from source_df into dest_df, adding the extra columns from the ontology to the schema df. |
|
|
|
Writes out the dataframes using the provided suffixes. |
|
Write out schema as a dataframe, then merge in extra columns from dataframes. |
- assign_hed_ids_section(df, unused_tag_ids)[source]¶
Adds missing HedIds to dataframe.
- Parameters:
df (pd.DataFrame) – The dataframe to add id’s to.
unused_tag_ids (set of int) – The possible hed id’s to assign from
- convert_df_to_omn(dataframes)[source]¶
Convert the dataframe format schema to omn format.
- Parameters:
dataframes (dict) – A set of dataframes representing a schema, potentially including extra columns
- Returns:
omn_file(str): A combined string representing (most of) a schema omn file. omn_data(dict): a dict of DF_SUFFIXES:str, representing each .tsv file in omn format.
- Return type:
tuple
- get_all_ids(df)[source]¶
Returns a set of all unique hedIds in the dataframe
- Parameters:
df (pd.DataFrame) – The dataframe
- Returns:
None if this has no hed column, otherwise all unique numbers as a set.
- Return type:
numbers(Set or None)
- get_attributes_from_row(row)[source]¶
Get the tag attributes from a line.
- Parameters:
row (pd.Series) – A tag line.
- Returns:
Dictionary of attributes.
- Return type:
dict
- get_library_name_and_id(schema)[source]¶
Get the library(“Standard” for the standard schema) and first id for a schema range
- Parameters:
schema (HedSchema) – The schema to check
- Returns:
The capitalized library name first_id(int): the first id for a given library
- Return type:
library_name(str)
- merge_dfs(dest_df, source_df)[source]¶
Merges extra columns from source_df into dest_df, adding the extra columns from the ontology to the schema df.
- Parameters:
dest_df – The dataframe to add extra columns to
source_df – The dataframe to get extra columns from
- save_dataframes(base_filename, dataframe_dict)[source]¶
Writes out the dataframes using the provided suffixes.
Does not validate contents or suffixes.
If base_filename has a .tsv suffix, save directly to the indicated location. If base_filename is a directory(does NOT have a .tsv suffix), save the contents into a directory named that. The subfiles are named the same. e.g. HED8.3.0/HED8.3.0_Tag.tsv
- Parameters:
base_filename (str) – The base filename to use. Output is {base_filename}_{suffix}.tsv See DF_SUFFIXES for all expected names.
str (dataframe_dict(dict of) – df.DataFrame): The list of files to save out. No validation is done.
- update_dataframes_from_schema(dataframes, schema, schema_name='', get_as_ids=False, assign_missing_ids=False)[source]¶
Write out schema as a dataframe, then merge in extra columns from dataframes.
- Parameters:
dataframes (dict) – A full set of schema spreadsheet formatted dataframes
schema (HedSchema) – The schema to write into the dataframes:
schema_name (str) – The name to use to find the schema id range.
get_as_ids (bool) – If True, replace all known references with HedIds
assign_missing_ids (bool) – If True, replacing any blank(new) HedIds with valid ones
- Returns:
- pd.DataFrames): The updated dataframes
These dataframes can potentially have extra columns
- Return type:
dataframes(dict of str