df_util

Functions

calculate_attribute_type(attribute_entry)

Returns the type of this attribute(annotation, object, data)

convert_filenames_to_dict(filenames)

Infers filename meaning based on suffix, e.g.

create_empty_dataframes()

Returns the default empty dataframes

get_attributes_from_row(row)

Get the tag attributes from a line.

get_library_name_and_id(schema)

Get the library("Standard" for the standard schema) and first id for a schema range

load_dataframes(filenames)

Load the dataframes from the source folder or series of files.

merge_dataframe_dicts(df_dict1, df_dict2[, ...])

Create a new dictionary of DataFrames where dict2 is merged into dict1.

merge_dataframes(df1, df2, key)

Create a new dataframe where df2 is merged into df1 and duplicates are eliminated.

remove_prefix(text, prefix)

save_dataframes(base_filename, dataframe_dict)

Writes out the dataframes using the provided suffixes.

calculate_attribute_type(attribute_entry)[source]

Returns the type of this attribute(annotation, object, data)

Returns:

“annotation”, “object”, or “data”.

Return type:

attribute_type(str)

convert_filenames_to_dict(filenames)[source]

Infers filename meaning based on suffix, e.g. _Tag for the tags sheet

Parameters:

filenames (str or None or list or dict) – The list to convert to a dict If a string with a .tsv suffix: Save to that location, adding the suffix to each .tsv file If a string with no .tsv suffix: Save to that folder, with the contents being the separate .tsv files.

Returns:

str): The required suffix to filename mapping

Return type:

filename_dict(str

create_empty_dataframes()[source]

Returns the default empty dataframes

get_attributes_from_row(row)[source]

Get the tag attributes from a line.

Parameters:

row (pd.Series) – A tag line.

Returns:

Dictionary of attributes.

Return type:

dict

get_library_name_and_id(schema)[source]

Get the library(“Standard” for the standard schema) and first id for a schema range

Parameters:

schema (HedSchema) – The schema to check

Returns:

The capitalized library name first_id(int): the first id for a given library

Return type:

library_name(str)

load_dataframes(filenames)[source]

Load the dataframes from the source folder or series of files.

Parameters:

filenames (str or None or list or dict) – The input filenames If a string with a .tsv suffix: Save to that location, adding the suffix to each .tsv file If a string with no .tsv suffix: Save to that folder, with the contents being the separate .tsv files.

Returns:

dataframes): The suffix:dataframe dict

Return type:

dataframes_dict(str

merge_dataframe_dicts(df_dict1, df_dict2, key_column='rdfs.label')[source]

Create a new dictionary of DataFrames where dict2 is merged into dict1.

Does not validate contents or suffixes.

Parameters:
  • str (df_dict2(dict of) – df.DataFrame): dataframes to use as destination merge.

  • str – df.DataFrame): dataframes to use as a merge element.

  • key_column (str) – name of the column that is treated as the key when dataframes are merged

merge_dataframes(df1, df2, key)[source]

Create a new dataframe where df2 is merged into df1 and duplicates are eliminated.

Parameters:
  • df1 (df.DataFrame) – dataframe to use as destination merge.

  • df2 (df.DataFrame) – dataframe to use as a merge element.

  • key (str) – name of the column that is treated as the key when dataframes are merged

Returns:

The merged dataframe.

Return type:

df.DataFrame

remove_prefix(text, prefix)[source]
save_dataframes(base_filename, dataframe_dict)[source]

Writes out the dataframes using the provided suffixes.

Does not validate contents or suffixes.

If base_filename has a .tsv suffix, save directly to the indicated location. If base_filename is a directory(does NOT have a .tsv suffix), save the contents into a directory named that. The subfiles are named the same. e.g. HED8.3.0/HED8.3.0_Tag.tsv

Parameters:
  • base_filename (str) – The base filename to use. Output is {base_filename}_{suffix}.tsv See DF_SUFFIXES for all expected names.

  • str (dataframe_dict(dict of) – df.DataFrame): The list of files to save out. No validation is done.