hed.models.BaseInput

class BaseInput(file, file_type=None, worksheet_name=None, has_column_names=True, mapper=None, def_mapper=None, name=None)[source]

Represents a spreadsheet file.

__init__(file, file_type=None, worksheet_name=None, has_column_names=True, mapper=None, def_mapper=None, name=None)[source]

Constructor for the BaseInput class.

Parameters
  • file (str or file like) – An xlsx/tsv file to open.

  • file_type (str) – “.xlsx” for excel, “.tsv” or “.txt” for tsv. data. Derived from file if file is a str.

  • worksheet_name (str or None) – The name of the Excel workbook worksheet that contains the HED tags. Not applicable to tsv files.

  • has_column_names (bool) – True if file has column names. The validation will skip over the first line of the file. False, if otherwise.

  • mapper (ColumnMapper) – Pass in a built column mapper(see HedInput or EventsInput for examples), or None to just retrieve all columns as hed tags.

  • name (str or None) – Optional field for how this file will report errors.

Methods

__init__(file[, file_type, worksheet_name, ...])

Constructor for the BaseInput class.

convert_to_long(hed_schema[, error_handler])

Converts all tags in a given spreadsheet to long form

convert_to_short(hed_schema[, error_handler])

Converts all tags in a given spreadsheet to short form

extract_definitions([error_handler])

Gathers and validates all definitions found in this spreadsheet

get_def_and_mapper_issues(error_handler[, ...])

Returns formatted issues found with definitions and columns.

get_worksheet([worksheet_name])

Returns the requested worksheet from the workbook by name

iter_dataframe([mapper, return_row_dict, ...])

Generates a list of parsed rows based on the given column mapper.

iter_raw([hed_ops, error_handler])

Generates an iterator that goes over every row in the file without modification.

reset_mapper(new_mapper)

Set the column mapper to the passed in one, allowing you to view the file differently.

set_cell(row_number, column_number, ...[, ...])

param row_number

The row number of the spreadsheet to set

to_csv([file, output_processed_file])

Returns the file as a csv string.

to_excel(file[, output_processed_file])

param file

Location to save this file. Can be file, or stream/file like.

update_definition_mapper_with_file(def_dict)

Adds label definitions gathered from the given list of inputs if this has a definition mapper.

validate_file(hed_ops[, name, ...])

Run the given hed_ops on all columns and rows of the spreadsheet

Attributes

COMMA_DELIMITER

EXCEL_EXTENSION

FILE_EXTENSION

FILE_INPUT

STRING_INPUT

TAB_DELIMITER

TEXT_EXTENSION

dataframe

has_column_names

loaded_workbook

name

worksheet_name

class BaseInput(file, file_type=None, worksheet_name=None, has_column_names=True, mapper=None, def_mapper=None, name=None)[source]

Bases: object

Represents a spreadsheet file.

COMMA_DELIMITER = ','
EXCEL_EXTENSION = ['.xlsx']
FILE_EXTENSION = ['.tsv', '.txt', '.xlsx']
FILE_INPUT = 'file'
STRING_INPUT = 'string'
TAB_DELIMITER = '\t'
TEXT_EXTENSION = ['.tsv', '.txt']
convert_to_long(hed_schema, error_handler=None)[source]

Converts all tags in a given spreadsheet to long form

Parameters
  • hed_schema (HedSchema) – The schema to use to convert tags.

  • error_handler (ErrorHandler) – The error handler to use for context, uses a default one if none.

Returns

issues_list – A list of issues found during conversion

Return type

[{}]

convert_to_short(hed_schema, error_handler=None)[source]

Converts all tags in a given spreadsheet to short form

Parameters
  • hed_schema (HedSchema) – The schema to use to convert tags.

  • error_handler (ErrorHandler) – The error handler to use for context, uses a default one if none.

Returns

issues_list – A list of issues found during conversion

Return type

[{}]

property dataframe
extract_definitions(error_handler=None)[source]

Gathers and validates all definitions found in this spreadsheet

Parameters

error_handler (ErrorHandler) – The error handler to use for context, uses a default one if none.

Returns

def_dict – Contains all the definitions located in the file

Return type

DefDict

get_def_and_mapper_issues(error_handler, check_for_warnings=False)[source]

Returns formatted issues found with definitions and columns.

Parameters
  • error_handler (ErrorHandler) – The error handler to use

  • check_for_warnings (bool) – If True this will check for and return warnings as well

Returns

issues_list – A list of definition and mapping issues.

Return type

[{}]

get_worksheet(worksheet_name=None)[source]

Returns the requested worksheet from the workbook by name

Parameters

worksheet_name (str) – Returns the requested worksheet by name, or the first one if no name passed in.

Return type

worksheet

property has_column_names
iter_dataframe(mapper=None, return_row_dict=False, hed_ops=None, run_string_ops_on_columns=False, error_handler=None, expand_defs=False, remove_definitions=True, **kwargs)[source]

Generates a list of parsed rows based on the given column mapper.

Parameters
  • mapper (ColumnMapper) – The column name to column number mapper

  • return_row_dict (bool) – If True, this returns the full row_dict including issues. If False, returns just the HedStrings for each column

  • error_handler (ErrorHandler) – The error handler to use for context, uses a default one if none.

  • hed_ops ([func or HedOps] or func or HedOps) – A list of HedOps of funcs to apply to the hed strings before returning

  • run_string_ops_on_columns (bool) – If true, run all tag and string ops on columns, rather than columns then rows.

  • expand_defs (bool) – If True, this will expand def tags into def-expand groups

  • remove_definitions (bool) – If true, this will remove all definition tags found.

  • kwargs – See models.hed_ops.translate_ops or the specific hed_ops for additional options

Yields
  • row_number (int) – The current row number

  • row_dict (dict) – A dict containing the parsed row, including: “HED”, “column_to_hed_tags”, and possibly “column_issues”

iter_raw(hed_ops=None, error_handler=None, **kwargs)[source]

Generates an iterator that goes over every row in the file without modification.

This is primarily for altering or re-saving the original file.(eg convert short tags to long)

Parameters
  • hed_ops ([func or HedOps] or func or HedOps) – A list of HedOps of funcs to apply to the hed strings before returning

  • error_handler (ErrorHandler) – The error handler to use for context, uses a default one if none.

  • kwargs – See models.hed_ops.translate_ops or the specific hed_ops for additional options

Yields
  • row_number (int) – The current row number

  • column_to_hed_tags_dictionary (dict) – A dict with keys column_number, value the cell at that position.

property loaded_workbook
property name
reset_mapper(new_mapper)[source]

Set the column mapper to the passed in one, allowing you to view the file differently.

Parameters

new_mapper (ColumnMapper) –

set_cell(row_number, column_number, new_string_obj, include_column_prefix_if_exist=False, tag_form='short_tag')[source]
Parameters
  • row_number (int) – The row number of the spreadsheet to set

  • column_number (int) – The column number of the spreadsheet to set

  • new_string_obj (HedString) – Text to put in the given cell

  • include_column_prefix_if_exist (bool) – If true and the column matches one from mapper _column_prefix_dictionary, remove the name_prefix

  • tag_form (str) – The version of the tags we would like to use from the hed string.(short_tag, long_tag, base_tag, etc) Any attribute of a HedTag that returns a string is valid.

to_csv(file=None, output_processed_file=False)[source]

Returns the file as a csv string.

Parameters
  • file (str or file like or None) – Location to save this file. Can be file, or stream/file like.

  • output_processed_file (bool) – Replace all definitions and labels in HED columns as appropriate. Also fills in things like categories.

to_excel(file, output_processed_file=False)[source]
Parameters
  • file (str or file like) – Location to save this file. Can be file, or stream/file like.

  • output_processed_file (bool) – Replace all definitions and labels in HED columns as appropriate. Also fills in things like categories.

update_definition_mapper_with_file(def_dict)[source]

Adds label definitions gathered from the given list of inputs if this has a definition mapper.

Parameters

def_dict (DefDict) – The gathered definitions to add to the mapper.

validate_file(hed_ops, name=None, error_handler=None, check_for_warnings=True, **kwargs)[source]

Run the given hed_ops on all columns and rows of the spreadsheet

Parameters
  • hed_ops ([func or HedOps] or func or HedOps) – A list of HedOps of funcs to apply.

  • name (str) – If present, will use this as the filename for context, rather than using the actual filename Useful for temp filenames.

  • error_handler (ErrorHandler or None) – Used to report errors. Uses a default one if none passed in.

  • check_for_warnings (bool) – If True this will check for and return warnings as well

  • kwargs – See models.hed_ops.translate_ops or the specific hed_ops for additional options

Returns

validation_issues – The list of validation issues found

Return type

[{}]

property worksheet_name