SpreadsheetInput¶
- class SpreadsheetInput(file=None, file_type=None, worksheet_name=None, tag_columns=None, has_column_names=True, column_prefix_dictionary=None, name=None)[source]
A spreadsheet of HED tags.
Methods
|
Constructor for the SpreadsheetInput class. |
|
Assembles the hed strings |
|
Get the metadata for each column |
|
Combines all columns in the given dataframe into a single HED string series, |
|
Convert all tags in underlying dataframe to the specified form. |
|
Convert all tags in underlying dataframe to long form. |
|
Convert all tags in underlying dataframe to short form. |
|
Shrinks any def-expand found in the underlying dataframe. |
|
Returns a list of column refs for this file. |
|
Returns the definition dict for this file |
|
Get the requested worksheet. |
|
Set mapper to a different view of the file. |
Replace the specified cell with transformed text. |
|
|
Shrinks any def-expand found in the underlying dataframe. |
|
Write to file or return as a string. |
|
Output to an Excel file. |
Creates a SpreadsheetValidator and returns all issues with this fil |
Attributes
|
|
|
|
|
|
|
|
Returns a list of the column names. |
|
The underlying dataframe. |
|
Return the assembled dataframe |
|
|
True if dataframe has column names. |
|
The underlying loaded workbooks. |
Name of the data. |
|
Returns the onset column if it exists |
|
Return the assembled dataframe as a series |
|
|
Return the assembled dataframe as a series, with rows that have the same onset combined |
|
The worksheet name. |
- SpreadsheetInput.__init__(file=None, file_type=None, worksheet_name=None, tag_columns=None, has_column_names=True, column_prefix_dictionary=None, name=None)[source]¶
Constructor for the SpreadsheetInput class.
- Parameters:
file (str or file like) – An xlsx/tsv file to open or a File object.
file_type (str or None) – “.xlsx” for excel, “.tsv” or “.txt” for tsv. data. If file is a string, the
worksheet_name (str or None) – The name of the Excel workbook worksheet that contains the HED tags. Not applicable to tsv files. If omitted for Excel, the first worksheet is assumed.
tag_columns (list) – A list of ints containing the columns that contain the HED tags. The default value is [1] indicating only the second column has tags.
has_column_names (bool) – True if file has column names. Validation will skip over the first line of the file if the spreadsheet as column names.
column_prefix_dictionary (dict) – Dictionary with keys that are column numbers/names and values are HED tag prefixes to prepend to the tags in that column before processing.
Notes
column_prefix_dictionary may be deprecated/renamed. These are no longer prefixes, but rather converted to value columns. eg. {“key”: “Description”, 1: “Label/”} will turn into value columns as {“key”: “Description/#”, 1: “Label/#”} It will be a validation issue if column 1 is called “key” in the above example. This means it no longer accepts anything but the value portion only in the columns.
- Raises:
file is blank
An invalid dataframe was passed with size 0
An invalid extension was provided
A duplicate or empty column name appears
OSError –
Cannot open the indicated file
KeyError –
The specified worksheet name does not exist
- SpreadsheetInput.assemble(mapper=None, skip_curly_braces=False)¶
Assembles the hed strings
- Parameters:
mapper (ColumnMapper or None) – Generally pass none here unless you want special behavior.
skip_curly_braces (bool) – If True, don’t plug in curly brace values into columns.
- Returns:
the assembled dataframe
- Return type:
Dataframe
- SpreadsheetInput.column_metadata()¶
Get the metadata for each column
- Returns:
number/ColumnMeta pairs
- Return type:
dict
- static SpreadsheetInput.combine_dataframe(dataframe)¶
- Combines all columns in the given dataframe into a single HED string series,
skipping empty columns and columns with empty strings.
- Parameters:
dataframe (Dataframe) – The dataframe to combine
- Returns:
the assembled series
- Return type:
Series
- SpreadsheetInput.convert_to_form(hed_schema, tag_form)¶
Convert all tags in underlying dataframe to the specified form.
- Parameters:
hed_schema (HedSchema) – The schema to use to convert tags.
tag_form (str) – HedTag property to convert tags to. Most cases should use convert_to_short or convert_to_long below.
- SpreadsheetInput.convert_to_long(hed_schema)¶
Convert all tags in underlying dataframe to long form.
- Parameters:
hed_schema (HedSchema or None) – The schema to use to convert tags.
- SpreadsheetInput.convert_to_short(hed_schema)¶
Convert all tags in underlying dataframe to short form.
- Parameters:
hed_schema (HedSchema) – The schema to use to convert tags.
- SpreadsheetInput.expand_defs(hed_schema, def_dict)¶
Shrinks any def-expand found in the underlying dataframe.
- Parameters:
hed_schema (HedSchema or None) – The schema to use to identify defs
def_dict (DefinitionDict) – The definitions to expand
- SpreadsheetInput.get_column_refs()¶
Returns a list of column refs for this file.
Default implementation returns none.
- Returns:
A list of unique column refs found
- Return type:
column_refs(list)
- SpreadsheetInput.get_def_dict(hed_schema, extra_def_dicts=None)¶
Returns the definition dict for this file
Note: Baseclass implementation returns just extra_def_dicts.
- Parameters:
hed_schema (HedSchema) – used to identify tags to find definitions(if needed)
extra_def_dicts (list, DefinitionDict, or None) – Extra dicts to add to the list.
- Returns:
A single definition dict representing all the data(and extra def dicts)
- Return type:
DefinitionDict
- SpreadsheetInput.get_worksheet(worksheet_name=None)¶
Get the requested worksheet.
- Parameters:
worksheet_name (str or None) – The name of the requested worksheet by name or the first one if None.
- Returns:
The workbook request.
- Return type:
openpyxl.workbook.Workbook
Notes
If None, returns the first worksheet.
- Raises:
KeyError –
The specified worksheet name does not exist
- SpreadsheetInput.reset_mapper(new_mapper)¶
Set mapper to a different view of the file.
- Parameters:
new_mapper (ColumnMapper) – A column mapper to be associated with this base input.
- SpreadsheetInput.set_cell(row_number, column_number, new_string_obj, tag_form='short_tag')¶
Replace the specified cell with transformed text.
- Parameters:
row_number (int) – The row number of the spreadsheet to set.
column_number (int) – The column number of the spreadsheet to set.
new_string_obj (HedString) – Object with text to put in the given cell.
tag_form (str) – Version of the tags (short_tag, long_tag, base_tag, etc)
Notes
Any attribute of a HedTag that returns a string is a valid value of tag_form.
- Raises:
ValueError –
There is not a loaded dataframe
KeyError –
the indicated row/column does not exist
AttributeError –
The indicated tag_form is not an attribute of HedTag
- SpreadsheetInput.shrink_defs(hed_schema)¶
Shrinks any def-expand found in the underlying dataframe.
- Parameters:
hed_schema (HedSchema or None) – The schema to use to identify defs
- SpreadsheetInput.to_csv(file=None)¶
Write to file or return as a string.
- Parameters:
file (str, file-like, or None) – Location to save this file. If None, return as string.
- Returns:
None if file is given or the contents as a str if file is None.
- Return type:
None or str
- Raises:
OSError –
Cannot open the indicated file
- SpreadsheetInput.to_excel(file)¶
Output to an Excel file.
- Parameters:
file (str or file-like) – Location to save this base input.
- Raises:
ValueError –
if empty file object was passed
OSError –
Cannot open the indicated file
- SpreadsheetInput.validate(hed_schema, extra_def_dicts=None, name=None, error_handler=None)¶
Creates a SpreadsheetValidator and returns all issues with this fil
- Parameters:
hed_schema (HedSchema) – The schema to use for validation
extra_def_dicts (list of DefDict or DefDict) – all definitions to use for validation
name (str) – The name to report errors from this file as
error_handler (ErrorHandler) – Error context to use. Creates a new one if None
- Returns:
A list of issues for hed string
- Return type:
issues (list of dict)
- SpreadsheetInput.COMMA_DELIMITER = ','¶
- SpreadsheetInput.EXCEL_EXTENSION = ['.xlsx']¶
- SpreadsheetInput.FILE_EXTENSION = ['.tsv', '.txt', '.xlsx']¶
- SpreadsheetInput.FILE_INPUT = 'file'¶
- SpreadsheetInput.STRING_INPUT = 'string'¶
- SpreadsheetInput.TAB_DELIMITER = '\t'¶
- SpreadsheetInput.TEXT_EXTENSION = ['.tsv', '.txt']¶
- SpreadsheetInput.columns¶
Returns a list of the column names.
Empty if no column names.
- Returns:
the column names
- Return type:
columns(list)
- SpreadsheetInput.dataframe¶
The underlying dataframe.
- SpreadsheetInput.dataframe_a¶
- Return the assembled dataframe
Probably a placeholder name.
- Returns:
the assembled dataframe
- Return type:
Dataframe
- SpreadsheetInput.has_column_names¶
True if dataframe has column names.
- SpreadsheetInput.loaded_workbook¶
The underlying loaded workbooks.
- SpreadsheetInput.name¶
Name of the data.
- SpreadsheetInput.onsets¶
Returns the onset column if it exists
- SpreadsheetInput.series_a¶
Return the assembled dataframe as a series
- Returns:
the assembled dataframe with columns merged
- Return type:
Series
- SpreadsheetInput.series_filtered¶
Return the assembled dataframe as a series, with rows that have the same onset combined
- Returns:
the assembled dataframe with columns merged, and the rows filtered together
- Return type:
Series
- SpreadsheetInput.worksheet_name¶
The worksheet name.