RemapColumnsOp¶
- class RemapColumnsOp(parameters)[source]
Map values in m columns in a columnar file into a new combinations in n columns.
- Required remodeling parameters:
source_columns (list): The key columns to map (m key columns).
destination_columns (list): The destination columns to have the mapped values (n destination columns).
map_list (list): A list of lists with the mapping.
ignore_missing (bool): If True, entries whose key column values are not in map_list are ignored.
- Optional remodeling parameters:
integer_sources (list): Source columns that should be treated as integers rather than strings.
Notes
Each list element list is of length m + n with the key columns followed by mapped columns.
TODO: Allow wildcards
Methods
|
Constructor for the remap columns operation. |
|
Remap new columns from combinations of others. |
|
Validates whether operation parameters meet op-specific criteria beyond that captured in json schema. |
Attributes
- RemapColumnsOp.__init__(parameters)[source]¶
Constructor for the remap columns operation.
- Parameters:
parameters (dict) – Parameter values for required and optional parameters.
- RemapColumnsOp.do_op(dispatcher, df, name, sidecar=None)[source]¶
Remap new columns from combinations of others.
- Parameters:
dispatcher (Dispatcher) – Manages the operation I/O.
df (DataFrame) – The DataFrame to be remodeled.
name (str) – Unique identifier for the dataframe – often the original file path.
sidecar (Sidecar or file-like) – Not needed for this operation.
- Returns:
A new dataframe after processing.
- Return type:
Dataframe
- Raises:
ValueError –
If ignore_missing is False and source values from the data are not in the map.
- static RemapColumnsOp.validate_input_data(parameters)[source]¶
Validates whether operation parameters meet op-specific criteria beyond that captured in json schema.
Example: A check to see whether two input arrays are the same length.
- Notes: The minimum implementation should return an empty list to indicate no errors were found.
If additional validation is necessary, method should perform the validation and return a list with user-friendly error strings.
- RemapColumnsOp.NAME = 'remap_columns'¶
- RemapColumnsOp.PARAMS = {'additionalProperties': False, 'properties': {'destination_columns': {'description': 'The columns to insert new values based on a key lookup of the source columns.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array'}, 'ignore_missing': {'description': 'If true, insert missing source columns in the result, filled with n/a, else error.', 'type': 'boolean'}, 'integer_sources': {'description': 'A list of source column names whose values are to be treated as integers.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'map_list': {'description': 'An array of k lists each with m+n entries corresponding to the k unique keys.', 'items': {'items': {'type': ['string', 'number']}, 'minItems': 1, 'type': 'array'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'source_columns': {'description': 'The columns whose values are combined to provide the remap keys.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array'}}, 'required': ['source_columns', 'destination_columns', 'map_list', 'ignore_missing'], 'type': 'object'}¶