Defining Transformation

how to define and run transformation

Transformation Parameters (or transform_params) is the way you define a set of transformations (LLM or otherwise) that you want to do on your data.

transform_params Overview

The transform_params object contains configuration settings for performing data transformations using a specific type of Language Learning Models (LLMs). Below are the detailed parameters included in transform_params:

model

  • Type: string
  • Description: Specifies the type of LLMs engine to use for the transformation. Options include "trellis-premium" , "trellis-enterprise"and trellis-scale, each indicating a different level of speed and accuracy.

mode

  • Type: string
  • Description: Method of processing the data. The default here should be document

operations

  • Type: list of objects
  • Description: This parameter defines a list of operations to be carried out on the dataset. Each operation is detailed by an object that includes information about the target column, the data type of that column, the type of transformation to apply, and a description of the task.

Each object within the operations list encompasses the following parameters:

a. column_name

  • Type: string
  • Description: Names the column in the dataset on which the operation will be executed. It identifies the specific data point that will undergo transformation or extraction.

b. column_type

  • Type: string
  • Description: Indicates the data type of the target column, adhering to all PostgreSQL data types as documented in the PostgreSQL documentation (https://www.postgresql.org/docs/current/datatype.html). Valid types include, but are not limited to, text for string data, text[] for arrays of text, numeric for numerical data, and date for date values.

c. transform_type

  • Type: string
  • Description: Describes the transformation or extraction method to be applied to the data in the target column. The term "extraction" suggests that the operation aims to retrieve specific pieces of data from the column.

d. task_description

  • Type: string
  • Description: Provides a clear, human-readable explanation of what the operation seeks to achieve. Examples include extracting URLs from text data, where the description would outline the purpose of extracting such information.

Here're an example of a transformation_params

{
 "model": "trellis-premium",
 "mode": "document",
"operations": 
[ {
            "column_name": "names_list",
            "column_type": "text[]",
            "transform_type": "extraction",
            "task_description": "List of person names mentioned in the email",
            "output_values": "numer
        },
{
            "column_name": "is_investor",
            "column_type": "text",
            "transform_type": "classification",
            "task_description": "Is the email from an investor?",
            "output_values": {"yes":"this email is from investor", "no":"this email is not from investor"}
        },
]
}

When you're done with defining the transformation you can go to Initiate transforms to kick-off the transformation run.