What is Data Transformation - Core of ETL

Every Business Need to transform its data based on various reporting and Audit requirements. Business can't use data directly from Source and so is used ETL tools like InfoSphere DataStage. Transformation operations change one set of data values into other values when the mapping specifications are generated and run as jobs. Data transformation is a process allows you select source data through some application method, convert that data, and map the data to the format that is required by target systems. Developers manipulate this data to bring it into compliance with business, domain, and integrity rules as well as with other data within the target environment. In simple terms can say transforming data with mapping specifications.

A transformation rule is nothing but instructions provided to developer who converts them into a job defined based on the mapping specification. The transformation rules describe the current state of the information and what needs to be done to it to produce a particular result. A business analyst might add the business rules and collaborate with the developers to turn the business rules into a job. Data Types can be transformed into business standards and apply consistent representations to data, correct misspellings, and incorporate business or industry standards. 

Transformation rules can be chosen from set of rules imported to transform the source data and produce a result that the end business application might need.The result of a transformation must be a value whose type fits the type of the target object. A transformation can include the following items:
  • Converting from one data type to another, resolving inconsistencies,Converting currencies for monetary calculations, Reducing redundant or duplicated data.
A transformation can be in the form of functions, join operations, lookup statements, expressions, or annotated business rules. Transformations can be on a single column or on multiple columns. Tools like IBM InfoSphere DataStage and InfoSphere Information Server makes life easier for developers to provide canvas to map the specifications and design Jobs. InfoSphere Fast Track further speed up the process of mapping.
 
-Ritesh
Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions