Wednesday, June 29, 2011

What is Data Transformation - Core of ETL

Every Business Need to transform its data based on various reporting and Audit requirements. Business can't use data directly from Source and so is used ETL tools like InfoSphere DataStage. Transformation operations change one set of data values into other values when the mapping specifications are generated and run as jobs. Data transformation is a process allows you select source data through some application method, convert that data, and map the data to the format that is required by target systems. Developers manipulate this data to bring it into compliance with business, domain, and integrity rules as well as with other data within the target environment. In simple terms can say transforming data with mapping specifications.

A transformation rule is nothing but instructions provided to developer who converts them into a job defined based on the mapping specification. The transformation rules describe the current state of the information and what needs to be done to it to produce a particular result. A business analyst might add the business rules and collaborate with the developers to turn the business rules into a job. Data Types can be transformed into business standards and apply consistent representations to data, correct misspellings, and incorporate business or industry standards. 

Transformation rules can be chosen from set of rules imported to transform the source data and produce a result that the end business application might need.The result of a transformation must be a value whose type fits the type of the target object. A transformation can include the following items:
  • Converting from one data type to another, resolving inconsistencies,Converting currencies for monetary calculations, Reducing redundant or duplicated data.
A transformation can be in the form of functions, join operations, lookup statements, expressions, or annotated business rules. Transformations can be on a single column or on multiple columns. Tools like IBM InfoSphere DataStage and InfoSphere Information Server makes life easier for developers to provide canvas to map the specifications and design Jobs. InfoSphere Fast Track further speed up the process of mapping.
 
-Ritesh
Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions

Thursday, June 16, 2011

A century of IBM: Technology pioneer continues to ‘Think’

Happy Birthday IBM.
From June 16, 1911 where companies that made scales, punch-clocks for work and other machines merged to form the Computing Tabulating Recording Co to be renamed as IBM in 1924 and to be our own Watson in 2010. IBM came long way and hence we always simplify IBM by saying "Think" - "Click for IBM 1st".

IBM started way back making sense of millions of punch card records and sees future innovations in the analysis of the billions and billions of bits of data being transmitted in the 21st century. So those Inventions by IBM during its early days are key to future Generations and Trends. Data from multiple sources, Integration of Data, real-time data processing and Business Analytic all have base in those Initial Inventions.

As Watson getting used in real-world to use as a medical diagnostic tool that can understand plain language and analyze mountains of information, we can understand where IBM Thinking and Technology going. Based on this we can say IBM and other Technology companies focusing on "Data" and its Analytic. As Data or say Huge Data in raw form need to be processed. And its analysis going to generate many new businesses in future.

-Ritesh
Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions





 

Wednesday, June 15, 2011

From ETL to T-ETL - Its Advantages

In June, 2010 I started my blogging with ETL & ELT. Today I am taking it to next level and see T-ETL. What it is and how it can benefit the Enterprises.
Federated Approach use traditional method of data consolidation. Consolidated data stores, which are typically managed to extract, transform, load (ETL) or replicate data, are becoming standard choice for information integration today. In today's world ETL Tools and in certain cases Data Stores and Streaming data together are becoming best way to achieve fast, highly available, and integrated access to related information. By combining data consolidation with federation, businesses achieve the flexibility and responsiveness that is required in today's fast paced environment.

What we achieve if we integrate InfoSphere DataStage and InfoSphere Federation Server to perform data consolidation. On its integration InfoSphere Federation Server can be used as data pre-processor or performing initial transformations on the Data either on Source or on Data Extraction Piece. It means we are introducing Transformation before real ETL and is named as T-ETL. The T-ETL architecture can use federation to join, aggregate, and filter data before it enters InfoSphere DataStage, which can use its parallel engine to perform more complex transformations and the maintenance of the target.
The architecture draws on the strengths of both products, producing a flexible and highly efficient solution for data consolidation; WebSphere Federation Server for its joining and SQL processing capabilities, and WebSphere DataStage for its parallel data flow and powerful transformation logic. The WebSphere Federation Server cost-based optimizer also allows the T- ETL architecture to dynamically react to changes in data volumes and patterns, without the need to modify the job.

Transformation followed by ETL (T-ETL) is not a new concept and is as old as ETL and ELT. Many ETL jobs already employ some form of transformation while extracting the data, say filtering and aggregating data, or performing a join between two source tables, which reside on the same source database.  Only restriction that the source objects must exist on the same data source has severely limited the scope of T-ETL solutions to date. InfoSphere Federation Server removes this limitation and extends this initial transformation stage to heterogeneous data sources that are supported by InfoSphere Federation Server.
-Ritesh
Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions

Why Use InfoSphere DataStage for ETL Processes

IBM InfoSphere Information Server is a suite or can say an umbrella of multiple integrated solutions with features of Profiling, Cleansing, Extraction, Transformation, De-Identification and Loading. InfoSphere DataStage is an ETL tool and part of the IBM Information Platforms Solutions suite. It uses a graphical notation to construct data integration solutions and make life easier for the ETL process developer.
IBM InfoSphere DataStage provides a series of benefits like

Flexible Development Environment to ETL Process Developers. With its Feature Rich Designer developer can develop their processes in their desired manner and can even plan components which can be reused. Feature Like multiple instance of single process allows to share and remove redundancy of processes across enterprise. ETL developer can perform the data integrations process quickly and even can mane use of extensible objects and functions apart from implementing customized functions and use them.

With InfoSphere DataStage ETL developer can not only retrieve data from heterogeneous applications but also can join data at source level or at DataStage level and apply any business transformation rule from within a designer without having to write any procedural code. 

With the introduction of Information Server, common data infrastructure used for data movement and data quality (metadata repository, parallel processing framework, development environment) and provide a complete Data Lineage.Off course all this along with capability of executing the ETL process in parallel mode with unlimited scalability and maximum utilization of hardware resources.

-Ritesh
 Disclaimer: "The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions.

IBM's Centennial and Community experience by ISL Hyderabad

It is a proud moment for IBMers as they are celebrating 100 Yrs of IBM's vision. IBMers world wide doing their bit to help the local community and go beyond their regular hectic Job Routines. It is true that in the Age of the Modern Corporation Community Service is must and every one need to push the boundaries of science and technology, along with the responsibility towards IBM and make their communities successful.

Today Technical Community @Software Laboratories Hyderabad celebrating the freedom to work and decided to take Community Experience. Technology Experts experiencing a new challenge today while  working with or I should say mentoring young Kids of various Schools in Hyderabad including few in Interiors. It is a 'real time' Challenge and completely different experience for these Experts where they share their ideas with Children. It will be real fun to see how these experts convenience and explain things to our Next Generation Kids who consider technology from different perspective. From my perspective It is real learning exercise for these experts than really to the young kids. As kids are real innovators in all fields. They knew how to handle problems with multiple tries and without taking tension.
Let me come back and share the experience and also share more as I am sure it is going to be real fun.
Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, 
strategies or opinions