Wednesday, December 29, 2010

A Vacation is good for Relationship and Professional Life

Separating work and life or enjoy the life is difficult in today's challenging world. I had vacations every year in last 14 years but never with free mind or without my Mobile or TP/Computer or emails. I never had a day in so many years where slept without any work related tension even during vacation.



This year I visited Chennai, India and first time in last 10 years without my TP / no emails. Infect kept Mobile as well on silent mode. On my return I am feeling like I got Fresh and Charged mind as it was few years ago. It gave me another perspective of life beyond office and utilize time efficiently.


I never realized how wonderful is to have a break from the daily routine, have new experiences and enjoy the time with people I like and make new friends. During this relaxed time I learnt many new things especially from my kids and their side of perspective about everything they like.

In short I decided to share this so my friends who take vacations but in reality not. They should consider at-least few days for Personal use. It will not only help them in their Day to Day "Personal Life" But also help them professionally.


I wish and hope every one gets similar time every year. Happy New Year to all.

Friday, October 15, 2010

InfoSphere Information Server 8.5 Released

Wait for a optimized and enhanced IBM InfoSphere Information Server 8.5 is finally over and IBM announced the general availability of Information Server 8.5. Before I go into deep dive of its highlights and new features, here is listing of available products and platforms available in this release.

Operating Systems AIX, Windows and Linux where it is released.

It contains InfoSphere DataStage, QualityStage, CDC, BG (Business Glossary), FT (Fast Track) and IA (Information Analyzer) apart from Workbench

InfoSphere Information Server by default support repository on DB2 9.7, SQL Server 2008, Oracle 11g and Application Server WAS version 7.0.0.11

Here is formal IBM  announcement and Other Details
What does IBM® InfoSphere™ Information Server V8.5 contains:

It provides a comprehensive data integration platform designed to simplify our clients' data integration platform while achieving new levels of integration acceleration, addressing the most demanding needs of your enterprise, and improving operational management of data integration projects.

New capabilities available in InfoSphere Information Server 8.5 include:
  • Advances in balanced optimization
  • Advanced metadata flows and automation across tools
  • Support for high availability
  • Simplified installation, configuration, and patching tools
  • Migration tooling for upgrades from earlier versions
  • New Blueprint Director capabilities to build actionable information blueprints
 I shall revert soon with more details on each of these items and more on 8.5 and its impact on Industry.
-Ritesh

 Disclaimer: "The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions." 























Tuesday, August 31, 2010

Intelligent ETL or is it Analytics and ETL

It is quite some time when I was exploring why ETL is "must have" and how other technologies around can make it more relevant to business user. How ETL and many technologies around can come together can make more sense to enterprises and also to the society as a whole. 

While doing so I came through few examples where Technology played major role in India. I focus more on India as you know its complexity, culture, multiple languages and so is festivals. 

Passport and Single UID to me are more complex than India's Moon Mission. Reason being impact and involvement of people at large spread across 28 States and 7 UT if not changed recently. It is quite some time now in India Passport Seva got started which allows passport to be issues in 3 Days. It is quite an achievement keeping complexities involved w.r.t. India and in the manner information stored in various departments. Same is the manner when we talk about single UID for all Indians. It is biggest challenge brains in the software industry facing in recent times so is various officers in the Government of India. To find a solution to such a large and complex system where even data formats are not defined how you establish the identity. Various companies put their best to come to a final possible solution. Why I am saying this? This lead me to Analytic. While digging more on "Analytic" I realized it is nothing but availability of data in some or other form and using various techniques off course intelligent ones and establish relations and then predict results. So basic requirement here is also data.

Businesses or Enterprises today implement unique solutions to protect their Customers, employees, assets or even their brand. Even Companies do background checks before they hire or issue the contract. In short primary goal is  to establish the correct identity or can say who - exactly - they are dealing with.
While doing this analysis you definitely need the data stored for Business Intelligence, Government Compliance and even future predictions. 

It forces me to think why I am saying ETL and why not Intelligent ETL  where analytic and ETL talk to each other in same process and provide comprehensive value to world while in the flow. These 2 solutions in India will definitely allow various companies to thing and focus more on "Integrations". Integration is a better approach than developing new products. Even new innovations will happen around Integrations.

I think Intelligent ETL will allow lots of improvements in available solutions for Health care and Financial System and even Insurance and Law enforcement. When I look at Enterprises Every one in-deed looking for analysis real time but also need data.

So let me dig more into Intelligent ETL and will promise to come back on this topic soon in couple of months.
I definitely come back Integration soon as looks to me I should discuss it first.

-Ritesh 

 Disclaimer: "The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions."

Friday, June 25, 2010

ETL or ELT Where to Go?

Well gone are the days when Developers fights for C vs C++ or C++ vs Java and even on compilers. As time changes we argue more on applications and what are available solution and how one is better than other available option.

Until recently transformation was supposed to be outside the data warehouses as they were not considered capable enough to handle complex and extensive business transformations and business mappings. Extract, Transform and Load or to be precise ETL was the only option explored and considered effective way to transform the data and load processed information into data warehouse.

In the ETL field multiple tools like IBM InfoSphere DataStage,
Ab Initio, Informatica Power Center and many others played a major role and established a new segment involving the data transfer between source systems and data warehouse. Everything was going fine for these products and they were competing on features, performance rather than real innovations for many years.

On a sidelines because of new vendors coming with advanced technologies combined with specific hardware a new term "push-down" came into existence. In simple terms Extract, Load and then Transform or ELT. It means push part of or all the Business Transformation to the database. Netezza changed the DW market completely. It forced people to think about innovations into current ETL model as well and opened another segment commonly known as ELT. Netezza consolidated analytic activity in the appliance, right where the data resides.

This change in the ETL segment started the debate shall I use ETL or ELT. What is better? Which tool is providing me a better options? Is DataStage providing Push Down or Informatica has better options and so on.

In this process we simply forgot the reason why we have DW. DW is meant to have a base foundation which provides me a Standard Data which I can use in my reports or analysis instead of start transforming it further. It is kind of deviating from concept of DW as then different people has different data as interpretation is different. Pushing Business Transformations or Business Logic to lower layer complicates Business intelligence (BI) as it depends on multiple resources or in-house 3rd party tools.

Netezza still lacks many features and in the time being ETL tools started some push down mechanisms. Another one which is capable of doing same is Teradata which is capable of doing large scale business transformation on the machine. Isn't IBM's Balanced Optimizer does something like that, re-arrange the queries and push-down them to DW and give a mix of ETL and ELT based on requirement, dynamically deciding what is best.

Isn't this ELT approach came much before Netezza as Before and After SQL in DataStage and similar in Infomatica allows user to execute Stored Procedures. If we are doing transformation via Stored Procedure it is nothing but ELT even though via ETL.

In ETL what we do is we do all kind of transformations before we load consumption ready data into the target. In ELT approach, we will do some transformation prior to the load, and then load it in staging tables or temporary tables. Post this execute steps to do jojns, sorting or even indexes and make data available for further consumption. It is always better to avoid any extra trip to DW but it should be optimal enough as if I am saving one trip but spending 4 times extra in doing some processing then it is not worth of saving.

ELT is definitely a possible solution when ETL becomes a bottleneck but can't be a complete solution. Push Down mechanism can be used in specific cases but not necessarily a blind go to this approach. Lets take a small example, I received customer issues with huge log files. Now it is not really possible to use some DB query and process it. Possible option is use some ETL Tool and split the file based on requirement and process it. On the other hand if I have to join to fields and get a specific outcome which further relies on aggregation of other 2 fields then definitely ELT mode works faster.

I can still achieve some amount of ELT even via DataStage push down approach but can't use Netezza or Teradata ELT capabilities to transform my log file consist of varied length data. ELT also has issue when data is coming from different sources as then I need to rely on completion before do a report. ETL vs ELT is dependant on want you need to do with the data. Person deciding should understand what DWH contains and how big the data is and what is time frame looking for processing as at end of the day every thing cost and we rely on ROI. We should explore best and use the same and if it requires combination so be it.

Please do comment and give feedback to take this discussion forward.

Disclaimer: "The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions."

Wednesday, June 23, 2010

My First Blog

Before I start blogging and exchange information let me add something about myself. I am addict to Internet and Books off course online and it is my favorite past time. In case get some free time from this watch Cartoon Channels any kind of movies where I do not have to use my brain. If get some time from these I work for some time as well :-). Short experimentation with technology is ere I spent my time when not having fun with my kids. I read multiple blogs daily and sometimes comment on them but now myself is getting into this habit and staring my own blog.

My single point agenda in my blogs is share Technology while having fun and learn from people who are reading and make comments on them. Take it forward and comeback with more.

Trying to learn and share the technology I am aware and understand through this Blog.