Thursday, September 29, 2011

InfoSphere DataStage and TeraData Sync Tables

Teradata sync tables are created by DataStage Teradata stages. Teradata Enterprise creates with a name terasync that is shared by all Teradata Enterprise jobs that are loading into the same database. The name of the sync table created by the Teradata Connector is supplied by the user, and that table can either be shared by other Teradata Connector jobs (with each job using a unique Sync ID key into that table) or each Teradata Connector job can have its own sync table.
These sync tables are a necessity due to requirements imposed by Teradata's parallel bulk load and export interfaces. These interfaces require a certain amount of synchronization at the start and end of a load or export and at every checkpoint in between. The interface requires a sequence of method calls to be done in lock step. After each player process has called the first method in the sequence, they cannot proceed to call the next method until all player processes have finished calling the first method. So the sync table is used as a means of communication between the player processes.

In Teradata Enterprise, you cannot avoid using the terasync table. In the Teradata Connector, you can avoid using the sync table by setting the Parallel synchronization property to No, however that stage will be forced to run sequentially in that case. 

Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions

Big Data - Changing approach towards BI and Analytics

Data...More Data and then n-times multiplication of this data. Data is growing much faster across enterprises irrespective of country or city you are in. Its growth is explosive. Digital universe expanded so is the data within organization. But all of this data is not relevant. We don't really work on this zattabyte of data for the time being  still need to deal with huge data unprecedented data growth from a wide variety of sources and systems.

In recent days a new term "big data" has emerged which describe this growth and also provided the systems and technology required to leverage it. Generally speaking, big data represents data sets that can no longer be easily managed or analyzed with traditional or common data management tools, methods and infrastructures. Big Data contains certain characteristics like high velocity, high volume and even variety of data structures. This definitely brings new challenges to data analysis, search, data integration, information discovery and exploration, reporting and system maintenance.

Here is a very nice article from Shawn Rogers which discussed various issues and available systems around this "Big Data" including Hadoop. It also discussed how it is impacting BI and Analytic.

Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions

Requirements Gathering - How Cirtical it is for a Project?

Requirements gathering is 'THE CRITICAL' foundational activity for building the data model(s) for a business intelligence environment. I do agree not all requirements can be captured and documented when we start but still a detailed requirements gathering approach is likely to yield information to help in developing the initial design. Off-course it will help to extend if additional needs are identified. It also help to map the "real requirements" and explore them instead of getting deviated with what is provided. 'Real World' is driven by perception and we have to live in this real world. 
When we want to capture the requirements we should prepare ourselves for the same. It includes understanding the Scope, business issues, collect relevant documents and 'Who & How' going to provide information.
Project Scope - a Document describe what’s need to be delivered and what can be ignored. It also mention minimal project timeline and the tentative available resources. Off-course it contains major issues and assumptions to document risks.

Business Issues - Data analyst should have in-depth understanding of information gathered and its mapping to corporate strategies. Analyst should understand what are problems which need to be addressed.

Relevant Documents - Analyst should collect samples of decision support reports and spreadsheets as a starting point or set the baseline. This can be used to identifies deficiencies in process and format. Also will provide information on available data which can be used to meet the business needs.

Who and How - Any information need to be solicited from variety of people in roles including the sponsors, steering committee members, business SME(s), business analysts and end users. Apart from these people involved in providing decision support information or people familiar with possible data sources. Post identification of these people discuss with each of the individuals or groups to collect the information in a manner they are more comfortable. Information gathering is a technique and it requires to be tailored to each participant. Need to be well planned sessions with crisp queries to avoid missing the opportunity in time allotted. But need to be flexible enough to handle situation and get answer in different format. Discussions should be documented and if any follow-up commitments should be completed.

Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions

Sunday, September 18, 2011

Cloud Computing how it is growing and its impacts

New research indicates that Enterprises are started adopting Cloud Computing and upward trends expected to continue in near future. Here is a detailed discssion on various areas where cloud computing being used extensively and discusses levels of maturity, trends and best practices in organizations’ use of business data in the cloud.
Please visit Ventana Research for details.
Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions

What does it take to Justify Data Modelling in Organization?

Data Management is the key aspect and major challenge of any organization. Professionals involved in managing data understand the negative impact of poor data architecture practices. Data Model is not a piece of design looks nice on the wall as we treat it without considering its practical implications. This situation leads to confusion, misinterpretation or assumptions, ultimately wasting time and effort and directly impacting the bottom line.
A detailed discussion by Jason Tiret on "what motivates your upper management will make data modeling justification much easier".
R-E-S-P-E-C-T for Data Modeling 
Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions