Information Server - IBM InfoSphere DataStage with Spark Runtime (Engine)

December 2017 saw lots of renewed focus on IBM Unified Governance and Integration  ("UG&I") with focus on modernization especially DataStage. IBM showcased and later released Data Flow Designer a Web-based Client for DataStage. It addressed major requirement from its clients, large user community of Data Engineers over years. 

December 2018 was no different with IBM bringing another addition to its "UG&I" portfolio with a focus on Spark. Recently IBM released support for Spark as an alternate Runtime for Data Flow Designer (DataStage jobs) or we can say DataStage with Spark. It means now the clients have the choice to use IBM proprietary PX Engine or Spark as a runtime (Engine) to process their data. As PX discussed at length by multiple people over years including me focusing on what current release (Spark support) brings to the table for DataStage developers and users. 
  • Data Flow Designer supports a new Job Type Spark (apart from other Job Types like Parallel, Sequencer etc. in the past). The user can use Data Flow Designer to create a DataStage Job meant for Spark runtime, compile it and even run it on Spark.
  • To use Spark user need to provide Spark Cluster details or Yarn details.
SparkYARN
  • DataStage Jobs created for Spark also visible with new Job Type in the Jobs Dashboard.
DataStage Job Yarn View

Spark Job Type
I'll share more details in my upcoming blog and maybe a Youtube Video post Think 2019 along with few relevant use cases.  I am sure IBM will release additional features supported on DataStage for Spark Runtime soon :-)

-Ritesh
Disclaimer: “The postings on this site are my own and don’t necessarily represent IBM’s positions, strategies or opinions.”