Features of Datastage?

What are the main features of datastage?

DataStage has the following features to aid the design and processing required to build a data warehouse :
Features of Datastage

  1. Uses graphical design tools. With simple point and click techniques you can draw a scheme to represent your processing requirements.
  2. Extracts data from any number or types of database.
  3. Handles all the metadata definitions required to define your data warehouse. You can view and modify the table definitions at any point during the design of your application.
  4. Aggregates data. You can modify SQL SELECT statements used to extract data.
  5. Transforms data. DataStage has a set of predefined transforms and functions you can use to convert your data. You can easily extend the functionality by defining your own transforms to use.
  6. Loads the data warehouse.
Related Topics:

Datastage Scenario Based Questions with Answers

Datastage Scenario Based Questions with Answers

Datastage Scenario Based Questions with Answers

Datastage Scenario Based Questions with Answers:

Question1 : 
Input :
source,destination,km
hyd,bang,1000
delhi,chennai,1500
chennai,bang,600
bang,hyd,1000
bombay,pune,1000
bang,chennai,600
Output :
source,destination,km
hyd,bang,1000
delhi,chennai,1500
chennai,bang,600
bombay,pune,1000

see here hyd to banglore distance is 1000km
another rows is also banglore to hyd 1000km
so need to delete rows like this.

Solution :
Kindly find the solution for this problem.There are so many different ways. This is, one of the way to get the 
Source=>Sort=>RemoveDup=>Transformer=>Lookup=>Filter=>Target

Steps:

1. Read the Source Data.
2. Sort the Data on initial 2 columns.
3. Remove duplicate values (If Any) on initial 2 columns.

4. Pass 2 output links from transformer.
One Link-> Master Link
Col1=> Source
Col2=> Destination
Col3=> Distance
2nd Link => Refrence Link
Col1=> Destination
Col2=> Source
Col3=> LkpChk ( With Default Value = 1)

5. Lookup the master link with the reference link.
with Col1=Col1 and Col2=Col2
Output from lookup is
One Link-> Master Link
Col1=> Source
Col2=> Destination
Col3=> Distance
Col4=> LkpChk( If Lookup found =1 Else you will get Null)

6. Filter the data only for (LkpChk <> 1)

7. You will get your required values.


Related Topics:


20 Datastage Interview Questions for Experienced PDF Download

Datastage Interview Questions for Experienced PDF Download

Datastage Interview Questions for Experienced PDF Download
1) What is Datastage ?
  • Design jobs for Extraction, Transformation and Loading(ETL).
  • Ideal tool data integration projects such as data warehouses, data marts and system migrations
  • import, export, create and managed metadata for use within jobs.
  • Schedule, run and monitor jobs all withinDatastge
  • Adminsters your Datastage development and execution environments.
2)   How did you handle reject data?
Typically a Reject-link is defined and the rejected data is loaded back into data warehouse. So Reject link has to be defined every Output link you wish to collect rejected data. Rejected data is typically bad data like duplicates of Primary keys or null-rows where data is expected.


3)  If worked with DS6.0 and latest versions what are Link-Partitioner and Link-Collector used for?

Link Partitioner – Used for partitioning the data.
Link Collector – Used for collecting the partitioned data.[sociallocker]


4) What are OConv () and Iconv () functions and where are they used?


IConv() – Converts a string to an internal storage format

OConv() – Converts an expression to an output format.

5) What are Routines and where/how are they written and have you written any routines before?

Routines are stored in the Routines branch of the DataStage Repository, where you can create, view or edit. The following are different types of routines:


1) Transform functions


2) Before-after job subroutines


3) Job Control routines

6) How did you connect to DB2 in your last project?


Using DB2 ODBC drivers.

7)  Explain METASTAGE?

MetaStage is used to handle the Metadata which will be very useful for data lineage and data analysis later on. Meta Data defines the type of data we are handling. This Data Definitions are stored in repository and can be accessed with the use of MetaStage.

8) How do you merge two files in DS?

Either use Copy command as a Before-job subroutine if the metadata of the 2 files are same or create a job to concatenate the 2 files into one if the metadata is different.

9) Do you know about INTEGRITY/QUALITY stage?
Qulaity Stage can be integrated with DataStage, In Quality Stage we have many stages like investigate, match, survivorship like that so that we can do the Quality related works and we can integrate with datastage we need Quality stage plugin to achieve the task.

10) Explain the differences between Oracle8i/9i?

Oracle 8i does not support pseudo column sysdate but 9i supports
Oracle 8i we can create 256 columns in a table but in 9i we can upto 1000 columns(fields)

11) Question.10 What is DS Designer used for?
You use the Designer to build jobs by creating a visual design that models the flow and transformation of data from the data source through to the target warehouse. The Designer graphical interface lets you select stage icons, drop them onto the Designer work area, and add links.

12) What are conformed dimensions?

A conformed dimension is a single, coherent view of the same piece of data throughout the organization. The same dimension is used in all subsequent star schemas defined. This enables reporting across the complete data warehouse in a simple format.

13)Why fact table is in normal form?
Basically the fact table consists of the Index keys of the dimension/ook up tables and the measures. so when ever we have the keys in a table .that itself implies that the table is in the normal form.

14)What is DS Administrator used for?

The Administrator enables you to set up DataStage users, control the purging of the Repository, and, if National Language Support (NLS) is enabled, install and manage maps and locales.

15)What is DS Director used for?

datastage director is used to run the jobs and validate the jobs.we can go to datastage director from datastage designer it self.

16)What is DS Manager used for?
The Manager is a graphical tool that enables you to view and manage the contents of the DataStage Repository.

17)What is a linked cube?

A cube can be stored on a single analysis server and then defined as a linked cube on other Analysis servers. End users connected to any of these analysis servers can then access the cube. This arrangement avoids the more costly alternative of storing and maintaining copies of a cube on multiple analysis servers. linked cubes can be connected using TCP/IP or HTTP. To end users a linked cube looks like a regular cube.

18)What are Static Hash files and Dynamic Hash files?
As the names itself suggest what they mean. In general we use Type-30 dynamic Hash files. The Data file has a default size of 2Gb and the overflow file is used if the data exceeds the 2GB size.

19)What is Hash file stage and what is it used for?

Used for Look-ups. It is like a reference table. It is also used in-place of ODBC, OCI tables for better performance.

20)How are the Dimension tables designed?

Find where data for this dimension are located. Figure out how to extract this data. Determine how to maintain changes to this dimension.Change fact table and DW population routines.


Related Topics:





Datastage Scenario Based Questions - Part 3

Datastage Scenario Based Questions - Part 3


1. Consider the following product types data as the source.

Product_id, product_type
------------------------
10,          video
10,          Audio
20,          Audio
30,          Audio
40,          Audio
50,          Audio
10,          Movie
20,          Movie
30,          Movie
40,          Movie
50,          Movie
60,          Movie

Assume that there are only 3 product types are available in the source. The source contains 12 records and you dont know how many products are available in each product type.

Q1. Create a job to select 9 products in such a way that 3 products should be selected from video, 3 products should be selected from Audio and the remaining 3 products should be selected from Movie.

Q2. In the above problem Q1, if the number of products in a particular product type are less than 3, then you wont get the total 9 records in the target table. For example, see the videos type in the source data. Now design a mapping in such way that even if the number of products in a particular product type are less than 3, then you have to get those less number of records from another product types. For example: If the number of products in videos are 1, then the reamaining 2 records should come from audios or movies. So, the total number of records in the target table should always be 9.

2. Create a job to convert column data into row data.
The source data looks like

col1, col2, col3
----------------
a,    b,    c
d,    e,    f

The target table data should look like

Col
---
a
b
c
d
e
f

3. Create a job to convert row data into column data.

The source data looks like

id, value
---------
10,  a
10,  b
10,  c
20,  d
20,  e
20,  f

The target table data should look like

id, col1, col2, col3
--------------------
10,  a,   b,     c
20,  d,   e,     f

Recommended Reading

More Scenarios Based Interview Questions on Datastage at
Datastage Scenario Based Questions - Part 1
Datastage Scenario Based Questions - Part 2

Datastage Scenario Based Questions - Part 2

Datastage Scenario Based Questions - Part 2

Datastage Scenario Based Questions - Part 2

1. Consider the following employees data as source?

employee_id, salary
-------------------
10,          1000
20,          2000
30,          3000
40,          5000

Q1. Create a job to load the cumulative sum of salaries of employees into target table?
The target table data should look like as

employee_id, salary, cumulative_sum
-----------------------------------
10,          1000,   1000
20,          2000,   3000
30,          3000,   6000
40,          5000,   11000

Q2. Create a job to get the pervious row salary for the current row. If there is no pervious row exists for the current row, then the pervious row salary should be displayed as null.

The output should look like as

employee_id, salary, pre_row_salary
-----------------------------------
10,          1000,    Null
20,          2000,    1000
30,          3000,    2000
40,          5000,    3000

Q3. Create a job to get the next row salary for the current row. If there is no next row for the current row, then the next row salary should be displayed as null.

The output should look like as

employee_id, salary, next_row_salary
------------------------------------
10,          1000,       2000
20,          2000,       3000
30,          3000,       5000
40,          5000,       Null

Q4. Create a job to find the sum of salaries of all employees and this sum should repeat for all the rows.

The output should look like as

employee_id, salary, salary_sum
-------------------------------
10,          1000,   11000
20,          2000,   11000
30,          3000,   11000
40,          5000,   11000

2. Consider the following employees table as source

department_no, employee_name
----------------------------
20,            R
10,            A
10,            D
20,            P
10,            B
10,            C
20,            Q
20,            S

Q1. Create a job to load a target table with the following values from the above source?

department_no, employee_list
--------------------------------
10,            A
10,            A,B
10,            A,B,C
10,            A,B,C,D
20,            A,B,C,D,P
20,            A,B,C,D,P,Q
20,            A,B,C,D,P,Q,R
20,            A,B,C,D,P,Q,R,S

Q2. Create a job to load a target table with the following values from the above source?

department_no, employee_list
----------------------------
10,            A
10,            A,B
10,            A,B,C
10,            A,B,C,D
20,            P
20,            P,Q
20,            P,Q,R
20,            P,Q,R,S

Q3. Create a job to load a target table with the following values from the above source?

department_no, employee_names
-----------------------------
10,            A,B,C,D
20,            P,Q,R,S

Recommended Reading

More Scenarios Based Interview Questions on Datastage at
Datastage Scenario Based Questions - Part 3
Datastage Scenario Based Questions - Part 1

Datastage Scenario Based Questions and Answers for Freshers and Experienced

Datastage Scenario Based Questions and Answers for Freshers and Experienced


1. Create a job to load the first 3 records from a flat file into a target table?

2. Create a job to load the last 3 records from a flat file into a target table?

3. Create a job to load the first record from a flat file into one table A, the last record from a flat file into table B and the remaining records into table C?

4. Consider the following products data which contain duplicate records.

A
B
C
C
B
D
B

Answer the below questions

Q1. Create a job to load all unique products in one table and the duplicate rows in to another table.

The first table should contain the following output

A
D

The second target should contain the following output

B
B
B
C
C

Q2. Create a job to load each product once into one table and the remaining products which are duplicated into another table.

The first table should contain the following output

A
B
C
D

The second table should contain the following output

B
B
C

Recommended Reading

More Scenarios Based Interview Questions on Datastage at
Datastage Scenario Based Questions - Part 2
Datastage Scenario Based Questions - Part 3

10 DataStage Interview Questions and Answers - Download PDF

10 DataStage Interview Questions and Answers


1) What is Datastage ?
2) What are the differences between Datastage and Informatica
3) How do you generate Sequence number in Datastage?
4) What are the components of DataStage?
5) What is a merge in DataStage?
6) What are system variables and sequencers in DataStage
7) What is the difference between operational data stage (ODS) and data warehouse?
8) Purpose of Pivot Stage and types of containers in DataStage
9) What can we do with DataStage Director?
10) What is the difference between data warehousing and OLAP?

datastage logo png


1) What is Datastage ?


  • Design jobs for Extraction, Transformation and Loading(ETL).
  • Ideal tool data integration projects such as data warehouses, data marts and system migrations
  • import, export, create and managed metadata for use within jobs.
  • Schedule, run and monitor jobs all withinDatastge
  • Adminsters your Datastage development and execution environments.

2) What are the differences between Datastage and Informatica


  • Data stage supports parallel processing
  • Links are treated as objects in Data Stage
  • Links are port to port connectivity in Informatica
  • The implementation of Slowly Changing Dimensions is easy in Informatica which are complex in Data Stage
  • Complete error handling is not supported by informatica, which is supported Datastage.

3) How do you generate Sequence number in Datastage?


Sequence numbers can be generated in Datastage using certain routines. They are

  • KeyMgtGetNextVal
  • KeyMgtGetNextValConn

4) What are the components of DataStage?



Datastage has two components
  • Client Components – Designer, Director, Manager and Administrator
  • Server Components – Server, Repository and Plug-ins.

5) What is a merge in DataStage?



Merge in DataStage :

  • is available in both parallel an server jobs as a specific stage
  • is used for the purpose of joining two tables such as server/parallel or tables/datasets
  • is done on a key field which is mandatory in both master and update dataset

6) What are system variables and sequencers in DataStage



 System variables:

  • System variables are used to get system information
  • They are accessed by a transformer or by a routine
  • They have @ as prefix and are read only

Sequencers:

  • Sequencers allows for synchronizing the control flow of various activities in a single sequence of jobs.
  • The number of input and output triggers can be multiple

7) What is the difference between operational data stage (ODS) and data warehouse?



Data warehouse

  • It is a decision support database system for the purpose of organizational needs. 
  • It is non volatile, integrated and time variant collection of data

Operational Data Source
  • It is an integrated collection of information. 
  • It can contain 90 days of information at maximum.
  • ODS supports dynamic data.

8) Purpose of Pivot Stage and types of containers in DataStage


What is the purpose of Pivot Stage?

Pivot Stage is used to convert columns to rows

What are the types of Containers?

There are two types of containers namely, Local Container and Shared Container


9) What can we do with DataStage Director?

  • Validating
  • Scheduling
  • Executing
  • Monitoring server jobs

10) What is the difference between data warehousing and OLAP?


  • Data warehouse refers to organization and data storage from various resources
  • Data warehousing is used for data analysis. 
  • OLAP is to deal with data analysis with a software 
  • OLAP is used for managing aggregations and information partition for in-depth analysis
  • Business intelligence is performed with OLAP.