Features of Datastage?

What are the main features of datastage?

DataStage has the following features to aid the design and processing required to build a data warehouse :
Features of Datastage

  1. Uses graphical design tools. With simple point and click techniques you can draw a scheme to represent your processing requirements.
  2. Extracts data from any number or types of database.
  3. Handles all the metadata definitions required to define your data warehouse. You can view and modify the table definitions at any point during the design of your application.
  4. Aggregates data. You can modify SQL SELECT statements used to extract data.
  5. Transforms data. DataStage has a set of predefined transforms and functions you can use to convert your data. You can easily extend the functionality by defining your own transforms to use.
  6. Loads the data warehouse.
Related Topics:

Datastage Scenario Based Questions with Answers

Datastage Scenario Based Questions with Answers

Datastage Scenario Based Questions with Answers

Datastage Scenario Based Questions with Answers:

Question1 : 
Input :
source,destination,km
hyd,bang,1000
delhi,chennai,1500
chennai,bang,600
bang,hyd,1000
bombay,pune,1000
bang,chennai,600
Output :
source,destination,km
hyd,bang,1000
delhi,chennai,1500
chennai,bang,600
bombay,pune,1000

see here hyd to banglore distance is 1000km
another rows is also banglore to hyd 1000km
so need to delete rows like this.

Solution :
Kindly find the solution for this problem.There are so many different ways. This is, one of the way to get the 
Source=>Sort=>RemoveDup=>Transformer=>Lookup=>Filter=>Target

Steps:

1. Read the Source Data.
2. Sort the Data on initial 2 columns.
3. Remove duplicate values (If Any) on initial 2 columns.

4. Pass 2 output links from transformer.
One Link-> Master Link
Col1=> Source
Col2=> Destination
Col3=> Distance
2nd Link => Refrence Link
Col1=> Destination
Col2=> Source
Col3=> LkpChk ( With Default Value = 1)

5. Lookup the master link with the reference link.
with Col1=Col1 and Col2=Col2
Output from lookup is
One Link-> Master Link
Col1=> Source
Col2=> Destination
Col3=> Distance
Col4=> LkpChk( If Lookup found =1 Else you will get Null)

6. Filter the data only for (LkpChk <> 1)

7. You will get your required values.


Related Topics:


20 Datastage Interview Questions for Experienced PDF Download

Datastage Interview Questions for Experienced PDF Download

Datastage Interview Questions for Experienced PDF Download
1) What is Datastage ?
  • Design jobs for Extraction, Transformation and Loading(ETL).
  • Ideal tool data integration projects such as data warehouses, data marts and system migrations
  • import, export, create and managed metadata for use within jobs.
  • Schedule, run and monitor jobs all withinDatastge
  • Adminsters your Datastage development and execution environments.
2)   How did you handle reject data?
Typically a Reject-link is defined and the rejected data is loaded back into data warehouse. So Reject link has to be defined every Output link you wish to collect rejected data. Rejected data is typically bad data like duplicates of Primary keys or null-rows where data is expected.


3)  If worked with DS6.0 and latest versions what are Link-Partitioner and Link-Collector used for?

Link Partitioner – Used for partitioning the data.
Link Collector – Used for collecting the partitioned data.[sociallocker]


4) What are OConv () and Iconv () functions and where are they used?


IConv() – Converts a string to an internal storage format

OConv() – Converts an expression to an output format.

5) What are Routines and where/how are they written and have you written any routines before?

Routines are stored in the Routines branch of the DataStage Repository, where you can create, view or edit. The following are different types of routines:


1) Transform functions


2) Before-after job subroutines


3) Job Control routines

6) How did you connect to DB2 in your last project?


Using DB2 ODBC drivers.

7)  Explain METASTAGE?

MetaStage is used to handle the Metadata which will be very useful for data lineage and data analysis later on. Meta Data defines the type of data we are handling. This Data Definitions are stored in repository and can be accessed with the use of MetaStage.

8) How do you merge two files in DS?

Either use Copy command as a Before-job subroutine if the metadata of the 2 files are same or create a job to concatenate the 2 files into one if the metadata is different.

9) Do you know about INTEGRITY/QUALITY stage?
Qulaity Stage can be integrated with DataStage, In Quality Stage we have many stages like investigate, match, survivorship like that so that we can do the Quality related works and we can integrate with datastage we need Quality stage plugin to achieve the task.

10) Explain the differences between Oracle8i/9i?

Oracle 8i does not support pseudo column sysdate but 9i supports
Oracle 8i we can create 256 columns in a table but in 9i we can upto 1000 columns(fields)

11) Question.10 What is DS Designer used for?
You use the Designer to build jobs by creating a visual design that models the flow and transformation of data from the data source through to the target warehouse. The Designer graphical interface lets you select stage icons, drop them onto the Designer work area, and add links.

12) What are conformed dimensions?

A conformed dimension is a single, coherent view of the same piece of data throughout the organization. The same dimension is used in all subsequent star schemas defined. This enables reporting across the complete data warehouse in a simple format.

13)Why fact table is in normal form?
Basically the fact table consists of the Index keys of the dimension/ook up tables and the measures. so when ever we have the keys in a table .that itself implies that the table is in the normal form.

14)What is DS Administrator used for?

The Administrator enables you to set up DataStage users, control the purging of the Repository, and, if National Language Support (NLS) is enabled, install and manage maps and locales.

15)What is DS Director used for?

datastage director is used to run the jobs and validate the jobs.we can go to datastage director from datastage designer it self.

16)What is DS Manager used for?
The Manager is a graphical tool that enables you to view and manage the contents of the DataStage Repository.

17)What is a linked cube?

A cube can be stored on a single analysis server and then defined as a linked cube on other Analysis servers. End users connected to any of these analysis servers can then access the cube. This arrangement avoids the more costly alternative of storing and maintaining copies of a cube on multiple analysis servers. linked cubes can be connected using TCP/IP or HTTP. To end users a linked cube looks like a regular cube.

18)What are Static Hash files and Dynamic Hash files?
As the names itself suggest what they mean. In general we use Type-30 dynamic Hash files. The Data file has a default size of 2Gb and the overflow file is used if the data exceeds the 2GB size.

19)What is Hash file stage and what is it used for?

Used for Look-ups. It is like a reference table. It is also used in-place of ODBC, OCI tables for better performance.

20)How are the Dimension tables designed?

Find where data for this dimension are located. Figure out how to extract this data. Determine how to maintain changes to this dimension.Change fact table and DW population routines.


Related Topics:





Datastage Scenario Based Questions - Part 3

Datastage Scenario Based Questions - Part 3


1. Consider the following product types data as the source.

Product_id, product_type
------------------------
10,          video
10,          Audio
20,          Audio
30,          Audio
40,          Audio
50,          Audio
10,          Movie
20,          Movie
30,          Movie
40,          Movie
50,          Movie
60,          Movie

Assume that there are only 3 product types are available in the source. The source contains 12 records and you dont know how many products are available in each product type.

Q1. Create a job to select 9 products in such a way that 3 products should be selected from video, 3 products should be selected from Audio and the remaining 3 products should be selected from Movie.

Q2. In the above problem Q1, if the number of products in a particular product type are less than 3, then you wont get the total 9 records in the target table. For example, see the videos type in the source data. Now design a mapping in such way that even if the number of products in a particular product type are less than 3, then you have to get those less number of records from another product types. For example: If the number of products in videos are 1, then the reamaining 2 records should come from audios or movies. So, the total number of records in the target table should always be 9.

2. Create a job to convert column data into row data.
The source data looks like

col1, col2, col3
----------------
a,    b,    c
d,    e,    f

The target table data should look like

Col
---
a
b
c
d
e
f

3. Create a job to convert row data into column data.

The source data looks like

id, value
---------
10,  a
10,  b
10,  c
20,  d
20,  e
20,  f

The target table data should look like

id, col1, col2, col3
--------------------
10,  a,   b,     c
20,  d,   e,     f

Recommended Reading

More Scenarios Based Interview Questions on Datastage at
Datastage Scenario Based Questions - Part 1
Datastage Scenario Based Questions - Part 2

Datastage Scenario Based Questions - Part 2

Datastage Scenario Based Questions - Part 2

Datastage Scenario Based Questions - Part 2

1. Consider the following employees data as source?

employee_id, salary
-------------------
10,          1000
20,          2000
30,          3000
40,          5000

Q1. Create a job to load the cumulative sum of salaries of employees into target table?
The target table data should look like as

employee_id, salary, cumulative_sum
-----------------------------------
10,          1000,   1000
20,          2000,   3000
30,          3000,   6000
40,          5000,   11000

Q2. Create a job to get the pervious row salary for the current row. If there is no pervious row exists for the current row, then the pervious row salary should be displayed as null.

The output should look like as

employee_id, salary, pre_row_salary
-----------------------------------
10,          1000,    Null
20,          2000,    1000
30,          3000,    2000
40,          5000,    3000

Q3. Create a job to get the next row salary for the current row. If there is no next row for the current row, then the next row salary should be displayed as null.

The output should look like as

employee_id, salary, next_row_salary
------------------------------------
10,          1000,       2000
20,          2000,       3000
30,          3000,       5000
40,          5000,       Null

Q4. Create a job to find the sum of salaries of all employees and this sum should repeat for all the rows.

The output should look like as

employee_id, salary, salary_sum
-------------------------------
10,          1000,   11000
20,          2000,   11000
30,          3000,   11000
40,          5000,   11000

2. Consider the following employees table as source

department_no, employee_name
----------------------------
20,            R
10,            A
10,            D
20,            P
10,            B
10,            C
20,            Q
20,            S

Q1. Create a job to load a target table with the following values from the above source?

department_no, employee_list
--------------------------------
10,            A
10,            A,B
10,            A,B,C
10,            A,B,C,D
20,            A,B,C,D,P
20,            A,B,C,D,P,Q
20,            A,B,C,D,P,Q,R
20,            A,B,C,D,P,Q,R,S

Q2. Create a job to load a target table with the following values from the above source?

department_no, employee_list
----------------------------
10,            A
10,            A,B
10,            A,B,C
10,            A,B,C,D
20,            P
20,            P,Q
20,            P,Q,R
20,            P,Q,R,S

Q3. Create a job to load a target table with the following values from the above source?

department_no, employee_names
-----------------------------
10,            A,B,C,D
20,            P,Q,R,S

Recommended Reading

More Scenarios Based Interview Questions on Datastage at
Datastage Scenario Based Questions - Part 3
Datastage Scenario Based Questions - Part 1

Datastage Scenario Based Questions and Answers for Freshers and Experienced

Datastage Scenario Based Questions and Answers for Freshers and Experienced


1. Create a job to load the first 3 records from a flat file into a target table?

2. Create a job to load the last 3 records from a flat file into a target table?

3. Create a job to load the first record from a flat file into one table A, the last record from a flat file into table B and the remaining records into table C?

4. Consider the following products data which contain duplicate records.

A
B
C
C
B
D
B

Answer the below questions

Q1. Create a job to load all unique products in one table and the duplicate rows in to another table.

The first table should contain the following output

A
D

The second target should contain the following output

B
B
B
C
C

Q2. Create a job to load each product once into one table and the remaining products which are duplicated into another table.

The first table should contain the following output

A
B
C
D

The second table should contain the following output

B
B
C

Recommended Reading

More Scenarios Based Interview Questions on Datastage at
Datastage Scenario Based Questions - Part 2
Datastage Scenario Based Questions - Part 3

What is Partition in Oracle 11g with Example

What is Partition in Oracle with Example





PARTITIONS

Partitioning is powerful functionality that allows tables, indexes, and index-organized tables to be subdivided into smaller pieces, enabling these database objects to be managed and accessed at a finer level of granularity. Oracle provides a comprehensive range of partitioning schemes to address every business requirement. Moreover, since it is entirely transparent in SQL statements, partitioning can be used with any application, from packaged OLTP applications to data warehouses.


When to Partition a Table??
  • Tables greater than 2 GB should always be considered as candidates for partitioning.
  • Tables containing historical data, in which new data is added into the newest partition. A typical example is a historical table where only the current month's data is updatable and the other 11 months are read only.
  • When the contents of a table need to be distributed across different types of storage devices.
TYPES
1     Range partitions
2     List partitions
3     Hash partitions
4     Sub partitions
  
ADVANTAGES OF PARTITIONS
  •  Reducing downtime for scheduled maintenance, which allows maintenance operations to be carried out on selected partitions while other partitions are available to users.
  • Reducing downtime due to data failure, failure of a particular partition will no way affect other partitions.
  • Partition independence allows for concurrent use of the various partitions for various purposes.
 What is the advantage of partitions, by storing them in different Tablespaces??
1     Reduces the possibility of data corruption in multiple partitions.
2     Back up and recovery of each partition can be done independently.
Partitioning Key
Each row in a partitioned table is unambiguously assigned to a single partition. The partitioning key is comprised of one or more columns that determine the partition where each row will be stored


1.RANGE PARTITIONS

Definition: A table that is partitioned by range is partitioned in such a way that each partition contains rows for which the partitioning expression value lies within a given range.

 Creating range partitioned table
 SQL> Create table Employee(emp_no number(2),emp_name varchar(2)) partition by range(emp_no) (partition p1 values less than(100), partition p2 values less than(200), partition p3 values less than(300),partition p4 values less than(maxvalue)); 
Inserting records into range partitioned table
     SQL> Insert into Employee values(101,’a’);      -- this will go to p1
     SQL> Insert into Employee values(201,’b’);     -- this will go to p2
     SQL> Insert into Employee values(301,’c’);      -- this will go to p3
     SQL> Insert into Employee values(401,’d’);     -- this will go to p4
 Selecting records from range partitioned table
     SQL> Select *from Employee;
     SQL> Select *from Employee partition(p1);
 Adding a partition
     SQL> Alter table Employee add partition p5 values less than(400);
 Dropping a partition
    SQL> Alter table Employee drop partition p1;
 Renaming a partition
     SQL> Alter table Employee rename partition p3 to p6;
 Truncate a partition
     SQL> Alter table Employee truncate partition p5;
 Splitting a partition
    SQL> Alter table Employee split partition p2 at(120) into (partition p21,partition p22);
 Exchanging a partition
  SQL> Alter table Employee exchange partition p2 with table Employee_x;
 Moving a partition
     SQL> Alter table Employee move partition p21 tablespace ABC_TBS;

 

 2. LIST PARTITIONS

Definition: List partitioning enables you to explicitly control how rows map to partitions by specifying a list of discrete values for the partitioning key in the description for each partition.
 Creating list partitioned table
SQL> Create table Employee (Emp_no number(2),Emp_name varchar(2)) partition by list(Emp_no)  (partition p1 values(1,2,3,4,5), partition p2 values(6,7,8,9,10),partition p3            values(11,12,13,14,15), partition p4 values(16,17,18,19,20));
 Inserting records into list partitioned table
      SQL> Insert into Employee values(4,’xxx’);     -- this will go to p1
      SQL> Insert into Employee values(8,’yyy’);     -- this will go to p2
      SQL> Insert into Employee values(14,’zzz’);    -- this will go to p3
      SQL> Insert into Employee values(19,’bbb’);  -- this will go to p4
 Selecting records from list partitioned table
     SQL> Select *from Employee;
     SQL> Select *from Employee partition(p1);
 Adding a partition
     SQL> Alter table Employee add partition p5 values(21,22,23,24,25);
 Dropping a partition
     SQL> Alter table Employee drop partition p5;
 Renaming a partition
     SQL> Alter table Employee rename partition p5to p1;
 Truncate a partition
     SQL> Alter table Employee truncate partition p5;
 Exchanging a partition
    SQL> Alter table Employee exchange partition p1 with table Employee_x;
 Moving a partition
    SQL> Alter table Employee move partition p2 tablespace ABC_TBS;

 

3. HASH PARTITIONS

Definition:Hash partitioning maps data to partitions based on a hashing algorithm that Oracle applies to the partitioning key that you identify.
Creating hash partitioned table
     SQL> Create table Employee(emp_no number(2),emp_name varchar(2)) partition by      hash(emp_no) partitions 5;
     Here oracle automatically gives partition names like
                                                SYS_P1
                                                SYS_P2
                                                SYS_P3
                                                SYS_P4
                                                SYS_P5
 Inserting records into hash partitioned table(based on hash function)
     SQL> Insert into Employee values(5,’a’);      
     SQL> Insert into Employee values(8,’b’);      
     SQL> Insert into Employee values(14,’c’);    
     SQL> Insert into Employee values(19,’d’);   
 Selecting records from hash partitioned table
     SQL> Select *from Employee;
     SQL> Select *from Employee partition(SYS_P2);
 Adding a partition
     SQL> Alter table Employee add partition p9;
 Renaming a partition
    SQL> Alter table Employee rename partition p9 to p10;
 Truncate a partition
     SQL> Alter table Employee truncate partition p9;
 Exchanging a partition
 SQL> Alter table Employee exchange partition SYS_P1 with table Employee_X;
 Moving a partition
     SQL> Alter table Employee move partition SYS_P1 tablespace ABC_TBS;

Related Topics:

Difference between Informatica and Datastage