Datastage – Trim a special character

There is a field which finishes with a CR character (/n).

Following options were tried, but didn’t work.

1) Tried the following function Trim(FIELD,’/n’, ‘B’), but it doesn’t work.
2)Tried to use the Char function in order to retrieve the CR from its ASCII code and trim it after (Trimb(Trimf(Trim(NullToValue(SR_2_TR.OPCUA2,”),Char(13),’T’)))) but it doesn’t work too.

Finally, what worked –

Using Convert function in Transformer. Convert Char(13) and Char(10) in the input string to ‘’.

Datastage – Checking client connections

Command to check that no client connections are open

# netstat grep dsrpcd

This command produces no output when all dsrpcd connections are closed.

More information can be found in the Datastage Administrator’s Guide

Datastage – Operational Metadata

What is operational Metadata?
— As found on
http://publib.boulder.ibm.com/infocenter/iisinfsv/v8r0/index.jsp?topic=/com.ibm.swg.im.iis.opmeta.doc/topics/c_OM_OperationalMetadata.html

Operational metadata describes the events and processes that occur and the objects that are affected when you run a WebSphere® DataStage® and QualityStage job.
After you run a job, you can store and report on the following information about the job run:

When the job started and finished, and how long it took to run
How many rows were read, written, or referenced
Which tables or files were read from, written to, or referenced
Which stages and links were used
Whether the job failed or had warnings
The project the job was in
Any parameters used by the job
The invocation ID of the job
Any notes about running the job

Datastage – Reading Reserved characters

Set the Environment Variable DS_ENABLE_RESERVED_CHAR_CONVERT to TRUE.

It is necessary to read reserved characters Ex- Column Name – CAS# (# is a Reserved Character in Datastage)

Datastage – Trim a special character

There is a field which finishes with a CR character (/n).

Following options were tried, but didn’t work.

1) Tried the following function Trim(FIELD,’/n’, ‘B’), but it doesn’t work.
2)Tried to use the Char function in order to retrieve the CR from its ASCII code and trim it after (Trimb(Trimf(Trim(NullToValue(SR_2_TR.OPCUA2,”),Char(13),’T’)))) but it doesn’t work too.

Finally, what worked –

Using Convert function in Transformer. Convert Char(13) and Char(10) in the input string to ‘’.

Datastage Best Practices

Few are listed here taken from it-toolbox – will be listing more in later posts….

http://it.toolbox.com/blogs/datastage-journey/14-good-design-tips-in-datastage-801-38894


1) When you need to run the same sequence of jobs again and again, better create a sequencer with all the jobs that you need to run. Running this sequencer will run all the jobs. You can provide the sequence as per your requirement.

2) If you are using a copy or a filter stage either immediately after or immediately before a transformer stage, you are reducing the efficiency by using more stages because a transformer does the job of both copy stage as well as a filter stage

3) Use Sort stages instead of Remove duplicate stages. Sort stage has got more grouping options and sort indicator options.

4) Turn off Runtime Column propagation wherever it’s not required.

5) Make use of Modify, Filter, and Aggregation, Col. Generator etc stages instead of Transformer stage only if the anticipated volumes are high and performance becomes a problem. Otherwise use Transformer. It is very easy to code a transformer than a modify stage.

6)Avoid propagation of unnecessary metadata between the stages. Use Modify stage and drop the metadata. Modify stage will drop the metadata only when explicitly specified using DROP clause.

7)Add reject files wherever you need reprocessing of rejected records or you think considerable data loss may happen. Try to keep reject file at least at Sequential file stages and writing to Database stages.

8)Make use of Order By clause when a DB stage is being used in join. The intention is to make use of Database power for sorting instead of Data Stage resources. Keep the join partitioning as Auto. Indicate don’t sort option between DB stage and join stage using sort stage when using order by clause.

9)While doing Outer joins, you can make use of Dummy variables for just Null checking instead of fetching an explicit column from table.

10)Data Partitioning is very important part of Parallel job design. It’s always advisable to have the data partitioning as ‘Auto’ unless you are comfortable with partitioning, since all Data Stage stages are designed to perform in the required way with Auto partitioning.

11) Do remember that Modify drops the Metadata only when it is explicitly asked to do so using KEEP/DROP clauses.

12) Range Look-up: Range Look-up is equivalent to the operator between. Lookup against a range of values was difficult to implement in previous Data Stage versions. By having this functionality in the lookup stage, comparing a source column to a range of two lookup columns or a lookup column to a range of two source columns can be easily implemented.

13) Use a Copy stage to dump out data to intermediate peek stages or sequential debug files. Copy stages get removed during compile time so they do not increase overhead

14)Where you are using a Copy stage with a single input and a single output, you should ensure that you set the Force property in the stage editor TRUE. This prevents DataStage from deciding that the Copy operation is superfluous and optimizing it out of the job.

Datastage – Editing a ODBC Connection

‘You need to edit three files to set up the required ODBC connections. These are:
dsenv
.odbc.ini
uvodbc.config
All three are located in the $DSHOME directory. Copies of uvodbc.config are also placed in the project directories.

dsenv fileThe WebSphere DataStage server has a centralized file for storing environment variables called dsenv in $DSHOME. $DSHOME identifies the WebSphere DataStage installation directory. The default directory is /opt/IBM/InformationServer/Server/DSEngine.

.odbc.ini fileThe .odbc.ini files gives information about connecting to the database (wire protocol drivers) or the database client (non-wire protocol drivers). If your system uses a mix of drivers, your .odbc.ini file will contain a mix of entry types.

uvodbc.config fileUse the uvodbc.config file to specify the DSNs for the databases that you are connecting to through ODBC.

Check this link for more information
http://publib.boulder.ibm.com/infocenter/iisinfsv/v8r0/index.jsp?topic=/com.ibm.swg.im.iis.productization.iisinfsv.install.doc/tasks/wsisinst_Configuring_ODBC_Connections.html

This information is basically from datastage Installation guide.

I think the below points from the Parallel Job Developer’s Guide will also be relevant


You need to be running Oracle 8 or better, Enterprise Edition in order
to use the Oracle Enterprise Stage.
You must also do the following:


1) Create the user defined environment variable ORACLE_HOME and
set this to the $ORACLE_HOME path (e.g., /disk3/oracle9i).


2) Create the user defined environment variable ORACLE_SID and
set this to the correct service name (e.g., ODBCSOL).


3) Add ORACLE_HOME/bin to your PATH and ORACLE_HOME/lib to
your LIBPATH, LD_LIBRARY_PATH, or SHLIB_PATH.