Archive for 2011

Alternatives for ODIWaitForLogData

While implementing Changed Data Capture in our environment we had a trouble getting OdiWaitForLogData to work. Changes in the Journalized table were not exiting the OdiWaitForLogData step. We came up with the following alternative to to do a near real time Data Capture.

We used OdiWaitForData tool instead of OdiWaitForLogData.

Configure the OdiWaitForData following the below guidelines for the parameters mentioned:
  1. Table Name: Wait on Data on Table J$<SourceTableNAme> : This is the table which records the changes, created as a part of Journalizing infrastructure. It is created in the work schema of the source table's physical schema.  NOTE: Do not use <%odiRef.getObjectName()%> function to get the object name it does not work for this tool. Please use the complete physical table name.
  2. Logical Schema: Is the schema where this table is present. This is the work schema of the Physical Schema where source table is present
  3. SQL Filter: Will contain information about the subscriber you are consuming. 
    1. For example, you are subscribing to the changes tagged by subscriber 'ChampsLoan2CFDR' we should add the SQL statement JRN_SUBSCRIBER = 'ChampsLoan2CFDR'. If you are consuming changes by this subscriber, you should add this filter, if you have only one subscriber you can ignore this part
    2. If you want to trigger only on Insert changes recorded by  'ChampsLoan2CFDR'. The filter will be JRN_SUBSCRIBER = 'ChampsLoan2CFDR' and JRN_FLAG = 'I'
    3. Each time record consumed  record is tagged with 1 and non-consumed records are tagged with  0 (at the time of insertion into the table). We must wait on only non-consumed records. So its safe to add another condition: JRN_SUBSCRIBER = 'ChampsLoan2CFDR' and JRN_FLAG = 'I' and JRN_CONSUMED = '0'
  4. Object Type: Please follow the instructions T for table.
  5. Increment Detection: No, for performance related purpose.
Rest of the parameters can be configured according to your requirement.

Here is a sample command: 
OdiWaitForData "-GLOBAL_ROWCOUNT=1" "-LSCHEMA=GG_ODI_TEMP" "-POLLINT=1000" "-SQLFILTER=JRN_SUBSCRIBER = 'ChampsLoan2CFDR' and JRN_FLAG = 'I' and JRN_CONSUMED = '0'" "-TIMEOUT=0" "-TIMEOUT_WITH_ROWS_OK=YES" "-UNIT_ROWCOUNT=1" "-INCREMENT_DETECTION=NO" "-TABLE_NAME=J$P_QRM_LOANS"  "-OBJECT_TYPE=T" 



Thursday, December 22, 2011
Posted by Arjun Lagisetty

Performance tuning tips in ODI


There are many ETL (extract, transform and load) tools out there but Oracle Data Integrator (ODI) has gained a lot of ground because of its ability to perform event-driven, high-volume batch uploads from disparate data sources and to the variety of target applications/data storage technologies.
To maximize benefits of ODI, it needs to be tuned (with emphasis) and designed well. Here are couple of tips (besides the best practices already covered by many) that I have learnt while designing the packages/interfaces:
  1. Set truncate mode ON for temporary tablesIn development mode, switching off the deletion of temporary tables in the staging area is quite handy for debugging purposes. But these can cost heavily if carried over in the debug mode to production especially if ‘append’ mode is on for the tables.
  2. Tune JVM heap options: ODI uses java runtime environment for most of the knowledge modules and message-driven functionality. The heap settings for ODI are in the odiparams.bat script file and  the default values for the ODI_INIT_HEAP and ODI_MAX_HEAP properties are 32M and 256M. In most of the implementations where ODI is used, these settings are relatively low and result in OutofMemoryError exception  when the packages/interfaces are run. Per documentation the recommended settings are 256M for ODI_INIT_HEAP and 1024M for ODI_MAX_HEAP. Per my experience, 256M of init setting is the optimal one but max heap setting should be based on how many packages and the volume of task performed by those packages.
  3. Optimize the batch load size: When uploading to an RDBMS data storage, batch upload size can be a very effective setting and improve the runtime of an interface/package significantly. Most of the relational databases have two properties that can be set in the Physical Schema definition of the database technology under Topology Manager. The two settings are Array Fetch Size  and Batch Upload Size. These settings are typically quite low and should be increased to the optimal values to make the loads faster.
  4. Remote server URI and access: Many times the packages/interfaces have to access files from the remote servers. One obvious way to handle this is to share the path in the remote server, map the drive to the remote server path where required files are and then use the mapped drive in the package. But I found that ODI was not able to work with mapped drives. You need to give the fully qualified Universal Resource Identification (URI) path. Secondly, the access given to the path should be to a service account that is registered in the domain and can be accessed from within any server in the network. This service account should also be used as the logon account for ODI agent.
  5. Sunopsis Memory Engine vs. External Database: ODI provides Sunopsis Memory Engine as an in-memory database based on HSQLDB that serves quite well as a high-performing temporary data storage for intermediary data transformations. But, this engine is good for not too large set of data because it uses the memory from JVM that ODI runs on. For transformations of large data sets in interfaces, rather use an external RDBMS as the target or staging data-storage with truncate mode switched on (as mentioned in point 1). There is a good series of external blog on Sunopsis Memory Engine that also addresses this point in detail and more (Under the hood of the Sunopsis Memory Engine).
Wednesday, November 16, 2011
Posted by Arjun Lagisetty

Popular Post

Labels

Blog Archive

Copyright © ODI Pundits - Oracle Data Integrator - Maintained by Arjun Lagisetty