Category Archives: Analytics

Exploring ODI Graphical Interface – ODI Studio (Lesson 2)

In this blog, you will get insights about the graphical user interface, the ODI studio. So what is ODI Studio??

As per Oracle documentation, Oracle Data Integrator Studio is a developer’s interface for configuring and managing ODI. It is installed as part of standalone or enterprise option under the specified ORACLE_HOME. ODI Studio provides four graphical Navigators for managing ODI artifacts: Designer, Operator, Topology, and Security.

Just a 1 liner overview for above 4 artifacts in layman language would be : Designer is for developers to design mapping, generate code scenarios as per business requirements. Operator is to run those codes and monitor the jobs. Topology is for Admin purpose to connect to Data sources, define context (env) and Security at end is again for admin to define roles and give priveleges or access.

To go into deeper insights we need lot to learn. So lets start with chronological order Asc, ie

Step 1: Topology (Lesson 4)

Step 2: Designer (Lesson 5)

Step3: Operator (Lesson 6)

Step 4: Security (Lesson 7)

Click on above links to study them to get deeper insights.

Click here to go to Lesson 3, ie Learn about ODI component AGENTS.

stay tuned !!

 

 

 

 

Generate DDL script for all tables of a schema in SQL Developer

Hello Friends,

I am back after a long vacay !! and thought to add some pointers to my blog. I have searched a lot about this topic and found least information. So here is step by step process to Generate DDL scripts for all tables in a schema in SQL developer, without much work.

If you want only DDL scripts together in 1 view, you can use below query to run in SQL developer schema, where you want output:

SELECT DBMS_METADATA.GET_DDL(‘TABLE’, TABLE_NAME) FROM USER_TABLES;

Next way to get separate DDL scripts is to :

  1. Go to FILE -> DATA MODELLER -> EXPORT -> DDL FILE
  2. New pop up window appear
  3. i1Click on Generate button
  4. New pop window appears
  5. Now click on “Generate DDL scripts in Separate Files”, on screen at bottom right.i2
  6. Now go to tab “Include TABLE DDL scripts
  7. Select table you want to have DDL scripts
  8. click ok. It will ask for location, where to store. Just set all this and you are good to go.

 

Happy Working 🙂

 

 

 

 

ETL ODI Error |ORA-01792: MAXIMUM NUMBER OF COLUMNS IN A TABLE OR VIEW IS 1000

Are you seeing the error “ORA-01792: MAXIMUM NUMBER OF COLUMNS IN A TABLE OR VIEW IS 1000” when the original query does not select as many columns.

Cause
——–

The problem matches Unpublished bugs as per the Oracle documents:

Bug 19653859 – CI BACKPORT OF BUG 19509982 FOR INCLUSION IN DATABASE BP 12.1.0.2.2
Bug 19509982 – DISABLE FIX FOR RAISING ORA-1792 BY DEFAULT

Solutions:

  1. Oracle suggests to alter the query and/or view definitions to avoid the error.
  2. However in cases where the SQL cannot be adjusted then the checking can be disabled by:
    SQL> alter system set “_fix_control”=’17376322:OFF’;
  3. Alternatively, an interim patch(patch 19509982) can be applied to disable the error by default.

References:

•Bug 19509982 Disable raising of ORA-1792 by default

You can click here to read more

 

 

Examples:

SQL> alter session set “_fix_control”=’17376322:OFF’;

or at system level :

SQL> alter system set “_fix_control”=’17376322:OFF’;

OR

Apply Patch 19509982 if available for your DBVersion and Platform

To download the patch, please follow the steps below:
1) Go To MOS
2) Click on Patches & Updates
3) In the Patch Search Section, enter your patch number <19509982> in the Patch Name or Number field.
4) Click Search
5) Click the patch that meets your DbVersion and platform
6) Read any applicable notes before downloading, then click the download button.

 

Information courtesy : Google/Oracle Docs

How to Google any column value from OBIEE

Now a days end users demand a lot of GUI interactivity in Obiee reports. They are really happy if obiee reports are full of helpful functionalities. One of them is to Google out the value they click on. So this was actually my business requirement few days back. Its not much tough to implement and requires no config or RPD changes.

Steps to do it :

  1. Create a new Action. Go to New -> Action -> Navigate to a Web PAge
  2. Type http://www.google.com/search?q=:   in url section on window
  3. then click on Define Parameters in right side of window.Remove the “optional” mark for the parameter like below:
  4. img1

    Save this action with any name you remember.

  5. Now go to your Analysis, where you wanna call this action through any column Interaction.
  6. On column properties -> Interaction -> Primary Interaction, select Action Links.
  7. Now click on + sign to add Action link
  8. Now in new pop up window, clink on Select existing action
  9. Identify your Saved action and click ok.
  10. Now edit this Action link and select the “column value” as the parameter value, as shown below
  11. then Select the relevant column and tick the boxes for “hidden” (and “fixed”) options. click ok to save.
  12. click ok and check your results. When you click on any mapped column value, Google window will open up searching that value.

 

Follow For more !!

 

How to get YYYYMM format from calendar date without using CAST as Char

So today i am here to discuss the scenarios of CAST function in OBIEE11g or 12c both. We always see that we always run behind using CAST function for conversion to Char/Int/Date etc. But issue appears when we have to get a diff format in Date functions.

For example, i had a scenario where user wanted to see report in pivot form, with Prompt  Date range as 2017 / 01  to 2017 / 04.

I tried getting above values in Prompt by using Convert Calendar date to Char using Cast function. But when i have to pass this prompt value to my report, it was not working as expected. If  i select Date range from 2017 / 01  to 2017 / 04, it use to show only data in report for 2017 / 01 , 2017 / 02, 2017 / 03.  It use to omit 2017 / 04.

To resolve this issue, we used a simple logic in Report Prompt, as below. This gives us the output in little different format , but still no CAST is used,so no conversion to Char and then having issues passing value.

Using below idea, we are now able to get output in proper format and it passes correctly to report and filters it.

imgg

Output shows below:

Follow for more 🙂

How to navigate to external links from obiee reports

Sometimes we get different business requirements, which are meaning-full but little trick to implement in OBIEE. Similarly , last week i got a requirement for business to add a hyperlink on a report column, which when clicked goes to an external link. also, along with it i have to pass the CodePin and Number.

I tired many ways of passing parameters in Go Nav, GO url format. But it didnt work. Finally <a Href> worked, so sharing my solution with you.

Step 1: Edit column formula and place your code in below format. Concat all the parameters to be passed. Set Target and give it a name.

Step 2: Modify the same column and set its data format as below.

Let me know if you face any issues.

Follow for more !!

 

How to Remove Null Value from Dashboard prompts in OBIEE

Currently i am facing a weird issue in Dashboard prompts of OBIEE 12c.There are invalid Null values that are not present in database but showing in prompt columns in OBIEE. For some of the columns Null Values is valid but for other it is not. So Business wants to display null value in prompt only if it is valid and appears in DB column as well.

There are two ways to fix this issue,

a) edit this prompt and check the ‘Require user input’ property (doesnt work max times)

b) At RPD level , uncheck the Nullable option under the physical column as shown below:

obiee snippet

Then restart / recycle all the servers. Untill then this change wont get impacted. Do it for only necessary columns which will never have NULL value in DB also.

Follow for more on my FB page

or my OBIEE group

Hadoop Ecosystem Components contd…(Tutorial Day 6)

So continuing the old post, here we will discuss some more components of Hadoop ecosystem.

Data Integration or ETL Components of Hadoop Ecosystem

Sqoop (SQL-to-Hadoop) is a big data tool that offers the capability to extract bulk data from non-Hadoop  or relational databases (like MySQL, Oracle,Teradata, Postgre) , transform the data into a form usable by Hadoop, and then load the data into HDFS, Hbase or Hive also. This process is similar to Extract, Transform, and Load.It parallelizes data transfer for fast performance, copies data quickly from external system to Hadoop & makes data analysis more efficient.

It’s batch oriented and not suitable for low latency interactive queries. It provides a scalable processing environment for both structured and non-structured data.

Sqoop Import

The import tool imports individual tables from RDBMS to HDFS. Each row in a table is treated as a record in HDFS. All records are stored as text data in text files or as binary data in Avro and Sequence files.

Sqoop Export

The export tool exports a set of files from HDFS back to an RDBMS. The files given as input to Sqoop contain records, which are called as rows in table. Those are read and parsed into a set of records and delimited with user-specified delimiter.

Sqoop Use Case-

Coupons.com , Apollo Group uses Sqoop component of the Hadoop ecosystem to enable transmission of data between Hadoop & data warehouse .
  • Flume-

Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of streaming or log files data into the Hadoop Distributed File System (HDFS).  It is used for collecting data from its origin and sending it back to the resting location (HDFS).Flume accomplishes this by outlining data flows that consist of 3 primary structures channels, sources and sinks. The processes that run the dataflow with flume are known as agents and the bits of data that flow via flume are known as events.

Flume helps to collect data from a variety of sources, like logs, jms, Directory etc.
Multiple flume agents can be configured to collect high volume of data.
It scales horizontally & is stream oriented.It provides high throughput and low latency.It is fault tolerant.

Both Sqoop and Flume, pull the data from the source and push it to the sink. The main difference is Flume is event driven, while Sqoop is not.

Flume Use Case –

Twitter source connects through the streaming API and continuously downloads the tweets (called as events). These tweets are converted into JSON format and sent to the downstream Flume sinks for further analysis of tweets and retweets to engage users on Twitter.
Goibibo uses Flume to transfer logs from production system to HDFS.

Data Storage Component of Hadoop Ecosystem

HBase

Hbase is an open source, distributed, sorted map model.Its a column store-based NoSQL database solution & is similar to Google’s BigTable framework.It supports random reads and also batch computations using MapReduce. With HBase NoSQL database enterprise can create large tables with millions of rows and columns on hardware machine. The best practice to use HBase is when there is a requirement for random ‘read or write’ access to big datasets. HBase’s important advantage is that it supports updates on larger tables and faster lookup. The HBase data store supports linear and modular scaling. HBase stores data as a multidimensional map and is distributed. HBase operations are all MapReduce tasks that run in a parallel manner.

Its well integrated with Pig/Hive/Sqoop. It is consistent and partition tolerant system in CAP theorem.

HBase Use Case-

Facebook is one the largest users of HBase with its messaging platform built on top of HBase in 2010.

Cassandra

Apache Cassandra is a free and open-source distributed database management system designed to handle large amounts of data across many commodity servers.This database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data.Cassandra’s support for replicating across multiple data-centers is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.

Use Cases:

For Cassandra, Twitter is an excellent example. We know that, like most sites, user information (screen name, password, email address, etc), is kept for everyone and that those entries are linked to one another to map friends and followers. And, it wouldn’t be Twitter if it weren’t storing tweets, which in addition to the 140 characters of text are also associated with meta-data like timestamp and the unique id that we see in the URLs.

Monitoring, Management and Orchestration Components of Hadoop Ecosystem- Oozie and Zookeeper

  • Oozie-

Oozie is a workflow scheduler where the workflows are expressed as Directed Acyclic Graphs. Oozie runs in a Java servlet container Tomcat and makes use of a database to store all the running workflow instances, their states ad variables along with the workflow definitions to manage Hadoop jobs (MapReduce, Sqoop, Pig and Hive).The workflows in Oozie are executed based on data and time dependencies.

Oozie Use Case:

The American video game publisher Riot Games uses Hadoop and the open source tool Oozie to understand  the player experience.

  • Zookeeper-

Zookeeper is the king of coordination and provides simple, fast, reliable and ordered operational services for a Hadoop cluster. Zookeeper is responsible for synchronization service, distributed configuration service and for providing a naming registry for distributed systems.

Zookeeper Use Case-

Found by Elastic uses Zookeeper comprehensively for resource allocation, leader election, high priority notifications and discovery. The entire service of Found built up of various systems that read and write to   Zookeeper.

Here is the recorded session from the IBM Certified Hadoop Developer Course at DeZyre about the components of Hadoop Ecosystem –
Several other common Hadoop ecosystem components include: Avro, Cassandra, Chukwa, Mahout, HCatalog, Ambari and Hama. By implementing Hadoop using one or more of the Hadoop ecosystem components, users can personalize their big data experience to meet the changing business requirements. The demand for big data analytics will make the elephant stay in the big data room for quite some time.

Data Serialisation (Data Interchange Protocols)

AVRO: Apache Avro is a language-neutral data serialization system, developed by  Apache Hadoop.Data serialization is a mechanism to translate data in computer environment (like memory buffer, data structures or object state) into binary or textual form that can be transported over network or stored in some persistent storage media.Java and Hadoop provides serialization APIs, which are java based, but Avro is not only language independent but also it is schema-based.Once the data is transported over network or retrieved from the persistent storage, it needs to be deserialized again. Serialization is termed as marshalling and deserialization is termed as unmarshalling.

Avro uses JSON format to declare the data structures. Presently, it supports languages such as Java, C, C++, C#, Python, and Ruby.Avro has a schema-based system. A language-independent schema is associated with its read and write operations.

Like Avro, there are other serialization mechanisms in Hadoop such as Sequence Files, Protocol Buffers, and Thrift.Avro creates a self-describing file named Avro Data File, in which it stores data along with its schema in the metadata section.Avro is also used in Remote Procedure Calls (RPCs). During RPC, client and server exchange schemas in the connection handshake.

To serialize Hadoop data, there are two ways −

  • You can use the Writable classes, provided by Hadoop’s native library.
  • You can also use Sequence Files which store the data in binary format.

The main drawback of these two mechanisms is that Writables and SequenceFiles have only a Java API and they cannot be written or read in any other language.

Therefore any of the files created in Hadoop with above two mechanisms cannot be read by any other third language, which makes Hadoop as a limited box. To address this drawback, Doug Cutting created Avro, which is a language independent data structure.

Use Case:

Content credit : http://www.tutorialspoint.com

Avro provides rich data structures. For example, you can create a record that contains an array, an enumerated type, and a sub record. These datatypes can be created in any language, can be processed in Hadoop, and the results can be fed to a third language.

 

 Thrift :

Thrift is a lightweight, language-independent software stack with an associated code generation mechanism for RPC. Thrift provides clean abstractions for data transport, data serialization, and application level processing. Thrift was originally developed by Facebook and now it is open sourced as an Apache project. Apache Thrift is a set of code-generation tools that allows developers to build RPC clients and servers by just defining the data types and service interfaces in a simple definition file. Given this file as an input, code is generated to build RPC clients and servers that communicate seamlessly across programming languages.

Thrift supports a variety of languages including C++, Java, Python, PHP, Ruby.

To learn more on Hadoop…keep on reading these tutorials…every day we try to get something new and interesting for all my readers !!

Now we will start learning all Ecosystem components in more detail. Click here to read about how MapReduce Algorithm works with an easy example.

Hadoop Ecosystem Components contd…(Tutorial Day 5)

So continuing the old post, vendors that provide Hadoop-based platforms include Cloudera, Hortonworks, MapR, Greenplum, IBM, and Amazon. Here we will discuss more components of Hadoop ecosystem.

Data Access Components of Hadoop Ecosystem

  • Pig-

Apache Pig is a high-level platform for creating programs that run on Apache Hadoop. Apache Pig is a tool developed by Yahoo for analyzing huge data sets efficiently and easily. The high level data flow language for this platform is called Pig Latin. Pig can execute its Hadoop jobs in MapReduce, Apache Tez, or Apache Spark. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.

At the present time, Pig’s infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs, for which large-scale parallel implementations already exist (e.g., the Hadoop subproject). Pig’s language layer currently consists of a textual language called Pig Latin.Pig is an open source project under the Apache Software Foundation, so you can learn about it online

Pig Latin is basically used it to construct dataflows, to have a scheduled job to periodically crunch the massive data from HDFS and transfer the summarized data into a relational database for reporting, & ad-hoc analyses. Hive is used for simple ad-hoc analytical queries for the data in HDFS, as Hive queries are a lot faster to write for those types of queries. Its generally used by Yahoo, Twitter etc to process web logs,images,maps etc.

Usage of Apache Pig:

  • Using Pig Latin, programmers can perform MapReduce tasks easily without having to type complex codes in Java, as it uses multi-query approach, thereby reducing the length of codes. For example, an operation that would require you to type 200 lines of code (LoC) in Java can be easily done by typing as less as just 10 LoC in Apache Pig. Ultimately Apache Pig reduces the development time by almost 16 times.
  • Pig Latin is SQL-like language and it is easy to learn Apache Pig when you are familiar with SQL.
  • Apache Pig provides many built-in operators to support data operations like joins, filters, ordering, etc. In addition, it also provides nested data types like tuples, bags, and maps that are missing from MapReduce.

Pig Use Case-

I am hereby using one of my fav use case of PIG Latin language, you can read here on Slideshare:

Scenario: You have a User data in one file ,website data in another. Now you want to find out the top 5 most visited pages by users of Age (18-25). For this scenario, MAp reduce program is full page length code, but in PIG Latin language its a small easily understandable code.

pig_latin-code_example
Code credit: Nick Dimiduk
  • Hive-

Hive is a Data warehouse system layer built on Hadoop. It allows to define a structure for unstructured big data and query the data using a SQL-like language called HiveQL. Its developed by Facebook & makes querying faster through indexing.

Hive Use Case-

Hive simplifies Hadoop at Facebook with the execution of 7500+ Hive jobs daily for Ad-hoc analysis, reporting and machine learning.

 

Read my next blog on more Hadoop ecosystem components (tutorial Day 6)

Always Inspiring Creations!!