Category Archives: data

What is Big Data ? Is it Only Hadoop ? (Tutorial day 1)

Big Data, the new buzz word in the today’s technology is gaining more importance due to its high rewards. A systematic and focused approach toward the adoption of Big Data allows one to derive maximum value and utilize the power of Big Data.

 Its nothing but a new framework or system to get insight of existing different data forms and increasing the researchers/analyst power to get more out of existing system.

As BG Univ says, “Big data is about the application of new tools to do MORE analytic on MORE data for More people.”

Lifecycle of data can be defined as :


People get confuse with Big Data & Hadoop as 2 similar things. But no, Big data is not only Hadoop

Big Data is not a tool or single technique. Its actually a platform or a framework having various components like Data Warehouses (providing OLAP data/History), Real time Data systems and Hadoop (provides insight to structured/semi or unstructured Data).

Examples of Big Data are like Traffic data, Flights Data/ Search engine data etc.

Thus Big Data includes huge volume, high velocity, and extensible variety of data. The data in it will be of three types :

a) Structured data: Relational data.
b) Semi Structured data: XML data.
c) Unstructured data: Word, PDF, Text, Media Logs.

 Big Data can be characterized by 3 V’s :

1) Velocity -> Batch processing data, real time
2) Variety-> Structured, semi-structured, unstructured and polymorphic data
3) Volume-> Terabytes to Petabytes

Big Data puts existing traditional systems into trouble due to many reasons because when data increases the complexity, Security, maintenance, processing time of it also increases. Big Data gets Distributed processing system into picture. Its using multiple system/disk for parallel processing.

There are various tools & technologies in the market from different vendors including IBM, Microsoft, etc., to handle big data. Few of them are:

1) No SQL Big Data systems are designed to provide operational capabilities for real-time, interactive workloads where data is primarily captured and stored. It allows massive computations to be run inexpensively and efficiently. This makes operational big data workloads much easier to manage, cheaper, and faster to implement. For example MongoDB
2)MPP & MapReduce provide analytical capabilities for complex analysis including lot of data. Based on them we have Hadoop, Hive, Pig, Impala
3) Storage (HDFS ie Hadoop Distributed File System)
4) Servers (Google App Engine)
There are major challenges with Big Data.

Read  Day 2 tutorial to understand further and bookmark this page for future reference.

Creating Word / Tag Clouds in Obiee 11g

Like we discussed in my other blog over D3 visualizations in Obiee 11g, we have another data visualization technique called “word clouds”, that is used to graphically show word distributions in unstructured data sets through font sizes and orientations. For example, the word cloud shown below display the company names in different sizes/fonts.

It can be implemented in OBIEE11g reports, without using any extra help. We just have to create a dummy column using any logic to get dummy values ranging between 5 to 70 and then in Narrative View we can use this column in font size of Company column.

Try below steps:

  1. Add your column which needs to appear in Cloud, ie Product name
  2. Create a dummy column Font . Edit it and write any logic to get value range from 5 to 70 ( you can range your values depending on Font size required in Cloud) , like (count Of Transaction) * 5) / Max (count Of Transaction)  or if Char column is there, you can formula like (count(name)*1000 )/ 30 *9.5 or anything.
  3. Now edit column properties of Font column to set Data format as non decimal
  4. Now create 1 more dummy column as Color .Edit it to write formula as   Case when Font is between 1 to 10, then ‘Red’ when Font is between 11 to 30 then ‘Green’ etcc.
  5. Now go to results and see how table view appears. If everything seems fine, save this report. Now add a Narrative View to it to display the Company/Product name column,  like below and its done.
style= "font-size:@2pt; color:@3;"> 

Do try it.

Data Visualization (D3) in OBIEE 11g

One of the great features of Oracle’s Business Intelligence 11g foundation is the ability to integrate external applications through the use of java script libraries.Today we’re going to expand on this functionality by integrating third party(open source) data visualization java script library used for data manipulation ie D3. To describe, its Data Driven Documents.

Oracle have added powerful new data visualization capabilities that turn raw data into insightful information.D3 helps you bring data to life using HTML, SVG, and CSS. D3’s emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to DOM manipulation.  For example, you can use D3 to generate an HTML table from an array of numbers. Or, use the same data to create an interactive SVG bar chart with smooth transitions and interaction.

You can download or get the D3 files (HTML) online.These files contain the basic code of creating visual designs like Bar Column, Candlestick, Pie, Doughnut, Funnel etc. After downloading this file, either we can directly make an external call to it or we can place it in OBIEE server at following location (You can check with your admin to see if file needs to be placed on some other location also) :


In order to embed a D3 visualization in OBIEE you’ll need to first create  sample report and then use a Narrative view to display products Transaction count via Region (example). This will enable you to gain access to the data that we need to drive our visualization using the @n substitution variables where n equals the column position in the criteria or in the array.

So let’s create a simple report of Region, Product & its selling count. Now in Narrative View, we will try to create a doughnut design displaying Region wise Count spread.
In the Prefix section at the top we will declare a JavaScript array variable called “n” that will contain the data from the analysis like this:

var n=[];

This array will hold Data elements like below:

The Narrative section should contain the following code :

n.push({Count:@2, LegendText:“@1“,indexLabel:@2});

where @1, @2 substitute for  Region & Transaction Count respectively and will dynamically generate the JavaScript to populate our array. LegendText tag is used to show the Regions in Legend, and IndexLabel is to show data value in Index.

 Now in Postfix section,we have to write load of JavaScript code, in which we call the array in a function, mention the font style/color/size, decide the legend style etc. It should look similar to below. You can modify it further as required.

Now if you see the final output: