Category Archives: Unstructured

What is Big Data ? Is it Only Hadoop ? (Tutorial day 1)

Big Data, the new buzz word in the today’s technology is gaining more importance due to its high rewards. A systematic and focused approach toward the adoption of Big Data allows one to derive maximum value and utilize the power of Big Data.

 Its nothing but a new framework or system to get insight of existing different data forms and increasing the researchers/analyst power to get more out of existing system.

As BG Univ says, “Big data is about the application of new tools to do MORE analytic on MORE data for More people.”

Lifecycle of data can be defined as :

 

People get confuse with Big Data & Hadoop as 2 similar things. But no, Big data is not only Hadoop

Big Data is not a tool or single technique. Its actually a platform or a framework having various components like Data Warehouses (providing OLAP data/History), Real time Data systems and Hadoop (provides insight to structured/semi or unstructured Data).

Examples of Big Data are like Traffic data, Flights Data/ Search engine data etc.

Thus Big Data includes huge volume, high velocity, and extensible variety of data. The data in it will be of three types :

a) Structured data: Relational data.
b) Semi Structured data: XML data.
c) Unstructured data: Word, PDF, Text, Media Logs.

 Big Data can be characterized by 3 V’s :

1) Velocity -> Batch processing data, real time
2) Variety-> Structured, semi-structured, unstructured and polymorphic data
3) Volume-> Terabytes to Petabytes

Big Data puts existing traditional systems into trouble due to many reasons because when data increases the complexity, Security, maintenance, processing time of it also increases. Big Data gets Distributed processing system into picture. Its using multiple system/disk for parallel processing.

There are various tools & technologies in the market from different vendors including IBM, Microsoft, etc., to handle big data. Few of them are:

1) No SQL Big Data systems are designed to provide operational capabilities for real-time, interactive workloads where data is primarily captured and stored. It allows massive computations to be run inexpensively and efficiently. This makes operational big data workloads much easier to manage, cheaper, and faster to implement. For example MongoDB
2)MPP & MapReduce provide analytical capabilities for complex analysis including lot of data. Based on them we have Hadoop, Hive, Pig, Impala
3) Storage (HDFS ie Hadoop Distributed File System)
4) Servers (Google App Engine)
There are major challenges with Big Data.

Read  Day 2 tutorial to understand further and bookmark this page for future reference.

Creating Word / Tag Clouds in Obiee 11g

Like we discussed in my other blog over D3 visualizations in Obiee 11g, we have another data visualization technique called “word clouds”, that is used to graphically show word distributions in unstructured data sets through font sizes and orientations. For example, the word cloud shown below display the company names in different sizes/fonts.

It can be implemented in OBIEE11g reports, without using any extra help. We just have to create a dummy column using any logic to get dummy values ranging between 5 to 70 and then in Narrative View we can use this column in font size of Company column.

Try below steps:

  1. Add your column which needs to appear in Cloud, ie Product name
  2. Create a dummy column Font . Edit it and write any logic to get value range from 5 to 70 ( you can range your values depending on Font size required in Cloud) , like (count Of Transaction) * 5) / Max (count Of Transaction)  or if Char column is there, you can formula like (count(name)*1000 )/ 30 *9.5 or anything.
  3. Now edit column properties of Font column to set Data format as non decimal
  4. Now create 1 more dummy column as Color .Edit it to write formula as   Case when Font is between 1 to 10, then ‘Red’ when Font is between 11 to 30 then ‘Green’ etcc.
  5. Now go to results and see how table view appears. If everything seems fine, save this report. Now add a Narrative View to it to display the Company/Product name column,  like below and its done.
<a 
style= "font-size:@2pt; color:@3;"> 
@1 
</a>

Do try it.