How to get YYYYMM format from calendar date without using CAST as Char

So today i am here to discuss the scenarios of CAST function in OBIEE11g or 12c both. We always see that we always run behind using CAST function for conversion to Char/Int/Date etc. But issue appears when we have to get a diff format in Date functions.

For example, i had a scenario where user wanted to see report in pivot form, with Prompt  Date range as 2017 / 01  to 2017 / 04.

I tried getting above values in Prompt by using Convert Calendar date to Char using Cast function. But when i have to pass this prompt value to my report, it was not working as expected. If  i select Date range from 2017 / 01  to 2017 / 04, it use to show only data in report for 2017 / 01 , 2017 / 02, 2017 / 03.  It use to omit 2017 / 04.

To resolve this issue, we used a simple logic in Report Prompt, as below. This gives us the output in little different format , but still no CAST is used,so no conversion to Char and then having issues passing value.

Using below idea, we are now able to get output in proper format and it passes correctly to report and filters it.


Output shows below:

Follow for more 🙂

How to navigate to external links from obiee reports

Sometimes we get different business requirements, which are meaning-full but little trick to implement in OBIEE. Similarly , last week i got a requirement for business to add a hyperlink on a report column, which when clicked goes to an external link. also, along with it i have to pass the CodePin and Number.

I tired many ways of passing parameters in Go Nav, GO url format. But it didnt work. Finally <a Href> worked, so sharing my solution with you.

Step 1: Edit column formula and place your code in below format. Concat all the parameters to be passed. Set Target and give it a name.

Step 2: Modify the same column and set its data format as below.

Let me know if you face any issues.

Follow for more !!


What is Resilient Distributed Datasets (RDDs) ? (Day 3)

Spark’s primary core abstraction is called Resilient Distributed Dataset or RDD.It is designed to support in-memory data storage, distributed across a cluster in a manner that is Resilient,fault-tolerant and efficient. RDD are Resilient as it relies on lineage graph , whenever there is  a failure in system, they can recompute themselves using the prior information. Similarly Fault-tolerance is achieved, in part, by tracking the lineage of transformations applied to coarse grained sets of data. Efficiency is achieved through parallelization of processing across multiple nodes in the cluster, and minimization of data replication between those nodes.

In a layman language, you can RDD is representation of the data that’s coming into your system in an object format & allows you to do computation on it.”

Spark RDD’s can reference to a dataset in an external storage system, such as a shared filesystem, HDFS, HBase, or any data source offering a Hadoop InputFormat. Also we can define it as, just a distributed collection of elements that is parallelized across the cluster. Once data is loaded into an RDD, two basic types of operation can be carried out.. Transformations and Actions.

Transformations are those that do not return a value. In fact, nothing is evaluated during the definition of these transformation statements. It just creates a new RDD by changing the original through processes such as mapping, filtering, and more. The fault tolerance aspect of RDDs allows Spark to reconstruct the transformations used to build the lineage to get back the lost data.

Actions are when the transformations get evaluated along with the action that is called for that RDD. Actions return values. For example, you can do a count on a RDD, to get the number of elements within and that value is returned.

The original RDD remains unchanged throughout. The chain of transformations from RDD1 to RDDn are logged, and can be repeated in the event of data loss or the failure of a cluster node.RDDs can be persistent in order to cache a dataset in memory across operations. This allows future actions to be much faster, by as much as ten times.
Spark’s cache is fault-tolerant in that if any partition of an RDD is lost, it will automatically be recomputed by using the original transformations. For example, let’s say a node goes offline. All it needs to do when it comes back online is to re-evaluate the graph to where it left off. Caching is provided with Spark to enable the processing to happen in memory. If it does not fit in memory, it will spill to disk.
Interesting thing about Spark is , it’s lazy evaluation.  This is because RDD are not loaded into system as in when the system encounters an RDD , but only done when an Action is supposed to be performed. So to understand this concept, lets take an example:
  • We read a text file and load the data into new created RDD ‘m’   {scala>  val m=sc.textfile (“abc.txt”)  } . This step is interpreted by Spark and an DAG is created that tells it to read data from file and push it in RDD format. An RDD is made of multiple partitions. By default, the minimum # of partitions in an RDD will be two. However, this is customizable and will be different in vendor distributions of Spark. For example, when creating an RDD out of an HDFS file, each block in the file feeds one RDD partition, so a file with 30 unique blocks will create an RDD with 30 partitions. Or in Cassandra, every 100,000 rows get loaded into one RDD partition. So, a Cassandra table with 1 million rows will generate an RDD with 10 partitions.
  • Next step is to display the first item in this RDD,  {scala>  m.first() }
  • Now lets use the .filter() transformation on the ‘m‘ RDD to return a new RDD named “linesWithApache“, which will contain a subset of the items in the file (only the ones containing the string “Apache”: {scala> val linesWithApache = m.filter(line => line.contains(“Apache”))}
  • Now lets use an Action to find no. of lines with Apache word.  {scala> linesWithApache.count()}
  • To further see these lines, you can use .collect()  Action.  {scala> linesWithApache.collect()  }
Learn How Spark actually works? Click here


Spark Components..Architecture (Day 2)

So here we are today…in Day 2  tutorial for Spark learning. As we all know, that Spark is a top-level project of the Apache Software Foundation, designed to be used with a range of programming languages and on a variety of architectures. Spark’s speed, simplicity, and broad support for existing development environments and storage systems make it increasingly popular with a wide range of developers, and relatively accessible to those learning to work with it for the first time.

To learn Spark easily and incorporate into existing applications as straightforwardly as possible., its developed to support many programming languages like Java, Python, Scala, SQL & R. Spark is easy to download and install on a laptop or virtual machine. Spark was built to be able to run in a couple different ways: standalone, or part of a cluster.For production workloads that are operating at scale, Spark will require to run on an big data cluster. These clusters are often also used for Hadoop jobs, and Hadoop’s YARN resource manager will generally be used to manage that Hadoop cluster (including Spark).  Spark can also run just as easily on clusters controlled by Apache Mesos.A series of scripts bundled with current releases of Spark simplify the process of launching Spark on Amazon Web Services’ Elastic Compute Cloud (EC2).

Spark Architecture

The Spark architecture or stack currently is comprised of Spark Core and four libraries that are optimized to address the requirements of four different use cases.Individual applications will typically require Spark Core and at least one of these libraries.

What are Spark Components?

Spark core: Its is a general-purpose system providing basic functionality like task scheduling, distributing,fault recovery, interacting with storage systems and monitoring of the applications across a cluster. Spark Core is also home to the API that defines resilient distributed datasets (RDDs), which is Spark’s main programming abstraction.

Then you have the components on top of the core that are designed to interoperate closely.Benefit of such a stack is that all the higher layer components will inherit the improvements made at the lower layers. Example: Optimization to the Spark Core will speed up the SQL, the streaming, the machine learning and the graph processing libraries as well.

  1. Spark Streaming : This module enables scalable and fault-tolerant processing of streaming data, and can integrate with established sources of data streams like Flume. Examples of data streams include log files generated by production web servers, or queues of messages containing status updates posted by users of a web service.
  2. Spark SQL: This module is for working with structured data. It allows querying data via SQL as well as the Apache Hive variant of SQL—called the Hive Query Language (HQL)—and it supports many sources of data, including Hive tables, Parquet, and JSON.Spark SQL also supports JDBC and ODBC connections, enabling a degree of integration with existing databases, data warehouses and business intelligence tools.
  3. GRaphX : It supports analysis of and computation over graphs of data (e.g., a social network’s friend graph) and performing graph-parallel computations. Like Spark Streaming and Spark SQL, it also extends the Spark RDD API, allowing us to create a directed graph with arbitrary properties attached to each vertex and edge. It provides various operators for manipulating graphs (e.g., subgraph and mapVertices) and a library of common graph algorithms (e.g., PageRank and triangle counting).
  4. Spark Mlib : Spark comes with a library containing common machine learning (ML) functionality, called MLlib. It provides multiple types of machine learning algorithms, including classification, regression, clustering, and collaborative filtering, as well as supporting functionality such as model evaluation and data import.

What is Resilient Distributed Datasets (RDDs)? Click here to learn Day 3 tutorial 🙂

What is it replacing Hadoop ? (Day1)

Spark Framework is a simple Java web framework built for fast computation. It is a free and open-source software  & an alternative to other Java web application frameworks such as JAX-RS and Spring MVC. It was started in 2009 at Berkeley.


To define, Spark is a cluster-computing framework, which means that it competes more with MapReduce than with the entire Hadoop ecosystem. It actually extends MR model to support more computation ways like interactive/iterative algos, queries, stream processing, graph processing etc. It is designed to be highly accessible, offering simple API in languages like Python, Java, Scala & SQL.One of the main features Spark offers for speed is the ability to run computations in memory, but the system is also more efficient than MapReduce for complex applications running on disk.

Is Spark a Hadoop module?

We see Spark is listed as a module on Hadoop’s project page, but Spark also has its own page because, while it can run in Hadoop clusters through YARN, it also has a standalone mode. So Spark is independent. By default there is no storage mechanism in Spark, so to store data, need fast and scalable file system. Hence uses S3 or HDFS or any other file system, but if you use Hadoop it’s very low cost.

Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3. You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, or on Apache Mesos. However, as time goes on, some big data scientists expect Spark to diverge and perhaps replace Hadoop, especially in instances where faster access to processed data is critical.

(Source: Internet)

Hadoop vs Spark

A direct comparison of Hadoop and Spark is difficult because they do many of the same things, but are also non-overlapping in some areas.The most important thing to remember about Hadoop and Spark is that their use is not an either-or scenario because they are not mutually exclusive. Nor is one necessarily a drop-in replacement for the other. The two are compatible with each other and that makes their pairing an extremely powerful solution for a variety of big data applications.So we can compare them on some below points:

  1. Data Processing Engine/Operators: Hadoop originally was designed to handle crawling and searching billions of web pages and collecting their information into a database. For this it uses Map reduce,which is a batch-processing engine. MapReduce operates in sequential steps by reading data from the cluster, performing its operation on the data, writing the results back to the cluster, reading updated data from the cluster, performing the next data operation, writing those results back to the cluster and so on. But Spark is a cluster-computing framework,Which performs similar operations, but it does so in a single step and in memory. It reads data from the cluster, performs its operation (Filter/map/join/groupby) on the data, and then writes it back to the cluster.
  2. File System: Spark has no file management and therefor must rely on Hadoop’s Distributed File System (HDFS) or some other solution like S3, Tachyon.
  3. Speed/Performance: Spark’s in-memory processing admit that Spark is very fast (Up to 100 times faster than Hadoop MapReduce), Spark can also perform batch processing, however, it really excels at streaming workloads, interactive queries, and machine-based learning.The reason that Spark is so fast is that it processes everything in memory. Yes, it can also use disk for data that doesn’t all fit into memory.Spark uses memory and can use disk for processing, whereas MapReduce is strictly disk-based. Example: Internet of Things sensors, log monitoring, security analytics all require Spark for faster computation.
  4. Storage: MapReduce uses persistent storage and Spark uses Resilient Distributed Datasets (RDDs)
  5. Ease of Use: Spark is well known for its performance, but it’s also somewhat well known for its ease of use in that it comes with user-friendly APIs for Scala (its native language), Java, Python, and Spark SQL.Spark has an interactive mode so that developers and users can run queries.MapReduce has no interactive mode, but add-ons such as Hive and Pig  to make working with MapReduce a little easier for developers.
  6. Costs :Both MapReduce and Spark are Apache projects, which means that they’re open source and free software products. While there’s no cost for the software, there are costs associated with running either platform in personnel and in hardware. Both products are designed to run on commodity hardware, such as low cost, so-called white box server systems. However Spark systems cost more because of the large amounts of RAM required to run everything in memory. But what’s also true is that Spark’s technology reduces the number of required systems. So, you have significantly fewer systems that cost more. There’s probably a point at which Spark actually reduces costs per unit of computation even with the additional RAM requirement.
  7. API’s: Spark also includes its own graph computation library, GraphX. GraphX allows users to view the same data as graphs and as collections. Users can also transform and join graphs with Resilient Distributed Datasets (RDDs).
  8. Fault Tolerance: Hadoop uses Replicated blocks of data to maintain this feature. There is a link between TaskTrackers & JobTracker, so if its missed then the JobTracker reschedules all pending and in-progress operations to another TaskTracker. This effectively provides fault tolerance.Spark uses Resilient Distributed Datasets (RDDs), which are fault-tolerant collections of elements that can be operated on in parallel. RDDs can reference a dataset in an external storage system, such as a shared filesystem, HDFS, HBase, or any data source offering a Hadoop InputFormat. Spark can create RDDs from any storage source supported by Hadoop, including local filesystems or one of those listed previously.
  9. Scalability: both MapReduce and Spark are scalable using the HDFS.
  10. Security: Hadoop supports Kerberos authentication. HDFS supports access control lists (ACLs) and a traditional file permissions model. For user control in job submission, Hadoop provides Service Level Authorization, which ensures that clients have the right permissions.Spark’s security is a bit sparse by currently only supporting authentication via shared secret (password authentication). The security bonus that Spark can enjoy is that if you run Spark on HDFS, it can use HDFS ACLs and file-level permissions. Additionally, Spark can run on YARN giving it the capability of using Kerberos authentication.

Learn Spark Architecture by clicking here.

Mughal Dine-esty…the fine Dining Restaurant !!

Mughal Dine-esty is a fine dining restaurant on the ground floor of the Time Square Building in Sushant Lok Phase 1 in Gurgaon. It’s an add-on venture of Dine-esty, which has been a famous Chinese restaurant in Gurgaon. This restaurant serves authentic Mughalai food. This place has a common entrance and 2 different set-ups inside. The interiors are exquisite and really aesthetic.You can see chinese theme touch in the seating area.

The restaurant offers a good variety of food options in both Chinese and Mughlai Cuisines.  Visited here for fine dining with my husband & friends on Sunday brunch.We tried both vegetarian and non vegetarian cuisines.

We started of with the starters, along with some signature drinks like Italian Smooch, Silver Lining, Blue Sea and Food Punch (5/5). Both the cocktails and mocktails were full of flavors and tasted well. In starters, we had Methi corn ke seekh kebab (5/5) –full of flavours and yumm, Kale channe ki Tikki (4.5/5) – was nicely cooked ,had strong flavour of Cardamom & was quite crispy , Daahi ke Sholley (5/5) – very cripsy and full of hungcurd, properly fried, Achari Paneer Tikka (5/5) was teh best i liked. Paneer was very soft and full of flavours.

After drinks and starters, it was time to order main course. We ordered Daal Makhani, RawalPindi Cholle, Keere ka raita, Rice & Naans.Food was delicious (5/5) and the portion size was also good. In desserts we had gulab jamuns which were quite soft and served hot.


The staff is pretty courteous in service and the service was prompt. Pricing wise little costly but worth a visit.

Highly recommend as place is ideal for fine dining, especially with your family.


To know more about Mughal Dine-esty, click here:

Mughal Dine-Esty Menu, Reviews, Photos, Location and Info - Zomato

La Pino’z..What more you can ask from a pizza !!

La Pino’z pizza is a delivery only outlet serving delicacies like Mexican, Italian, chinese. It’s located at Golf course road, gurgaon. Their priority is to serve high quality, freshly prepared, hot meals delivered on time, every time to all of their customers.

They serve authentic pizzas at a very reasonable price.Menu has lot of options including veg and non veg. Order was delivered on time and served hot. The packaging was proper, cardboard boxes with required accompaniments. La Pino’z Pizza is one place where you can try the Giant Slice if you’re looking to try more than one pizza or do not have the appetite to eat it all. You can also ask them to serve two favours on same pizza. They have given an option to order Monster pizza, for large group of people.

We ordered Cheese garlic bread (5/5), which was as super yummy,full of cheese and adequately flavoured. Corn Taco’s (4.5/5) – it was excellent and full of cheese, corn.was properly cooked and crispy. 1 large Veg Tamer pizza & 1 paneer pizza (5/5).Pizza were really good with full on veggie toppings like onion, tomatoes and capsicum, paneer,mushrooms . It was super yum. The crust was not too thick and quite crispy, It was served with delicious dips and sauces.


I shared my order with my parents. Generally, we don’t eat outside or ready meals, so i was skeptical about how good the food would taste, i feared they my parents will not like it, but was soooo happy when my father said I just loved the Pizza. Everything were delicious & lip smacking!!

Loved the food and would highly recommend it!!

What the ratings stand for: 5 = Excellent, 4 = Very Good, 3 = Good, 2 = Fair, 1 = Disaster.

Find it on Zomato at :

La Pino'z Pizza Menu, Reviews, Photos, Location and Info - Zomato