Category Archives: analysis

Understanding GL Account types

October 31, 2017 creativeme1807 Leave a comment

General Ledger Accounts

A general ledger or accounting ledger is a record or document that contains account summaries for accounts used by a company. In other words, a ledger is a record that details all business accounts and account activity during a period. In other words, the accounts that are used to sort and store transactions are found in the company’s general ledger. Ledgers are of many types like Accounting, Subsidiary etc. Accounting Ledgers are arranged according to the following seven classifications.

Assets (Cash, Accounts Receivable, Land, Equipment)
Liabilities (Loans Payable, Accounts Payable, Bonds Payable)
Stockholders’ equity (Common Stock, Retained Earnings)
Operating Revenues Account (Sales, Service Fees)
Operating Expenses (Salaries Expense, Rent Expense, Depreciation Expense)
Non-operating revenues and gains (Investment Income, Gain on Disposal of Truck)
Non-operating expenses and losses (Interest Expense, Loss on Disposal of Equipment)

Balance Sheet Accounts

The first three classifications are referred to as balance sheet accounts since the balances in these accounts are reported on the financial statement known as the balance sheet.

Balance sheet accounts
- Assets
- Liabilities
- Stockholders’ (or Owner’s) equity

The balance sheet accounts are also known as permanent accounts (or real accounts) since the balances in these accounts will not be closed at the end of an accounting year. Instead, these account balances are carried forward to the next accounting year.

Income Statement Accounts

The four remaining classifications of accounts are referred to as income statement accounts since the amounts in these accounts will be reported on the financial statement known as the income statement.

Income statement accounts
- Operating revenues
- Operating expenses
- Non-operating revenues and gains
- Non-operating expenses and losses

The income statement accounts are also known as temporary accounts since the balances in these accounts will be closed at the end of the accounting year. Each income statement account is closed in order to begin the next accounting year with a zero balance.

The year-end balances from all of the income statement accounts will be combined and entered as a single net amount in Retained Earnings (a balance sheet account within stockholders’ equity) or in a proprietor’s capital account.

Note: If an account has not had any activity in the current or recent periods, it is often omitted from the current general ledger.

Chart of Accounts

The chart of accounts is simply a list of all of the accounts that are available for recording transactions. This means that the number of accounts in the chart of accounts will be greater than the number of accounts in the general ledger. (The reason is that accounts with zero balances and no recent entries are often omitted from the general ledger until there is a transaction for the account.) The chart of accounts is organized similar to the general ledger: balance sheet accounts followed by the income statement accounts. However, the chart of accounts does not contain any entries or account balances.

In other words, General Ledger are divided among 5 account types available and they form two groups:
Profit and loss accounts or income statement accounts:

Expense
Revenue

Balance sheet accounts:

Asset
Liability
Owners equity

The major difference between these two groups is that at year end the Income statement accounts are rolled up into the retained earnings account when you open the first period. Balance sheet accounts are rolled forward to the same code combination when you open the first period of a new year.

analysis, Analytics, OBIEE, OBIEE, Obiee 11.1.1.9.0, OBIEE 11g, Obiee 12c, OBIEE Reports, OBIEE11G

How to Google any column value from OBIEE

May 9, 2017 creativeme1807 Leave a comment

Now a days end users demand a lot of GUI interactivity in Obiee reports. They are really happy if obiee reports are full of helpful functionalities. One of them is to Google out the value they click on. So this was actually my business requirement few days back. Its not much tough to implement and requires no config or RPD changes.

Steps to do it :

Create a new Action. Go to New -> Action -> Navigate to a Web PAge
Type http://www.google.com/search?q=: in url section on window
then click on Define Parameters in right side of window.Remove the “optional” mark for the parameter like below:
Save this action with any name you remember.
Now go to your Analysis, where you wanna call this action through any column Interaction.
On column properties -> Interaction -> Primary Interaction, select Action Links.
Now click on + sign to add Action link
Now in new pop up window, clink on Select existing action
Identify your Saved action and click ok.
Now edit this Action link and select the “column value” as the parameter value, as shown below
then Select the relevant column and tick the boxes for “hidden” (and “fixed”) options. click ok to save.
click ok and check your results. When you click on any mapped column value, Google window will open up searching that value.

Follow For more !!

analysis, Big Data, Data Analyst, Hadoop, Hadoop architecture, Hadoop Clusters, Hadoop common, Spark

What is Spark..is it replacing Hadoop ? (Day1)

April 19, 2017 creativeme1807 Leave a comment

Spark Framework is a simple Java web framework built for fast computation. It is a free and open-source software & an alternative to other Java web application frameworks such as JAX-RS and Spring MVC. It was started in 2009 at Berkeley.

Overview:

To define, Spark is a cluster-computing framework, which means that it competes more with MapReduce than with the entire Hadoop ecosystem. It actually extends MR model to support more computation ways like interactive/iterative algos, queries, stream processing, graph processing etc. It is designed to be highly accessible, offering simple API in languages like Python, Java, Scala & SQL.One of the main features Spark offers for speed is the ability to run computations in memory, but the system is also more efficient than MapReduce for complex applications running on disk.

Is Spark a Hadoop module?

We see Spark is listed as a module on Hadoop’s project page, but Spark also has its own page because, while it can run in Hadoop clusters through YARN, it also has a standalone mode. So Spark is independent. By default there is no storage mechanism in Spark, so to store data, need fast and scalable file system. Hence uses S3 or HDFS or any other file system, but if you use Hadoop it’s very low cost.

Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3. You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, or on Apache Mesos. However, as time goes on, some big data scientists expect Spark to diverge and perhaps replace Hadoop, especially in instances where faster access to processed data is critical.

(Source: Internet)

Hadoop vs Spark

A direct comparison of Hadoop and Spark is difficult because they do many of the same things, but are also non-overlapping in some areas.The most important thing to remember about Hadoop and Spark is that their use is not an either-or scenario because they are not mutually exclusive. Nor is one necessarily a drop-in replacement for the other. The two are compatible with each other and that makes their pairing an extremely powerful solution for a variety of big data applications.So we can compare them on some below points:

Data Processing Engine/Operators: Hadoop originally was designed to handle crawling and searching billions of web pages and collecting their information into a database. For this it uses Map reduce,which is a batch-processing engine. MapReduce operates in sequential steps by reading data from the cluster, performing its operation on the data, writing the results back to the cluster, reading updated data from the cluster, performing the next data operation, writing those results back to the cluster and so on. But Spark is a cluster-computing framework,Which performs similar operations, but it does so in a single step and in memory. It reads data from the cluster, performs its operation (Filter/map/join/groupby) on the data, and then writes it back to the cluster.
File System: Spark has no file management and therefor must rely on Hadoop’s Distributed File System (HDFS) or some other solution like S3, Tachyon.
Speed/Performance: Spark’s in-memory processing admit that Spark is very fast (Up to 100 times faster than Hadoop MapReduce), Spark can also perform batch processing, however, it really excels at streaming workloads, interactive queries, and machine-based learning.The reason that Spark is so fast is that it processes everything in memory. Yes, it can also use disk for data that doesn’t all fit into memory.Spark uses memory and can use disk for processing, whereas MapReduce is strictly disk-based. Example: Internet of Things sensors, log monitoring, security analytics all require Spark for faster computation.
Storage: MapReduce uses persistent storage and Spark uses Resilient Distributed Datasets (RDDs)
Ease of Use: Spark is well known for its performance, but it’s also somewhat well known for its ease of use in that it comes with user-friendly APIs for Scala (its native language), Java, Python, and Spark SQL.Spark has an interactive mode so that developers and users can run queries.MapReduce has no interactive mode, but add-ons such as Hive and Pig to make working with MapReduce a little easier for developers.
Costs :Both MapReduce and Spark are Apache projects, which means that they’re open source and free software products. While there’s no cost for the software, there are costs associated with running either platform in personnel and in hardware. Both products are designed to run on commodity hardware, such as low cost, so-called white box server systems. However Spark systems cost more because of the large amounts of RAM required to run everything in memory. But what’s also true is that Spark’s technology reduces the number of required systems. So, you have significantly fewer systems that cost more. There’s probably a point at which Spark actually reduces costs per unit of computation even with the additional RAM requirement.
API’s: Spark also includes its own graph computation library, GraphX. GraphX allows users to view the same data as graphs and as collections. Users can also transform and join graphs with Resilient Distributed Datasets (RDDs).
Fault Tolerance: Hadoop uses Replicated blocks of data to maintain this feature. There is a link between TaskTrackers & JobTracker, so if its missed then the JobTracker reschedules all pending and in-progress operations to another TaskTracker. This effectively provides fault tolerance.Spark uses Resilient Distributed Datasets (RDDs), which are fault-tolerant collections of elements that can be operated on in parallel. RDDs can reference a dataset in an external storage system, such as a shared filesystem, HDFS, HBase, or any data source offering a Hadoop InputFormat. Spark can create RDDs from any storage source supported by Hadoop, including local filesystems or one of those listed previously.
Scalability: both MapReduce and Spark are scalable using the HDFS.
Security: Hadoop supports Kerberos authentication. HDFS supports access control lists (ACLs) and a traditional file permissions model. For user control in job submission, Hadoop provides Service Level Authorization, which ensures that clients have the right permissions.Spark’s security is a bit sparse by currently only supporting authentication via shared secret (password authentication). The security bonus that Spark can enjoy is that if you run Spark on HDFS, it can use HDFS ACLs and file-level permissions. Additionally, Spark can run on YARN giving it the capability of using Kerberos authentication.

Learn Spark Architecture by clicking here.

algorithm, analysis, Big Data, bottlenecks, distributed system, Hadoop, issues, Map, parallel, Processing, Reduce

Big Data Architecture & its Challenges (Tutorial Day2)

January 25, 2017 creativeme1807 1 Comment

Big Data as word describes is data that is too large to process using traditional methods.It originated with companies who had the problem of querying very large distributed semi or structured data. Google developed MapReduce to support distributed computing on large data sets on computer clusters. As discussed in earlier post, few examples of Big Data are:

Petabytes of data
Billions of records
distributed data
Flat files (cannot be seen in relation DB)
Semi structure data like log files
Video messages

Applications that produce or generate Big-data can be:

Transactional/operational (CRM,ERP,Sales,HR),
Analytics (IT logs,Call Centre)

Big Data Architecture is set of few components joined to each other as shown in below image. Hadoop is present in middle tier of this structure, but not mandatory requirement.

big-data-components_operational-data-graph

Will discuss the components further in next tutorial blogs.

Bottlenecks with Big Data are :

Storage
Transfer
Sharing
Analysis
Processing
Visualization
Security

Big data is not just about size
–Finds insights from complex, noisy, heterogeneous, longitudinal, and voluminous data
–It aims to answer questions that were previously unanswered

In our existing traditional approach, we use a Data-warehouse to store data (OLTP-OLAP) in structured format. Process it , do data mining and build reports for further high level analysis.This approach works fine with those applications that process less volume of data which can be accommodated by standard db servers, or up to the limit of the processor that is processing the data.

But when it comes to dealing with huge amounts of scale-able data, it becomes a problem to process it using this tradition approach.Transactional Big-data projects cannot use Hadoop, as it is not real-time.

For transactional systems that do not need a database transaction to have ACID properties (Atomicity, Consistency, Isolation,Durability), NoSQL databases can be used, though there are constraints such as restricting transactions to a single data item.

For big-data transactional SQL databases that need the ACID properties have less options.

This is when Big Data got Distributed System into picture. It most of all related to Map-Reduce technology.For example, 1 machine with 4 I/O channels can process 1 terabyte of data in approx 42 mins if the channel speed is 100 mb/s.
But if we have a distributed system of 100 machines, each with 4 I/O channels, and each channel speed is 100 mb/s, then it will take few sec to process the data.

To adopt distributed System, Map reduce algorithm(MR) was used.This algorithm divides the task into small parts and assigns them to many computers (cluster), and collects the results from them which when integrated, form the output data-set.

Image Courtesy: google

Using the above solution, Doug Cutting and his team developed an Open Source Project called HADOOP.

To proceed further and understand Hadoop ,its component & Architecture, please read my Next Blog

	creativeme1807 on Pregnancy is a journey…!!
	creativeme1807 on Pregnancy is a journey…!!
	creativeme1807 on OBIEE : Go Nav/ Go URL structu…
	creativeme1807 on Different Tool tip for differe…
	Giorgio Colafrancesc… on Different Tool tip for differe…

Always Inspiring Creations!!

Category Archives: analysis

Understanding GL Account types