awesome-elephant-and-little-girl-with-violin

It’s an exciting time for business. Companies have either amassed a tremendous amount of data or are increasing their efforts to ingest data available from this digital ecosystem that we’re part of to find value and new profit streams. No matter how we look at it there is a need to make sense out of all this.

Time to move from spending precious time for defining Big Data and actually taking steps of doing something with the data available. Before we do this though we need vision. On an earlier post I added the 5th V for Big Data Vision. This is an integral component to be on the right path of profitability. Enter the 5 ways to getting to insights fast.

  1. Modernize data access

More and more as we engage on this strategy there is going to be a need for data engineering. Perhaps one of the most important layers. Unfortunately there are many moving parts in Big Data and being an open source technological framework there is a high curve of learning. A good practice is to do an assessment so you have the roadmap blueprinted.

  1. Data Management & data processing

This step depends on such factors as organizational structure, capabilities to ingest and process a variety of data with velocity. Chances are that in order to be efficient and to accommodate the business demands you have to move swiftly. As such you’ll need solutions. A good model is to look for managed services. It is important that you alleviate all stakeholders as much as possible.

  1. Predictive insights

A managed services model per above will allow you to focus on how to improve the business. A lot of companies today are focusing on the reporting. This is all good but elaborated spreadsheets don’t allow you to have time to explore where the hidden revenue streams are. Time to put the thinking cap on. The paradox here is the human capability.  Predictive modeling is where the action should be focused. A good rule here is that this is a continuous process so don’t treat this steps as one and done.

  1. Problem solving selection

Big Data methodology can help you solve many problems. Improve product recommendations to clients, adjust content creation based on listening ingestion from social feeds, prevention in maintenance or in fighting diseases or customize customer experience, etc. If you’re starting you need to test the waters. At this point you have heard the terminology POC (Proof of Concept) and Use Case (the larger version of the problem you have to solve).  A good rule start with a modest problem that can lead to the full use case solution.

  1. Anticipate Scalability

Revert back to the vision. You will grow and not all companies have the same capabilities. Your historical data will grow, the demand from the business units will grow, and your team will grow. At this stage what you have to keep in mind is that Big Data should be tailored to the exact needs of your organization. Stay away from one size fits all models. Choose nimble technology partners because there is a learning curve and more importantly a high degree of customization.

In conclusion there is a dependency to external partners and also a high degree for teamwork. You have to remember and constantly evaluate if at the end of the day are my problems Big Data problems and that you shouldn’t do Big Data because it’s fashionable but because you have a need to extract value for your business. On my next post I would be writing about the 5th V in Big Data stay tuned.

 

 

 

I recently read yet another article about  how the amount of  data is expected to grow to 44 Zetabytes (what exactly is that?) by 2020 which represents 10 times the volume from 2013.

du-stacks-to-moonI get it – these are large numbers, even though must of us have a very vague notion of exactly how large. With this kind of growth, one can’t help wondering whether the digital world is going to explode. Surely, we can’t sustain this kind of growth to the next decade, let alone in the long term?

Marketeers, journalists and everyone trying to run their business based on fear of numbers like to keep talking about this and show beautiful graphs and reports on why Big Data is important.

Is it Truly Big Data?

If we take a pause and think about it, we will quickly realize that all data is not created equal. Let’s consider this example.

Trend line graphAdvanced kitchen appliances may help me track my energy use every second of every day. If I drill-down into these beautiful charts and see a short spike at say 1 PM yesterday, what does that tell me? What can I do about it? Do I really need this level of detail?

If instead I am shown average consumption per day, over a month, that will tell me what the trend is.

In this example then, having all of the per second data is useless – it is a grain of sand rather than a pearl of wisdom. We can reduce both the amount of data collected (by sampling much less frequently) as well as the data stored by focusing on what the business use case for the data is.

If the data is not useful, it is not Big Data. Big Data requires that we consider what insights we need to run our businesses (or the world) and then build the intelligence to gather, store and analyze the right data that can be put to use. These are the pearls. Discard the rest.

?????????????????????????????????????????????????????????????????????????????????????

 As organizations are looking to become more data driven  many of the stakeholders that are embarking on this  journey are assessing the Big Data domain. Terminology  and expertise can vary as this field has a lot of moving  parts. It is almost as fun as researching and learning  nautical terminology.

 So how do you get to be a Big Data Admiral? What one  has to do in order to navigate successfully data oceans or  data lakes depending on the size of your company! Stats  vary on the expertise and what one should look for when creating such a program.

A significant number of organizations are still daunted of undertaking such task. Enter the 3 Pillars for your Big Data foundation:

data ocean monster

  •   Modernize Data Warehouse

Traditional Data Warehouses are not a good fit for Big Data. Just the unstructured data needs of the domain and storage can be very expensive to continue to support. Other areas to evaluate are ETL functionalities and disaster recovery.

  • Data Plenum and Data Juncture

In physics plenum means a space that is filled with matter (in our case data). In this part the essential planning is around a platform to be able to ingest the data across the company in an effective manner in order to get to the juncture stage. In the stage of data juncture, you should have all of the analytical functions finalized in order to start formulating decisions.

  • Insights

At the end of the day the purpose of Big Data is to help companies uncover hidden KPIs and Insights that lead to gains and profits for the organization. The main areas to consider are the development of dashboards/applications for the executives in your organization, reports via predictive analytics in order to help them get to the promise land.

In conclusion having early success in this type of journey is very important. The test part is to be open to collaborative ways of working and new ideas in order to achieve your company’s goals. Address the use cases that are important to your company. Assess if they are a Big Data problem first and then go to action. Most important be proactive and don’t wait for the business to be in a bad state in order to begin this journey.

 

NOTE: This is a guest post by Jenny Richards. Contact her through her website.

Any discussion being held about “Big Data” or “Hadoop Functionality” today should be left in the hands of CIOs or IT heads rather than CEOs and CFOs. From selection of the right Hadoop service provider to implementation and on-going maintenance, the technical input of CIOs could go a long way in streamlining expectations of the latter, who usually understand just one language – the bottom-line.

hadoop-in-the-cloud

However, given that the main goal of implementing Hadoop for big data analytics is to gain executable insights that will give enterprises an edge in an increasingly competitive marketplace, the decision to implement Hadoop cannot be made in isolation from the big enterprise business minds.

It is no secret that Hadoop and Cloud computing are two of the biggest current trends dominating the world of technology for the last couple of years, and with no signs of slowing down in the years to come. There has been much debate regarding the co-existence of the two, with many arguing that big data analytics are out of place in the cloud. Is there any truth to that? What’s the real deal then?

Hadoop vs. the Cloud – Overview

Many analysts have expressed skepticism regarding the usability of Hadoop in a cloud-based environment, which would have to be the case for organizations looking to implement cost effective remote DBA support services. The major premise of their argument is that cloud computing, by its very nature, does not support addition of Hadoop clusters.

According to Richard Fichera from Forrester, here are three reasons Hadoop cannot be used with a cloud computing environment.

  1. Heavy and Increasing Workloads

Given the operation of many enterprises, Hadoop clusters are usually utilized in high capacities, necessitating increase in capacity to match an ever growing need for space resources. This means that Hadoop clusters will be predictably filled with data at a steady pace, whether slow or fast depending on the organization itself. This essentially negates the need for elasticity that is usually sought after through cloud deployments.

  1. Growing Data Sets

Hadoop clusters amass upto 10 times the amount of data collected by legacy transactional environments. By nature, this is data that customer-centric stakeholders and data scientists will be hard pressed to be rid of. This makes cloud storage and access needs inherently more expensive when utilizing cloud-based solutions. Also, given the unpredictability of access requirements, on-premise storage may be more favorable where cloud access times are sub-optimal, which occurs more often than is thought.

  1. Performance vs Locality

Deploying Hadoop clusters from the cloud makes sense for data which already exists in the cloud, such as social media analytical data. However, for real-time data originating from business-customer interactions in multiple locations, the better option is to deploy Hadoop on-premise with the correct deterministic latency and bandwidth to reduce end-to-end application latency.

Deploying Hadoop in the Cloud for Cloud-based Data

hadoop-cloud-2Well, whether or not the above points hold water is a different matter with each enterprise, because there’s more than just one side to the story. What cloud-based Hadoop cynics seem to be doing is to downplay the ‘data gravity’ quagmire, which is the main reason enterprises opt for a deployment of Hadoop  in the cloud.

Presently, it’s too soon to tell what the best place is for data, and hence their analytical applications to sit over the long term. However, the idea is that Hadoop needs to be deployed where the data is seated. If you have your data stored in S3 buckets for instance, it makes sense to opt for a deployment of Hadoop in the cloud.

For now, there isn’t enough data to make a long-term trend analysis and finally put the matter to bed. But it would be narrow-sighted to imagine that Hadoop will remain bound to the location of data. A more likely prediction is that Hadoop will pervade everywhere – a sensible premise given that data is also pervading everywhere.

As the data moves everywhere, its gravity will be exerted bringing Hadoop clusters with it. It’s not wild to imagine that Hadoop workloads will soon be present in vehicles, on wireless base stations as well as other up-and-coming IT infrastructure edges. With data coming from and going everywhere, Hadoop cannot, and will not, remain cloistered in data centers.

Hadoop in the cloud actually makes sense

More likely, deployments will include Hadoop in the cloud and on-premise data centers according to the multiple data gravity sites. As Hadoop architecture and markets continue to mature, cloud-based Hadoop deployment will become more significant.

In fact, cloud deployment for historical data will become more compelling – being a more economical storage location for enterprises that still wish to maintain historical data for reporting purposes – kind of like the tapes from yesteryears, but with higher availability and more accessibility.

Hadoop is a natural fit for cloud-based environments. The reasons for this are more than the reasons against it, which are:

  • Faster procurement and deployment of large scale resources
  • Lower costs of handling innovations
  • More efficient schemes to handle batch workloads
  • Ability to process variable resource requirements seamlessly
  • Simplifying Hadoop operations

Think about it, far easier it is to say “We need 20 more servers” than it is to actually buy them, create space for them, buy fans to keep them cool and deploy security to keep them safe. You can have all the reasons why Hadoop is better on-premise, but in the current big data environment, cloud deployment is not something enterprises can side-step for too long. The economic sense is just too staggering.

The truth is that the theoretical application of Hadoop in physical data centers is far removed from the practicality of it. While it presently cannot be considered a perfect match for deployment on cloud infrastructure, all practicality favors adoption of Hadoop clusters in the cloud.

Conclusion

What Forrester seems to have overlooked in their argument for Hadoop on-premise implementation is that many enterprises’ data resources are increasingly becoming live, as well as the hardships of meeting expending data requirements in an on-premise situation.

However, a major reason for Hadoop cloud deployment is the need for skilled and experienced managers to handle this sophisticated technology at deployment, configuration, scaling and management. Unless you have limitless resources to have an in-house team, cloud deployment, which facilitates remote administration, just makes better sense.

Orzota is excited to announce our first webinar to be conducted jointly with our partner Cignex Datamatics on October 1, 2015 at 11:00 AM PST. Augment your Data Warehouse with Hadoop to create a modern data architecture with many advantages. Although there is some information available on the benefits of Data Warehouse augmentation with Hadoop, every enterprise is different and needs an architecture and plan that is tailored to its use cases.

This webinar will walk through a real use case of a Fortune 100 company that led a pioneering effort at augmenting its Teradata-based Enterprise Data Warehouse (EDW) and highlight some of the unique challenges that we faced during the project. We will also cover other not-so-common reasons why  enterprises are augmenting existing Data Warehouses with Hadoop to derive benefits such as:

  • dramatically improve efficiency of the Enterprise Data Warehouse
  • truly integrate unstructured and semi-structured data for deeper business insights
  • lower cost and improve TCO

Please register for the webinar.

Hadoop as we know it, has the ability to spread data and processing across several nodes. It can process very large amounts of data using collections of commodity hardware. This can also be accomplished by using a collection of remote virtual servers (leveraging cloud services like Amazon AWS).

The two key components of Hadoop include :

  1. Hadoop Distributed File System (HDFS), and
  2. MapReduce, a framework to split up a computing job across multiple processors.

However, for iterative processing, Hadoop MapReduce framework is time-consuming, thereby, making Hadoop jobs batch-oriented.

MapReduce vs Spark

The researchers at UC Berkeley’s AMPLab realized this and developed Spark as an alternative to MapReduce. Spark took better advantage of memory on the distributed set of machines than MapReduce and greatly reduced the need for the disk I/O. Its in-memory processing can help achieve 10-50X+ improvements or more in data-processing times than MapReduce. It also offers added incentive of being much easier to program than MapReduce.  With Spark, developers do not need to split up and coordinate their logic across separate Map and Reduce routines. They can seamlessly combine and create complex workflows.

Spark extends value from Hadoop

Spark Components
Spark Components

The  key platform components for SPARK include:

Spark SQL (Interactive real-time query tool),

Spark Streaming (Streaming Analytics Engine),

MLlib (machine learning library), and

GraphX (graph analysis engine),

Spark does not include its own file system for organizing files. For this reason many organizations install it on top of Hadoop. Spark’s advanced analytics applications can make use of data stored within the Hadoop Distributed File System (HDFS). Organizations can do more deeper analytics with less coding and faster response times than typical MapReduce applications.  Spark thus plays a very important part in extending the value of Hadoop.

Spark for Data Scientists

Spark is becoming a key data science tool for many iterative modeling challenges. It enhances data scientists’ productivity by enabling them to leverage existing HDFS data. In addition, it can access and process data stored in HBase, Cassandra, and any other Hadoop-supported storage system. Spark can combine SQL, streaming, and graph analytics within cloud analytics applications.

Spark’s three-fold value:

  1. Runtime processing environment,
  2. Development framework for in-memory advanced analytics, and
  3. Next-generation, cluster-computing solution.

Explore Spark capabilities today, for use cases within your business.  Take them beyond experimentation and apply to the business problems and opportunities you encounter daily. Spark the fire within your organization!

We can help realize quantum value from Hadoop and Spark for your business; please contact us.

 

The valley gets very excited about new technologies and applications. Adoption of technologies tends to be rapid in Silicon Valley based companies, especially startups. This is understandable considering that they can iterate faster and generally have the technical skills to emulate the trend-setters. Of late, the pace of innovation especially in the Big Data space has been so rapid, that sometimes it is necessary to hit the pause button. This is especially true for larger enterprises, both in Silicon Valley and elsewhere.
If you are just getting started on Big Data or Hadoop, you have plenty of company. Read my interview with Syncsort guest blogger Mark Underwood which will hopefully put things in perspective. What are the trends, where should you begin, how can you accelerate your adoption of Big Data?

Find out.

And if you need help, we at Orzota most likely can provide you the expertise and support. So contact us.

Over the last few months, it seems that there is some re-thinking, with an emphasis away from the hype and more towards the value of Big Data. The whole cycle started with large silicon valley tech companies such as Yahoo!, Google and Facebook creating and deploying Big Data applications to great success. However, as large enterprises started down the path, it was clear that the technologies were not yet mature – especially in the realm of security. The early enterprise adopters struggled, taking well over a year for their first projects.

Things have  improved in the last couple of years with robust integrations in the stack and bundled distributions. The next wave of enterprise IT adoptions found it easier to build their big data warehouses – yet many still remain as development projects. The most successful projects have been where Hadoop has been used as a data repository and/or for backup purposes. Many early adopters did this just because it was the “next cool thing” they could impresses their bosses with, or worse to pad their resumes! But where were the promised “new” insights? What is the true value of Big Data?

It is no wonder than that many analysts and other industry veterans have started rolling the drums claiming the end of the Big Data hype.

What is Big Data anyway?

Consider this: Most enterprises have barely even scratched the surface of this technology revolution. Some think that these technologies are irrelevant, others worry that if they don’t jump on the bandwagon, they will be left behind. But in many cases, the basic question that customers continue to ask is this: What can I do with this technology? What is the value of Big Data to my organization?

This is where we in the Big Data technology community must step up our efforts. We need to stop talking about the Yahoos and the Googles and instead realize that most businesses are nothing like tech companies, they don’t need 1000 node clusters or have petabytes of data. How can they still benefit from this revolution that lets them integrate multiple structured and unstructured data sources? What is the easiest way for them to genuinely gain insights from their data?

The industry should be focusing on answering these questions, rather than trying to invent yet another platform or API to shave off a few milliseconds on a query. Do we really need yet another NoSQL database?

Off my soap box. Thanks for reading.

 

 

Orzota, Inc., a Big Data solutions company, today, announces the availability of the Orzota Big Data Management Platform for beta users.

The platform offers Big Data as a Service on AWS and is targeted at companies that are getting started on Hadoop. By automating many of the tasks necessary for a Hadoop project from data ingestion to creating workflows and reports, the Orzota platform can significantly reduce the time to develop and launch Hadoop applications.

“In my experience, it can take at least 4-6 months to get started on Hadoop. I expect the Orzota Big Data Management Platform to save at least four months of that time” said Bharath Mundlapudi, President and CTO of Orzota.

“Eat your own dog food” is a standard mantra for technology companies. Orzota uses its platform to deliver Big Data solutions for its customers. When enterprises start on a new project, the first ask is always for a Proof-of-Concept (PoC). The enterprise wants to see how it can gain better insights or better integrate data that was previously not being used. A “Big Data as a Service” offering such as the Orzota platform can make quick work of a PoC, allowing the developer to focus on solving the business problem, rather than Hadoop plumbing.

Please contact us to schedule a demo.