implementing big dataThe conversation has shifted from trying to define what Big Data is but we’re stuck in the layers. Is Big Data fun like eating an orange full of vitamin c? Or is it more like an onion? where you cry when you try to peel it off but contrary it’s so sweet when you cook it properly? I tend to lean more towards an orange, and I will tell you why.

One important area is the learning curve. We are seeing efforts taking six months+ without even having addressed any of the use cases let alone having to figure them out. There is a lot of innovation at the moment but with that comes a lot of new terminologies that also require some effort to be informed. More important is the use case. How are we going to evaluate what we need without knowing what the use cases are? A real-time data streaming scenario is much different than scenarios where the speed is not of importance as we can afford to see the results the next day.

Three Steps to get started with Big Data

  1. Start by identifying your vertical and finding what the demands are and who you serve as your customer. This will guide you on how to decide the Proof-of-Concept PoC(s) and therefore the use case(s). As a follow-up step, we then look at the technological assets that we will need. From there the next steps is how to launch the Proof-of-Concept (PoC) quickly. For companies that think their data is sensitive follow this rule:  decide what data you can live with by having it in the cloud as you’re only testing the hypothesis at this stage and is the fastest route.
  1. Seamless integration. At this stage, you need a solution that is responsive. Why because multiple business units will come with requirements. You need to be able to accommodate such demands. I can’t stress enough the managed services approach until you can support it internally. Why because it’s a huge learning curve and the return on investment (ROI) is far better. Many fall into the loop that because they see what a super enterprise is doing, then it’s feasible to have the same approach. This approach will only yield frustration as it will take longer. At Orzota BigForce we’re working hard to accelerate this stage.
  1. Insights. At the end of the day, you are doing all this for the insights but more specifically for the predictive and the hidden potential. Most likely you have a ton of reporting going on. Reports it’s not the issue here. If you cannot get the proper reports right now, then you need some serious help. Insights can be discovered internally but also externally, so kindly remind your hardest critics that is not about how much data you have but how much is out there that you can derive these very crucial insights for your business. A good example to put this argument to bed is Social listening and therefore Sentiment Analysis.

In conclusion, if you think about it, there isn’t much overlap if you start from the use case and Proof-of-Concept (PoC) approach while following the above steps. Starting small will also allow you to get buy-in and then expand. Partners may seem in the beginning that they offer similar services but at the pole position, you need to get more with less. Lastly, always keep in mind that for the majority the dynamics are different as there isn’t much Data Analysis talent out there thus with a managed services approach or a hybrid approach you can accelerate your environment and team. Finally, the orange correlation in 3 easy steps: pick, peel and consume. You can enjoy it getting all the vitamin C and its benefits.

big data analysis

We are pleased to announce a free trial to the Orzota BigForce Social solution. This solution built on top of the Orzota BigForce platform provides the capability to analyze text streams from twitter, media sites, blog sites, etc. With search capability, the solution provides data scientists a means of exploring the social media data, with a focus on sentiment analysis. The sentiment analysis free trial will let you quickly determine whether such a solution can meet your needs.

Unlike many other sentiment analysis solutions, the  Orzota BigForce Social solution can be customized to meet your needs. So sign up for the trial and reach out to us to understand how we can make this work for you!


Internships don’t always involve slaving around, getting coffee, or doing lowly tasks for your boss. In fact, my experience at Orzota didn’t involve any of that. During the summer months of July & August, I helped with Orzota’s digital marketing focusing on social media, specifically Twitter marketing. This post captures a small glimpse of that journey.



It is often believed that if anything consequential is happening in business or society, it has to be trending on Twitter. My main goal was to increase Orzota’s twitter marketing footprint during the summer and more specifically increase the amount of followers and engagement on Orzotas’s Twitter profile. My job was to find the optimal methods to accomplish this goal with influencers that matter to Orzota.

Getting Started with Twitter

As I was not extremely familiar with Twitter and its tools before this experience, I was forced to use the good old trial-and-error method. I started out by following as many relevant people as I could find in the big data Twitter community using the appropriate hashtags. While I got a few follow-backs, it was definitely not the most efficient way to do things and I quickly became aware of that.

When I finally gained the confidence a couple of days later to begin tweeting independently, I sent out my first tweet and BAM! I immediately gained four new followers. I was genuinely astounded at how much engagement one tweet received. I made it an objective to tweet around five times a day and ended up lining several of them up so they would be scheduled for the most effective time slots.

Twitter marketing is more of an art than science. One hindrance was the fact that the Twitter caption had to sell the message in 140 characters or less. A captivating sentence followed by a link to an article or picture is the most effective tweet. All I can say is that the art of that comes with practice; from what I observed, including appealing words that may cause the reader to feel a certain emotion—whether it be shock, excitement, or worry—is essential to hook the reader.

Engaging Twitter Followers

As I went through this process daily, I made sure to record the metrics of the previous day so I could keep track of my progress. I was able to acquire quite a bit of knowledge about widespread topics such as the internet of things (IoT), cloud, data science, and obviously big data. I was gaining many followers a day through tweeting regularly, however I quickly encountered another issue; several people would not follow back, assuming that the Orzota Twitter handle was a bot. I eventually realized that it was necessary to engage with my followers. So, as I was advised, I began replying to people, favoriting tweets, and retweeting articles I enjoyed. Although this didn’t make a significant impact, it definitely got more people to follow me. As for the types of articles people seemed to like more, I noticed that most people enjoyed articles that had an impact on their personal lives. The more they related to it, the more engagement the tweet got.

Twitter Tools

Here is a list of the top four free resources (in no particular order) I found to be particularly useful:

TweetDeck—this tool allows a clear view of all activity happening on your account so nothing gets missed; it is also great for managing multiple accounts at once

Twitter Analytics—this is a feature embedded in Twitter itself; it records your activity on Twitter while keeping track of all the exact numbers (in terms of impressions, new followers, and top tweets); permits one to see how their numbers have progressed over the months

WhoUnfollowedMe?—this catches any followers who are only temporary and decide to unfollow later on

Crowdfire—this is just a general tool for social media growth; it includes a variety of features including automating direct messaging, accounting for all new followers and unfollowers, and searching for potentially interested followers

Twitter Marketing

What exactly is twitter marketing anyway? Besides just gaining more Twitter followers how does it benefit the actual company, its product, or brand? Well for one, when people hear about a company and come across its Twitter page, they look at the frequency of the tweets and the amount of followers to determine its credibility. Another benefit is that some of these followers are prospective customers. Thus, segmenting and categorizing followers and sending them targeted and personalized direct messages based on their preferences may help spark an interest in the company’s product(s).


All in all, this was an extremely meaningful experience for me as well as my employer, enabling Orzota to gain over 200 new twitter followers since my internship began. I had fun with words, unleashing my inner creativity and hitherto unknown marketing capability. I gained a new insight on how to directly market not only a product, but also a company brand as a whole, was able to figure out the best way to maneuver through Twitter, and thanks to Orzota, I also became acquainted with today’s hottest technology topics.

We recently worked with a leading Hi-Tech manufacturing company to design and implement a brand new scalable and efficient workforce analytics solution targeted the mobile workforce.

The solution is designed to raise the workers’ confidence bar, and to minimize the effort required to train the workers. The solution also improved the manpower utilization by optimizing inventory adjustments with higher accuracy while fulfilling orders. It also reduces the learning curve for workers resulting in substantial reduction in training hours.

Workforce Analytics Solution Overview

The Workforce Analytics solution was built on a Common Data Analytics Platform leveraging Hortonworks HDP 2.4 and used the following technologies: Kafka, Storm, HBase, HDFS, Hive, Knox, Ranger, Spark and Oozie.

The platform collects real time data from the application on mobile devices, stores it, and runs analytics with better performance and lower latency compared to their prior legacy system.

The HDP components at a glance:
Workforce Analytics Solution HDP Components

Workforce Analytics Architecture

The operational real-time data is collected using Kafka and ingested into HDFS and HBase in parallel using Storm (see diagram below). HBase acts as the primary data store for the analytics application. The data in HDFS is encrypted and reserved for other applications. Based on the business logic, the data stored in HBase is processed using Spark on a daily, weekly, monthly and yearly basis, and stored back into HBase as a feed for Spark Analytics (Spark SQL). Spark Analytics is used to run jobs to generate specific insights. The output from Spark Analytics in Hive as a temporary table. Hive Thrift Server is used to execute queries against Hive and retrieve the results for visualization and exploration using Tableau. Custom dashboards were also built for business users to help them track higher-level metrics.

Workforce Analytics - Architecture

To address security requirements, Apache Knox and Apache Ranger were used for perimeter security and access control, respectively. Both are included as a part of HDP 2.4 and are configured in the Access Node.

Workforce Analytics Physical Architecture

The figure below shows the physical layout of the services on the various servers used. The architecture comprises of Edge Nodes, Master Nodes and Slave Nodes. Each set of nodes run a variety of services.

Workforce Analytics Physical Architecture

Issues and Solutions

While implementing this solution, we ran into a variety of issues. We outline some of them here in the hope that it may help others who are designing similar architectures with the Apache Hadoop  or Hortonworks HDP eco-system of components. Table creation, user permission and workflows were the common focus areas.

HBase Table Creation

We ran into permission issues with HBase table creation.

Solution: In Apache Ranger, update HBase policy by giving appropriate read, write and create permission for the defined user.

Connection to hive thrift server

Another issue we ran into involved connections to Hive Thrift Server for a particular user “ABC”.

Solution: Ensure that the below properties are added to $HADOOP_CONF/core-site.xml



Oozie workflow jobs submission

Permission errors continued to plague the project while creating workflows in oozie.

Solution: The following needs to exist in the section of the corresponding job definition in workflow.xml:




within the

<shell xmlns="uri:oozie:shell-action:0.2">

oozie workflow job stuck in prep state

When re-running an Oozie workflow job after a period of time, it went to PREP state and did not execute. While trying to kill the job via CLI, the Oozie log shows the job was successfully killed.

USER [test] GROUP[-] TOKEN[-] APP[-] JOB[ABCDEF] ACTION[] User test killed the WF job ABCEDEF-oozie-oozi-W

However, in the Oozie UI, the job is still shown to be in PREP state.

Solution: Further research showed that the Oozie database at the backend (Derby by default) was corrupted, and was not representing the correct state of the jobs.

We decided, for longer term stability, to migrate from Derby to MySQL as the backend database for Oozie. After this migration, we did not run into this issue again.


Big data projects can grow and evolve rapidly. It’s important to realize that the solution chosen must offer the flexibility to scale up or down to meet business needs. Today, in addition to commercial platform distributions such as Hortonworks and Cloudera, higher level tools and applications simplify the process of developing big data applications. However, as seen by some of the issues we describe above, expertise in the underlying technologies is still crucial for timely completion of projects. Orzota can help. Please contact us.

Surveys, research and retailers’ own measurements show that personalization works. According to Internet Retailer, 80% of consumers say they like it when brands’ e-mails recommend products based on their previous purchases.

personalizing the customer experience
Personalization Statistics

Personalization for Retail is Hard

The incessant media attention on the personalization and customer intelligence of companies such as Amazon and Google hides the fact that 80% of marketers fail at personalization. Knowing who each consumer is, what his or her likes and dislikes are is a major challenge. There are two major problems: data collection and data integration.

For retailers, the challenge is even more significant. It is not just about omnichannel integration across devices, but integrating online and retail store traffic and purchases as well. For instance, how can you know if a user’s purchase in the store is the result of an email marketing campaign?  How can you even tell if a user was driven by the campaign to the store if she does not make a purchase?

Additionally, understanding what metrics are important, how to define and measure them, let alone how to make them actionable are all difficult problems that many retailers grapple with. Traditional analytics fail to provide the necessary data, metrics and insights needed.

A Plethora of Solutions

There are dozens of tools and solutions that solve certain parts of the problem. Some are focused on conversion metrics, others at marketing campaigns, still others at building a 360-degree view of a customer (or so they claim). Some address e-commerce only, others tackle omnichannel traffic. Some target beacon solutions in retail stores, others claim to provide predictions on which users will buy and how much they will spend.

It is easy to get lost in the hype and marketing statements and lose track on what is the real problem to be solved. At Orzota, our focus is always the customer and their pain points. Our hybrid approach to customized solutions expertly managed aims to provide exactly what the customer wants – no more, no less.

Contact us for more information on our retail solutions.

Here is another article that gives some tips on improving the customer experience.

On April 4, 2016, Orzota’s Founder and CEO, Shanti Subramanyam was honored with the 2016 WoM2M award by Connected World Magazine for making significant contributions in the IoT space.
The editors of Connected World magazine, along with IoT (Internet of Things) leaders and professionals, gathered on April 4th evening to honor the prestigious 2016 Women of M2M (WoM2M) at an awards dinner in San Francisco, at the culmination of the inaugural Peggy Smedley Institute.

W0M2M award
Shanti Subramanyam receiving 2016 WoM2M award from Peggy Smedley

The Women of M2M list looks at the most powerful and influential women in the world of IoT and M2M who are selected because they each bring a unique perspective to their companies and have helped push connected technologies forward.

The women featured represent a variety of positions and industries demonstrating grace and tenacity. What makes these women so exceptional is that they have set the bar very high for the next generation of women who are to follow them.

Orzota congratulates Shanti on winning this award and thanks the Connected World Magazine for this honor.


managing big data


It’s no secret that analyzing data is a key initiative for all businesses. Some say we have been doing it for years. Yes, that’s true but new Big Data methodologies give us the opportunity to be more effective and discover new sources of revenue. Today, we have the capability to use technology at a very low cost that allows us to do more, understand more, reach out to more consumers while having a unified idea of what they are looking for.

 big data managementWe can shape and mold everything that we do base on data from research to marketing and everything in between. Think about it – if you know you’re shooting only 18% from 3 point range why would you keep shooting 3 pointers?

Big Data comes with two major idiosyncrasies: complexity because of the open source frameworks and not enough talent available. Don’t let open source scare you – it actually means savings! Lack of talent is also easy to overcome using big data management.

Here are some benefits of managing big data:

  • Taking out the cost  If you’re not a large, tech-savvy enterprise like Facebook or other giants, chances are you’re struggling with this decision. First is the technology part – including data engineering, hardware, and technology solutions. Then is the part of going and finding the right people (not the one overseeing everything, but the actual crew). Let’s take a closer look. Here are some of the roles required to build your first big data solution: software engineer, data engineer, data scientist, QA & release engineers, data architect, devops,… have you had enough?
  • Taking out the complexity  Big Data gives us the opportunity to select the tools and tailor them to the exact use cases of what we’re looking to implement, all in an economic fashion. Open source frameworks come with a learning curve but also provide great value. How can one exploit the advantages while minimizing the complexity?  Big data management is the answer. Coupled with experienced services offerings that can help build out your solution, you can enjoy the benefits of deep data insights in a matter of weeks at a fraction of the cost.
  • Show me the money To be competitive, every enterprise today must exploit their data with advanced analytics. There is also a need for a sidekick. Lastly, speed and agility are important. Big Data Management will give you that edge to get the scale of a super enterprise while not breaking the bank.

In conclusion, I was inspired by articles on the complexity of Big Data and also from articles on talent for Big Data. These are major issues that can simply be avoided by looking for a Big Data Management option.  This approach allows for analytics on large amounts of data, with the possibility of providing new sources of revenue. It’s not about the size of data; what matters is to solve your problems fast and economically.

Does your Analytics journey look like this?

analytic solutionsBig Data and Analytics is now moving at a faster pace than before to midsized businesses (MSBs). The potential “Value” of Analytics while once thought useful to larger global enterprises, has quickly moved downstream. As a consequence, the spend in this space is also expected to grow faster than the large-scale enterprises.

Midsize businesses across domains can now gain a significant competitive advantage, get valuable insights to identify potential markets, and form the basis to improve customer experience and operational efficiency.

However, there’s more money spent on efforts to cope with massive influx of available data than the applicability of “Analytical Value” that technology offers. To influence business results meaningfully, MBs must take a multi-dimensional view of the Analytics Value. They must consider the Value across the following three dimensions:

Intent, Commitment, and Clarity of Business Impact

It is imperative to understand the purpose and focus areas where Big data and Analytics have the most potential. Here are few key questions:

  • What do we want to use big data for? Strategic? Tactical?
  • How can we monetize the data streams in terms of customer loyalty, revenue growth and/or cost reduction?
  • What are the areas for Business Impact (Customer, Product/Service, Operations, Supplier/Partner, Finance, Risk)?
  • Where within these areas does Big Data and Analytics provide the most sustainable value?
  • How can we use data-driven customer intelligence to understand customer behavior?
  • What specific customer-centric and operational-centric KPIs or metrics provide insights into a particular component of our business? e.g.: Propensity to Buy, Customer Lifetime Value (CLV)
  • How will better insights and information help overcome the most pressing challenges in our business?

Solution Options Built on a Foundation of Analytics

Most medium sized businesses lack understanding of the various Solution Options and Tools available, and hence are not confident and hesitate to employ it. It is imperative to select diligently from a plethora of Analytics Solutions and Tools for cost efficiencies, process improvements, data governance, and technology.

From a process perspective, they must be able to collect enough internal data, normalize and combine this data with external data sources to identify patterns and behaviors.

A study by IDC revealed that organizations that use diverse data sources, analytical tools (e.g., predictive analytics) and the right set of metrics are five times more likely to succeed and exceed expectations for their projects than those who don’t use these big data strategies.

Skills Training, Gap Analysis, and Lessons Learned

Having a successful initial implementation or Proof Of Concept (more is better, timing is of essence) within 6-8 months is critical. Shorten decision paths and leveraging domain centric Big data and Analytics partners and solution providers is essential.

However, identifying and realizing Analytics Value for midsize businesses is also more than just working with external partners. They must know the critical questions to answer based on the data.The business must understand the Analytics terminology and technology, as well as possess some internal statistical knowledge. We see a few midsized businesses include a new role – a Data Strategist, to help with their growth strategies, streamline business operations and integrate technology to help the business operate more efficiently.

Please feel free to reach out to learn how Orzota has helped organizations across the above dimensions to:

  • Build models targeted to specific use-cases that can be implemented swiftly, with clear business focus
  • Select, deploy of targeted data-analytic solutions
  • Adopt the Analytical solutions and tools
  • Realize Analytics Value Faster

5V of big data

Since the explosion of Big Data we have early adopters and also a fan-base that is still in the evaluation phase. Everyone is familiar with the 4 V’s in Big Data: Volume, Variety, Veracity and Velocity. These are fantastic fundamentals but we need a 5v of big data.

Introducing the 5V of Big Data

vision  It seems that many are stuck in the definition stage and endless discussions that don’t lead anywhere. There is a plethora of innovation and all kinds of companies that are providing a solution to the many moving parts of Big Data. In an earlier post, I wrote about the three pillars of Big Data: Modernization of Data Warehouse, Data Plenum and Data Juncture, Insights.  All good to this point but…

Why Do We Need the 5th V (vision) in Big Data?

  • Vision help us to identify where to start.

We have to envision the complexity and ultimately where we need to take this program. And yes folks, it is a program. By spending a minute to look into the next few months, we can identify at least internally at this stage an initial framework of action items. You have to remember that depending on the size of your company, you have to be a team player on this.

  • We can get a better understanding of the playing field.

Big Data has complexity, and this is why it’s so much fun to work in this domain. Your Vision should be centric to the goals of your company as a whole! This means that it should serve the needs of your internal and external business efforts. Let’s take an example for a B2C company. If you provide products/offerings to consumers, you will have different needs that are around how to understand your customers better, how they shop online vs. store, marketing, and predictive insights. When you are envisioning, take a minute to understand how you will get there by servicing the needs of the business requirements.

  • Building the program

One of the final elements of the Vision is to initiate the program. You have to remember that it is going to be an effort that will require external assistance and internal coordination. A major accomplishment in this area is to quickly be able to accommodate the business requirements thus providing big data driven solutions to the company users. I can’t stress enough the importance of a managed services model and here is why: first, you eliminate the complexity and second, you can deliver faster to the business users. Also, you have to remember that there is a need for nurturing and continuous modeling. Finally the team structure. In team sports, we often hear the word franchise player the person(s) that the team is built around. Whether you like the NFL or the NBA you can’t go anywhere without a good quarterback and in basketball point guard. Then you build from there wide receivers, center you get the point.

In conclusion, Vision, the 5V of Big Data, would be a catalyst to create initiate the steps that get you to successfully build the process for Volume, Velocity, Variety, and Veracity. Remember that we are all Big Data analysts and that analytics in one way or another are ingrained in our human system. From a young age, we played sports, so we looked at stats and knew what to do or we knew how much time to spend to improve our game, therefore same for business. One last point though it takes a team to win the championship keep that in mind! No need to feel that we’re drowning or intimidated by data and likely there is plenty of innovation out there at the moment. So take a minute to use the 5th V (Vision) and keep moving forward to derive value from Big Data.

It’s an exciting time for the Orzota team. During February we will hold some interesting conversations with the local business community in Chicago. chicago-skyline

Orzota’s Founder and CTO, Bharath Mundlapudi and our Director of Sales Ilias Farfouris will be presenting and hosting 2 events. The topics are Big Data in IoT and Modern Data architecture using Flink and Hadoop.

Chicago and the midwest are communities that are taking huge strides in Technology within the Big Data domain. Orzota’s mission and past use case expertise makes a dynamic contributor for these groups to share best practices.

During February 15th we will be a part of the IoT business community in Chicago. Please find the info and join us for a great evening of technological conversation that fuse Big Data and IoT. Here’s the link

During February 16th we will be a part of the Chicago Apache Flink community. This is a technology that is growing and the group is starting efforts in New York and internationally. Our mission as a big data company is to contribute with new and exciting upcoming technologies. We’re excited to illustrate our domain and fuse it with the new kid on the block Flink. Here’s the link

Come join us during these fun sessions and interact with peers in Big Data and IoT community. Special thanks to the organizers of the IoT Chicago and Flink Chicago community and thank you for having us!

See you there!