We recently worked with a leading Hi-Tech manufacturing company to design and implement a brand new scalable and efficient workforce analytics solution targeted the mobile workforce.

The solution is designed to raise the workers’ confidence bar, and to minimize the effort required to train the workers. The solution also improved the manpower utilization by optimizing inventory adjustments with higher accuracy while fulfilling orders. It also reduces the learning curve for workers resulting in substantial reduction in training hours.

Workforce Analytics Solution Overview

The Workforce Analytics solution was built on a Common Data Analytics Platform leveraging Hortonworks HDP 2.4 and used the following technologies: Kafka, Storm, HBase, HDFS, Hive, Knox, Ranger, Spark and Oozie.

The platform collects real time data from the application on mobile devices, stores it, and runs analytics with better performance and lower latency compared to their prior legacy system.

The HDP components at a glance:
Workforce Analytics Solution HDP Components

Workforce Analytics Architecture

The operational real-time data is collected using Kafka and ingested into HDFS and HBase in parallel using Storm (see diagram below). HBase acts as the primary data store for the analytics application. The data in HDFS is encrypted and reserved for other applications. Based on the business logic, the data stored in HBase is processed using Spark on a daily, weekly, monthly and yearly basis, and stored back into HBase as a feed for Spark Analytics (Spark SQL). Spark Analytics is used to run jobs to generate specific insights. The output from Spark Analytics in Hive as a temporary table. Hive Thrift Server is used to execute queries against Hive and retrieve the results for visualization and exploration using Tableau. Custom dashboards were also built for business users to help them track higher-level metrics.

Workforce Analytics - Architecture

To address security requirements, Apache Knox and Apache Ranger were used for perimeter security and access control, respectively. Both are included as a part of HDP 2.4 and are configured in the Access Node.

Workforce Analytics Physical Architecture

The figure below shows the physical layout of the services on the various servers used. The architecture comprises of Edge Nodes, Master Nodes and Slave Nodes. Each set of nodes run a variety of services.

Workforce Analytics Physical Architecture

Issues and Solutions

While implementing this solution, we ran into a variety of issues. We outline some of them here in the hope that it may help others who are designing similar architectures with the Apache Hadoop  or Hortonworks HDP eco-system of components. Table creation, user permission and workflows were the common focus areas.

HBase Table Creation

We ran into permission issues with HBase table creation.

Solution: In Apache Ranger, update HBase policy by giving appropriate read, write and create permission for the defined user.

Connection to hive thrift server

Another issue we ran into involved connections to Hive Thrift Server for a particular user “ABC”.

Solution: Ensure that the below properties are added to $HADOOP_CONF/core-site.xml

hadoop.proxyuser.ABC.groups=*

hadoop.proxyuser.ABC.hosts=*

Oozie workflow jobs submission

Permission errors continued to plague the project while creating workflows in oozie.

Solution: The following needs to exist in the section of the corresponding job definition in workflow.xml:

<env-var>

HADOOP_USER_NAME=ABC

</env-var>

within the

<shell xmlns="uri:oozie:shell-action:0.2">

oozie workflow job stuck in prep state

When re-running an Oozie workflow job after a period of time, it went to PREP state and did not execute. While trying to kill the job via CLI, the Oozie log shows the job was successfully killed.

USER [test] GROUP[-] TOKEN[-] APP[-] JOB[ABCDEF] ACTION[] User test killed the WF job ABCEDEF-oozie-oozi-W

However, in the Oozie UI, the job is still shown to be in PREP state.

Solution: Further research showed that the Oozie database at the backend (Derby by default) was corrupted, and was not representing the correct state of the jobs.

We decided, for longer term stability, to migrate from Derby to MySQL as the backend database for Oozie. After this migration, we did not run into this issue again.

Conclusion

Big data projects can grow and evolve rapidly. It’s important to realize that the solution chosen must offer the flexibility to scale up or down to meet business needs. Today, in addition to commercial platform distributions such as Hortonworks and Cloudera, higher level tools and applications simplify the process of developing big data applications. However, as seen by some of the issues we describe above, expertise in the underlying technologies is still crucial for timely completion of projects. Orzota can help. Please contact us.

Does your Analytics journey look like this?

analytic solutionsBig Data and Analytics is now moving at a faster pace than before to midsized businesses (MSBs). The potential “Value” of Analytics while once thought useful to larger global enterprises, has quickly moved downstream. As a consequence, the spend in this space is also expected to grow faster than the large-scale enterprises.

Midsize businesses across domains can now gain a significant competitive advantage, get valuable insights to identify potential markets, and form the basis to improve customer experience and operational efficiency.

However, there’s more money spent on efforts to cope with massive influx of available data than the applicability of “Analytical Value” that technology offers. To influence business results meaningfully, MBs must take a multi-dimensional view of the Analytics Value. They must consider the Value across the following three dimensions:

Intent, Commitment, and Clarity of Business Impact

It is imperative to understand the purpose and focus areas where Big data and Analytics have the most potential. Here are few key questions:

  • What do we want to use big data for? Strategic? Tactical?
  • How can we monetize the data streams in terms of customer loyalty, revenue growth and/or cost reduction?
  • What are the areas for Business Impact (Customer, Product/Service, Operations, Supplier/Partner, Finance, Risk)?
  • Where within these areas does Big Data and Analytics provide the most sustainable value?
  • How can we use data-driven customer intelligence to understand customer behavior?
  • What specific customer-centric and operational-centric KPIs or metrics provide insights into a particular component of our business? e.g.: Propensity to Buy, Customer Lifetime Value (CLV)
  • How will better insights and information help overcome the most pressing challenges in our business?

Solution Options Built on a Foundation of Analytics

Most medium sized businesses lack understanding of the various Solution Options and Tools available, and hence are not confident and hesitate to employ it. It is imperative to select diligently from a plethora of Analytics Solutions and Tools for cost efficiencies, process improvements, data governance, and technology.

From a process perspective, they must be able to collect enough internal data, normalize and combine this data with external data sources to identify patterns and behaviors.

A study by IDC revealed that organizations that use diverse data sources, analytical tools (e.g., predictive analytics) and the right set of metrics are five times more likely to succeed and exceed expectations for their projects than those who don’t use these big data strategies.

Skills Training, Gap Analysis, and Lessons Learned

Having a successful initial implementation or Proof Of Concept (more is better, timing is of essence) within 6-8 months is critical. Shorten decision paths and leveraging domain centric Big data and Analytics partners and solution providers is essential.

However, identifying and realizing Analytics Value for midsize businesses is also more than just working with external partners. They must know the critical questions to answer based on the data.The business must understand the Analytics terminology and technology, as well as possess some internal statistical knowledge. We see a few midsized businesses include a new role – a Data Strategist, to help with their growth strategies, streamline business operations and integrate technology to help the business operate more efficiently.

Please feel free to reach out to learn how Orzota has helped organizations across the above dimensions to:

  • Build models targeted to specific use-cases that can be implemented swiftly, with clear business focus
  • Select, deploy of targeted data-analytic solutions
  • Adopt the Analytical solutions and tools
  • Realize Analytics Value Faster