(ARTICLE 1 in the Series)
The Rise of Big Data
Big Data technologies made it possible for enterprises to capture and integrate diverse sets of data. They were no longer constrained to the data warehouses for analytics and reporting. Big Data allowed the integration of third-party syndicated data sets and social media data such as tweets and blogs. In addition, it helped break down silos between the various divisions within the enterprise, democratizing data access and help gain new insights from data.
The enriched big data sets can be used not just to understand the past, but make predictions about the future – which customers are likely to churn, which customers/equipment are most likely to generate new claims, which products are the most likely to succeed, etc.
We are now in the next wave of deriving value from data using AI-powered applications. The big breakthrough for this wave is the ability to use AI-powered neural networks to solve a wide variety of problems including autonomous driving vehicles, natural language understanding, image recognition, etc. Translating these technological advancements to real business use cases will result in significant operational benefits – reducing cost, providing faster customer service while creating new business models and sources of revenue.
Let’s look at some of the use cases for AI in insurance.
Underwriting or new application processing is the first pillar in any type of insurance – namely, processing applications for new insurance policies. The process can be complicated depending on the type, size, prior history and other components of the application to evaluate the risk and enroll the client. It involves communication among multiple parties – the client, agent, and underwriter. This is traditionally a manual process as it involves a review of many different types of documents from diverse carriers with no standardization that allows easy automation. Further, many carriers still receive paper documents that are faxed or scanned (or worse – sent via snail mail !)
AI-powered systems can help this step in multiple ways:
- Natural Language Processing (NLP) systems and chatbots and streamline communication between the parties
- AI-driven document extraction systems (Docu-AI) can automate the processing of the various documents using AI and Big Data
- Data from documents can then be analyzed by AI-powered analytics to help the underwriter assess risk
Claims processing forms the core of the business for insurance carriers. When a claim is processed in a timely manner, it improves customer satisfaction and retention. Simultaneously, the processing has to minimize financial loss due to fraud or other factors to maximize profitability. Most companies have focused their energies on improving the claims process using technology.
Many software applications already automate workflows ensuring timely processing and smooth communication with all parties involved. Mobile apps allow users to easily submit claims along with documentation such as photos of the incident, claim form, invoices, etc.
Yet, main parts of the process are heavily manual. Claims adjusters have to frequently go out in the field to make assessments. Even in the case of smaller claims, the adjuster may manually review documents and photos.
How can AI-powered systems help claims processing?
- Image recognition algorithms can help identify and automatically categorize various pieces of information in claim evidence photos such as license plates of vehicles, insurance cards, various types of damages, etc.
- AI-driven document extraction systems (DocuAI) can automate analysis and categorization of line items in medical records, body shop estimates, contractor reports, etc. Using NLP and Deep learning techniques allows these systems to recognize a wide variety of content.
- Robotic Process Automation (RPA) can automate many parts of the processing workflow along with 1) and 2) above
Fraud detection is usually a part of claim processing to ensure that no opportunistic fraud has taken place. The biggest loss for insurance companies is due to fraud. Many larger carriers already use predictive analytics to help identify potential fraud in claims. These Machine Learning models use not just a carrier’s own data but also shared databases across companies to flag potential fraud.
AI-powered systems can take this a step further. They can use the vast amounts of accumulated data and images to detect more subtle instances of fraud as well as previously intractable ones. With the cost of running these models dropping dramatically, even small claims can be analyzed to detect patterns.
Improving customer service is the goal of every organization. With Big Data and AI, it is possible to automate the analysis of customer service calls and emails allowing customer service agents to proactively address complaints and issues.
AI-driven chatbots are now pervasive on websites and web portals. They provide an easy way of answering customers’ questions while reserving human interaction to handle more complex issues. Mobile apps with the ability to answer spoken natural language queries are now possible using technologies like Siri, Alexa and the same knowledge base used by chatbots and customer service agents.
New Business Models
With IoT enabling the gathering of fine-grained data (how many do I drive every day, what is the average trip, how many hours is the property unoccupied), insurance companies are seizing the opportunity to come up with new ways of underwriting policies. AI-powered systems can provide better risk analysis for determining premiums resulting in new personalized products. These new products can be provided at attractive premiums, driving new business.
I will be giving a talk titled “Anomaly Detection for Predictive Maintenance” at the Global Artificial Intelligence Conference in Seattle on April 27th 2018. If you are going to the conference, please do reach out.
Detecting anomalies in sensor events is a requirement for a wide variety of use cases in the industrial IoT. Examples include predicting failures of HVAC systems and elevators for property management to identifying potential signals of malfunction in aircraft engines to schedule preventive maintenance. When the number of sensors runs into the tens of thousands or more, as is common in large IoT installations, a scalable model for preventive maintenance is needed.
Unlike prediction models for customer churn, inventory forecasts, etc. that rely on multiple sources of data and a wide range of domain-specific parameters, it is possible to detect anomalies for many types of time-series data using statistical techniques alone.
In this session, we will discuss a step by step process for anomaly detection with examples that aid in quick insights for building models for preventive maintenance.
I will be giving a talk titled “Time-series analysis in minutes” at the Global Data Science Conference in Santa Clara on April 2nd at 3:30 PM.
The focus of the talk will be on understand why and how to analyze time-series data quickly and efficiently. You can read the full abstract here.
An interview given as part of this conference is also available at the conference website.
If you are going to the conference and would like to connect, I would be happy to meet with you.
How Big a Deal Is IoT? Much Bigger than Big Data
A quick heads-up that I’ll be participating in a DM Radio podcast on IoT (March 01, 2018) , to talk about IoT and its future. You can listen in on the live podcast at 3 PM Eastern / 12 PM Pacific (and I expect there’ll be a recording available, too). Click here to read more.
The Internet of Things really started humming in 2018, but the best has yet to come. Sure, there were some hiccups along the way, with refrigerators hijacked in massive distributed denial of service attacks, but for the most part, the IoT experience has gone pretty well. What does the future hold? So, so much more! Check out this episode of DM Radio to learn more. Host @eric_kavanagh will interview big data legend Kirk Borne of Booz Allen, and yours truly, Naren Gokul, along with several expert guests!
There have been many articles written and talks given over the last several years on abandoning the Enterprise Data Warehouse (EDW) in favor of an Enterprise Data Lake with some passionately promoting the idea and others just as passionately denying that this is achievable. In this article, I would like to take a more pragmatic approach to the case and try and lay down a process that enterprises should consider for a data management architecture.
The focus is on data lakes for enterprises, referred to as Enterprise Data Lake to distinguish it from data lakes created by internet, ad-tech or other technology companies that have different types of data and access requirements.
The Enterprise Data Warehouse
The much reviled and beleaguered Data Warehouse has been the mainstay of enterprises for over 20 years supporting business reports, dashboards and allowing analysts to understand how the business is functioning. Data Warehouses when built right provide robust security, audit and governance which is critical – especially with the increasing cyber-hacks today.
Alas – many data warehouse projects are so complex, they are never finished! Further, the strict, hierarchical governance that many IT departments created around the warehouse caused lots of frustration as business analysts and researchers cannot explore the data freely.
The Hadoop Phenomenon
When Hadoop entered the mainstream, the big attraction for business analysts and data scientists was the ability to store and access data outside the restrictive bounds of IT! This raised the exciting possibility of finding new insights into business operations, optimizing spend and finding new revenue streams.
3 Requirements for the Enterprise Data Lake
James Dixon coined the term Data Lake in 2010 to mean data flowing from a single source with the data being stored in its natural state. We have come some ways from that definition and the most common definition of a Data Lake today is a data repository for many different types and sources of data, be they structured or unstructured, internal or external, to facilitate different ways of accessing and analyzing the data. The Data Lake is built on Hadoop with the data stored in HDFS across a cluster of systems.
The 3 requirements for the Enterprise Data Lake are:
- It must collect and store data from one or more sources in its original, raw form and optionally, its various processed forms.
- It must allow flexible access to the data from different applications; for example, structured access to tables and columns as well as unstructured access to files.
- Entity and transaction data must have strong governance defined to prevent the lake from becoming a swamp.
Enterprise Data Lake Architecture
The diagram below shows an Enterprise Data Lake that ingests data from many typical systems such as CRM, ERP and other transactional systems. In addition, it is fed unstructured data from web logs, social media, IoT devices, third-party sites (such as DMP, D&B) creating a data repository. This rich data eco-system can now support combining multiple sources of data for more accurate analytics and never-before possible insights into business operations.
With technologies such as BigForce SNAP, it is possible to run existing enterprise Business Intelligence (BI) tools as well as perform exploratory analysis with visualization tools such as Tableau.
Enterprise Data Lake Governance
More importantly, the Hadoop eco-system now supports data governance through technologies like Ranger, Knox and Sentry. In combination with Kerberos, and enterprise identity management systems such as Active Directory (AD) or other LDAP frameworks, it is possible to implement strong security and governance rules. See “Implementing Hadoop Security” for details.
The Modern Enterprise Data Architecture
But what if you already have an existing EDW with hundreds of applications, some of which use complex analytics functions? How best can you leverage the EDW while also moving to a modern data architecture that allows new data sources to be integrated and empower your data scientists to integrate, enrich and analyze lots of data without the restrictions of the EDW?
A happy compromise between the data lake and data warehouse does exist and data architects and businesses have realized that it IS possible to build on the strengths of each system.
In this architecture, the data lake serves as the repository for all raw data, ingested from all the relevant data sources of an organization. Optionally, the data lake can also store cleansed and integrated data which is then also fed into the data warehouse. This way, newer BI applications can be built directly on the enterprise data lake while existing applications can continue to run on the EDW.
Data Governance in the Enterprise Data Lake
Data Governance policies for enterprise data in the EDW should also apply to the same data within the Enterprise Data Lake in most cases. Otherwise, this may lead to security holes and data inconsistencies between the two systems. If careful consideration is not given to governance, the data lake will turn into a data swamp !
However, since the data lake consists of all the raw data from operational systems as well as new data sources, it is possible to now provide data scientists and other analysts access to these data sets for new exploratory analytics.
Architecting a modern data architecture requires a thorough understanding of the requirements, existing applications and future needs and goals of the enterprise. Especially important to consider are Master data and Metadata management, governance and security as well as the right technologies.
At Orzota, we have built data lakes for a variety of businesses and have a methodology in place to ensure success. Contact us for more information.