During the summer of 2016, we had a high school student intern with us. He knew some Java from the Computer Science AP course but was very interested in using machine learning to predict health outcomes. We were skeptical at first – the prospect of teaching a teenager (even a very smart one) the fundamentals of ML, along with a new programming language and then have him apply it to a real data set … and all in the span of a summer internship seemed like an Herculean task. But seeing how keen he was, we decided to take him on.
Sushant Thyagaraj (that was his name) proved us wrong! He learned R within the first week, following that quickly with various ML algorithms through tutorials and sample exercises. He researched various publicly available data sets that might be suitable for his work, went through several iterations with a couple of the data sets before finally settling on predicting survival for lung cancer patients after thoracic surgery.
He continued fine tuning his results and wrote a full paper detailing his work (I should add that this last was done after school began). We are pleased to present his paper: Using Machine Learning to Predict the Post-Operative Life Expectancy of Lung Cancer Patients
The topic of data science has been on the rise within the tech industry. Often, we see techies conversing and sharing articles about data science on social media and we hear professionals discussing it as part of their business plan. By now, most of us are aware that it exists and have an inkling about what it does. But can you answer the following questions?
Do You Need a Data Scientist?
In the past, it has been known that larger, technologically advanced companies used data scientists (Facebook, LinkedIn, Google, etc.). However, we are seeing non-technology type businesses hire data scientists. For example, retailers are using data science for everything from understanding customers to managing inventory. Data science allows companies to gain insights from data in many fields and ultimately improve forecasting and decision making.
What Does a Data Scientist Do?
According to Dr. Steve Hanks, there are three major capabilities that data scientists must have: (1) They must understand that data has meaning, (2) They must understand the problem that you need to solve and how the data relates to that, and (3) They must understand the engineering.
A data scientist, in very general terms, looks at and investigates a set of data and then comes up with different ways to answer previously posed questions. Along the way, the data scientist may consider historical data analysis to develop analytical models and dashboards that provide insights and improve decision making.For example, a data scientist for a large retailer like Macy’s may look at not just past seasons’ data, but current economic and weather conditions to make predictions for their upcoming season. Retail executives make use of this to improve things such as sales, revenue, marketing strategies, advertising efforts, etc.
How Do You Build a Strong Data Science Team?
Choosing people that are aware and skilled in areas that fit your company’s need is essential. An article from Datafloq says, “The team needs to take the data and understand how it can affect different areas of the company and help those areas implement positive changes.” Not all the skills of a data scientist can be taught; it is important to have a natural affinity for data analytics, and the drive to produce beneficial insights to answer your company’s needs.Data scientists are not only computer scientists and statisticians, but must have a solid understanding of the business as well.
Should You Outsource Your Data?
Because this field of work is both complex and intimidating, there is a shortage of skilled professionals in the industry. Advanced analytics require a certain skill-set to develop and run machine learning models. Instead of spending the money and putting in the efforts to develop a team with the necessary skills internally, you can speed up your path to data science and outsource. For small-to-medium businesses, it can be cost-prohibitive to have their own data science team. There is work in the field of data engineering that must be done before a data scientist can develop models. This may not be an efficient use of resources for a small-to-medium business to hire both data engineers and data scientists.
Shanti Subramanyam, CEO at Orzota says, “Deciding to outsource reflects the core competency of your business. If you don’t have the financial resources or the capacity to focus on it, outsourcing is a faster and more efficient way to stand up a capability.”
If you’re overwhelmed by these questions, don’t be. Although the idea of data science and big data may seem complex, it is important to understand at least the basics. If you can articulate your business pain-points, it will be easier to answer these questions and find the best solutions to fit your company’s needs. Orzota is here to explain further, answer your questions, and offer services to help you feel comfortable with understanding and fulfilling your data science needs.
IoT and Big Data are becoming essential to market growth and customer success. Enter Orzota’s call for partners. There are initiatives in all major verticals like Manufacturing, Oil & Gas, Transportation, Retail, Financial, Insurance, Life Science/Healthcare, etc. Additionally, there are many pieces to the puzzle between, data architecture, technology and the necessary resources to deliver a successful Big Data & IoT program that will benefit business users.
At Orzota we seat at the intersection of IoT and Big Data. As a Silicon Valley based company we aim to provide solutions that can transform the way companies collaborate and derive value from Data whether being it from sensors, machines, ERPs, websites, industry boards, social media and beyond. To do so we bring platforms for Big Data and IoT that are flexible and quickly accelerate the delivery of solutions while supporting it through our Managed Services models. We’re harnessing Open stacks and cloud technologies that provide the elasticity and economics to quickly generate ROI. Lastly, we augment projects with verified resources for Data Architects, Data Engineers and Data Science.
As a partner, you’re a technology or a consulting provider that serves the mid-market and is looking to augment their service portfolio while adopting the latest in Open stack technologies for Big Data and IoT. Apart from an excellent solution, what you can expect is the support of an experienced team that has been in the technology side of this domain at Yahoo while harnessing project experience from companies like Netflix, Boeing, and Bank of America to name a few.
We’re all about making it easy so email us at firstname.lastname@example.org
The conversation has shifted from trying to define what Big Data is but we’re stuck in the layers. Is Big Data fun like eating an orange full of vitamin c? Or is it more like an onion? where you cry when you try to peel it off but contrary it’s so sweet when you cook it properly? I tend to lean more towards an orange, and I will tell you why.
One important area is the learning curve. We are seeing efforts taking six months+ without even having addressed any of the use cases let alone having to figure them out. There is a lot of innovation at the moment but with that comes a lot of new terminologies that also require some effort to be informed. More important is the use case. How are we going to evaluate what we need without knowing what the use cases are? A real-time data streaming scenario is much different than scenarios where the speed is not of importance as we can afford to see the results the next day.
Three Steps to get started with Big Data
- Start by identifying your vertical and finding what the demands are and who you serve as your customer. This will guide you on how to decide the Proof-of-Concept PoC(s) and therefore the use case(s). As a follow-up step, we then look at the technological assets that we will need. From there the next steps is how to launch the Proof-of-Concept (PoC) quickly. For companies that think their data is sensitive follow this rule: decide what data you can live with by having it in the cloud as you’re only testing the hypothesis at this stage and is the fastest route.
- Seamless integration. At this stage, you need a solution that is responsive. Why because multiple business units will come with requirements. You need to be able to accommodate such demands. I can’t stress enough the managed services approach until you can support it internally. Why because it’s a huge learning curve and the return on investment (ROI) is far better. Many fall into the loop that because they see what a super enterprise is doing, then it’s feasible to have the same approach. This approach will only yield frustration as it will take longer. At Orzota BigForce we’re working hard to accelerate this stage.
- Insights. At the end of the day, you are doing all this for the insights but more specifically for the predictive and the hidden potential. Most likely you have a ton of reporting going on. Reports it’s not the issue here. If you cannot get the proper reports right now, then you need some serious help. Insights can be discovered internally but also externally, so kindly remind your hardest critics that is not about how much data you have but how much is out there that you can derive these very crucial insights for your business. A good example to put this argument to bed is Social listening and therefore Sentiment Analysis.
In conclusion, if you think about it, there isn’t much overlap if you start from the use case and Proof-of-Concept (PoC) approach while following the above steps. Starting small will also allow you to get buy-in and then expand. Partners may seem in the beginning that they offer similar services but at the pole position, you need to get more with less. Lastly, always keep in mind that for the majority the dynamics are different as there isn’t much Data Analysis talent out there thus with a managed services approach or a hybrid approach you can accelerate your environment and team. Finally, the orange correlation in 3 easy steps: pick, peel and consume. You can enjoy it getting all the vitamin C and its benefits.
We are pleased to announce a free trial to the Orzota BigForce Social solution. This solution built on top of the Orzota BigForce platform provides the capability to analyze text streams from twitter, media sites, blog sites, etc. With search capability, the solution provides data scientists a means of exploring the social media data, with a focus on sentiment analysis. The sentiment analysis free trial will let you quickly determine whether such a solution can meet your needs.
Unlike many other sentiment analysis solutions, the Orzota BigForce Social solution can be customized to meet your needs. So sign up for the trial and reach out to us to understand how we can make this work for you!