The conversation about business outreach (service first) and infrastructure elasticity (cloud) does not feel complete without including…
Every generation of technology upgrade has created a need for the next upgrade in some ways. User created content and social media initially drove the need for big data techniques. However, the drivers to this movement added up pretty quickly because what big data analysis and prediction can do for business was quickly understood.
A Quick Introduction
Big data is commonly referred to as the mining and analysis of extremely high volumes of data. The data in question is not structured since it is collected from a variety of sources. Many such sources might not be following any standard storage format or schema. The data in question is also described by its characteristics, primarily – volume, verity, velocity, variability and complexity. The techniques for analyzing this data involve algorithms that engage multiple processors and servers that distribute the data to be processed in a logical way. Map-Reduce is one of the many popular algorithms, and Hadoop is one of the popular implementations of Map-Reduce.
Big data techniques are not something that only the corporations that collect social media dust need, it is something that every business needs to look into, sooner than later. It is a separate topic that every business needs to factor in social media data in some form or the other. Even if that part is ignored, the volume of structured data is increasing by the day.
Keeping all that in mind, it should be important to explain how well big data fits into the elasticity of the cloud. Imagine an operation where data needs to be segregated by some specific parameter on different servers. These different servers might run some processing depending of the type of the data or just store that data to improve access time. A true cloud environment will be the perfect host of such an operation. You can spin up new servers, having specific configuration, with just a few lines of scripts at run time.
Where are we heading
In 2011 Google Fiber announced Kansas City to be the first to receive 1 Gigabyte per second internet speed followed by Austin, TX and Provo, UT. As per the company’s announcement in February 2014, Google fiber will be reaching to another 34 cities. AT&T stepped up by announcing that its new service, GigaPower, will provide the gigabyte internet speed to cover as many as 100 municipalities in 25 metropolitan areas. Besides Google and AT&T many other large and smaller providers are working on targeted areas to provide superfast internet speed such as – Cox, Greenlight Network, Canby Telcom, CenturyLink, Sonic.Net etc.
Considering this new scale of data on bandwidth, the way application technology works is going to change, specially the part that involves Mobile and Cloud. It will be much more convenient to have a huge memory and processor centric operation running in a cloud environment, streaming status and results to the browser running on your laptop or a small hand held mobile device.
Moving the heavy lifting work to the cloud and keeping the control on low resource devices is not something that is going to happen. It is happening now; only the scale and outreach is going to increase exponentially. Everyone connected to this field should to pay attention to the changes and keep a strategy for the future, be it providers, consumers, decision makers, technology workers, business users and consultants.
Power BI is Microsoft’s cloud based service that leverages Excel to enable self-service business intelligence. The term Power BI has also been used generically to reference the components and technologies that compromise the Microsoft BI suite of tools. Specifically, PowerPivot, PowerView, PowerQuery, PowerMaps, Question & Answer (Q&A) and now Forecasting. The Q&A and Forecasting features are currently supported only in Office 365 and SharePoint on-line. The other features are fully supported in the desktop (Office Professional Plus) and Office 365 versions of Excel 2013.
The latest incarnation of Office 365 implements time series analysis to provide forecasting capabilities. It is this version and its forecasting capabilities that will be discussed in this article. The description and definition of the specific time series algorithms related to forecasting is beyond the scope of this discussion but, the implications of providing this capability are not.
The methods and techniques for time series analysis are well documented and understood in academia and in the field of statistics, but now this capability is being placed in the hands of the masses that may or may not have a thorough understanding of the associated techniques or how to interpret the results. This may present a change management issue for an organization, but with some planning a great deal of benefits and insights can be obtained that would otherwise not be realized.
From a change management perspective, it is imperative that a consistent approach be defined and implemented to ensure consistent results when developing an analytics solution. This should also include a training program on terminology, techniques, methods and practices.
Let’s take a detailed look at the process that will lead to obtaining useful insights from a forecasting exercise and then how this process applies to an example implemented in Power BI.
- Business Understanding – Understanding from a business perspective of the project objectives, requirements and what the specific outcomes should be. This may also include an initial reference to an analytic methodology or approach (forecasting, classification, etc.).
- Data Understanding – Understanding of the traits/personality (profile) of the data. Are there data quality issues? What are the valid domains of attribute values? Are there obvious patterns?
- Data Preparation – Does the data need to be reformatted? How will missing values be handled? What are the relevant attributes or subsets of data?
- Modeling – Identify potential modeling techniques to meet the requirements of the business solutions and its objectives.
- Evaluation – Evaluate the model and determine its fitness for use. How accurate is the model? Does it address the business requirements? Have new insights been exposed that change the understanding of the data?
- Deployment – Present the model results. Make sure the appropriate visualization is used to present the results. Does the deployment require a simple report, or is a new process required to closed the analytic loop?
This process is depicted in the following diagram:
The above process steps define the CRISPtm data mining methodology which provides an excellent foundational approach and process for development and deployment of predictive analytic and data mining solutions. It has been around for some time, but the basic tenets are very applicable. Let’s now look at an example of how Power BI forecasting can be leveraged and how the process steps are implemented.
The following data represents new and used car sales from 2002-2014. The data is stored by month. Examining the raw data, this is the opportune moment to address business understanding and identification of the business problem and requirements. In this case the business problem is to forecast future new car sales to help better manage inventory. Also, understanding the nature and characteristics of the data should be accomplished at this point. This can be done via data profiling (min, max, null counts, standard deviation, etc.) and through data visualization. It would also help to have a domain expert available to provide additional insights. With regards to data preparation for Power BI Forecasting, there should be an attribute that can be used for time series analysis. In this case, a new attribute is created named [Period Ending] that is a combination of the [Year] and the [Month] represented internally as a date.
The above data was loaded into a PowerPivot workbook and uploaded to Power BI where some visualizations were applied. The line chart shows new car sales units over time. This line chart will be our candidate for time series analysis (forecasting). Note that there appears to be a cyclical pattern in the data. This is a good reason to generate a visualization to provide insights into the nature of the data.
Currently, to perform forecasting, Power BI must be placed in HTML5 mode. This is accomplished via an icon in the lower right corner of the web page. Once that has been done, then hovering over the chart will expose a caret that indicates forecasting may be performed.
Clicking the caret produces a forecast and displays an additional panel that contains adjustable sliders for confidence interval and seasonality. The forecasting algorithm will attempt to detect seasonality and display the calculated cycle in terms of units. The seasonality slider allows for manually setting the number of periods over which cycles will repeat. For example, If based on domain knowledge, it is known that the seasonality is different from what is calculated then it can be adjusted accordingly. This may change the forecasted values. In this case, the seasonality is detected to be 12 units (1 year).
The confidence interval slider displays a shaded area that indicates the number of forecasted values that fall within a specified number of standard deviations. If there is a need to have a very high correlation for forecasts, select one standard deviation. This will also be an indication of how well the forecast model fits the data. The nature and requirements of the business problem and the user will determine an acceptable value for the confidence interval. For this data, 68% of expected values fall within one standard deviation.
There is also the ability to perform a hindcast. A hindcast produces a model that uses historical data to predict future values based on a preceding selected point in time. New predictions are generated that show how the current predictions would look if the prediction was generation at some past point in time.
Prior to this point, the appropriate model would have been selected (time series) and the model applied and evaluated. Within Power BI, the option to select a specific time series model is not available. With regards to model evaluation, adjustment of the confidence interval and hindcasting provides the ability to evaluate the overall fitness of the model.
Finally, the model is deployed and can be used for revaluation. This can be done via exporting the model along with its data to Excel and running it back through the forecasting model again.
It has been demonstrated how Power BI forecasting can be leveraged using the CRISPtm methodology and how advanced analytics can be placed in the hands of the masses. Power BI as a solution is simple to understand, uses existing technologies and is straightforward to implement. Over time, more and more advanced analytic capabilities will be exposed to the masses and to be successful, a well defined process, approach and appropriate training must be used to ensure that proper results and insights are obtained.
Questions and comments can be addressed directly to:
Director, Data Management – Strategist
Continuing the simplification of mobile first, cloud first from the previous post…
Let’s highlight the two big objectives that are achieved by separating core business services and platforms specific client in the last post:
- Platform and device outreach – HTTP being understood by all modern devices, makes your service consumable by any device that can host a client application and understand the language of the web.
- Heavy lifting done on the server – With the separation between a client app and business as a service running on a server somewhere, all the heavy lifting is done on the server where as the user’s device is doing mostly the user interaction. The heavy lifting work is generally referred to complex computations that consume a lot of hardware resources like CPU, RAM which is generally a limitation on small mobile devices.
Now let’s talk a little about the server.
Is your application business ready or feature ready?
So now we have built our application in a restful manner to reach a broad spectrum of devices and we moved the heavy lifting on the server. At this point our business idea can either take off or send us back to the drawing board. In either case the load on the server that is doing the complex operation is going to fluctuate.
The question here is – is the application infrastructure elastic enough to support that… or is such increase and decrease in the infrastructure going to come with a heavy cost?
It is a difficult question to answer for any developer – how many users (or traffic) would the current server infrastructure be able to hold? The best answer that you will get would be a very careful calculation based on perhaps stress testing, overly padded with seasoned wisdom. In fact, in case of a new application or a rewritten application with the newer frameworks, it is very difficult to evaluate the ideal infrastructure requirement until the rubber hits the road. To be on the safer side, every team tends to overestimate.
Cloudy with heavy awesomeness
Moving the infrastructure to cloud will help you achieve such elasticity. You do not need to worry about contacting data centers really, you can spin off new servers and shut them down when not needed, using a few lines of scripts. Depending on the service you are using, you could do many infrastructure operations using a self-service portal and be charged for only the infrastructure you use, for the duration you used it.
Suppose after we launched our application, we found that out target customers are in a specific geographic location like the east coast or some other part of the world that our analysts never imagined. Can you quickly respond to the new found opportunity? Most cloud service providers will allow you to select the geographic location of your infrastructure, allowing to place more servers closer to the customer for optimized user experience.
Global cloud providers are large organizations that have heavily invested in the infrastructure over the years thus providing you high security and availability. Therefor, there are many benefits that your business gets by moving to the cloud that might be difficult to estimate beforehand.
The device and service situation
Over the last decade, the tech industry has seen the exponential growth of a variety of devices (laptops, mobile phones, tablets, gaming consoles sensitive to touch/voice/motion/gestures). This is only going to get more diversified, whether we talk about 10 new devices within the 5.75 inch and 6.85 inches sector or the consoles / wrist bands that will replace medical equipment or watches that will talk to phones, glasses and TVs. Yes, the forecast is intense.
Relax, take some REST.
Given we can only partially foresee which future devices will be available, how do we maintain consistent delivery of business functionality to the devices that are still to be developed? Well, build a service that can communicate with all the existing devices and can also take care of the REST.
For decades we were happy with XML and SOAP messages communicating across applications. But now this communication has grown beyond traditional applications and devices. From some remote server hosted somewhere in the cloud to smart touchscreens, from set-top boxes to gaming consoles, some of those devices understand SOAP but many do not. However all of them positively understand HTTP. HTTP is what connects everything to the “Cloud” and staying away from either might not be a good idea.
Although the term “cloud” is heavily misused for sales purposes, I will come back to it a little later, let’s first talk about…
What is Service First?
So once you have identified that your business functionality will be delivered via HTTP, you should build your logic and let it be consumed via the HTTP service. HTTP service is more commonly known as RESTful service or REST service Web APIs. This is an architectural pattern that embraces HTTP as transportation protocol. So RESTful services are basically services built to be consumed over the web via HTTP.
Once we have functionality ready and available to be consumed across the spectrum, we can go ahead and create client apps for as many platforms and devices as we want. As far as maintenance is concerned, any change in the business logic will be one change to your service, and a consistent UX will represent the brand.
Let me start this post with a small incident that happened in our organization a couple of years back. We had switched our insurance provider and the representative of the new provider was giving a presentation of the benefit plan. Then came a slide that had their “mobile app” mentioned, among other things. While they talked how one could look up health care providers and other information via the app, someone mentioned that they couldn’t find the app in the store. After a few seconds of confusion, the presenter assured to get back with the information about the app.
The disconnect was – the app they were talking about was a mobile web site, where as people were instinctively searching in the app store. This brings out many interesting points. One of the most important being – the discoverability of your application. When you look for an app that provides a functionality do you open the browser and search online? Maybe, depending upon what are you searching for, but it is highly likely that you will look for it in the App Store of your device.
Which leads us straight to the ultimate option of targeting multiple devices and platforms:
Publishing the application in the app store of the platform.
There are a few ways to go about it –
One would be to choose the device/platform and re-write an app using the platform’s API and language. But given that right now we are talking about targeting multiple devices and platforms, I would not recommend it.
The second will be to abstract the business logic / functionality to a restful service and write multiple UI clients (AKA Apps) using the native platform. That is a decent option, however there is a cost associated with hiring multiple developers each with their own platform skills (Objective C with iOS, Java with Android, C# with Windows 8/Phone etc.). The cost for development and maintenance adds up, every time a new version of the platform is released and newer devices stop supporting older platform versions.
The ultimate goal is to implement this option using the existing skills and without re-writing the entire application. This can be achieved in two ways. But to achieve either, the application has to be well structured. Any well written application defines the boundaries of at least the three traditional layers – UI, Business layer and Data access later. To be able to reuse most of the code across apps, it is rather required to separate out at least the UI layer from the rest of the application logic.
Let’s take a look at these options now:
1 – Speak the Universal language – HTML5
To get the full benefit, the reusable business and data access logic should be abstracted into a RESTful service which would be consumed by your application that will be consuming the PhoneGap framework.
2 – Hire an interpreter – Xamarin
Xamarin allows you to write native iOS, Android and Windows store apps using a single application platform (Microsoft .NET) and language (C#). Applications build in Xamarin enjoy full native access to platform APIs, get native performance and user interfaces. Xamarin provides the ultimate ease of development and maintenance by integrating its tools right into Visual Studio and by extending Microsoft tooling. Microsoft released Portable Class Library (PCL) to develop functionality using C# to be used across a verity of platforms. This was initially created to support Microsoft platforms like .NET and Windows, Windows 8 Store, Windows Phone, etc. In 2013 the PCL feature was extended to iOS, Android and Mac using Xamarin. Since PCL is independent of the platforms it does not support some platform specific features, like certain encryption. Also Entity framework 6.X is not supported PCLs, but as per the forums, EF 7 will be.
That said, you can create the shared business logic of the using PCLs, which each separate client application can use. The new release of Xamarin includes a new feature – Xamarin Forms which allows the sharing of UI logic across platform too.
This is the final part of a 3 part series on Power BI. Part 1 discussed the tenets of self-service analytics and how Power BI and be leveraged. Part 2 provided a more in depth discussion of Power BI capabilities. Part 3 will look at Power BI deployment scenarios.
Before deploying Power BI, you must clearly understand who your content creators and content consumers are. The following personas typically exist within an organization:
The Executive needs information that is highly aggregated to give a high-level picture of the state of a business or functional area. Often presented in the form of a dashboard or scorecard, information for the executive is intended to spark questions and drive strategic decision making.
The Analytist needs raw or lightly summarized data to consume for the purpose of creating a detailed analysis of a specific business problem or opportunity. Analysts present data in the form of spreadsheets, presentations or ad hoc reports that have limited use beyond the specific problem they are tasked to solve.
The Manager needs information that provides a detailed analysis of a specific business area or function. Data for the manager is used in planning future activities or assessing past performance. It is at a level of detail that can help define specific actions, such as a territorial sales plan or marketing activities for a product launch. Data for the manager is generally provided as reports or detailed scorecards.
Operational users deal with data presented at a transactional level. Examples of an operational report may include invoice registers created by accounting or a daily production plan published by a production planner. Operational data has been traditionally presented in the form of reports. Increasingly, data for the operational user is being presented on-line via Intranets or mobile devices.
Desktop applications, Web Browsers, Smart Phone/Tablet Applications
The mix of personas, audiences and consumption modes along with core business requirements provides input into deciding on how best to deploy Power BI as a whole or its individual components.
Deployment Scenario 1 : Desktop Excel 2013
This is the simplest scenario and can service the needs of individuals accessing local and remote data sources. In this scenario an Analyst is typically trying to address a specific domain problem related to an assigned task. The output of the solution may provide insights into a related issue or it may be a one-time solution. At this level Power Pivot, Power View, Power Maps and/or Power Query may be leveraged to address the specific analysis. One important characteristic of this scenario is that it may never venture far from the Analysts’ desk. Also, the content creators and consumers may be the same individual. The consumption mode is generally at the desktop level using Excel.
Deployment Scenario 2 : Desktop Excel 2013 Power Pivot workbooks deployed via SharePoint
This scenario expands on deployment scenario 1. This scenario also facilitates using multiple disparate data sources via Power Pivot and Power Query. A specific domain problem has been identified but the value of the analysis is far more reaching than an individual analysts’ desk. The problem may address a department need or the broader need of a functional area. SharePoint is leveraged for the dissemination of information to a wider audience. Data can be refreshed on a regular basis using Power Pivot refresh. Since the data is stored in the Excel/Power Pivot workbook, it is possible to drilldown into the transactions that support the analysis. This deployment scenario can support the operational, managerial, analytic and executive personas for domain specific problems. Any limitations are related to managing the volume of data stored in an individual workbook as well as preventing duplication of effort within an organization. The consumption mode is via desktop applications or Web Browsers. This includes smartphone and tablet based web browsers.
Deployment Scenario 3 : Power View, Excel workbooks deployed via SharePoint Leveraging Tabular/Multidimensional mode Analysis Services
In this scenario, the analysis can be much more detailed, and encompass much larger volumes of data. The physical data is typically external but may also be embedded in an Excel/Power Pivot workbook. Disparate data sources have been integrated via a Tabular model or an Analysis Services cube. Complex solutions can be deployed via Power View in SharePoint or Excel Services. This deployment scenario supports all of the personas and provides a great deal of flexibility in how the solution can be processed and delivered as well as a rich interactive user experience. The consumption mode is via desktop applications or Web Browsers. This includes smartphone and tablet based web browsers.
Deployment Scenario 4 : Power BI via Office 365
Power BI via Office 365 provides the ability to service all of the above personas as well as all of the consumption modes. As in deployment scenario 3, the analysis can be much more detailed, and encompass much larger volumes of data. This includes full access to all of the Power BI components (Power Maps, Power Query, Power View, and Power Pivot). Additional information consumption methods are available above and beyond the base interactive functionality. For instance, there may be a situation where a specific analyses does not exist. Power BI via Office 365 provides a question and answer mode in which natural language queries may be executed against the associated Excel/Power Pivot workbook data source (e.g. “Show total sales by product category”). A dictionary of business terms (synonyms) should be created to fully leverage this capability. The synonyms map information consumer concepts to the underlying analytic model. Information may also be consumed via a Power BI app available via the Windows App Store. The Power BI app is linked to an Office 365 site supporting the deployed Excel/Power Pivot workbooks. One item to note is that Office 365 BI capabilities (question and answer) are cloud based as a component of the Office 365 subscription. In deployment scenarios 2-3 on-premises or off-premises SharePoint can be used for information delivery. This raises the additional question of should the deployment mechanism for an BI content be on or off premises. This in itself is a complex topic and cannot be fully addressed without understanding an organizations specific requirements.
The above deployment scenarios by no means cover all of the possible options available for the deployment of Power BI, but they do represent the more common ones. If you are considering using and deploying Power BI the above scenarios can be used as a starting point and modified as required to fit an organization’s specific requirements.