Subscribe Bookmark RSS Feed
anne_milley

Staff

Joined:

May 28, 2014

The well-appointed analytic workbench

What do I mean by “analytic workbench?” Basically, the compute-resource environment with which data analysis takes place. How would you describe some of the analytic workbenches in your organization? Not everyone is a power analyst, so not everyone requires power tools. But all of us deal with data at some level, and all of us can benefit from making sense of our data more efficiently and effectively. See if there are opportunities to make the analytic workbenches in your organization more productive.

Memory and Storage

Do you have lots of memory available on your desktop and/or servers? Do you have plenty of storage locally and remotely? More memory creates more analytic bandwidth, not just through more efficient compute resources, but also through less wait time on the part of the analysts — who are increasingly among the more expensive talent. Both memory and storage, on the other hand, have become quite inexpensive, as shown in JMP Graph Builder with data from John C. McCallum.

 

Form and Function

Easy-to-use, well-designed interfaces help you stay “in flow.” Even the savviest coders and spreadsheet users can benefit from interactive, dynamic visualization. Staying “in flow” has been one of the guiding principles for developers in JMP — minimize the drudgery and complexity while speeding up discovery.

Big operational risks are incurred by using spreadsheets to do things they were never intended to do. In 2008, just before the global financial crisis, an analyst at a large bank had been using a complicated spreadsheet to calculate the bank’s risk-based capital requirements. This was a hairy, multitab spreadsheet containing numerous formulas and referenced cells with logic that only the person who built it could follow. When the analyst left for another job, it took many, many person-hours to calculate the capital requirements for the next reporting period. Wasted time and effort can be avoided if organizations wean themselves off of an over-reliance on spreadsheets and use well-documented code or applications that are more easily repeated and maintained. To facilitate this, the JMP Add-In for Microsoft Excel lets you maintain the familiarity of Excel’s interface with the option to do calculations in JMP, minimizing the potential for errors (accuracy of computations as well as the fat-fingering kind of error).

Most analysts prefer a fairly robust analytic workbench to use the right tool for the job, yet tend to gravitate to their favorite tool – or their analytic hub – for most of their work. Typically, this is the tool that lets them stay "in flow" while concentrating on solving problems in innovative ways. Good analysts are open to learning new things to increase their bandwidth — whether that’s more advanced training with tools they already use or introducing new tools to hone their workbench.

 Compute Environment

At SAS and at most technology companies, we are fortunate to have adequately powered hardware, good throughput, and ample memory and storage. Because JMP runs on both operating systems, I even have a Mac with Parallels for Windows to run both. Many organizations have old hardware, running operating systems that are soon-to-be or are no longer supported. Analysis is typically CPU-intensive, and as our sample sizes grow to keep up with big data, investing in an adequate compute environment is a must.

Many organizations have invested in their servers and ignored the desktop. However, when your expensive analytic talent is twiddling their thumbs because of slow bandwidth or a network outage, and they are left with nothing more than a spreadsheet on their desktop (if they even have a desktop), opportunity costs can be high. The analytically advanced should be provided with compute options — on desktops, on servers and remotely. With these analysts juggling multiple projects in different phases, they can be much more productive by having both adequate desktop and server compute resources. Especially in the exploratory phase, local, in-memory visual tools with instantaneous response will take advantage of greater visual bandwidth — analysis at the speed of sight, revealing the data’s structure more quickly.

Collaboration and Presentation

When the analyst achieves a useful model, she is not done. For the results to matter, they usually have to be communicated and acted upon. With so much data, we are all analysts at some level. Even when results are shared, we may ask more questions, which may lead to more analysis and insight. Because of this collaborative, iterative approach, many analysts prefer the power of visually and interactively showing the results to colleagues and to decision makers. How many times have you heard this scenario described? An executive asks for something that requires you to build a predictive model. You present the results to the executive, whereupon he says, “What if …?” If you show the results live, you have the potential to be more efficient with everyone’s time.

Suppose, specifically, that you were asked how you might improve the quality of wine.  Having modeled the relationship of factors affecting quality, you can show those relationships — and the effects of changing them using the Profiler in JMP. Below, if you drag the vertical red line for alcohol to increase it, you can see it increases quality (up to a point) as well as effects that change has on other factors affecting quality. You can switch from red to white wine by clicking the vertical red line for color. For either color, you can see that lowering density improves the quality. A great deal more can be done with profiling, but the point is that you can show live "what would happen if…." and see the consequences. We have actually heard reactions like “For the first time in my life, I actually understand regression” when people see results presented in this way.

[iframe src ="/content/jmp/files/2013/05/Profiler_wine.html" width="620" height="400" border="0"]

Open Wine Profiler in a new window

The data used for the purpose of this example are from the UCI Machine Learning Library.)

Data

While data are not strictly part of the workbench, data are the raw material and part of the workbench experience.

A good analyst is like a good chef. To attract the best, having a well-appointed kitchen is obvious. But would you then provide poor-quality ingredients? Organizations that ignore data quality pay, and pay, and pay. The most pernicious form of payment is what it does to the culture. I’ve seen analysts who run from projects that require them to work with poor data. The data are so bad they know that any time they spend on it will be fruitless — that any recommendation they make will be unjustifiable and that they could’ve spent their time more productively. If most of the workday is spent on projects involving poor data, it can be very demotivating, and good analysts may well leave. The other obvious payment for poor data quality is in the form of sub-optimal decisions and consequences. Data quality is ultimately a shared responsibility of everyone who uses the data. Organizations with good data have factored that shared responsibility into the culture.

Difficult access to data is another area often overlooked. If you are paying competitively for your analytical talent, you don’t want large chunks of that time being spent fetching and preparing data. Ideally, most of the data needed are centrally maintained and routinely updated. Any further data retrieval and manipulation for analysis purposes should be made easy — visually where possible.

Keeping Up with Technology

It is a balance. If you chase after the latest technology, your time solving problems and adding value will be less. If you stand still, your analytic infrastructure becomes less agile and can be unduly limiting. You should give some portion of time to trying new things and assessing where the investment in any new infrastructure or upgrade would have the highest return. From what I’ve seen, many organizations are data-challenged in various ways, overexposed to spreadsheet risk, underexposed to visual paradigms — both for discovery and for presentation — and underpowered in various parts of their compute environments. If data and analytics are considered strategic assets in your organization, I encourage you to assess the potential for retooling some of the analytic workbenches in your organization.

Note: A version of this post first appeared in the International Institute for Analytics blog.