Experts weigh in on how to build analytic cultures of excellence

anne_milley · Apr 12, 2021 03:01 PM

Treating data as a core asset, cultivating curiosity, encouraging experimentation, continuing to invest in training and mentoring – these are just some of the many important takeaways from the plenary and panel discussion featuring Loren Perlman, VP of Science at Riffyn; Andre Argenton, VP of Core Research and Development at Dow; and Kumar Subramanyan, Director of Data Science at Unilever. Watch the on-demand version of this episode of Statistically Speaking.

Here's an excerpt on data as a core asset from Loren's plenary talk:

[video]

We had many good questions from the audience that we weren’t able to answer at the time. Our featured guests have kindly provided answers to some of the questions. We thank them for sharing more of their wisdom.

Can you talk more about FAIR data practices?

Loren: FAIR stands for Findable, Accessible, Interoperable, and Reusable, defined as follows:

Findable means that there are search and organization terms associated with data sets that make them findable when using relevant search terms, such as project, experimental purpose, etc.
Accessible means that once the data is located, it is obtainable for use – I can literally download it. This may come with associated authorization or authentication requirements, depending on a given organization’s data access policies.
Interoperable means that the data sets can operate with applications or workflows for analysis, storage, or additional processing.
Reusable means that the data is well documented with associated metadata to enable recombination in different settings. In other words, the data and associated metadata use shared vocabularies (onotologies) that enable an understood relationship between data sets – defining how are they related via the data or metadata contained within.

More thorough information about FAIR can be found here: https://www.go-fair.org/fair-principles.

How have you addressed pushback on purchasing JMP licenses over the “why can't we just use Excel” argument? Any tips for proving to non-users that JMP is worth it?

Kumar: This is precisely the argument many organizations are facing. Microsoft Excel is pervasive in organizations, but increasingly, you have folks getting trained in R and Python. The question is what is the value of investing in JMP? The latter point is relatively easy to dispel – R, Python and other open source analysis tools need a certain level of expertise and training. As a result, they are not the route to democratizing analytics, even though they may be a route to democratizing specific analytic solutions in organizations. The conversion from Excel to JMP is the right question to address, and the following points will help this journey:

Use Excel as a starting point to get people on the analytics journey, if needed. Basic data management/organization and analysis can be taught in Excel.
In the next phase of up-skilling, embed JMP as a key enabler. Showcase its unique capabilities and advantages over Excel (for example, design of experiments, visualization, handling larger data sets, significantly larger number of methods and models).
Build communities of JMP users around application areas; this works better than targeting individual users.

Andre: There are really two approaches, and the choice of the right one depends on many factors. One approach is the corporatewide one. It requires someone in the company’s leadership to understand the value and buy in to the idea that a transformation happens faster when there is top-down commitment and support. If you do not have a champion at the top, then you should consider the second approach, which is one of growing support through success cases. For this approach, you need to literally run an experiment and have a subset of the organization operating as a test case. This trial group would be given the chance to operate with the right tools, such as JMP, as well as the right support structure. Consider it a case study to show value. The success of this group would feed the future investments to expand.

Loren: It’s a fair question from management, as they are trying to illustrate ROI (return on investment) for any money spent on new tools or strategies for data management. Excel is admittedly a fast and easy tool for limited-scale data sets and basic analyses. The problem appears when either the data sets become substantial in scale (and substantial means different things to different people) or the modeling needs become significant. There is simply no comparison between the power of JMP as an analytics platform. I think Kumar and Andre covered the right approaches – establish a core capability and provide support so that users can demonstrate clear value from implementing the tool.

What are specific ways to incentivize change and growth toward a culture of analytics?

Kumar: I have a few suggestions:

Invest in the right tools (systems) to gain user conversion/adoption. Focus on a great user experience and seamless data access.
Identify early adopters/business champions and use them to drive further adoption.
Implement incentives, such as learning credits, that can contribute to future career opportunities. Redefine roles to include analytics as a key skill set.

Andre: Celebrate small victories and celebrate behavior. When a researcher uses new skills, a novel approach to analyze the data, or a different approach to visualize the data, that example should be celebrated as much as the discovery of new material.

Loren: As the cliché states: Money talks. Offering appropriate rewards for change in culture go a long way. Maybe offering a lunch for the team that demonstrates the cleverest use of analysis or a cash reward for problem solving that might not have been accomplished without a data- and analytics-first approach. I would say that it also starts during the hiring process – we need to hire people who have certain skills or understand they will be expected to develop them on the job. I do think we tend to silo skill sets a bit (for example, a data scientist does the modeling; a bench scientist does the experimentation). Why can’t people be expected to upskill and continuously grow in multiple dimensions?

I've been hearing a lot lately that we should make data a core asset, and I agree. My question is how do we do that, and what does that look like?

Kumar: This needs to be addressed at all levels of the organization to be fully realized:

At the top level, there needs to be a data strategy that defines what core data is, how it will be managed (acquired, stored, protected, accessed), and how it will be used to create value.
At the lower levels of the organization, two things are required: 1) instilling a culture of pride in data and data as a shared asset in the organization; 2) operational processes and tools to enable good quality data to be collected and managed as a shared asset.

Loren: We see this a lot in deployment of Riffyn Nexus. The short answer is that it needs to be easy to collect high-quality data and organize it according to FAIR data practices. If there are barriers to data collection and management, then users will simply revert to existing behaviors, such as siloed data in locally saved spreadsheets.

If your organization has a very minimal data culture – little to no alignment between business processes, data and business decision – what do you focus on first?

Kumar: Focus on business processes that are generating large amounts of data. Show the value of analyzing these data in terms of new insights that lead to better decisions or operational efficiencies from automating analysis/reporting. Once the impact is recognized, move to systematize data capture/data quality issues and show improvements in quality of insights/decisions. Lastly, identify opportunities for business process improvements that will invariably come out of the analytics. These improvements are usually much harder to implement unless their benefits are significant.

Andre: In alignment to Kumar’s point, I would focus on those areas where you can effectively make the most impact and where you can deliver value quickly. Say you have two opportunities: 1) a $100 million opportunity that will take multiple years to be realized and that requires massive effort, or 2) a $10 million opportunity where data is largely available, stakeholders are more committed, and analytics has an obvious and large role to play. Choose the $10 million opportunity. Deliver on it, gain credibility and trust, and work your way toward the point where the organization can address the larger opportunities.

Loren: This intersects tightly with the cultural change and incentivization question. It requires a top-down investment and vision, along with bottom-up tools and attitude to make it happen. I will note it is not easy, but it is deeply rewarding at every level within an organization. Management needs to create the space and incentive for this to happen, while providing the infrastructure to drive it.

Has there been any evolutions in HR to support building of a culture of analytics, such as job descriptions, incentives for up-skilling, etc.?

Kumar: Yes. HR needs to and is playing a critical role in building an analytics culture in organizations. It spans from including data and analytics as a specific skill, creating analytics roles, building a recruitment network, and championing up-skilling of employees through learning opportunities.

Many organizations have seen the concept of citizen data scientists used to promote/elevate the base of the organization to a higher level of analytics awareness and culture. They create incentives, ranging from certification, learning credits that help further career opportunities, and reverse mentoring for senior management.

How do you aggregate data across a large group? What tools can be used to take the data from one problem so that it has relevance to the greater, aggregated data set?

Kumar: For a group that has a common domain (such as life sciences or manufacturing), the key is to standardize a process for data capture, which will drive efficiency and, more importantly, data quality. The biggest hurdle to data sharing is poor data quality, like missing data or lack of metadata. Improving data quality can be done via a tool such as a standardized data capture template or a user-friendly workflow (Electronic Lab Notebooks work well). For groups across domains, it can be more challenging. It works best if there is a defined use case that needs to access data across these domains. Assuming this has been established, bringing together these data sets in a cloud environment for analysis is the standard operational method today. The key is to allow search and discovery of the data across these groups to facilitate analytics solutions.

Loren: I want to echo Kumar’s excellent comment about standardizing processes for data capture. This means providing the tools and ontologies that create consistency and alignment within groups. Driving data aggregation between groups can be achieved by understanding how data and materials flow between groups. It means aligning on simple things, like sample naming schemes, so that data can be joined in a data lake or a tool like Riffyn Nexus.

How do you get people to trust the results of analysis when they don't fully understand math or statistics?

Kumar: This is indeed one of the challenges to creating a pervasive culture of analytics in organizations; therefore, the importance of change management cannot be underestimated. We talk about the need for raising the floor of the organization, which refers to raising the awareness of data and analytics and its value across all levels of the organization. There is also a need to target the managers and leaders who need to make the business decisions. We are increasingly seeing data and digital transformation training for leaders. In doing the above, there is also a need to address the hype vs. reality dichotomy that exists among the non-experts. There are some who believe analytics (AI/ML) is a panacea to all business problems, while there are others who will cite the glorious failures to support their position that the value of analytics (especially AI/ML) is over-hyped. The truth is, of course, in the middle, so the goal of all training must be to showcase this as the art of possibility, laced with a sense of realism to drive the right behavior and outcomes in organizations.

Andre: I agree with Kumar’s point of raising the floor and that starts with the need for continuous education. I also believe that a center of excellence with highly respected individuals and subject matter experts in the field will provide the assurance for people to accept it. With that acceptance, results and trust will naturally build within an organization.

The key to generating value is taking action from insights and analytics. How should organizations drive action along with analytics?

Kumar: Clearly, a transformation to a culture of analytics will fail unless there is value created in turning insights into actions. Therefore, it is important that a very clear set of actions are agreed upon at the beginning of every analytics project. Every result or prediction must have an specific outcome for the business, whether it is a measurable improvement in efficiency or an impact on a business process or decision.

Could you please touch on how ethics and sustainability aspects can be addressed in an industrial setting?

Andre: I believe that the meaning of ethics and sustainability in this question is regarding the ethical treatment of data and long-term sustainability of the digital solutions proposed (as opposed to general ethics in industry and general sustainability challenges to the planet).

The ongoing debate of ethics in data and data treatment that is being discussed across academia and industry is a very important one. In the ideal world, data generated is transparent, and analyses from the data are reproducible, auditable and self-explanatory for future users. Professor Philip Stark from Berkeley talks about the concept of “preproducibility,” and it is one that addresses this topic well. I see industry and academia evolving toward a system of data transparency within a specific field and a specific group (that is, everyone in an organization who can create value from that data should have access to the raw data and its associated metadata).

Addressing the question of sustainability of systems and processes, industry always has to make a decision when developing a system and data generation and analysis workflow. Do you choose a rapid development that serves one project really well but is not scalable? Or do you choose a longer and more expensive development of a system and data generation workflow that is holistic and applicable across all of the workflows? The answer, in my view, is we need balance. Some of the fundamental pieces of data structure and processes in research that will impact most of the workflows should be treated holistically, since it is cheaper and better to do it this way. However, unique systems, unique projects and unique needs will always exist, so it is important to have the flexibility to accept those in your data architecture and data culture. In other words, a pragmatic approach should always be considered.

Watch this episode of Statistically Speaking via our website.