Data connectivity key to insight discovery

Data connectivity is crucial for organizations as they try to adapt to quickly changing economic conditions.

Beginning with the onset of the COVID-19 pandemic in early 2020 and extending through supply chain disruptions, spiking inflation and now the war in Ukraine, economic conditions have been in constant flux.

That uncertainty has forced organizations to change the way they act and react. A mere three years ago, they could use past performance to do monthly, quarterly and even yearly forecasts to plot out their actions over the course of long, seemingly predictable periods.

Now, they need to be able to analyze data in real time to understand and react to changes and they need to do scenario planning to determine ahead of time how they'll act and react if one potential reality plays out versus multiple others.

Analytics has been at the heart of that real-time analysis. It is no longer the luxury of certain organizations before the pandemic but instead a necessity for at first surviving and then perhaps succeeding and even thriving amid uncertainty.

"Digital transformation got an acceleration two years ago and continues, and data is an inevitable portion of that," Stijn Christiaens, co-founder and chief data citizen at data management vendor Collibra, said recently as part of a panel discussion during Graph + AI Summit, the spring edition of the biannual open conference hosted virtually by graph data and analytics vendor TigerGraph.

But simply deciding to become data-informed, hiring a team of data scientists and deploying an analytics platform is not nearly enough.

Organizations have plenty of data that can be used to inform the decisions that could be the difference between survival and failure, but it's often wildly disconnected, with different departments using different systems and in some cases decades of data isolated in myriad different on-premises repositories.

What they need is data connectivity.

They need to connect data relevant to their business -- both the data they collect on their own and external data -- to give it context and make it usable and meaningful. And they need to connect it with both manual processes and technologies that discover relationships between data points and automatically connect them.

And only then can they use data to inform the decision-making process.

"You need to be able to bring data together from various different places," said Antoine Amend, senior director of financial services at AI and data lakehouse vendor Databricks. "That contextualization of information, that linkage, is what leads to actionable insights."

Blockchain, Datos, Registro, Mano

Avoiding initial missteps

While making data operational is a vital step organizations need to take before they can become data-informed, staffing is important as well.

Too often, organizations err by hiring a team of data scientists at the start of their attempt to transform. Data scientists are indeed the people needed to build augmented intelligence and machine learning models and do predictive analysis to help determine the path an organization should take, but their skills are wasted if there isn't already connectivity between their organizations' data.

For that, they need data engineers.

They're the ones whose role it is to prepare data, take it from its various sources, connect it and organize it in data catalogs and indexes where it can be found and put to use by data scientists and data analysts.

"Organizations are hearing about AI and machine learning and they want to do that, so they hire data scientists and PhDs in physics, and the first thing they wind up asking is, 'Where is the data?'" Christiaens said. "There's no place for the data scientist to even start searching, let alone even bring it together, so a lot of the work they wind up doing is digging like a pig for truffles trying to find the data."

Similarly, Simon Walker, managing partner of Kubrick Group, a firm specializing in technology staffing, said that he also sees organizations looking to hire data scientists before they're ready to give data scientists anything to do, though he noted it was more common when Kubrick was first getting started about five years ago.

"We were speaking to clients they said they need data scientists, but we knew there was nothing for them to analyze," he said.

Some of the organizations Kubrick was helping were 100 years old and had gone through a dozen or more mergers through the years, Walker continued.

"There was no [data connectivity]," he said. "Data scientists would have spent their lives doing data engineering. Engineers have to go in there and start connecting the data and give the data scientists a platform to actually come along and do the smart stuff."

Once they have prepared data to attain a degree of data connectivity, organizations can progress toward the analysis that leads to insight and action. And then the data scientists can even build more connectivity, not simply connecting data points but connecting the models and insights they develop to continuously improve their organization's decision-making.

Advancing technology

For many organizations, data connectivity -- at least initially -- has to do with engineering previously collected data from disparate on-premises sources and cataloging it.

But data connectivity also relates to connecting new data coming in from various sources as well as bringing in external data and using it to add broader context.

Meanwhile, the amount of data organizations are collecting is increasing exponentially. According to Statista, the volume of data created, captured, copied and consumed worldwide in 2010 was two zettabytes. By 2020, it was 64.2 zettabytes, and by 2025 it's expected to reach 181 zettabytes.

That means organizations need to find ways to harness the increasing amounts of data they're collecting to make it actionable.

And traditional software tools weren't built to handle the exponential increase in data.

In particular, traditional relational databases could become too cumbersome to meet the growing data needs of organizations while the graph database technology that TigerGraph specializes in has the potential to be an analytics enabler, according to the panelists.

Graph technology enables data connectivity between multiple data points simultaneously within databases, while data points stored in relational databases can only connect to one other data point at a time. And that ability to connect to multiple data points at once can lead to the discovery of relationships that users might not have otherwise discovered -- or would have taken them far longer to discover -- and results in faster, more in-depth data analysis.

In addition to TigerGraph, Neo4j is a prominent graph data and analytics vendor, while tech giants including AWS and Microsoft offer graph database options.

"Conventional analytics will fall short of doing everything we need to be able to do today," Christiaens said. "I'm a big fan of graph. Graph offers a lot of interesting algorithms that, if you had to do conventionally, you would be spending a lot of time rewriting algorithms."

Because graph's ability to handle scale and discover relationships between data, in October 2021 Gartner predicted that 80% of data and analytics innovations will be made using graph technology by 2025, up from 10% in late 2021.

"With graph, you can ask different types of questions of data, like 'What if?'" Amend said. "It finds relationships that were hidden."

Those hidden relationships are essentially the product of sophisticated data connectivity, and better enable organizations to navigate constant change.

"Companies are looking to data to adapt, and they're looking to data increasingly to support their decision-making," said Harry Powell, head of industry solutions at TigerGraph. "And that requires them to connect data together more than they've ever needed to do before."