By Kedar Samant
If you ask a data analyst what data they need, many will answer "All of it," no doubt in the belief that this helps solve business problems.
They’re not wrong. Organizations need information to address important challenges. UPS, for example, has been using GPS tracking devices to collect data on their delivery vehicles since 2010 in order to improve efficiency and customer service. Meanwhile, a typical organization in using fraud detection relies on a minimum of 10 data sources.
Apart from the benefits of obtaining additional data, companies can also gain advantages from extracting more insights, signals and features from existing data, which provides the context to generate even more.
But these opportunities create their own challenges. Why? It takes time and money to invest in data generation, analysis, and conversion into intelligence, which makes these processes easier said than done.
Here are three challenges that hold companies back from using more data and three difficulties of doing more with that data:
Obstacles to Acquiring New Data
New data feeds can come from third-party sources like an external blacklist or whitelist. They can also originate from richer datasets such as mobile device engagement, user behavior, or cross-channel information derived from technologies like interactive voice response (IVR).
Some challenges to the “all of it” solution arise from the difficulty of assessing informational value, business-related roadblocks (legal, political, or organizational), and unusable or stale data.
Proof-of-value (PoV) assessment of data acquisition is never an easy task. That’s especially the case when the information is incomplete or impossible to evaluate without the whole dataset. This leads to hard-to-answer questions. For example, how much lift would a fraud machine-learning model get by using a new third-party identity data source provider?
Many businesses ultimately rule out anything beyond a limited PoV without first acquiring the complete data. They invest in the project and spend the next months or years acquiring a feed. Then they conduct a PoV, only to realize that the value of the data supply doesn't justify its acquisition. But by then, it's too late. The organization can't recover the time and money it's already invested, so it holds off from adding other data feeds.
2. Organizational Friction
Any number of business challenges could stop an organization from adding a data feed. For example, the legal department could say that the organization can’t add a certain type of information because it’s not specified in the end-user agreement, or the political costs involved with different departments reaching out from their own silos might be too great. These roadblocks all eat up time and money just as integration, limited domain understanding, insufficient training, and a lack of data linkages do.
Organizations might be able to acquire a new data feed but not in a way that helps them. Perhaps the data’s format isn’t easily convertible, or maybe it’s available only online. Alternatively, the information could be stale. Companies need to make sure they can acquire fresh, quality data in a format they can use to contextualize it within the existing data lake.
Difficulties of Doing More with Data
As organizations acquire additional information, the value of that data occasionally reaches a plateau. This prompts companies to generate even more data until the value of those facts and figures inevitably also levels off, thereby perpetuating the cycle.
In terms of ROI, organizations are likely to get more value from exploiting data they already have than to go and acquire more data feeds. While expensive initially, the costs of doing more with data eventually pay off. These improvements might include better data stitching, deeper data analytics, stronger feature engineering, and contextualizing information within the existing data lake.
Obstacles that prevent organizations from doing more with data include limited platforms, lack of expertise, and immature processes.
Companies might want to do more with their existing data, but their platforms might not allow them to do so. The visualizations might not convey significant insights, for example, or might not provide access to vertical-specific modules such as anomaly detection for fraud management. In both cases, the platform fails organizations and limits what they can do with their data feeds.
Even with a supportive platform, organizations might not have the proper people in place to facilitate such engagement. Companies need people with the right expertise and engineering expertise to get the most out of existing data. Even so, it's not always easy to find someone who can do something meaningful with information.
Organizations might have the necessary platform and people in place to do more with data. But the absence of mature processes could limit their ability to launch a new data-centric project.
For instance, an enterprise might try to improve its data analytics by listening to the advice of a single employee rather than using a more predictable method. Such ad-hoc arrangements make it difficult for organizations to justify the value of a proposed project. They also steer enterprises towards business decisions that limit their ability to leverage more data for better insights.
The Way Forward
Organizations can benefit from handling more data and doing more with their existing data, even when obstacles stand in their way, because these challenges are manageable.
Enterprises can use techniques like better data-stitching to get more out of their available information while having the flexibility to keep adding these on an ongoing basis. Organizations can do a proof-of-value assessment of a data feed before they acquire it. They just need the proper tools.
About Kedar Samant
Kedar Samant is co-founder and CTO of Simility. A fraud detection industry veteran, Samant is responsible for the technology underlying Simility’s algorithm engine and end-to-end enterprise fraud management platform.