The recent paradigm shift in data generation has seen both structured data but more significantly, unstructured data grow massively. Companies use structured data every day through relational databases and spreadsheets, where analysis can easily be conducted. Unstructured data, which comes in the form of news articles, research reports, regulatory filings, legal contracts and other client documents and images represent on the other hand a source of untapped opportunity in the entreprise world.
For example, financial information has historically been consumed either in structured form (as security prices or economic indicators – and then processed as such using standard statistical methods and stored in relational databases) or in unstructured form (as news articles, research reports, etc.), in which case the underlying information is not directly processed by software but rather read, digested and in general handled by a human. The financial services and insurance industries are highly dependent on data-driven predictive analytics where normalised structured data is critical, and there is enormous value in using all types of data sources to extract information.
It is very hard for companies to manage and extract value from the influx of unstructured data – It is estimated that 80% of the world’s data is unstructured, but businesses are only able to gain visibility into a portion of that data because it is simple very hard to understand and find meaning in data that is text-heavy:
1 | Processing these huge volumes of data efficiently is complex – Billion of GBs of data are created per day, and businesses struggle to keep up with this ever-growing amount of information.
2 | Extracting meaningful information is even harder – even when organisations realise that they can run their business more efficiently just by tapping into unstructured data, or simply because they simply have to, they usually rely on manual processes that are prone to errors and not scalable.
However, recent advances in machine learning and processing capabilities make it now possible to extract structured information from vast amounts of unstructured data, which can potentially transform the industry. By using dark data extraction, companies can create actual structured datasets to analyse more data than ever before and create a competitive advantage in the market.
This is the rationale behind sc.io