How SCIO works

Automate Information Extraction uses sophisticated machine learning to automatically turn “dark” data (unstructured data buried in text, tables or figures which by definition cannot be processed by existing software or analytics platforms) from documents such as legal and commercial contracts, regulatory filings, web pages, news articles or annual reports into machine readable datasets. automates complex screening and analysis processes by extracting relevant data points and producing a predefined structured data output, replacing tasks that would otherwise necessitate tremendous human effort.

What makes special

Quality at scale achieves better than human extraction quality while scaling up to very large number of documents.

Extraction process automation via machine learning leverages the latest advances in ML models to extract complex document-level information that is expressed in the form of not only free text, but also tables or in visually distinctive ways.

Easy to set up is designed as an end to end integrated workflow from data collection to the production of structured results, accessible via simple REST API calls.

How does work?


Define the specific data points (name, date, entity, tables, etc.) that you need to retrieve


Train a ML model on a subset of the documents (text, PDFs, articles, web pages, etc.)


Once the ML is ready and deployed, send the documents to our hosted infrastructure or process the documents locally


Retrieve a JSON/XML result file containing the extracted data points in a structured form via an API call