Designed to Be Breached – Automated Document Consumption


A series of blue digital locks with a few red digital locks scattered in.

By David Neuman, Senior Analyst, TAG Cyber

The use of automated document consumption and data extraction processes presents the opportunity for greater business efficiency, lower cost of process ownership, positive customer experience, and…wait for it…risk of cyber exploitation and material business impact. Automated document consumption is a process of importing or extracting valuable information from varied manual formats. Companies spend an extraordinary amount of time on manual form processes that are costly, inaccurate, and time-consuming. Imagine the environment where a business integrated workflows and consumed or extracted data from documents with great efficiency. There are many industries on this journey.

The Value of Automated Document Consumption 

In the healthcare industry, many forms have free-form text, dense paragraphs, checkboxes, and tables. Perhaps the most sensitive of those in the healthcare industry is prescriptions. Ensuring accurate and protected data is maintained in fields that contain potentially lifesaving details on medicines or other medical treatments is critical to patient safety, especially when transiting between different entities…not to mention the many scripts that travel digitally between entities.  

Imagine the potential patient’s experience during a doctor’s appointment when they are asked to fill out a paper form, knowing that data already exists in their medical record. If they need to see a specialist, they need the information that exists in other electronic documents, a data lake of images, or medical systems. The experience of similar data in different forms updated by different entities is not only a miserable experience but potentially dangerous to patient safety and well-being.  

Automated document consumption could revolutionize that experience and deliver a higher quality of care while significantly lowering costs. This is evident with the rise in adoption of cloud data and digital content management services to capture, share, and use data across organizations and collaborate with third-party business partners. 

Other industries, such as chemical and utilities, have been keeping records for decades; those documents contain valuable information about business operations, maintenance records, and safety standards. The documents have a mix of text and images, making the production of a document pipeline a challenge. In insurance, fields such as estimates for repairs, property addresses, case identity numbers from sections of a document, or classification documents on claims between parties are areas of opportunities to improve processes.  

What Are the Risks of Automated Document Consumption? 

All these examples make the opportunities for automated document consumption very enticing, but as with other technologies and digital transformations, it is not without its risks – nothing is without risk. Most security problems are data problems and thereby make this area ripe for cyberattack and exploitation. Some of the risks are traditional such as data corruption through uploading malware or compromising access to sensitive data. This can also lead to ransomware attacks on large data repositories.  

Unbeknownst to customers, many service platforms that provide data sharing, collaboration, and orchestration are not scanning for these kinds of threats natively – and many threats are undetectable via scanning anyway. It is the responsibility of the customer to protect their data in shared security models that are long overdue to be updated. This risk becomes even more complex when you start dealing with third- and fourth-party relationships.   

These technologies and process transformations also present challenges for developers. Those who rely on vast data lakes in the cloud where the file uploads to the developer space need to be cleaned before ingestion into the data lake: 

  • Who is responsible for that activity?  
  • What designs are necessary for ensuring data is secure and maintains its integrity?   
  • Who responds when deviations or compromises occur, especially in multi-organizational (even multi-national) environments?   

It’s not sufficient to land a solution for automated consumption, especially for sensitive information, that doesn’t have clarity on the protection capabilities in process and technology. All these challenges are based on today’s environment. Sometimes a single solution can create previously unknown issues for tomorrow. 

As document consumption is automated at scale, it has the potential to change the stateful condition of information. For example, an application process for a position of high trust in an organization may require information on a person’s credit, criminal background, current and previous addresses, financial investments, friends, family, acquaintances and so on. Together, the stateful condition of this application is highly sensitive. All this information needs to be verified and likely spawn other processes (i.e., breaking up the information into smaller pieces, possibly other forms given to third parties).  

Suppose the address validation is given to a third party or service to verify. That information is far less sensitive but supposed a deviation is discovered, and another form is completed, identifying potentially damaging information. The stateful condition then returns to a highly sensitive state. This is a byproduct of greater efficiency in document consumption that could lead to previously unidentified risks. 

Solution Considerations in Planning the Use of Automated Document Consumption 

Does the CDR Solution Integrate with Content Collaboration Tools and Platforms? 

Content collaboration platforms such as Box, O365, S3, and Slack remain uncontrollable for many CDR vendors. Alternatively, vendors that provide an Open API can integrate CDR support into various software solutions, incorporating CDR protection every time users share files. 

How Does it Handle Compromised Vendors? 

Business email compromises (BEC) and Vendor email compromises (VEC) happen, allowing attackers direct access to send emails from apparently legitimate email addresses. Many anti-phishing training programs teach users to look for abnormal domains in the email address and trust those from trusted vendors. When VEC occurs, the user receives a malicious email from an otherwise trusted domain, often leading to malware infections. Blocking only when compromises happen leads to reactive, manual processes. Disarming and rebuilding by default eliminates the effort and reduces the load on staff while eliminating the risk of BEC and VEC. 

How Does CDR Handle Password-Protected Files? 

Working with password-protected and encrypted files is a challenge for any CDR. Because the content is inaccessible by default to the CDR, they are challenging to assess appropriately. The file is held in temporary storage with an advanced CDR solution, requiring the recipient to provide a password or decryption key. Once the user provides the information, the CDR assesses it like any other file, then rebuilds it using only safe components for the user. This process passes no bad elements into the organization while only temporarily obstructing transmission.  

Does Your CDR Work with Remote Browser Isolation (RBI)? 

By creating an isolated environment, RBI allows users to navigate the web safely with a buffer between their system and online threats. Having a CDR solution integrated with an RBI partner allows users to get all the benefits of the RBI for generalized web browsing while adding on the protection of a CDR. 

Can it Provide Security Metrics? 

A good CDR solution helps provide security metrics by removing the threats early in the MITRE ATT&CK framework. This framework seeks to identify and stop threats in the earliest stages of exposure to reduce potential impact. Eliminating them at initial access rather than after infection dramatically reduces detection time, improving operational performance metrics. Additionally, integrating this data into security operations helps provide leading intelligence indicators on attack vectors against an organization. 

What Analytics Does the Platform Provide? 

Given the volume of data orchestrated by a CDR platform, analytics should be able to identify trends on the throughput of files, sizes, and types. For example, huge files being transferred instead of using cloud storage. In addition, analytics can surface insights on threats found within files that the platform blocks. Highlighting suspicious files coming from a specific source or through a specific application can be an important indicator for cyber defenders or threat hunters. 

Solutions within various platforms and against diverse threats isn’t just an option—it’s a necessity. The dynamic relationship between CDR and collaboration tools, compromised vendors, password-protected files, Remote Browser Isolation, and security metrics illuminates a path towards proactive and adaptive security. It emphasizes not just defense but a transformation in the way we approach cybersecurity, moving from reactive measures to anticipatory strategies.  

By considering these aspects, businesses are not only fortifying their current security measures but also investing in a resilient future, turning potential vulnerabilities into opportunities for enhanced protection and intelligence. This shift reflects a maturing and forward-thinking cyber defense culture, pivotal in the age of relentless digital innovation and threats. 

background image

News you can use

Stay up-to-date on the latest industry news and get all the insights you need to navigate the cybersecurity world like a pro. It's as easy as using that form to the right. No catch. Just click, fill, subscribe, and sit back as the information comes to you.

Subscribe to our newsletter for real-time insights about the cybersecurity industry.