What is File Sanitization? Everything You Need to Know


Person wearing blue rubber gloves holding blue cloth and spray bottle to clean computer keyboard - Votiro

Ransomware continues to plague companies as the remote and hybrid workforce models leave new attack vectors. Research notes a 715% year-over-year increase in ransomware during 2020. Now that most disaster recovery and business continuity plans include robust data backup strategies, cyber attackers use a combined ransomware approach that both encrypts data and steals data. The new data exfiltration model means that organizations need to engage in more proactive data security protections.

Understanding what file sanitization is, how it can help mitigate ransomware attacks, and where you need additional controls can be the first step to protecting your environment. 

File Sanitization 101

File sanitization goes by many names, including file reconstruction, content sanitization, content disarm, file cleansing, and the list goes on. However, each one refers to the same similar process of mitigating malware threats by scanning files, identifying active content, removing active code, and recreating the file without the potentially dangerous code.

One way to think of file sanitization is to consider the way might redact files (if you weren’t using a Data Detection and Response platform). For example, when you send someone a PDF that contains sensitive information, you can use the redact tools to black out information such as your name, address, birth date, or social security number.

File sanitization tools do something similar but at the file metadata level. They don’t just hide the potentially risky code; they eliminate it entirely. Which, depending on your tool of choice, can be a good and bad thing.

File Sanitization – aka Content Disarm and Reconstruction (CDR)

Another term for the content sanitation process is Content Disarm and Reconstruction (CDR). Like file sanitization solutions, CDR vendors vary in their approach. For the purposes of this article, we will focus on the most advanced version of CDR, Level 3. For transparency, here are the other levels of content disarm and reconstruction:

CDR Type 1: Converting files to PDF

Utilizes file conversion or transformation to render the file safe. Level 1 CDR converts all files into a PDF file format, which eliminates the possibility of a hacker activating malicious code when a user clicks on a link or opens a document. However, this creates a flattened document, without any critical features such as macros, effectively transforming the file into an unusable document and negatively affecting productivity.

CDR Type 2: Stripping out active code and embedded objects

Level 2 CDR aimed to improve on Level 1. It focuses on stripping out only certain types of content, such as embedded objects or potentially active content, in order to ensure the safety of each file. However, the file still loses functionality, such as links, essential macros, and business logic. It also allows potentially vulnerable templates to remain within the document, leaving organizations prone to attack.

CDR Type 3 – Advanced CDR preserving file functionality

The most advanced form of an evolved CDR is Level 3. We call it Positive Selection® technology, and this level focuses on template-based reconstruction that allows full preservation of features and full functionality. Level 3 CDR rebuilds the document as it was originally intended to be used, copying only the known-good elements, positively-selected content, and ensuring only the safe template remains. This means business continues on as normal.

A Look at the Gaps

Unlike antivirus tools, CDR technology does not use detection or scan files in an effort to identify malware. Instead, content disarm technology aims to reconstruct the known-good components of a file onto a clean file template, leaving malware nowhere to be found.

For example, consider the average ransomware attack method: a cybercriminal uses a social engineering tactic by sending end-users emails that prompt them to download a file or click a link, and then use this action to install any number of malware types. Simple CDR tools will remove all active content in the email attachment and any metadata containing active content, while advanced CDR tools can scan these same files and replicate only the known-good elements so that no malicious code is carried over into the new, clean version of the file – while leaving active content safe and usable.

Active content includes: 

  • Macros
  • Add-ins
  • Data connections
  • ActiveX controls
  • Spreadsheet links
  • Color-theme files
  • Cascading style sheet (CSS) files
  • Links to external pictures
  • XML expansion packs
  • Media files
  • XML manifests
  • Smart documents

What is File Metadata?

Almost every file contains metadata. Metadata comes in three types: 

  • Descriptive: information about the content, such as title, author, publication date, subject, publisher, description
  • Structural: information about how the digital media’s components relate to one another including types, versions, relationships, file format, and size
  • Administrative: information about the file’s technical aspects such as technical information about decoding and rendering, preservation information for long-term archiving, and rights information like usage rights

Since metadata is coded into the digital asset, many users don’t know it exists. For example, if you’ve ever worked with an Excel spreadsheet, you might have used the macro capability. These macros record keystrokes and mouse clicks so that you can repeat the processes without starting over from scratch every time. Every time you create a macro, you used some basic coding. 

Malicious software works the same way. Embedded in a downloadable asset, cybercriminals hide malicious code that can execute when someone opens the document. This means that any metadata where this code can hide is risky.

How File Sanitization is Different from Detection-based Solutions

Traditional antivirus tools work by creating a database of known virus codes and comparing file codes to this database. Problematically, even the most advanced artificial intelligence (AI) tools can only do so much guessing. Let’s take a closer look at the reasons why file sanitization is different from detection-based solutions:

Why Antivirus Is Not Enough

Antivirus protection that’s based on detecting malicious code comes with several limitations. First, for AI to be effective in predicting computer virus mutations, you need an incredibly large data set. Machine learning (ML) algorithms are only effective when they can ingest as much data as possible. However, ingesting this much data and appropriately labeling it can be difficult. For detection-based AI/ML to work, you need to understand how the tool collects and monitors the information. 

Second, no matter how much data an AI collects and analyzes, human malicious actors will always be one step ahead. AI/ML detection-based tools offer predictive capabilities, but they can’t think like a cybercriminal. These tools can only take a backward-looking approach to predict the future; they can’t think creatively like a cybercriminal. However, with the rise of generative AI tools, this might not always be the case.

File Sanitization Doesn’t Guess, It Just Removes

With file sanitization, you never have to worry about whether your predictive analytics are going to fail you. That’s because file sanitization removes all risky elements contained in files, without looking for specific indicators. So, while AV alone can take days, weeks, months, even years to detect a hidden exploit, CDR stops malware on day zero.

File sanitization leaves behind all malware—whether it’s a known variant or something entirely new. The solution reconstructs the file with only safe elements so that you have all the information you need without any of the risks. If an employee accidentally downloads a file that a cybercriminal intended as a ransomware delivery method, you don’t need to worry because the malicious code has been removed.

Votiro’s Advanced File Sanitization Keeps You Safe from File-borne Threats

Votiro’s Positive Selection® technology provides continuous, proactive threat prevention. Instead of working from an outdated database that can miss new and innovative malware, Votiro removes all file elements that cybercriminals can weaponize. Instead of worrying about what new phishing attacks cybercriminals come up with, you can rest easy knowing that all files are free from malware or ransomware variants, even if they have just been created. Best of all, Votiro sanitizes files in real-time, while they’re still in motion, before the user ever receives the file.

With Votiro, security teams no longer need to worry about potential false positives created by antivirus, because only non-malicious elements of the file have been preserved. By eliminating any potential hidden threat in files, file sanitization reduces alert fatigue and frees up your security team to focus on other threat vectors. And we don’t stop there. Votiro also protects sensitive data in real-time via policy-based data masking. Our Zero Trust Data Detection and Response platform is built on the foundation of advanced CDR, antivirus, and enhanced to cover both sides of the data security conundrum.

If you’d like to learn more about how to get started with Votiro, schedule a demo with us today!

background image

News you can use

Stay up-to-date on the latest industry news and get all the insights you need to navigate the cybersecurity world like a pro. It's as easy as using that form to the right. No catch. Just click, fill, subscribe, and sit back as the information comes to you.

Subscribe to our newsletter for real-time insights about the cybersecurity industry.