What is File Sanitization? Everything to Know

May 7, 2021

Ransomware continues to plague companies as the remote and hybrid workforce models leave new attack vectors. Research notes a 715% year-over-year increase in ransomware during 2020. Now that most disaster recovery and business continuity plans include robust data backup strategies, cyber attackers use a combined ransomware approach that both encrypts data and steals data. The new data exfiltration model means that organizations need to engage in more proactive data security protections. 

Understanding what file sanitization is, how it can help mitigate ransomware attacks, and where you need additional controls can be the first step to protecting your environment. 

What is File Sanitization?

File content sanitization mitigates malware threats by removing potential malicious code from files. File sanitization solutions vary in terms of their approach, with many solutions scanning files, identifying active content, and removing all active content—even benign active content. The most advanced file sanitization solutions are able to identify and keep non-malicious active content (such as useful business macros) and only remove the malicious active content. Finally, the file is recreated without the potentially dangerous code.

One way to think of file sanitization is to consider the way you redact files. For example, when you send someone a PDF that contains sensitive information, you often use the redact tools to blackout information such as your name, address, birth date, or social security number. 

File sanitization tools do something similar but at the file metadata level. They don’t just hide the potentially risky code; they eliminate it entirely. 

File Sanitization Can Also be Known as Content Disarm and Reconstruction (CDR)

Another term for the content sanitation process is Content Disarm and Reconstruction (CDR). Like file sanitization solutions, CDR vendors vary in their approach. For the purposes of this article, we will focus on the most advanced version of CDR, Level 3 CDR. 

Unlike antivirus tools, CDR technology does not use detection or scan files in an effort to identify malware. Instead, CDR technology reconstructs the known-good components of a file onto a clean file template, so that malware is left behind.

For example, consider the average ransomware attack method. Cybercriminals use social engineering tactics, send end-users emails that prompt them to download a file or click a link, and then use this action to install the malware. CDR tools scan these files and only replicate  known-good metadata so that no malicious code is carried over into the new, clean version of the file. Less advanced CDR technology will remove all active content and metadata containing active content, which will also result in a safe document, but may result in a document that no longer works the way it was intended, resulting in frustrated users. 

What is Metadata?

Almost every file contains metadata. Metadata comes in three types: 

  • Descriptive: information about the content, such as title, author, publication date, subject, publisher, description
  • Structural: information about how the digital media’s components relate to one another including types, versions, relationships, file format, and size
  • Administrative: information about the file’s technical aspects such as technical information about decoding and rendering, preservation information for long-term archiving, and rights information like usage rights

Since metadata is coded into the digital asset, many users don’t know it exists. For example, if you’ve ever worked with an Excel spreadsheet, you might have used the macro capability. These macros record keystrokes and mouse clicks so that you can repeat the processes without starting over from scratch every time. Every time you create a macro, you used some basic coding. 

Malicious software works the same way. Embedded in a downloadable asset, cybercriminals hide malicious code that can execute when someone opens—or even just downloads—the document. This means that any metadata where this code can hide is risky. 

How File Sanitization is Different from Detection-Based Solutions

Traditional anti-virus tools work by creating a database of known virus codes and comparing file codes to this database. Problematically, even the most advanced artificial intelligence (AI) tools can only do so much guessing. Let’s take a closer look at the reasons why file sanitization is different from detection-based solutions.

Anti-virus is not enough

Anti-virus protection based on detecting malicious code comes with several limitations. First, for AI to be effective in predicting computer virus “mutations,” you need an incredibly large data set. Machine learning (ML) algorithms are only effective when they can ingest as much data as possible. However, ingesting this much data and appropriately labeling it can be difficult. For detection-based AI/ML to work, you need to understand how the tool collects and monitors the information. 

Second, no matter how much data an AI collects and analyzes, human malicious actors will always be one step ahead. AI/ML detection-based tools offer predictive capabilities, but they can’t think like a cybercriminal. These tools can only take a backward-looking approach to predict the future; they can’t think creatively like a cybercriminal. 

File sanitization doesn’t guess, it just removes

With file sanitization, you never have to worry about whether your predictive analytics are going to fail you. That’s because file sanitization removes all risky elements contained in files, without looking for specific indicators. 

Traditional content sanitization offers protection from malware by removing all active content, even if it doesn’t contain malware. However, this could render the document unusable to the user, who may need a specific function within the document to complete a work task. Advanced content sanitization doesn’t remove active content, allowing end users to keep full file functionality and integrity. Active content includes: 

  • Macros
  • Add-ins
  • Data connections
  • ActiveX controls
  • Spreadsheet links
  • Color-theme files
  • Cascading style sheet (CSS) files
  • Links to external pictures
  • XML expansion packs
  • Media files
  • XML manifests
  • Smart documents

File sanitization leaves behind all malware—whether it’s a known variant or something entirely new. The solution reconstructs the file with only safe elements so that you have all the information you need without any of the risks. If an employee accidentally downloads a file that a cybercriminal intended as a ransomware delivery method, you don’t need to worry because the malicious code has been removed. 

Votiro’s Positive Selection Technology Can Help

Votiro’s Positive Selection technology provides continuous proactive threat prevention. Instead of working from an outdated database that can miss new and innovative malware, Votiro removes all file elements that cybercriminals can weaponize. Instead of worrying about what new phishing attacks cybercriminals come up with, you can rest easy knowing that all files are free from any malware or ransomware variants, even if they have just been created. 

We process incoming files, identify the good elements in the file, and then reconstruct the file with the malicious code left behind so that users can safely download the data. We do this in real-time, as data in motion, before the user receives the file, without causing productivity problems or employee frustration.

Security teams no longer need to worry about potential anti-virus false positives because only non-malicious elements of the file have been preserved. By eliminating any potential hidden threat in files, file sanitization reduces alert fatigue and frees up your security team to focus on other threat vectors. 
The file sanitization process means that your data stays clean and secure. In a world where data is currency, you need to protect your files in order to protect your systems, networks, software, data, and corporate reputation.

Votiro’s Positive Selection solution offers a proactive risk mitigation approach that eliminates virus, malware, and ransomware risks. If you’d like to learn more about how to get started, schedule a demo with us today!