How to encrypt large files

In our last post, we showed that CipherStor’s performance is extremely fast. So fast, that other system bottlenecks such as SSD storage IO or network IO begin to surface. In light of that, we wanted to have a blog post on how to encrypt large files efficiently.

We recently had a discussion where a customer used the CipherStor API to write a worker process (let’s call it “CipherStorWorker”) which would consume the result of the business logic process (let’s call it “BusinessLogicWorker”). Conceptually, this reminds me of pipelined vs non-pipelined microprocessor architectures, so I’ll borrow those terms. Diagrammatically this is shown below.

Non-pipelined approach

CipherStor Non-pipelined Performance

Of course, when illustrated this way, it’s easy to spot the issue. We’re hitting storage 3 times – two writes and one read. Considering that the storage bandwidth is the typical system bottleneck, this approach takes that severe bottleneck and makes it three time worse (!). The BusinessLogicWorker and CipherStorWorker will both be idle, waiting for the storage to catch up. This leads to significantly longer end-to-end job execution times. In fact, Crypteron cannot even start until the entire file is processed AND written by Business Logic to the storage medium.

Pipelined approach

A far better approach is to incorporate the CipherStor APIs into the BusinessLogicWorker directly before any block ever hits the storage systems. This is conceptually shown below.

CipherStor Pipelined Performance

By using the steaming API to pass raw data from the business logic core to Crypteron’s CipherStor, one is effectively bypassing storage IO bottlenecks. As soon as a block is processed by the Business Logic core, it is immediately handed off to Crypteron. The Business Logic core moves to the next block and this clockwork continues until the last block flows through the system. So instead of waiting for the entire file to be first written out, Crypteron begins its work almost immediately.

Not only is this approach faster, it’s significantly secure since a plaintext version of the file never hits the storage systems.

Storage Bottleneck gone?

It’s worth saying that ultimately the data is written to storage, possibly over the network (e.g. SANs, Amazon S3, Azure BLOB etc). So there is no escaping those bottlenecks but by pipelining the processing of the entire file, we’re at least making sure we’re not amplifying that bottleneck problem. If you’re consistently seeing your storage or network bandwidth slow things down, the obvious thing to look into is upgrading to faster storage or bandwidth. If that’s not a possibility, another approach would be to add additional BusinessLogicWorkers in parallel across multiple VMs (i.e. ‘scale out’), so your outgoing network bandwidth scales up too. A full discussion of possible architectures is outside the scope of this blog post but if you’re interested, you might want to read more on Netflix’ cloud architecture.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Recent blog posts

PCI DSS and key rotations simplified

PCI compliance requires data encryption keys to be changed frequently. Here is how you can do it easily.

Your data-center is not secure and what you can do about it

There is no secure perimeter anymore. Neither in your corporate network nor in your data center. Fight a winning battle armed with self-protecting data rather than a losing one trying to protecting the infrastructure.

Introducing the Crypteron Startup Innovators Program

Qualifying startups get up to 50% off all plans. Tell us how you’re changing the world and the our Startup Innovators Program will support your journey.

6 encryption mistakes that lead to data breaches

If encryption is so unbreakable, why do businesses and governments keep getting hacked? Six common encryption mistakes that lead to data breaches.

Announcing the new Crypteron Community Edition

Starting today you can now sign up for the Crypteron Community Edition for free with no performance limitations.

Data breach response – One click to save your business

Get breathing room – when you need it the most. Respond to a data breach with a single click.

Why We Need Proper Data-At-Rest Encryption: 191M U.S. Voters’ Data Exposed

Adding security at the application level is a large step forward in protecting data from the constant threat of data breaches

How to encrypt large files

CipherStor is blazingly fast! Here we show how to use it within your data-flow pipeline to maintain high performance when encrypting large files.

Crypteron CipherStor Performance Benchmarks

Blazing fast performance by Crypteron CipherStor that hits several gigabytes/sec enabling security even in high bandwidth scenarios

Data-Centric Security: The Way Forward

Properly implemented encryption holds the key to a robust data-centric approach to security.

How to encrypt large files

by Sid Shetye time to read: 2 min
0