Frequently Asked Questions
How It Works
How is Anacode CS related to existing AWS Cloud storage?
Anacode CS is a proxy storage service. Anacode CS users simply direct GET and PUT requests to the Anacode CS endpoint, instead of to an AWS S3 or Glacier endpoint. Anacode encoded (compressed) data is still stored in an AWS S3 customer’s S3 bucket, but in compressed form. But instead of accessing your data from your Cloud storage vendor’s endpoint, you access your data via an Anacode endpoint, using the same read (GET) and write (PUT) calls that you use today.
Anacode uses the Cloud vendor’s microservices and containers to encode and decode data. Your write your data via PUT or POST API commands, and then read Anacode encoded data via GET API commands (with Byte Range [startByte, NBytes] support).
How does Anacode accelerate data transfers between Cloud storage and server nodes?
Each Cloud data center contains tens of thousands of servers and tens of thousands of storage nodes (HDDs and SSDs). These server and storage nodes are connected via a huge Ethernet network that typically uses 100 Gbps Ethernet links to route data through a network of top-of-rack (ToR) switches and routers. Because Anacode CS transfers lossless-compressed data between storage and server nodes, those transfers occur faster by the compression ratio of each Anacode data block. 2:1 Anacode encoded data transfers 2x faster from S3 storage to EC2 servers, and 10:1 Anacode encoded data transfers 10x faster between S3 storage to EC2 servers.
How much does Anacode storage cost?
Instead of paying AWS S3, you pay Anacode 35% less. For example, if your cloud storage bill was $10,000 per month with AWS S3, your new Anacode payment would be $6,500 per month, saving $3,500 per month!
When will Anacode CS be available on Microsoft Azure and Google Cloud Platform?
Anacode CS will be available for Microsoft Azure (“Azure”) and Google Compute Platform (“GCP”) eventually. Customer demand will influence Anacode CS availability on non-AWS platforms. So if you want Anacode CS on Azure and GCP, give feedback Microsoft and Google. Please contact us if you’d like to be an Anacode CS beta customer on Azure or GCP.
Anacode CS Encoding & Decoding
How does Anacode CS compression compare to gzip (Lempel-Ziv)?
On all datasets available to Anacode (and Anacode has lots of available data), Anacode CS compresses more, and faster, than gzip. Since Anacode CS encoding and decoding are pleasingly parallel (see the next FAQ question), Anacode CS processing is orders of magnitude faster than gzip. Also, Anacode CS is a storage service, while gzip is a compression and decompression service.
What is “pleasingly parallel,” and why do I care?
In computer science, the equivalent terms “massively parallel,” “embarrassingly parallel,” and “pleasingly parallel” describe algorithms that achieve a linear processing speed-up (faster throughput) with each added processing element. For example, if 10 processing elements running a “pleasingly parallel” algorithm operate at S MB/sec, then 100 processing elements will operate that same “pleasingly parallel” algorithm at 10S MB/sec. Anacode’s compressors and decompressors are both pleasingly parallel algorithms.
The original gzip algorithm, described in this paper is inherently sequential, using a 32 kB sliding window. Because that sliding window slides across ALL input data from the first Byte to the last, this means that every compressed gzip Byte depends on ALL of the compressed Bytes that came before it. In contrast, Anacode CS blocks are encoded and decoded in parallel, independently.
Parallel algorithms run faster as more processing elements are applied. Modern Cloud processing technologies like microservices (AWS Lambda) and containers (Docker) make thousands of processing threads available in parallel, at low cost. Anacode CS uses microservices and containers to deliver data cheaper and faster to Anacode CS customers.
How does Anacode encoding work?
First, a bit of humor.
- Anacode CS encoding is triggered by data writes via RESTful POST and PUT requests.
- Anacode CS encoding divides input data writes into independent blocks of a few MB. As expected, since Anacode’s blocks are significantly larger than gzip’s blocks, Anacode compression ratios are significantly better (more) than gzip compression ratios. Because Anacode doesn’t use a sliding window like gzip, Anacode compressed data can be access randomly, with Byte accuracy. Just like
fseek( )supports. Also see #8 below.
- Anacode blocks are then encoded and decoded in parallel. Thus Anacode encoding and decoding are pleasingly parallel operations. Each Anacode encoding and decoding thread is assigned to a microservice (AWS Lambda) instance or a (Docker) container instance.
- Since microservices and containers automatically scale to match demand, overall Anacode encoding and decoding speeds are proportional to the number of simultaneous software threads. Anacode CS blocks are encoded and decoded using tens, hundreds, or thousands of AWS Lambda and Docker instances. These Anacode CS scaling details are transparent to Anacode CS users. AWS Lambda scales automatically, using AWS-internal mechanisms, while Docker container scaling is controlled by Kubernetes.
- Using tens, hundreds, or thousands of AWS Lambda and Docker instances, Anacode CS typically encodes and decodes at many GB/sec.
- Anacode CS encoding automatically applies datatype-specific encoders to the data in each Anacode CS block. Anacode automatically recognizes four data types:
- Since Anacode CS detects blocks containing already-compressed data, Anacode CS doesn’t waste time (or MIPS or memory) trying to compress already-compressed data.
- The Anacode CS encoder creates an index that subsequent Anacode CS decoder GET (read) transactions use to support Byte-accurate random access.
How do I access my data on Anacode CS?
Access your data via the Anacode CS GET request in your SDK, which can include a [startByte, NBytes] Byte range request that specifies random access. The Anacode decoding process uses the Anacode index (created automatically by the Anacode CS encoder) to support Byte-accurate random access into Anacode CS lossless-compressed blocks.
During Anacode GET API requests, if the customer app that generated the GET request asks for a specific Byte range, Anacode fetches only those Anacode blocks required to satisfy the GET request. Anacode encoded blocks are then transferred from the storage medium (HDD or SSD), in encoded form, to the server instance that generated the GET API request. Once the Anacode encoded blocks arrive at that server instance, the Anacode container decodes the Anacode encoded block(s) and returns the decoded data to the customer app that generated the GET request.
Generally, each Anacode decode container operates at about 200 MB/sec. So, for example, if an AWS EC2 server requested 100 MB of data via a GET request, and the Anacode encoded version of the 100 MB encoded to 40 MB (2.5:1 compression), the Anacode encoded data traverses the Cloud provider’s Ethernet network 2.5x faster than the original data did. Once the Anacode encoded data reaches the server instance that generated the GET API request, the Anacode decoder decodes that data in this example in about (100 MB / 200 MB/sec = ).0.5 seconds.
What if all of my data is already compressed (like Netflix, Amazon Video, or .docx/.xlsx/.jpg/.mp3 files)?
Anacode CS encoding will NEVER expand data. The Anacode encoder identifies Anacode blocks that contain already-compressed data and simply copies those blocks. In Anacode’s experience, less than 10% of Cloud data (often WAY less than 10%) is already compressed, unless you happen to work for Netflix, Spotify, or YouTube. For Anacode blocks whose data is identified as integers, floats, or Byte sequences, the Anacode encoder makes sure that the resulting encoding size never exceeds the original size. If that ever happens, the Anacode encoder simply treats the input data as “already compressed” and copies it, Byte for Byte, to the output block.
How fast must Anacode encoding run?
Anacode encoding speed doesn’t need to operate faster than the Cloud network can transfer data from the EC2 server making the Anacode PUT or POST (write) request. For Amazon AWS, many EC2 instances are attached to the AWS Cloud via a 1 Gbps Ethernet link (~100 MB/sec). Some EC2 instances operate using faster Ethernet links (10 Gbps to 25 Gbps) to the AWS Cloud.
How fast is Anacode CS encoding?
Anacode encoding speed varies from about 10 MB/sec to about 500 MB/sec per Anacode CS parallel thread. Already-compressed data encodes the fastest, because Anacode CS simply copies blocks that contain already-compressed data without trying to further compress that data, because it’s typically impossible compress already-compressed data. Binary integer (IoT) and floating-point (HPC/AI/ML) data compresses at about 50 MB/sec per Anacode CS thread. All other data (such as text) compresses at about 10 MB/sec per Anacode CS thread.
How fast must Anacode decoding run?
Some AWS users are surprised that AWS S3 typically only transfers data at about 100 MB/sec across 1 Gbps Ethernet links to an AWS EC2 server. A single thread of Anacode CS decoding (at 200 MB/sec per decoder) is 1.6x faster than that.
Anacode decoding speed doesn’t need to operate faster than the Cloud network provides data to the EC2 server that made the Anacode GET (read) request. On Amazon AWS, the slowest link between AWS S3 and an EC2 instance only runs at about 1 Gbps (~100 MB/sec). Some EC2 instances operate using 10 Gbps or 25 Gbps Ethernet links to AWS S3. Since the Anacode decoder container operates at 200 MB/sec (~2 Gbps) or faster, and since Anacode CS GET request support Byte Ranges (random access), each Anacode CS decoder only runs (at 200 MB/sec per thread) long enough to decode the requested Anacode CS data. For example, if a user requests 100 MB of (decoded) data, four Anacode CS decoders decode 100 MB in ~100 msec.
Please contact Anacode support if ~GB/sec decoding is too slow for your application. Anacode CS can be made to run faster.
How fast is Anacode decoding?
Anacode decoding speed varies from about 200 MB/sec to about 500 MB/sec per container thread, depending on the type of data being decoded.
Anacode CS Security, Reliability, and Durability
Is Anacode encoded data secure?
Since Anacode CS stores its compressed data in AWS S3, Anacode CS data is at least as secure as any other AWS data written to S3. Since AWS S3 data at rest is encrypted by default, Anacode CS data at rest is also encrypted by default.
How does Anacode ensure that its customers won’t lose any data?
[Answer for techies] Every Anacode CS block includes TWO Cyclic Redundancy Checks: one on the encoded data, one on the decoded (original) data. Anacode uses a CRC-32 or other error detection mechanism to detect errors. These CRCs will detect any (exceedingly rare) event when AWS S3 storage somehow caused at least one bit-flip in either the encoded or the decoded (original) data. When that happens (again, expected to be exceedingly rare), Anacode will file a trouble ticket with AWS and will then access one of the other two copies of this Anacode CS data. In short order, Anacode expects that AWS will replace the flawed copy of the Anacode CS data with the fixed (corrected) copy.
AWS achieves its “eleven nines” of durability by making three copies of everything it stores in AWS S3. Since Anacode CS uses AWS S3 to store compressed data, and combined with the two CRC’s per Anacode CS block, Anacode CS is at least as durable and reliable (i.e. won’t lose data) as AWS S3.
[Answer for C-level execs] Since Anacode CS still stores its data in AWS S3, Anacode CS reliability and durability are equivalent to AWS S3 or durability.
Why should large companies trust their data to Anacode CS, a tiny little start-up?
Anacode’s founder, Al Wegener, has been in tech for 35+ years. If you met him, you’d know that he is a dependable, reliable guy. His goal since founding Anacode Labs, Inc. in 2015 is to make Cloud storage compressed storage, everywhere. Anacode plans to become a huge revenue enterprise with great name recognition in less than 12 months.
Al has received this question multiple times, from equally dependable, reliable, informed people (VCs, technologists, entrepreneurs), he realized that he’d have to address this question, because sadly, people (companies) have been burned before and are right to be suspicious of, despite hoping for trust-worthy, new offerings.
So here’s what Anacode will do (for free, and guaranteed in writing) for users that sign up for Anacode CS in 2020:
- [Answer for techies] Anacode CS encoding of every block will immediately be followed by Anacode CS decoding of every block. If the decoded data block ever mis-matches the input data block, Anacode CS will simply COPY the input data to the Anacode CS data block, as if it were already-compressed data. This ensures that all Anacode CS data blocks will NEVER fail to decode due to an Anacode CS encoding or decoding software bug.
- [Answer for C-level execs] Anacode CS will, without additional charge, store a copy of the Anacode CS user’s original (uncompressed) data in a Cloud storage service like AWS Glacier, Azure Archive, Backblaze, or Wasabi.
NOTE to late adopters: By the time you read this, Anacode CS will have been in business for years, saving its customers 35% every month on Cloud storage. In that time, not a single Anacode CS customer will have lost any data, or had any of their data held for ransom by Anacode CS. All that time, you’ve been paying too much for slower Cloud storage.
NOTE to everyone: Anacode’s founder encourages Cloud storage users to think about this: Is anyone else offering you 35% lower monthly storage costs, on the IaaS Cloud platform you’re already using? If not, try Anacode CS.
How does Anacode ensure that the Anacode encoded data, and the decompressed data, is correct?
Anacode generates and stores two checksums with every Anacode encoded block:
- A CRC-32 checksum on the Anacode encoded data block (detects potential errors in the Anacode encoded data)
- A CRC-32 checksum on the original data block (detects potential errors in the decoded data)
During decoding of every Anacode encoded block, both checksums are evaluated and verified. If there’s ever a checksum mismatch on either the Anacode encoded or decoded checksum, the Anacode CS decoder will report a decoding error. This condition is expected to be extremely rare (way less than the “eleven nines” of durability that AWS S3 guarantees). Anacode expects these extremely rare events will be much more likely to be caused by a problem with AWS S3 storage than a problem with Anacode CS.
Why isn’t Anacode CS a compression service?
Compression is used to achieve two complementary, related goals:
- To reduce storage costs, or
- To transfer data faster.
Anacode uses powerful, effective, purpose-built encoding to directly reduce storage costs and to directly transfer data faster. Encoding just happens to be the way we deliver those benefits. By delivering Anacode CS as a storage service, Anacode takes care of needed functions like automatic datatype detection and random access. Compression services don’t provide such functions.
Finally, compression services don’t deliver a 35% discount on storage costs (on compressible data) like Anacode CS does.