SQLite Equals ETL Heavy
Actian Corporation
May 18, 2020
Two weeks ago, I likened the performance of SQLite to that of a banana slug. Now, some may consider that a bit of hyperbole (and some UC Santa Cruz alumni may feel that I impugned the good name of their mascot, which was not my intent) but the numbers don’t lie. The measurable difference in local processing performance between SQLite and a modern edge data management system like Actian Zen is two to three orders of magnitude. So, the cheetah-to-banana slug comparison is quantitatively accurate.
In fact, strap in—because I’m doubling down on the banana slug analogy—but I’m going to swap out the cheetah (with its top speed of 70 mph) for a peregrine falcon, with a top speed of 240 mph. The reason? In considering modern edge data management for IoT, or any edge-to-cloud environment for that matter, you have to consider performance in terms of distributed data reads and writes—across devices, gateways, on-premises workstations, and servers, and, of course, the cloud. Such distribution poses complications that SQLite simply cannot overcome with anything remotely resembling speed.
Let me give you an example: SQLite only works as a serverless database, which mandates integration with, and therefore transformation of its data into, a client-server database. You’ll often see SQLite paired with Microsoft SQL Server, MySQL, Oracle, or Postgres. Additionally, there are stealth pairings in which SQLite is present but seemingly invisible. You don’t see SQLite paired with MongoDB or Couchbase, for example, but the mobile client version of both these databases is really SQLite. “Sync servers” between the mobile client and the database servers perform the required extract, transform, and load (ETL) functions to move data from SQLite into the main MongoDB or Couchbase databases.
But wait, you say: Isn’t the whole point about modern edge data management that data at the edge is going to be captured, shared, and processed by smarter components at the edge? Moreover, that some of that data is going to be sent from devices at the edge up to servers in the cloud? Why are you picking on SQLite?
So, in order, the response to your objections is yes, yes, and I’ll tell you.
Modern Edge Data Management is Shared and Distributed
Yes, we should all take it as a given that IoT and mobile environments will send data from devices and IoT sensors at the edge up to servers in the cloud. And, yes, new network standards like 5G (and industrial variants of 4G-LTE), coupled with AI running on more edge devices, will lead to more local and peer-to-peer processing. That will bring device and metadata management out of the cloud/data center to edge gateways on on-premises servers. Both scenarios share and distribute massive amounts of data, and, where SQLite is involved, will entail an explosion of ETL because SQLite isn’t going to be running on the larger servers at the edge or in the cloud. That’s where you’re seeing SQL Server, MySQL, Oracle, Postgres, and others (including Actian Zen Edge, enterprise, and cloud editions).
Which brings us to the question of why ETL matters. When you think about the characteristics of the systems that will be sharing and distributing all this data, three key things stand out: performance, integration, and security. We’ve already discussed the actual processing performance characteristics of our banana slug when it comes to local data operations. When we look closely at SQLite in the broader context of data sharing and distribution, it becomes apparent that the use of SQLite can have a profound impact on operational performance and security.
It’s All About the “T” in ETL
From a data management system standpoint, the transform action in ETL is the most critical element of that initialism. Unlike the E and L which aren’t impacted by data management systems as data transfer is a function of the virtual machine, operating system, hardware abstraction layers and of course I/O subsystems, the data management implementations dictate if, when, and how data transformations will occur.. When moving data from one database or file management system format to another, it is always necessary to reformat the data so that the system receiving the data can read it. SQLite touts the consistency of its underlying file system on all platforms, which would suggest that moving data from one platform to another requires no transformation. For an SQLite application operating as a simple data cache in a mobile device or moderately trafficked web sites this may be true. But that’s not what a shared and distributed IoT environment looks like. Today’s modern edge data management environments are fully managed, secure, and built to perform complex data processing and analysis on a variety of systems in a variety of places—on device, at the edge, and in the cloud. These are environments replete with data aggregation, peer-to-peer sharing, and other data management operations that require a transformation from a SQLite format into something else—quite possibly several something else’s.
And You Thought the Banana Slug Was Slow
That’s the second dimension where SQLite simply becomes sluggish. Actian conducted a series of tests comparing the transformative performance of Zen Core and SQLite. One set of tests compared performance of data transfers between SQLite and MS SQL Server to the same data transfer between Zen Core and Zen Enterprise Server. Both the SQLite and Zen Core serverless clients ran on a Raspberry Pi device while SQL Server and Zen Enterprise ran on a Windows server-based system.
The performance results are eye-popping: Taking a block of 25K rows from Zen Core and inserting it into Zen Enterprise took an average of 3 ms. Taking the same block from SQLite and inserting it in Microsoft SQL Server took 73 ms, or roughly 24X more time. Other tests, comparing Indexed and non-indexed updates, reads, and deletes all had similar results. Why? Because of the transformations required. In moving data between SQLite and SQL Server, the data from SQLite had to be transformed into a format that SQL Server, which has a different format and different data model, can read. When moving the data from Zen Core to Zen Enterprise Server, which rely on the same format and data model, no such transformation is necessary.
So Much for Faster, Better, Cheaper
Zen isn’t the only database with a common architecture stretching from client to server. Microsoft SQL Server has such an architecture, but it only runs on Windows-based devices. Actian Zen runs on pretty much everything—from Android-based IoT and mobile devices to Windows-based edge devices, to data center and cloud servers running a wide range of Linux implementations. Zen has a single, secure, modular architecture that allows the serverless version to interact with the Edge, Enterprise and Cloud versions using the same APIs, data formats, and file system, removing any need for transformations.
And that’s really where the distinction between the peregrine falcon and the banana slug becomes palpably real. If SQLite were capable of interacting directly with other elements in the modern edge data management environment, everyone happily using SQLite could avoid data transformations and heavy ETL. But that’s not the world in which we operate. SQLite will always involve heavy ETL, and a banana slug it will remain.
There’s an age-old tradeoff in the world of engineering development that goes like this: We can give you faster, better, or cheaper. Pick two. SQLite promises faster but in practice delivers slower, as the benchmarks above prove. That leaves better and cheaper—except that, as we’ll see, with SQLite we don’t even get better or cheaper. Stay tuned for the next post in this series, where we’ll discuss why SQLite is not better. After that, we’ll take a sharp, falcon-like look at total cost of ownership.
You can learn more about Actian Zen. Or, you can just kick the tires for free with Zen Core which is royalty-free for development and distribution.
Subscribe to the Actian Blog
Subscribe to Actian’s blog to get data insights delivered right to you.
- Stay in the know – Get the latest in data analytics pushed directly to your inbox
- Never miss a post – You’ll receive automatic email updates to let you know when new posts are live
- It’s all up to you – Change your delivery preferences to suit your needs