sovrn-banner

Friday, November 1, 2019

Tribute to Netezza (Did the cloud kill Netezza?)

Resultado de imagen de rip netezza


Now that IBM has finally "killed" the Netezza (or Pure Data System for Analytics, as was "painted in blue" by IBM's marketing) I think it is time to pay a tribute to one of the most amazing systems I had the pleasure to work with.

Netezza claimed to have invented the datawarehouse appliance market and given that even Larry Ellison gave them the credit to have forced him to "create" the Exadata machines, I tend to think that they were really right in that claim and that they really changed the DWH market.

I think IBM made a huge mistake by discontinuing those systems (aka, killing), so I will explain here my point of view of why I think this was wrong and in a second part of the article I will address a more technical approach explaining how I think Netezza should have evolved under the reigns of IBM.


Did the cloud kill Netezza?


I don't think so. There are many articles claiming that this was the reason why IBM killed it (you can look for them on the Internet, here you have a couple of samples: The Register & NextPathway). But I completely disagree with them. Indeed, Netezza was perfectly prepared for the cloud world and I knew of a few customers that bought the Netezza systems to provide database-as-a-service to their own customers. Yes, a pure database on the cloud offering based on a Netezza system.

 Resultado de imagen de netezza cloud offering
You have to consider that even on the cloud your software must be installed on a piece of hardware. In the case of Netezza it all came together in a truly plug-and-play system as the most complex systems -compounded of several racks- took us a couple of days maximum to install, including the unloading from the truck and unpacking, installation in the data-center, cabling, configuration and tests. And voila! your system was ready to use and prepare to load data instantly, no need to spend more time configuring, so it was really an ideal approach for any kind of users given its simplicity, ideally thus for the cloud.
One of the things those systems lacked of was a complete quota system to distribute and restrict effectively the use of storage space in a machine. An attempt to add something like that was available with the latest releases of the software -you could buy and pay for a half-rack system, but having installed a full-rack limited by software; this way, in the future you could just pay more to IBM and they would release more hardware by introducing a code to the system; I was told that it was a common practice in Mainframe systems, so you could increase computational power on demand without migrating the hardware-. You could also limit the resources allocated for a given user or group of users, so it was totally possible to provide a multitenancy system with different performances depending on the subscription. As I said, we had several customers that put the Netezza in their own private clouds and sold the service to their own clients, proving that it was indeed ready for the cloud.

You may argue that another issue with the hardware was its need of using FPGA chips to accelerate the queries.
Well, this was one of the distinctive features of the system, but again I don't think this was a limitation for a cloud based system.
If this argument was true, all the current approaches of GPU based databases (like Kinetica or MapD -rebranded to OmniSCI- etc.), would be doomed too, and this is not the case. Indeed, they all claim that they are cloud ready and thus require to use cloud systems equipped with GPUs to provide the performance promised.
So, again, the FPGA factor was not a limitation, as in any case you have to provide a hardware to run your database, so you could just put your Netezza racks in your data center and start providing the service.
Indeed, this is an advantage from my point of view, as many customers, for the most diverse reasons, still don't allow the use of the cloud for certain data loads, as they think they are too sensible to risk the exposure of such information: banks, governments, intelligence agencies, etc. Yes, everyday they are less, and everyday more of them embrace happily the cloud, but there are still many customers that will appreciate the possibility to install their DWH in their own premises too with a cloud complementary installation if any.

An all software Netezza


But, let's assume that to be tied to a hardware infrastructure based on FPGAs and a certain architecture is a limitation. The thing is that there already existed a software only version of Netezza. It was created at first just as an educational environment that could run in a virtual machine -I still have some of them to play in my laptop- but it later evolved so the most basic systems -those not provided with an FPGA- could be sold to small and medium enterprises for an affordable price. From there, the evolution to a pure cloud scalable system is clear, and the path was already defined for the system, so it would have be an easy task for IBM to have offered two flavours of Netezza  on their own cloud: one FPGA based offering and a standard offering, both capable of scaling and increase capacity on demand, like Redshift of Google Big Query are offering right now, but with the advantage of the experience of many years of Postgresql and FPGA development to offer better performance at a fraction of the cost.

A Pure Data System for Analytics N3001-001 with no FPGAs

The return of the non-dead


Now that we have just celebrated Halloween, maybe it is the proper time to comment on the return of the Netezza by IBM... Yes, as you are reading here, maybe Netezza is not as dead as we all thought. Or is it?
The thing is that someone at IBM apparently realized that after acquiring RedHat they also acquired a huge number of Postgresql users. And then they remembered that Netezza was also based -at the very beginning- in Postgresql, although heavily modified for the purpose pursued by the Netezza engineers.
So, they are offering a new product called Performance Server that is also capable of using the Cloud Pak engine to offer a Netezza like service. I haven't studied very much this new offering, but apparently you can install it on-premise or on the cloud, and is based in the same Netezza code but with SSD storage and there is no mention of FPGA accelerators, so I'd say that it is probably based on the Netezza software offering I mentioned before.
So, IBM is validating my line of thinking. Thanks for that guys!
You have a summary of this in this link to an IBM's blog entry.

In any case, although this is a way to continue with the life of Netezza, I would have envisioned a much more bold future for the system. I will talk about what I would have done if I had the power to decide in another entry (this is already too long!).

So, why did they kill it in the first place?


Well, I don't have the answer, I can only speculate and give my opinion. As it is my opinion that it was a mistake, although they are trying to repair it now with this Performance Server mentioned above, but I think it is too late (the same happened so Sun Microsystems when they decided to relive the Solaris x86, or HP with HP-UX: once killed, it is almost impossible to bring things back to life).
As I said, in my opinion all the rumours about Netezza not being ready for the cloud and the need of a special hardware are just excuses.
IBM had the right to do what they did, and probably the real reason was just to save money and concentrate resources in just one product, DB2, as their strategy was just to provide a unified and universal database: a one size fits all approach (just what we said at Netezza that didn't work... we used this reasoning to attack Exadata, and very successfully, by the way).
I heard rumours claiming that it was the "BD2 mafia" that simply couldn't allow any other database on the neighbourhood, that 9 out of every 10 DWH sold by IBM were Netezza versus 1 for DB2 or other databases and they killed it anyway because they were jealous. I don't know if this is true, it would be interesting to have the numbers to see the real selling and have a clearer picture of what happened and what is the real success of the IIAS that superseded the Netezza and DB2 systems.

In any case, an era is gone and I'm proud to have had the chance to work with such beautiful systems and to have helped lots of customers with such powerful and interesting systems.