Write Corda Next-Gen Once, Run Anywhere
A couple of months back, a brand new shiny ARM-based Apple Silicon Mac dropped onto my desk here at R3. Figuratively that is, first impressions were that it’s a little more “robust” than the previous generation of Intel Macs, so I suspect that had it been literally dropped, the desk would have suffered more than the laptop. But truth be told, I do tend to cart my laptops around a fair bit and some robustness never hurt. I found it in no small part ironic that technology designed to rid ourselves of the shackles of cross-platform compatibility and inconsistent environment problems became the stumbling block in running Corda Next-Gen on my new Mac hardware. But a little perseverance and some pull requests later and the results are that I am now happily using this Mac as my day-to-day development machine on the Corda Next-Gen project.
I have been a Mac user for well over a decade and I can reveal that I have a bit of a soft spot for ARM processors too. A good chunk of my career has been spent building all sorts of interesting ARM-based devices. Their history as a company dates back to the vibrant home-computing scene of the 1980s UK when many of us realized we’d probably end up doing this for a living one day.
So the combination of the two things was something that I was very keen to try out. And by the way, Amazon is not so far behind with the thinking that there’s a future for ARM processors outside embedded devices either. Plus Corda Next-Gen is pretty versatile, maybe one day we will see a use case to have it run on embedded devices. So more speed with less power — what’s not to like? Well for one, you have to rebuild all your native code. That shouldn’t be a problem for Corda as it’s JVM based, so write once, and run anywhere. Well, it turns out that’s sort of true.
Some time back, the distributed and web service-oriented software world started releasing its software via a thing called Docker Containers. Docker promises “Run Anywhere” and it’s just about the first statement that you see when you visit their site. There is truth in that statement and there’s also no doubt that running software in containers has made immeasurable improvements in the ease of distribution, deployment, and orchestration of web services and other software components.
Docker containers, for those who are unaware of them, are based around Linux Kernel extensions. They create sandbox environments on a host Linux machine in which you can run services as if they are running on the same machine, in the same state every time you launch it anywhere. They also facilitate scalability quite nicely. Imagine you have a simple stateless service responding with some immutable data on request. Put that service in a container and a load balancer in front of a bunch of such containers and you now have multiple versions of the exact same service, each processing a share of the work in exactly the same way. Of course, the architecture needs to become more complicated as the functionality provided by those services becomes less simple and data becomes mutable, but the principle remains similar.
To Docker’s advantage, back in the hedonistic days of 2020, if you were building software in between watching episodes of Tiger King, just about any mainstream computer that you were building it on would likely have an Intel architecture chipset. That meant with advances in virtualization, even the OS became a formality when it came to building containers for software deployment. Whilst typically containers used in production were deployed on cloud VMs running Linux, it was also no problem for development purposes to fire up Docker Desktop on a Mac and have Docker create virtualized Linux VMs seamlessly to run your containers in.
So how does this relate to Corda Next-Gen?
One thing the containerization world probably didn’t consider a priority when it set out is the uptake of an alternative processor architecture for desktop computers.
My colleagues at R3 have already touched on the importance of high availability and horizontal scalability with regard to the Corda Next-Gen architecture in other blog posts. It should be relatively clear by now that a product that utilizes container orchestration to realize both of those goals is at the mercy of the processor architecture of those containers to an extent. If you tried to follow the Corda Next-Gen quick start guide a few months ago you would have got so far, and then started running into problems trying to publish and launch some of the containers on an ARM device. The good news is many of those problems have been resolved in the Corda Next-Gen build system already. The other good news is that those that haven’t been in the hands of our cluster management team will be resolved shortly. The OK news is that right now you can already work around the remaining issues with just a few tweaks.
How does Docker handle platform architecture?
This leap into cross-platform architecture puts us back in the world where we suddenly now have to care about how we deploy software again. Less so because of the way we write it and build it, even C++ is sort of portable (squint a bit and avoid the bits of Boost which are written in assembler) but instead because of the processor architecture on which we execute it. We are left with a situation where the Docker desktop is fully functional for Apple Silicon Macs, however, it cannot virtualize Intel Linux virtual machines, only ARM ones. That’s because virtualization is not emulation.
Before we delve into what does and doesn’t work in Corda Next-Gen, it would be helpful to learn a little about how Docker handles ARM support. The best source of information for this is unsurprisingly the Docker page on Apple Silicon support and their page on Docker manifests. In short Docker, manifests are files that act as an abstraction between the image that will be pulled down and the request made to Docker Hub. The manifest contains information about various Docker images available in Docker Hub, and which image is pulled down locally is determined by filtering that information. When an ARM host attempts to pull an image via a manifest, should the manifest provide details of an ARM image, the Docker client will then pull that ARM image instead of the Intel-based one. If no ARM image exists, Docker will pull the Intel one and run it under the emulation provided by QEmu under the hood.
Whilst this emulation will work to a point, it is really a stop-gap solution to get something running, it is not advised by Docker to rely on this in your products given they know a number of features will be unsupported. I will save you the trouble of trying this yourself and tell you if you were to attempt to run Corda Next-Gen Docker images under QEmu, it will likely not start up. We will not dwell on why here, the goal should be to get Corda Next-Gen running on ARM images.
What works out of the box in Corda Next-Gen and what doesn’t?
You are able to build Corda Next-Gen on an ARM host in exactly the same way that you can build it on an Intel host. Build in this context means a compilation of the Kotlin code so that it runs on a JVM. Corda Next-Gen requires that you build it on a Java 11 JDK. The quick start guide explains that our chosen JDK is the one from Azul. Azul provides ARM-compiled JDKs for both Linux and Apple Silicon Macs from their download page. So all is good in this regard.
The next good news is that running Corda Next-Gen as a single process for development (using an in-process message bus rather than Kafka) works in exactly the same way on an ARM host as it does on an Intel host. We call running a single process for developing the “combined worker.” I won’t dwell on using the combined worker here, but you can find it in the IntelliJ run configurations, which are present when you open the Corda Next-Gen project in IntelliJ.
Following some work on the build system and with some alignment with our friends at Azul, you can also now publish Corda Next-Gen, i.e. run the
publishOSGiImage task on an Apple Silicon Mac just as described in the local development guide. The reason that you can do this, is that Azul has built and published their own ARM Docker images to Docker Hub based on their ARM JDK v11 at the request of R3! When you publish Corda Next-Gen Docker images using that
gradle task, you are basing those Corda images on a base image that contains a working JDK provided by Azul. Had Azul not provided ARM versions of the JDK via a Docker manifest, Docker Desktop would have pulled down Intel images and run them under emulation. In fact, this is what happened before attention was given to ARM support for Corda Next-Gen and hence where this journey started.
Now we get to the part where the not-so-good news starts. What you cannot yet do, is follow that local development guide to get the Corda “prerequisites” required to run a Corda Next-Gen Kubernetes cluster.
So, what are the Corda Next-Gen prerequisites?
Corda Next-Gen workers in a running cluster, by default, rely on Apache Kafka as a message bus and a Postgres database for persistence internally. These entities are what we refer to as “prerequisites” in the developer guide. In the real world, customers bring their own versions of these entities, configured based on their business needs. R3 provides instructions on how to get stock versions of these up and running for local development to ease the path into Corda’s distributed application SDK (CorDapp) development.
Unfortunately, the provider of the images used in the Corda developer guide prerequisites does not provide ARM images. This is where Apple Silicon Mac or other ARM host users must deviate.
ARM Yourself! (aka just tell me how to get it working)
This blog should be read in conjunction with the “Install Corda prerequisites” section from the local development guide. In addition to that particular section, please generally follow all instructions given in the local development guide.
Some basic Kubernetes and Helm knowledge is assumed here. For a start, you should at least know what those things are and also have some knowledge of how to interact with them from the command line. That is true of Corda Next-Gen development generally.
There is an assumption in the instructions, that you already have a running Kubernetes cluster on your Mac. These steps were tested against the Kubernetes cluster, which comes with Docker Desktop. It is therefore recommended that you do the same.
Firstly, we cannot use the Kafka part of the Helm chart from the pre-requisite Helm chart provided in the Corda-dev-helm repository. The Bitnami chart has no ARM support and although this is a well-requested feature, there is no movement on this right now.
Corda Next-Gen puts no special requirements on a Kafka deployment, so pointing it at an off-the-shelf deployment that runs on an ARM-based Mac is sufficient. When Corda Next-Gen starts up, it creates all the topics that it needs for operation as part of an InitContainer. Corda by default will try to connect to a Kafka bootstrap server on
prereqs-kafka port 9092, but even this is configurable by applying the correct value to the Corda helm chart.
apiVersion: apps/v1 kind: Deployment metadata: name: kafka-deployment spec: replicas: 1 selector: matchLabels: app: kafka template: metadata: labels: app: kafka spec: containers: - name: kafka image: ubuntu/kafka:edge env: - name: ZOOKEEPER_HOST value: zookeeper-service args: ["/etc/kafka/server.properties", "--override", "advertised.listeners=PLAINTEXT://prereqs-kafka:9092"] ports: - containerPort: 9092 name: kafka protocol: TCP --- apiVersion: v1 kind: Service metadata: name: prereqs-kafka labels: app: kafka spec: ports: - port: 9092 selector: app: kafka --- apiVersion: apps/v1 kind: Deployment metadata: name: zookeeper-deployment spec: replicas: 1 selector: matchLabels: app: zookeeper template: metadata: labels: app: zookeeper spec: containers: - name: zookeeper image: ubuntu/zookeeper:edge ports: - containerPort: 2181 name: zookeeper protocol: TCP --- apiVersion: v1 kind: Service metadata: name: zookeeper-service labels: app: zookeeper spec: ports: - port: 2181 selector: app: zookeeper
If you don’t know what this does and want to know, the Kubernetes documentation is the correct place to start. Kafka requires an external service called ZooKeeper to be running which is why you see references to that in this declaration. ZooKeeper is a generic service for administering metadata about distributed systems. If you want to know about ZooKeeper and how Kafka uses it, the Kafka documentation will explain all. A few minutes of reading around these topics should enable you to understand what this yaml file is telling Kubernetes to do. It is fairly standard stuff.
To apply this to your Kubernetes cluster, now type:
1kubectl apply -f kafka.yaml -n corda
Again, reading the Kubernetes documentation will explain what is happening here. After successful completion, you now have Kafka on your cluster.
Similarly to Kafka, Corda Next-Gen does not impose any special requirements towards a Postgres deployment. Here you are shown how to use the official Postgres image, although we still utilize the Bitnami chart, which will rather handily configure everything for us too.
Just like Kafka, there is no Bitnami ARM image. However, the Bitnami helm chart allows you to replace the use of their own Postgres image with an official Postgres image if you so choose. There are ARM official Postgres images already published. For this reason, you should follow the local development guide which it describes cloning the R3 corda-dev-helm repository and where explains how to add the Bitnami repository to your Helm installation.
The bit that you need to ignore is the part where it tells you to type
helm install prereqs -n corda charts/corda-dev. Instead, you are going to run that command with some extra options to tell the Helm chart to ignore Kafka completely (we installed this already) and to use the official Postgres image instead of the Bitnami one. The official image requires some extra configuration too, so all this needs to be passed to Helm when executing the install command.
What this means in practice, is that you need to execute the following from the command line:
helm install prereqs -n corda \ charts/corda-dev \ --set kafka.enabled=false \ --set postgresql.image.repository=postgres,postgresql.image.tag=10.6 \ --set postgresql.postgresqlDataDir=/var/lib/postgresql/data/pgdata \ --set postgresql.persistence.mountPath=/var/lib/postgresql/data \ --set postgresql.volumePermissions.image.repository=alpine \ --set postgresql.volumePermissions.image.tag="3.10" \ --set postgresql.primary.initdb.scripts=null \ --render-subchart-notes \ --timeout 10m \ --wait
Deploying Corda Next-Gen
Whilst Corda Next-Gen is built and published in the same way on an ARM host as an Intel host, we must tweak its installation into the cluster a little. This is only because we’ve played around with the prerequisites above and we need to specifically tell the Corda Helm chart that our Canonical Kafka does not support TLS or SASL. Where the guide tells you to execute
helm install corda -n corda charts/corda --values values.yaml --wait from the root of the
corda-runtime-os repository on your local machine, instead you must execute it with a single extra parameter:
helm install corda -n corda \ charts/corda \ --values values.yaml \ --set kafka.tls.enabled=false \ --set kafka.sasl.enabled=false \ --wait
So now you have Kafka, Postgres, and Corda running in your cluster. You’ll probably want to jump back into the standard Corda Next-Gen developer guide now and get going on building and installing a CorDapp into your cluster. The process for doing this is the same on an ARM host as on an Intel host.
A number of developers working on Corda Next-Gen here at R3 are already using Apple Silicon Macs as their day-to-day development machines and are following these guidelines alongside the usual Corda Next-Gen ones. Please do not be deterred from giving this a try though, because this blog demonstrates clearly that because of the work already done on Corda Next-Gen, only a few relatively minor tweaks are required to get you up and running.
Corda Next-Gen ARMed and Dangerous
So hopefully you have a running Corda Next-Gen cluster on your Mac now and because all ARM puns are likely now used, this is a good place to wrap up and explain our status relating to ARM support at R3.
- The Corda Next-Gen Cluster Management team is busy pondering over how to align Intel and ARM host images for the prerequisites, so one day soon, the instructions outlined in this blog will become moot. Users developing against Corda Next-Gen on a Mac will have the exact same experience as on any other Intel-based device.
- The Corda Infrastructure team now builds Corda Next-Gen on an ARM instance on AWS as a nightly build, so we now officially support ARM at the build stage. Any ARM-specific problems not picked up during local development will additionally be flagged to R3 developers if this nightly build fails.
The rest is open to our customers. If you have an ARM-based project that you feel might benefit from using Corda Next-Gen, I would encourage you to give it a go. Our product managers would love to hear from anyone who’d like official support for Corda on ARM. For instance, at CordaCon 2022, two collaborating third-party organizations gave a talk on using Corda to enable pay-per-use transactions on embedded devices in the manufacturing sector. Perhaps your project could run Corda directly on such devices, reducing the need for external Intel-based hardware. Perhaps you have other ideas for it that we’ve not yet thought of. Maybe you just like playing with dev boards and think this sounds cool. Corda Next-Gen is open-source, so if you can play with it and invent new ways of using it for yourselves, please do so. Then be sure to let us know!