Behind the scenes of a non-producing
EOS block producer
On may 2018, we started working on our block producer infrastructure. We were commited day and night, trying to follow the move on a global scale. It’s a shame, but thanks to several bottles of wine and beers, baguettes and our famous french cheese, we did survive !
Our whole story started in Paris in november 2017 with BitConseil. We were amazingly impressed by EOS and Dan Larimer’s vision. ADAPP, as an infrastructure provider for many blockchain nodes, was ready to deliver this kind of infrastructure. We talked about how Ethereum’s smart contracts were able to ensure their termination through gas mechanism and resources allocation on a peer-to-peer network; but furthermore, we talked about EOS as a game changer. EOS introduces concepts and technologies that no other project was dealing with (in production) : Web Assembly (WASM), Ricardian contracts, low latency block confirmation…
Being able to interrupt a smart contract or any job in case of bad behavior, passively, actively or even with a court, is something interesting. Because when people know how to reach consensus explicitly, they want to trick the system. If it happens with a lack of procedure (as we already saw it), or a procedure not directly under control it is more difficult. Most block producers are focused on simplifying this process. This is well known “code is law” paradox, well described by “do I let code handle it or not ?”
This question bring us to another one even more interesting. How do you manage current software ? With a daily software operation style, you interrupt, patch or update, apply live or define a procedure to enhance a service. Direction and execution is inside your production team. With Ethereum smart contract system, you cannot terminate a SC. In EOS smart contracts, you can not stop a SC until you’ve elected someone to take position on its liveness.
Introducing this knowledge, if all actors running nodes for the network know themselves and are not evenly distributed around the globe, the project is just close to a distributed database managed by one admin or even worse, similar as what you can see with European SEPA on Cassandra.
franceos is kind of a newborn for the moment. We wanted to build a block producing infrastructure : a French actor to execute community choices on what should live on EOS or not.
As a BP candidate, it’s our role to describe our tech, and to give you a better understanding on what we do as block producer.
We started digging into EOSIO, Jungle testnet repository, and hkeos/bp_infrastructure repos. Based on their work on LXC technology, we decided to dig deeper and avoid using it…
We were admirative of their beautiful automation project. For example, selection tools like Wireguard (thanks to Jason A.Donenfeld’s work), and HAProxy (from eponyme french company) that we were already using daily. We were OK to give a try on Patroneos.
On the contrary, we prefer not to include MongoDB for the first steps.
Firstly, franceos relies on dedicated servers. As EOS Canada has a great partnership with Google, we got a great partnership with OVH solutions, as one of our team worked for load balancing solutions for this company. We know they are not perfect, like every hosting provider, but OVH Chief Technology Officer is transparent and very communicative when he has to explain reasons behind failures, and has always great action plans in order to never get in the same situation. We cannot actually say that for AWS :-).
Secondly, OVH is not binded to the Patriot Act, thanks to our datacenter based in France. OVH had protected famous WikiLeaks during its exile, until French government put its hands on it. Our EOS infrastructure will be easily handled with automation (we automate all we do). Cloud servers are a reachable investment for us, and as we automate everything, we will be able to switch from our provider to another quickly, or even go from cloud to bare metal infrastructure very fast.
If this cloud doesn’t deliver what we need for EOS network, we are ready to move.
Technically, we rely on 5 dedicated servers (bare metal but hosted by OVH), 4 x 64 GBytes of RAM, and a 256 GBytes one.
- One server is dedicated to peering and API ;
- One server is dedicated to secure peering ;
- One server is dedicated to peering and testing ;
- Two servers (master & slave) are dedicated to block production ;
- N smalls servers are dedicated to VPN, monitoring, alerting, access management,EOS Report services, auto-building new releases and deployments.
If you want to known how much (those main servers) cost to us, just verify it on OVH website, that’s self explaining ! All this stuff is self-funded.
Nothing fancy for a Block Producer infrastructure, but it’s always better than nothing. We also have a Jungle testnet peering node, on the same server as our backup server. And we still plan to build our own infrastructure during this time.
In our infrastructure assumption, we had several key points that we know a cryptocurrency community is aware of. But first, let see how hardware is an endless discussion.
BARE METAL POINT
Usually, when you have to propose a bare metal infrastructure, you need to consider pure dedicated infrastructure. A safe place, known as a datacenter, providing networking infrastructure and physical hardware. But as soon as you are on a global scale, you have to introduce multiple layers of firewall and protect your ISP. For us, relying on a single infrastructure is a point of failure for a block producer.
The argument of some block producers is well known in the community : servers should not be hosted by a cloud company, because EOS needs to stay independent, and resilient in case of censorship. Thanks to this philosophy, we have our last guardians, like LibertyBlock. It is not the first time : french communities and our team have to deal with unfair politics, use privacy tools targeted for deeply respecting it (remember ProtonMail attacks). But do you trust your equipment provider, like HP or Dell ? Do you trust your electricity provider ?
In our technical design specification, we also had to be vigilant in the fact that host providers are currently better equipped in order to give strong availability guarantees, like 99,98% on SLA and world class networking via connections to a huge number of ISP; they also have deals with electricity suppliers.
We found that with our actual resources, the most pragmatic solution is to get our own servers with an housing solution. But we are also planning to install theses servers in Switzerland’s mountains. It is not a fairy tale, we were committed to it since the beginning and are reviewing different options.
Starting with virtual machines in 2000 (VMware, Citrix Xen, Redhat KVM), virtualization is widely used. The usually perceived benefits for cloud companies was to abstract resources and deliver server-like access to shared host, with performance penalties due to virtualization. From these companies, lots of investments came to secure their hypervisor against attackers, and now virtualization is widely known as a secure solution. In fact, with virtualization, you tend to emulate a new computer, and as said earlier you’re protecting against compromised hosts. That’s why it came as our third layer of security. We used and tested widely known virtualization technologies, to deliver fine enough security.
In 2013, we saw another wide adoption with containers in the cloud computing ecosystem : it’s heavily related to Docker company. Docker shares several qualities with LXC, but more flexibility in terms of lifecycle management applications. Containers started as a technique introduced in Linux Kernel to box resources needed for a process. Inspired, as far as i know, by BSD Jails, a myriad of tools came into Linux’ tree with namespaces, network management, or sandboxing.
Two aspects are now part of container’s security world: you have bad behaviour detection services, and syscall filtering processes running inside the so-called container. We rely on CoreOS’ solution (RKT) for our second layer of security.
What we call active security is about working with EOS daemon, known as nodeos. Specifically, today we have producer plugins dedicated to the BP, chain plugin which do all the work related to chain management and history plugin, which deals with transactions and actions. We have to detect new or unknown behavior from daemon and be ready to react. We actively monitor our processes and backup our history daily. Being part of JungleTestnet, we are testing new releases, and deploy them when we are confident enough on the quality of the binaries.
This is the first mandatory layer of security.
API & BP JSON
We decided not to take CloudFlare protection. When you considere it, cloudflare chief says that he has too much power over Internet. Even if it is in good faith, we know that centralisation leads to arbitrary decisions and possible leaks. We use standard Let’s Encrypt and plan to have our own EV certificate. We use anti-DDOS from OVH. Local communities use our API endpoints and we want to enhance them with new calls, to enable more developments.
EOS comes with different tools. As usual in crypto-currencies, you have multi-sigs to manage these kind of infrastructure and share responsibility between every people involved in the project. Thanks to EOS features, like permissioned roles on keys system, we have developed our own management process for owner, active and signer keys, claiming key. This nice feature is already used by our team to manage upgrades, and are part, of our process when switching to Baremetal servers in case of bad events.
Concerning hardware and securing signing keys, we looked at HSM modules and Yubikey but we actually consider our solution.
PLANNING THE UNPLANNED
What about Spectre, Meltdown and all future linked attacks ? We consider that servers running in front of Internet with TCP/IP access are, by design, vulnerable and unsafe. And so our system is similar to other BPs’, with a block producing node totally independent of his Internet connexion. We widely use private networking and all suggestions that we receive from our community.
Black swans and planning the unplanned is part of our job. As described by Nassim Nicholas Taleb, we tend to do it like for an airplane. We started to define basic redundancy of 3 everywhere. 3 BPs with a seperate version, 3+ firewalls, 3+ peers, 3+ IPs, etc. I has for sure a cost, and we select those layers of redundancy one by one.
It’s even true for CPU or RAM, as a resource provider to our community, we have to plan resources distribution. You have maybe read this story of Intel 3D memory known as Optane : we are working on using it in EOS.
Lately we introduced EOS Report to provide great insights and network information to our community. This tool is still in its early stages, but we want to open it more and more. We want to share with community more ways to measure EOS network’s healthines, robustness and responsiveness metrics.
Several challenges are on the way. Winter is coming. We are still here, and we are here to stay !
Win 1 000 franciumtoken airdrops the first 42 august 2018.