If you can‘t stand the heat, get out of the kitchen

Don Travlos
5 min readSep 10, 2018

Never have I felt this more than in the last few weeks, since launching our delegation service: You loaf: We bake. But as I always tell people — I have never been beaten by a computer, no matter how tough the going gets.

Along the way I am pleased to say that we learnt a lot of lessons and no, its not too hot for me in the Tezos kitchen! Our philosophy is: Tezos is in beta so push the system so that when the launch takes place, the ecosystem is strong enough to handle all eventualities. We’ve put in a bunch of back up systems: we can carry on baking even if part of our infrastructure falls over.

At outset, we also said we’d only start charging fees after Cycle 25, on the understanding that there would obviously be glitches. YL:WB is the only delegation service which still has a 0% fee. It will be this way until we’ve ironed out all the kinks. We’ve learnt a lot along the way and can now reach that magical 100% uptime milestone. The challenge has been balancing the costs of ensuring adequate redundancy vs. potential returns. The Tezos baking system is calibrated to a 5.5% annualised return, leaving little room to manoeuvre to keep a positive return.

I am grateful to all our delegates who stayed on despite a rocky start.

I think we’ve had bad luck — In my 30 plus years of working with computers, I’ve never had such a run of system failures. It started with a hard drive failing so I decided to run two nodes and ordered a second computer. I now have one baking actively and the other machine running in the background to take over baking if the first fails. It’s important to set it up correctly though — the punishment for double baking is severe.

No sooner had I commissioned my second node, I had an power failure in the middle of the night … not a major power cut affecting our whole neighbourhood. No…the 13A fuze in the plug triggered. The timing wasn’t great — I only picked it up in the morning and by then I had lost some endorsements.

I’ve have also been getting to grips with the nitty gritty of the node software, tinkering with the P2P system. I realised that the longer your node is online the more robust it becomes. So running two nodes continuously is good practice. If you only start the second node when the first falls over, you risk missing some action while it links into the network. With a single node you also risk it being corrupted when you start up again. Also it can take time to download the whole chain meaning lost endorsements and missing a block. Not to mention, ISP’s monitoring data activity and interpreting high volume as illegal streaming and throttle you. This happened to us but fortunately it was resolved with a single call to our service provider.

Another point of failure that I identified was the internet connection. I’d be more than happy to pay for fibre, but Manx Telecom does not offer this in my area — despite being located less than 200m from the exchange. (I throw my hands in the air and cry — Island Life!!!). Having two internet connections is like running two nodes — redundancy.

If that wasn’t enough, while I caught up my “zzzzz” at 23:38 on Friday night, the online baking machine froze. I checked all the logs, it wasn’t the Tezos node, no internet failure, no hack. I’m not running a rinky dink no name brand machine: it’s an Asus with Samsung RAM and Samsung 1Tb SSD which is nowhere near full. I ran a full diagnostic on the RAM: no problems. How do you see if the machine has frozen without physically eyeballing or pinging it every hour. This isn’t realistic for what should be a relatively passive delegation service after initial setup. The best solution is human intervention but I am only human and do need to sleep! These types of hardware errors should wash out over time. Honestly… what’s the probability that it keeps failing, and always in the middle of the night? During the day, I would have picked it up much faster as I regularly check.

It’s becoming increasingly evident to me how different this is to mining where my machines seemed to hash faster by hard resetting the machines often. I could set them to restart automatically. This is risky with Tezos because of double bakes. I could run the service from a tier 4 data centre but there’s no guarantee of 100% uptime either. Basically we have to accept that 100% uptime is not achievable, just like in POW mining. Realistically, the longer you go, the more likely you are to have an outage.

Anyway, the good news: I paid out the first rewards last week and until I have this thing bedded down, there will be zero fees.

So where have we got to? I now have two machines running nodes, both on a UPS which can run for a few hours, with a backup internet service. I have also dusted off a laptop which we were no longer using, to run a third node with its own inbuilt UPS. That way if my kid trips his gaming computer in the attic, it doesn’t impact my Tezos baking and there is no risk of starting the Third World War!! I also have the option of moving the laptop to a second site with its own independent power and internet supply!

Apart from many hours dedicated to harden our system and spending a bit of fiat, what did we lose? Two blocks and about ten endorsements in all. Against the ±900Tz we earned since launch it’s peanuts. And remember we still aren’t charging a fee. Don’t be fooled by other services that are not open about having issues, I am pushing the system, trying to break it. Break it I am but I am coming out stronger with a deeper understanding of potential issues.

I’ll leave you with a few cheesy puns… At YL:WB we’re making sure we’re not running on any half-baked systems. We won’t sugarcoat it — we’re transparent about all the problems we’ve had so far. Baking on Tezos is not a piece of cake. With all our redundancy we’re certainly not putting all our eggs in one basket and we’re committed to bringing home the bacon for our loafers!!!

--

--