Murray's Blog

Installing Home Batteries - Part 3

2024-08-26T14:00:00.000Z

Running on battery power in my home.

Background

I have installed batteries in my house, which will run low power devices overnight: fridge, freezer, networking gear, the server hosting this blog.

I’ve been running them for a month, so lets see how well it works (or doesn’t)!

Does it Work?

TL;DR: Yes it does!

I am able to run lighting, fridge, freezer, networking gear, servers, phone & laptop chargers, and a few other devices.During the day, the AC200L and B300 units charge when the sun is shining.And, at night, all the above run on battery, making my overnight consumption practically zero!

Gotchas

Of course, there are a few gotchas and teething problems I needed to solve along the way.

Washing Machine

Our washing machine only accepts cold water, and has an internal heating element for heating water.I’ve always used cold cycles for washing clothes (with the exception of cloth nappies from when our children were young). I had measured consumption of a washing cycle and found it was quite low (~0.2kWH).So, I had thoughts of running the washing machine on battery.

Alas, there were two problems.

First, even on a cold cycle the machine runs the heating element for a minute or two when the cycle starts.And this manages to pull very close to 10A.With only a few other appliances on battery, this is enough to trip breakers and overload the AC200L’s inverter.

Second, my original measurements didn’t take power factor into account.Turns out a washing machine is mostly a big inductive motor.And that makes for a poor power factor - so 0.2kWH is actually more like 0.5kWH.

So, washing machine stays grid powered.Oh well.

Timing Charging

While the AC200L accepts DC input from photovoltaic sources, my solar panels remain connected to the GoodWe inverter installed with the panels.This means I’m not charging from solar as optimally as I could.Instead, I need to time when the sun is up and I can charge, vs when the sun goes down and I run on battery.

Fortunately, sunrise and sunset are quite predictable, so timing is pretty easy.Either based on some moderately complex maths, or simply looking at the graphs from Home Assistant each week and adjusting as required.

Note that Bluetti has a whole house system which is a proper hybrid inverter, able to balance solar input, batteries and the grid.Alas, such a system is more expensive than what I just bought - and these batteries were expensive enough!

Sunny vs cloudy days

While sunrise and sunset are easy to predict, cloudy or raining days are much harder.

Well, its easy to check the weather each morning, but the effort required to modify schedules each day (and the risk I’ll mess something up) means I don’t bother.I just accept that a cloudy or rainy day is more expensive than a sunny one.

The more annoying thing is the way some light cloud cover can actually raise my solar output during the middle of the day.This is because of shading which lasts a few hours.

In full sun, the output drops to ~500W.But in light overcast, the output actually increases to ~1200W!

I suspect solar panels perform best when there is uniform solar radiation across the whole panel.Full sun makes for harsh shadows.While light overcast acts as a giant diffuser, and shadows become less pronounced.

Note that rain is terrible for solar - output is as little as 20% when its raining all day.

Death By 1000 Conversions

There are a lot of losses in my current setup:

Solar power is DC
Which is converted by the inverter to 230V AC, to integrate with the grid
Which is converted back to DC to charge batteries
Which is converted back to AC when batteries discharge
Which is converted back to DC for many appliances (servers, routers, TV, etc)

The saving grace is, once you pay to install solar, energy is effectively free.So, even if I can only use half the photons which hit my roof, that half is cheaper than buying from the grid!

Another part to inefficiencies is power factor.The clamp meters I used to estimate usage do measure power factor, but they didn’t include it in the power consumed (their Watts measurement).On the other hand, the clamp meters do (they match the data from my smart meter).

DC devices like routers, computers and TVs have quite poor power factors (well, at least mine do).Often around 50%!Which means yet more power isn’t used to its full effectiveness, and the batteries discharge faster than I thought.

Clamp Gotchas

A few days after the original installation, I installed all the clamp meters.This let me monitor pretty much every circuit in the house, so I have an excellent idea of where power is being consumed.Wonderful!

I thought it would be best to plug all 6 clamps into one of the batteries, so that, if there was a power outage, they would continue to run.

Only problem is, I got some really strange measurements:

While I was running on battery, the clamps measuring total consumption and solar generation went crazy!Showing negative and positive wattage.

It got stranger!One of the clamps measures a sine wave on a 2 hour period:

According to that, the bedroom and living circuit oscillated between positive and negative 400W over a two hour period!That just shouldn’t be possible!

Turns out I didn’t understand how AC power works.

The clamp meters have 3 inputs.Two are the physical clamps themselves.the other is AC 230V, which is used to supply power for the clamp firmware and WiFi, and also to measure voltage.

Power = Current x Voltage

That’s easy for DC circuits.But more complex for AC power, because the voltage of AC is varying from positive to negative 230V fifty times per second (50Hz).

The clamp measures voltage to calculate power.But the voltage of the grid is not exactly 50Hz, it varies by a fraction over time.And when the battery inverter is running, it is generating its own 50Hz waveform, slightly different to the grid.So, if the clamp is measuring voltage based on an inverter, and current going to the grid, these will be out of phase.So, sometimes the clamp thinks the voltage is negative when it is really positive, and thus thinks power is flowing in reverse.

The sine wave is when a clamp is using voltage from one battery to measure power from a second battery.Although identical models, the inverters are very slightly out of phase.Based on the 2 hour sine wave, they differ by ~0.000003Hz.Which I think is pretty impressive - but still enough to mess with my clamp meters!

The fix is easy: power clamp meters from the same circuit they are measuring.Then I get nice clean measurements:

Leaking Power

I originally configured batteries to run on a schedule within their own app.For the duration when I know my solar is producing power, they are configured as “off peak” and will charge from the grid (which is really my solar).When my solar is not producing power, they are configured as “peak” and will run on battery only.So the “peak” time is shortly before sunset until a little after sunrise.

My clamps (after I fixed the out-of-phase problem) showed a small, but consistent, export of power overnight - around 90W.And this was confirmed via my utility company smart meter data.

Exporting overnight is nice, if you have the capacity.Unfortunately, my little 5kWH batteries don’t have enough capacity to last overnight with my internal load.So, adding an extra 90W makes a significant impact overnight.

I’m told by others who have batteries, this is normal.Even when batteries are “off”, they aren’t really off.

However, if you physically disconnect my batteries from the grid, they will operate as a UPS.When there is no physical connection to the grid, they cannot export anything!

My solution was to use two of my plugin meters as timed switches.They can act as a “smart switch”, which is a fancy label for an on/off switch controlled by an app.Just so happens you can configure the switch on a schedule.

My actual schedule is a combination of Bluetti and Tuya configurationThe batteries never operate in “peak” mode, only “UPS” or “off-peak” modes.And the Tuya schedule controls when the grid is on or off.So the three states are:

Grid Off: overnight when grid power is expensive and my solar isn’t running, so run on batteries.
UPS: early morning and late afternoon where I consume my solar, but there isn’t enough power to charge batteries.
Off Peak: during the day when solar is producing and power is cheap, this is when I charge batteries.

Not Enough Energy

My bedroom and living room circuit powers my network gear, servers, a bunch of phone chargers, a desktop PC and our TV.All together, these might draw up to 400W, and has base load of ~220W overnight.I don’t want to run the batteries down to zero; 20% is what I’d like to keep them above.And that load is too much to run from ~4pm until 8am.The maths says just the base load would use ~65% of battery capacity, and reality says it will be worse.

My preferred solution would be to purchase another 3kWH B300 expansion battery.Unfortunately, my budget has already been spent, so I’m stuck with what I’ve got.

Instead, I will charge overnight for an hour, when prices are cheapest (around 2am).That is enough to keep the battery above 20% charge, still makes good use of day time solar, and avoids drawing from the grid in peak times (just before sunrise, just after sunset).

Servers

I have a number of servers which run 24/7 and contribute to the 220W base load on the bedroom and living circuit.Some are quite low power - they are actually laptops!Others are more powerful desktop PCs - they include a Minecraft server, and a TrueNAS box.

The TrueNAS, in particular, draws ~80W on its own (about one third of the base load).

I tried to re-arrange background tasks to run during the day, automatically shut the server down at night, and automatically restart it in the morning.

However, for some reason, the BIOS power-on timer never worked reliably.And I ended up with a server which was drawing 80W, and yet wasn’t running - the worst of both worlds!

It just runs 24/7 now.

Limited High Draw Appliances

While my primary use case is running low draw appliances, I have experimented with a few high draw ones.

I already mentioned the washing machine, which was a fail.

But my laser printer (which is usually a big no-no for powering off a UPS) works very nicely on 100% battery power.It does dim the lights for a moment when printing starts, but everything works fine after that.

Many kitchen appliances can also run off the batteries.But only one at a time.Things like the kettle, air fryer, induction cook top, and sandwich press have all been tested OK.They draw up to 1800W, for up to 15 minutes.That does hit the state of charge a little, but not enough to impact operation over night!

I’ve run an extension lead into the kitchen so we can use appliances in a blackout or when power prices are high in the evening.

Micromanagement

My biggest problem is the urge to micromanage everything!

I’m a programmer and computer nerd, so optimising is second nature.When I can tweak settings and schedules and appliance usage to save a few Watt-Hours, its very tempting!

I. Must. Resist.

Until the day I get a hybrid inverter which dynamically controls grid, batteries and solar, I will never have a perfectly optimised system.I need to accept that better isn’t perfect.But its still better.

Do They Save Me Money?

Maybe.

If they do, it isn’t all that much using a regular energy company which charges a flat rate for consumption & export.That is around $100 per month, in Winter (and a bit less in summer).

July was my lowest consumption ever, according to my smart meter data.(Even if my usage is regularly low anyway).

I’m definitely being friendlier to the grid.Because I’m not drawing as much power during the dawn and dusk peak periods.And I’m drawing practically nothing over night (with the exception of ~1.2kWH to partially charge one battery).

To try and save a bit more, I have switched to Amber Electric, which passes wholesale prices onto consumers.In theory, that means charging during the day is dirt cheap (prices are low), and discharging overnight is also cheap (because batteries).We’ll see if that helps or not over the next few months.

Of course, on the day I changed to Amber, the wholesale price jumped to scary levels!!

So, my regular energy company would cost a few dollars per day.But the first day on Amber cost $34!On the other hand, Amber usually costs less on other days.

I know Amber is best for people who export power during the peak periods.I’m not really trying to do that, and my batteries only export by accident.

We’ll see how it all averages out.

Conclusion

I’ve purchased and installed batteries which run my low power, but long running appliances.The installation is done, it looks great, and I can run my chosen circuits on batteries alone.I’ve even switched over to a different power company.

I seem to use less power. I might even be spending less money.

I’m definitely being friendlier to the grid - avoiding using power in peak periods when prices are high.

It might also be because I’m thinking more about power and so micromanage and optimise. But I guess that’s a good thing!

Installing Home Batteries - Part 2

2024-08-01T14:00:00.000Z

Second step to installing batteries in my home: buying batteries.

Background

I want to install batteries in my house.

In the previous article, I described how I used Home Assistant and energy meters to gather data about my energy usage.This has given me the information to chose the right batteries, which give the capacity required for my needs.

Just a reminder, I want to power low power devices overnight.Things like fridge, freezer, networking gear, and server hosting this blog.

Choosing Batteries

The maximum load of the devices I’d like to power is 500W, and overnight is about 14 hours in winter time, which is ~6kWh.That means I need a small inverter and large batteries, because that 500W load adds up over 14 hours.

I have been researching the idea of batteries for over 18 months, coming at it from several perspectives.

The most obvious option is the “whole house” batteries.This is your Tesla Powerwall, or Sungrow Batteries, or others.When I got solar panels installed, my utility company quoted about $10,000 for a 9.6kWh Sungrow battery.

The big problem I have with these is they are big and expensive and aren’t always upgradable.This means big capital purchases for physically large batteries which are difficult for me to find a home for.

So, I tried approaching the problem as an IT nerd: I already have battery backups, they’re called a UPS.The $150 UPSs I’ve bought to protect my networking gear and severs have cheap sealed lead acid (SLA) batteries.But, because the load is so low, they often last ~2 hours in a blackout.

Maybe I can scale up to a big UPS?

Turns out this isn’t a great idea.The SLA batteries have a lifetime of ~3-5 years before they basically don’t work at all.While they are pretty cheap to replace, I’d need to replace them more often than I’d like.There were a handful of lithium based UPSs, and they had a 2-3x price premium over their SLA counterparts.

I did some research and found some large UPS systems.They were still expensive, big, tricky to install, and they didn’t even look like they had enough capacity to last overnight.

Finally, large scale UPS systems are designed to run on batteries for a short period of time (a brownout or 15 minute blackout) before a generator kicks in.And they charge slowly - so slowly I wasn’t convinced they would recharge during the day after discharging overnight.

My final, and most crazy idea was: built it myself!

That lasted for about 30 minutes.I’m no electrical engineer, and the idea of working with large lithium ion batteries, busbars carrying 50A+, hooking up charging circuitry, connecting an inverter, and dealing with 230V outputs was just too scary.

I didn’t have the time, and also didn’t have the expertise.

Eventually, I found a curious product: the portable power station.

Portable power stations are pitched at camping, worksites and off-grid living.They have inverters ranging from ~1kW to up 5kWs.They use lithium chemestries.They are designed for running long periods on batteries alone.And they don’t cost as much as “professional” UPSs.

Essentially, they’re a big lithium based UPS, which seem perfectly designed for my use case!

This eventually lead me to Bluetti, which had a great combination of price, features and upgrades.

Bluetti AC200L + B300

I chose the Bluetti AC200L plus a B300 expansion battery.On paper, this gives me ~5kWhs of lithium iron phosphate (LiFePO) battery capacity, a 2.4kW inverter, USB and DC outputs, all the charging smarts from AC or DC inputs, and an app to control it all.

There are a bunch of videos about the AC200L out there.The best one goes into considerable detail about all capabilities, specs, benchmarks and gotchas. Just a note: that’s a 90 minute video!

Here are some unboxing pictures:

And how the unit looks, when I was testing it (before any electrical work).

Oh… and I decided I’d need two to safely cover my needs!

The AC200L cost $2,400, and the B300 cost $2,500, with some end of financial year discounts. Two of each total to $9,800 for ~10kWh of nameplate capacity, or a little under $1 per Wh, which is slightly better than average in Australia.

Specs and Capabilities

Let’s walk through what the power station can do!

AC and DC Outputs

The core of any portable power station or UPS is AC outputs.The AC200L has 4 x 230V, 50Hz AU plug outputs, and can supply up to 2400W in total.My needs are usually in the 200-400W range, but there are occasional spikes (eg: using a laser printer).

On the DC side, there are 2 USB-A outputs rated at ~30W between them and a USB-C output which can supply up to 100W.These are nice to charge phones and tablets overnight.And, I’ve run my son’s laptop off the USB-C output while playing some games.

Both the AC and DC outputs have an “eco mode”.Which is a timer to turn the outputs off after a few hours when the output falls below a configured level.This is because there is some overhead running the AC inverter & DC circitry, even if nothing is plugged in.As I’m running these overnight to replace the grid, the AC eco mode got disabled.But DC is set to an hour, which lets me charge devices without the risk of over charging.

There is a 12V DC output, which I’m unlikely to use.And a magic 48V DC output, which is very tempting to run to my appliances, as about half of what I’m powering would prefer DC over AC (think computers, smart TV, and network gear), and I’m sure there are significant losses converting between AC and DC several times over.Alas, it requires a proprietary Bluetti dongle thing, which isn’t available in Australia.

AC and DC Inputs

On the input side, you can connect various DC sources - 12V batteries or solar panels via an MPPT controller.I’m not likely to ever use those though.You can also connect to the grid using an AC cable (not a standard IEC cable - I suspect this is so Bluetti can charge at up to 15A on 110V grids).

One tiny gotcha I found was turning the device off: you must switch off the grid input and disconnect before the AC200L will power off.

Status Screen

You can see status via a nice bright LCD.This isn’t a full graphical LCD like a phone, but fixed function - like what you’d see on a microwave.But it is clear, bright, colourful, and more than functional.Current level of charge, time to full charge (when charging) or zero charge (when discharging), and current AC and DC input and output (in Watts) are all clearly visible.

There are three simple buttons to control the AC inverter, DC, and USB outputs.Press any button once and the display lights up, and a second time to turn it on / off.

B300 Expansion

The B300 can be used as a stand alone DC battery, with USB and 12V outputs, plus an MPPT controller for charging.However, I’ve always connected the B300 as an expansion battery, and the main AC200L controls it automatically.

The AC200L can use 2 x B300 expansions.Other Bluetti units can link with up to 4 x B300s, or use different expansion units.I may take advantage of an extra B300 in the future, but for now, I’m running one AC200L and one B300 together.

Weight and Size

The units are heavy, and just barely portable by one person around my house (~15m).Exact weight specs are on Bluetti’s website.If I was taking these camping or moving them around, I’d want two people to lift them, or even a trolley (Bluetti actually sell a trolley!)

The size is just about perfect to fit under some steel shelves in my garage, near my switch board.The B300 is wider and flatter, while the AC200L is a bit taller and narrower.Exact size specs are on Bluetti’s website.

App

The AC200L has app connectivity.This is via BlueTooth, which is entirely offline (essential if you’re taking the unit camping), or via WiFi, which hooks in to Bluetti’s cloud service and requires an Internet connection.

Many other sources have the basic app operation, so I won’t repeat many details here.Suffice it to say, pretty much all the on-screen data and on / off buttons are duplicated in the app.

One gotcha: you can only update device firmware via BlueTooth.I only connected via WiFi for the first few weeks, only to find there were a bunch of updates when I first checked via BlueTooth.

My usage is geared toward controlling how and when the units charge and discharge, so here’s more detail about that.

App - Charging Mode

Charging Mode has three options - Standard, Turbo, and Silent. Turbo tries to charge as quickly as possible, even if this might reduce battery life - I’d rather protect my expensive batteries so haven’t tried this mode. Silent restricts charging to ~600W; it isn’t entirely silent, but definitely runs fans less often. Standard is what I settled on: it charges at ~1200W from the grid. I may change over to Silent in Summer when there are more hours of sun in the day.

App - Working Mode

This is where all the action is.There are a few options, but I chose Custom because it enables all the possible settings.

The SOC Setting controls what percentage the battery will charge and discharge to.

The Time of Use slider enables the Schedule, which (as far as I’m concerned) is the most interesting part of the app.

Schedule lets you add 6 scheduled daily durations. Each of those durations can be either Off-Peak, or Peak. Any remaining time is blank, but I’ll refer to it as UPS. Here’s what they do (assuming plugged in to grid):

Off-Peak = the unit will charge as fast as other settings allow (Charging Mode and Max Charging Current of Grid are the relevant settings).
Peak = the unit will run on battery, not attempting to charge at all.
UPS = the unit will pass power from the grid, but not attempt to charge. Essentially, it will be a UPS.

There is a YouTube video which details all possible combinations of Working Mode with AC & DC charging, if you’re interested.Note: 50 minute video.

App - Advanced Settings > Max Charging Current of Grid

This has been detailed on YouTube videos, and is essentially a way to restrict (or increase) the maximum current the device will pull from the grid. In Australia, we can safely pull 10A from standard power outlets, but you can ask Bluetti Support for a magic code to increase that to 12A (I don’t have a need to charge that fast, so haven’t bothered).

I experimented with reducing the maximum current for a few weeks, but have ended up putting it back to the default 10A.

App - Data Logging

Not long after I purchased the devices (and perhaps coinciding with firmware updates), several time-series graphs appeared:

Daily Power Profile: this measures Watts for AC & DC inputs, plus AC & DC outputs.
Energy Statistics: this measures kWhs for AC & DC inputs, plus AC & DC outputs.
SOC Trend: this measures the percentage of charge.

I tend to keep an eye on SOC Trend, and ignore the others, because Home Assistant gives me that data.

Tear Down

I’m not about to void warranties or destroy my brand new batteries. But someone else has - yay for YouTube!

Oh, and I’m very glad I didn’t try to roll my own after watching that video. There are a lot of smarts in the AC200L.

Electrical Work

It’s all good to have batteries.But I need to connect them into my home circuits for them to be used.While its possible to run various appliances from the power stations directly, that gets pretty tedious.It also means I cannot run important things on batteries like lighting.

So, I engaged an electrician to replace the switch board, add a change over switch between mains and batteries, and add outlets to charge and connect the power stations to some circuits.

A big shout out to Daniel from Lighting Electrical and Communications, who did this work very professionally and to a very high quality.And put up with my nerd-ness doing work which was a bit unusual!

There are three circuits which can be powered from batteries, and one last area which runs directly from the power station via extension lead:

Lighting (UPS 1)
Fridge (UPS 1)
Freezer and home office (aka, the garage, powered directly from UPS 1)
Living room and bed room (UPS 2)

The main switch board was upgraded to be twice the size, allowing me to add clamp meters on almost all circuits.The only thing I couldn’t clamp was the wire coming from the grid - I might have another go at that in the future.

The change over switch allows switching between mains and the batteries for the above three circuits.This is a manual switch, and I’ve operated it exactly once - its now permanently set to “UPS”.Although I plan to always be running from batteries, it is important to have a bypass switch, in cast maintenance needs to happen (and I’m sure it will be needed at some point).

Finally, there’s a dedicated 20A circuit going to a power outlet to supply the power stations.And two sockets which run back up to the switchboard.The supplied cables are used for grid charging, and some short extension leads run from one of the AC outlets back to the switchboard.

The installation took a day and a half.Most of that first day the power was off while the electrician did his thing.And I was able to run everything from the AC200L!

Installed Batteries

Here is a picture of the whole setup.

Although the power stations think they are always charging from grid, I use the schedule feature charge when the solar panels are generating.This is entirely based on timing, but the sun rises and sets on a pretty reliable cycle, and all I need to do is check every week or two and adjust as required.

And the boring part is: it Just Works™.

Through the day, the batteries charge.In the evening, they power our lights, fridge, freezer, servers, network gear, and phone chargers.

OK… there were some gotchas and glitches, but I’ll address those in the next article.

Conclusion

I’ve purchased and installed batteries which run my low power, but long running appliances.The installation is done, it looks great, and I can run my chosen circuits on batteries alone.

So far, so good!

The real question is: does my setup actually save power and money?

I’ll answer that in the next post!

Installing Home Batteries - Part 1

2024-07-09T14:00:00.000Z

First step to installing home batteries: gathering data.

Background

In Australia, it looks like the future is electric and powered by renewables like solar and wind, backed by storage (hydro and batteries), with a bit of gas in the mix as well. Possibly even nuclear, if the Liberal Party wins the next election.

In any case, coal is on the way out. Gas and petrol are getting more expensive.And electric vehicles are (slowly) gaining popularity.

I got PV solar installed on my roof top in January 2023, and would love to get some batteries to store that solar energy for use over night.Alas, PV panels and an inverter are relatively cheap, but batteries are expensive.

But recently, something changed: the authority which manages most of the Australian energy market - AEMO - has hinted at supply shortages as coal power plans are phased out. As early as this coming summer (2024-2025).

We rarely have blackouts in Sydney, but on the odd occasion we do, its really inconvenient.Pretty much everything (except for heating water) in my house depends on electricity!(And water heating will be electrified when my current gas fired system fails).

When my mother-in-law generously gifted us a large sum of money, I thought now is the time to get started with some home backup batteries!

Goal

Install some batteries as a partial backup system in the event of a power outage.And, to be able to charge said batteries during the day when the sun is shining (effectively for free).And, to consume that stored power over night, when I’d otherwise be paying for electricity.

Even with my mother-in-law’s substantial gift, I can’t afford full house backup (and there are other things we also need to use that money for aside from my pet projects).I’d like to cover a number of important items:

Lighting
Fridge and freezer
Router and servers

Other low draw appliances are nice to have (TV, charging laptops and phones), but not essential.

Out of scope are all high draw appliances:

Air Conditioning
Heating
Cooking

These all require a substantially larger inverter and batteries than I can afford.

I don’t want something I need to build myself, but I would like something I can maintain.I’m not a licensed electrician, so don’t want to be messing with 230V AC or large lithium battery cells - out of the box is quite important.

Having said that, I know I’ll need to get an electrician involved, because the batteries need to be wired into existing circuits.

First up, I’m a nerd, so I need to gather data!

Plugin and Clamp Meters

So I purchased some devices to log power usage.Both were from Aliexpress: a plug in meter and a clamp meter.The former lets you plug in any appliance and measure power consumption, and the latter is a sensor which clamps around the active electrical wire in your switch board to do the same thing.

I used plugin meters to monitor things like my fridge, TV and servers.While the clamp meters were for lighting, air conditioning and all my appliance circuits.

These are all WiFi enabled Internet of Things devices, which use the Tuya backend.That is, they are relatively cheap devices based on a Chinese IoT platform with some standards for data gathering.

Aside: I created an isolated IoT network on my Mikrotik AP.Because the “s” in IoT stands for security, and I don’t want cheap devices, which will almost certainly end up with security issues in their lifetime, sitting on my main network.

Fun fact: every single IoT device I’ve purchased (from the above devices, to my solar inverter, and even the battery I finally chose) only supports 2.4GHz WiFi.I guess they don’t need much bandwidth - but I’m glad every other device in my house is 5GHz.

Home Assistant

Now, I initially purchased these devices knowing there a C# library out there which can link into the Tyua Internet of Things world.But turns out I didn’t have time to write a data logger application, keep it running 24/7, or visualise the results.

So I installed Home Assistant.

There are already instructions out there for installing Home Assistant, so I won’t bother doing a step by step here.I chose the “core” installation for my Debian server without Docker - apparently this is advanced stuff, even if its what I’ve been doing for the last ~15 years.The most annoying thing is it doesn’t auto-update, so every month or two I need to manually follow their update instructions - the hardest part is remembering.

I used the LocalTuya plugin to integrate with the Tuya devices.And within an hour, I had some pretty graphs on screen!

The Data!

Finally, some data!

I spent a good 6 months gathering data, so I could see any patterns in Summer vs Winter.And here are some things I found.

Solar Power is great in Summer, and sucks in Winter

(Note that Home Assistant deletes high precision data after ~14 days and replaces it with an hourly approximation, so the Summer data looks less detailed in these graphs).In Summer, there are 3 more usable hours of sun as compared to Winter. And, that sun is more intense, give more usable kWHs of power and a higher, longer peak.And, there is less shading over my panels (that big dip in the middle of Winter).

In Winter, I might generate 15kWH on a good day, while Summer can exceed 45kW.

In short, if I can charge batteries in Winter, then I should have no problems in Summer!

Plugin Meters Show Low Draw Appliances

This graph shows usage from a number of low draw (low wattage) devices including my fridge (orange), freezer and home office setup (cyan), servers (purple), and TV and network (blue).These devices rarely exceed 100W individually, and total about 500W at their peak (around 6-7pm).

They are my prime targets to power by batteries overnight, after said batteries are charged from solar during the day.

I’m going to lump lighting in under the low draw banner as well, although it was measured via a clamp meter.

If we assume a conservative draw of 500W continually, and 16 hours in Winter when we need to run batteries, we get around 8kWH required each night.This number will be important when choosing batteries.

Clamp Meters Show High Draw Appliances

These graphs are for appliances (covering two 20A circuits for all plugin appliances), and air conditioning.Unlike the low draw appliances, these have some very high peaks (over 4kW for appliances), and sometimes sustained high draw (air conditioning in reverse cycle runs at 1-2kW for several hours).

Note that the appliance graph actually covers all the plugin meters! But their usage is, essentially, base loadwhich peaks at 800W, and is more usually under 500W.

These appliances are mostly in our kitchen - the kettle, toaster, air fryer, microwave, and induction cooker.They are used for short periods of time, but draw 1.5kW+ each.

These won’t be targeted for batteries (at least not right now), because that will cost double what I’m prepared to spend now!

Evening Peak

All these graphs show peak usage is between 4pm and 9pm.When everyone is home after school & work, and cooking dinner, and relaxing before going to bed.

Unfortunately, this peak is just after the sun sets in Winter, and has only partial overlap in Summer.

There’s some great data from AEMO and other sources, which show energy demand, and supply by source, and pricing.This shows a common pattern in NSW: power is cheap during the day because solar is ridiculously cheap (sometimes the price even goes negative). But the sun goes down right when demand spikes in the evening, leading to high wholesale prices from 5pm-9pm.

This is the prime use case for batteries - store power during the day when the sun delivers cheap energy during the day, and use that power in the evening.

Conclusion

So far, I’ve gathered data about the details of my power usage.From this information, I have chosen the high value, low draw appliances which I’d like battery backup for: lighting, fridge & freezer, networking, and servers.And, I know approximately how much battery capacity I need to run overnight: at least 8kWhs.

Next up: choosing and purchasing batteries.

Building a CPRNG called Terninger - Part 17 Persistent State and Entropy Source

2024-02-26T13:00:00.000Z

Enhancing PingStatsSource with persistent state.

Background

You can read other Terninger posts which outline my progress building a the Fortuna CPRNG, or see the source code.

So far, I’ve put Terninger into production in makemeapassword.ligos.net. And persistent state is working in the core generator

But I’d really like to use persistent state in entropy sources for fun and profit!In particular, PingStatsSource has a limitation which can be overcome using persistent state: there is a hard coded list of servers it pings, which needs regular maintenance.

Goal

Wire up persistent state with any entropy source. The source will be initialised on load, and included when state is saved.
Enhance PingStatsSource to track which servers are working using persistent state.
Enhance PingStatsSource to discover new servers by randomly scanning the Internet.

Wire Up Persistent State to Entropy Sources

The convention is: any IEntropySource which also implements IPersistentStateSource will have persistent state loaded & saved automatically by PooledEntropyCprngGenerator.

There is already support to initialise objects from state on loading, so this should be pretty easy!Simply call this method during initialisation:

private void InitialiseEntropySourcesFromPersistentState(PersistentItemCollection persistentState)
{
if (persistentState == null)
return;
Logger.Trace("Initialising entropy sources from persistent state.");

foreach (var source in _EntropySources)
{
if (source is IPersistentStateSource sourceForPersistentState)
{
sourceForPersistentState.Initialise(source.Name);
Logger.Trace("Initialised entropy source '{0}' from persistent state.", source.Name);
}
}
}

That’s easy!

Except, you can add additional IEntropySource objects to the generator after it starts.And these should also be initialised.Exactly once.

That’s a bit more tricky.

Fortunately, there was already SourceAndMetadata, which combines an entropy source with some additional data (eg: has it thrown exceptions, does it complete synchronously or asyncronously).So I added an IsExternalStateInitialised field, and wrapped the initialisation into InitialiseFromExternalState().That gets called every time we poll the source, and will return early if its already been initialised.

public void InitialiseFromExternalState(PersistentItemCollection state)
{
if (IsExternalStateInitialised)
return false;
if (Source is IPersistentStateSource sourceForPersistentState) {
sourceForPersistentState.Initialise(state.Get(Name));
}
IsExternalStateInitialised = true;
}

To save state, I added GetPersistentStateOrNull(PersistentEventType eventType) to SourceAndMetadata.And simply called it when saving all the other persistent state, remembering to put each source into a namespace:

foreach (SourceAndMetadata sm in _EntropySources)
{
var state = sm.GetPersistentStateOrNull(eventType);
if (state != null) {
persistentState.SetNamespace(sm.Name, state);
}
}

And with that, all IEntropySources are wired up and ready to persist some state!

Tracking Targets

PingStatsSource sends ICMP echo requests (aka, pings) to a hard coded list of servers.There is a way to configure a custom list of servers, but I personally don’t use it.

Unfortunately, even though I chose DNS servers as the list, which shouldn’t change very often, they do change occasionally.And so, I need to update the list every now and then.

Well, now we have persistent state, we can keep the list of servers there.If a server disappears off the internet, we simply remove it from the list and move on with life.The internal list becomes an initial seed, rather than the canonical list for all time.

OK, so how does this work?

First thing, we need to wire up IPersistentStateSource to load and save.I won’t show the code, because its very similar to last time where we create an array-like structure.This is what ends up being saved:

PingStatsSourceTargetCountUtf8Text1024
PingStatsSourceTarget.1Utf8Text1.0.0.1
PingStatsSourceTarget.2Utf8Text1.1.1.1
PingStatsSourceTarget.3Utf8Text1.241.94.128
PingStatsSourceTarget.4Utf8Text100.11.201.175
PingStatsSourceTarget.5Utf8Text101.201.69.196
PingStatsSourceTarget.6Utf8Text103.1.206.179
PingStatsSourceTarget.7Utf8Text104.132.20.107
...

The core internal state is a list of PingTarget.We’ll add and remove to this over the lifetime of the object, including via IPersistentStateSource.(And, I’ll come back to why the abstract PingTarget rather than simple IPAddress).

private List _Targets = new List();

The first time we get entropy, we check if there are any targets.If not, we load up the internal seed list to get started.

if (_Targets.Count == 0)
// No prior persisted state: load a seed list to get started.
await InitialiseTargetsFromSeedSource();

When we gather entropy, we track which targets fail (timeout, network error, etc). Normally, we ping a target 6 times.Well, if we get 6 failures, we assume the target is offline, and remove it.

List targetsToSample = ...;
// Gather entropy
// ...

var failedTargets = targetsToSample
.Where(x => x.Failures == _PingsPerSample)
.Select(x => x.Target)
.ToList();

// Other logic here...

foreach (var t in failedTargets) {
_Targets.Remove(t);
}

That’s it!Targets which fail all 6 pings will be removed, never to be seen again.

Unfortunately, given enough time, we’ll remove all the targets.Better do something about that!

Discover New Targets

In order to discover new targets, we need to find a valid IP address, and then send some pings to confirm it works.Fortunately, the IPv4 address space is so full of reachable targets, that any random 32 bit number has a good chance of being valid.

private Task DiscoverTargets(int targetCount)
{
var targets = new List(targetCount);
var bytes = new byte[4];

while (targets.Count < targetCount)
{
// IPv4 address space is so full we can pick random bytes and its pretty likely we'll hit something.
_Rng.FillWithRandomBytes(bytes);

if (bytes[0] == 0
|| bytes[0] == 127
|| (bytes[0] >= 224 && bytes[0] <= 239)
|| bytes[0] >= 240
)
// 0.x.x.x is reserved for "this" network
// 127.x.x.x is localhost and won't give useful timings
// 224-239.x.x.x is multicast
// 240-255.x.x.x is reserved for future use (probably never)
continue;

var ip = new IPAddress(bytes);
if (_Targets.Any(x => ip.Equals(x.IPAddress)))
// Let's not add the same target twice!
continue;

targets.Add(new PingTarget(ip));
}

// Discovery runs 3 pings.
// If any one of the pings returns OK, the target will be added.
var targetsToSample = targets.Select(x => new PingAndStopwatch(x, _Timeout)).ToList();

await Task.WhenAll(targetsToSample.Where(x => x.Failures >= 0).Select(x => x.ResetAndRun()).ToArray());
await Task.WhenAll(targetsToSample.Where(x => x.Failures >= 1).Select(x => x.ResetAndRun()).ToArray());
await Task.WhenAll(targetsToSample.Where(x => x.Failures >= 2).Select(x => x.ResetAndRun()).ToArray());

var toAdd = targetsToSample.Where(x => x.Failures < 3).Select(x => x.Target).ToList();
_Targets.AddRange(toAdd);
}

We want to add n new targets (where n is 8 by default).So we generate n random IP addresses, excluding several ranges which we know are invalid up front.We then send up to 3 pings to each target.So long as any one ping succeeds, we add it to the list.

The IPv6 address space is, for all practical purposes, empty.So randomly picking 128 bit numbers isn’t going to work.For now, I’m putting IPv6 discovery in the too-hard-basket and marking it with a great big TODO.

Moving on, let’s add a few properties to the configuration so users have the option to turn off discovery (if they want) and to control the desired number of targets (TargetsPerSample was already there):

public class Configuration
{
/// 
/// Number of targets to ping from the list each sample.
/// Default: 8. Minimum: 1. Maximum: 100.
/// 
public int TargetsPerSample { get; set; } = 8;

/// 
/// Automatically discover new targets to ping by randomly scanning the Internet.
/// Default: true.
/// 
public bool DiscoverTargets { get; set; } = true;

/// 
/// Count of targets to accumulate when discovering.
/// Default: 1024. Minimum: 1. Maximum: 65536.
/// Each target will be recorded in persistent state.
/// 
public int DesiredTargetCount { get; set; } = 1024;
}

Finally, we wire this new method up to GetInternalEntropyAsync(), after we gather entropy:

if (_EnableTargetDiscovery && _Targets.Count < _DesiredTargetCount)
{
await DiscoverTargets(_TargetsPerSample);
}

Now we remove targets which don’t work, and automatically discover new targets which do.And anything in _Targets is persisted, so we don’t start from scratch next time around.

TCP Ping

One last feature: TCP ping.

ICMP echo request is the technical name for what we refer to as ping.But routers and firewalls can be configured to ignore pings (which might provide some tiny improvement in security by pretending you aren’t on the Internet).

But there are lots of web servers out there, listening on port 80 and 443.They cannot ignore requests to those ports, because that is what web servers do.Which means there are potentially more targets out there to be discovered.

Instead of an ICMP echo request, we will do the initial 3 way TCP handshake, which establishes a new TCP connection, and then immediately drop said connection.This is roughly equivalent to a regular ICMP ping, just using TCP instead.

The code to achieve this is quite simple:

public async Task<(bool isSuccess, object error)> TcpPing(IPAddress address, int port, TimeSpan timeout)
{
using (var socket = new Socket(IPAddress.AddressFamily, SocketType.Stream, ProtocolType.Tcp))
using (var cancel = new CancellationTokenSource())
{
try
{
cancel.CancelAfter(timeout);
await socket.ConnectAsync(address, port, cancel.Token);
return (true, SocketError.Success);
}
catch (OperationCanceledException)
{
return (false, SocketError.TimedOut);
}
catch (SocketException ex)
{
return (false, ex.SocketErrorCode);
}
}
}

This works wonderfully, but we now have two kinds of PingTarget.There is IcmpTarget and TcpTarget.The ICMP target only needs an IP address to function, but the TCP target needs a port as well.

Actually, there are three kinds!The third one is IpAddressTarget is just an IP address; we don’t know if its ICMP or which TCP port to try.But, we can run the discovery process on this address to convert it into IcmpTargets and TcpTargets.These “naked” IP addresses are the seed list.

Here’s some simple inheritance to model this:

abstract class PingTarget { 
public IPAddress IPAddress { get; }
}
sealed class IcmpTarget : PingTarget {
}
sealed class TcpTarget : PingTarget {
public int Port { get; }
}
sealed class IpAddressTarget : PingTarget {
}

How will we serialise and parse these?

The easiest is IpAddressTarget, it’s interchangeable with a regular IpAddress:

1.1.1.1
2606:4700:4700::1111

Now we need a way to represent either a port number, or an ICMP ping.There’s a standard way to encode a TCP Endpoint, and we’ll use something similar for ICMP:

1.1.1.1:80
1.1.1.1:ICMP
[2606:4700:4700::1111]:80
[2606:4700:4700::1111]:ICMP

These can all be unambiguously parsed and serialised.IPv4 is easy to split on the :, and IPv6 a bit more complex because we need to find matching square brackets.For simplicity, I use .ToString() for serialisation (that’s not always a good idea, but good enough in my case).And the parser a static method following the Try...() pattern common in C#, the out parameter will be one of IcmpTarget, TcpTarget or IpAddressTarget.

class PingTarget {
  public static bool TryParse(string s, out PingTarget result) {
...
  }
}

We also need a Ping() method.I’m not usually a fan of inheritance, but in this case its very effective because each type can implement ICMP or TCP ping, as required:

class PingTarget {
public abstract Task<(bool isSuccess, object error)> Ping(TimeSpan timeout);
}

The final piece of the puzzle is how to convert from the seed list containing IpAddressTarget into IcmpTarget and TcpTarget?Well, as part of the main GetInternalEntropyAsync() method, we pick a few IpAddressTargets, and run discovery on them:

var forDiscovery = _Targets.OfType().Take(_TargetsPerSample).ToList();
if (forDiscovery.Any())
{
await DiscoverTargets(forDiscovery);
}

That’s All For Now!

We now have persistent state wired up to any entropy source which needs it.And made meaningful improvements to PingStatsSource so it requires less of my attention.It automatically removes servers when they go offline, and discovers new ones.

You can see the actual Terninger code in GitHub. And the main NuGet package.

After a long time developing Terninger on and off, I’m going to stop posting about it.Because the core functionality is all done!

Building a CPRNG called Terninger - Part 16 Worker Loop State

2024-01-17T13:00:00.000Z

Wiring up the worker loop to load and save state.

Background

There’s been a short (well, long) delay getting this post up. Sorry about that. 🙁

You can read other Terninger posts which outline my progress building a the Fortuna CPRNG, or see the source code.

So far, I’ve put Terninger into production in makemeapassword.ligos.net.

And we’re up to the third and final part of persistent state:

Saving and loading the state.
Getting and setting the state from components which make up Terninger.
Integrating points 1 and 2 with the main entropy gathering worker loop (this post).

Goal

We have the interfaces and implementations to load and save state from disk, and get and set that state on in-memory objects.

We now need to wire up the main PooledEntropyCprngGenerator loop to load state on start up, and save when required.

Details

Here are the requirements in detail:

Load previous state from disk (if it exists) on start up.
Set internal object state based on loaded data.
Periodically read internal object state, and save to disk.
When stopping, save final object state.

Main Worker Loop

Terninger has a main worker loop.It is a single thread which keeps running for the lifetime of a Terninger PooledEntropyCprngGenerator to gather entropy.Basically, a giant while() loop.

Well, now it needs a beginning and an end.To load and save state at the start and finish.

It will look roughly like:

LoadState();
WorkerLoop();
SaveState();

Let’s look at the load, save and loop changes in a bit more detail.

Load / Set

Before the worker loop starts, we load persistent state from disk, and initialise related objects.

At the top level, there isn’t much exciting going on.The only notable thing is .GetAwaiter().GetResult(), because we are running in a top level thread and can’t do await.

var persistentState = TryLoadPersistentState().GetAwaiter().GetResult();
InitialiseInternalObjectsFromPersistentState(persistentState);

TryLoadPersistentState() does the actual load, wrapped in an exception handler in case of errors.Failures can be safely ignored and the generator will act as if it was a brand new instance.

private async Task TryLoadPersistentState()
{
if (_PersistentStateReader == null)
return null;

try
{
return await _PersistentStateReader.ReadAsync();
}
catch (Exception ex)
{
Logger.WarnException("Unable to load persistent state from. Brand new terninger instance will be initialised.", ex);
return null;
}
}

Once we have loaded our collection of items, we need to set related objects.

private void InitialiseInternalObjectsFromPersistentState(PersistentItemCollection persistentState)
{
if (persistentState == null)
return;

((IPersistentStateSource)this).Initialise(persistentState.Get(nameof(PooledEntropyCprngGenerator)));

var prngAsPeristentStateSource = _Prng as IPersistentStateSource;
if (prngAsPeristentStateSource != null)
{
  prngAsPeristentStateSource.Initialise(persistentState.Get(nameof(PooledEntropyCprngGenerator) + ".PRNG"));
}

((IPersistentStateSource)_Accumulator).Initialise(persistentState.Get(nameof(PooledEntropyCprngGenerator) + ".Accumulator"));

// Remove each namespace from collection so entropy sources cannot observe internal state.
persistentState.RemoveNamespace(nameof(PooledEntropyCprngGenerator));
persistentState.RemoveNamespace(nameof(PooledEntropyCprngGenerator) + ".PRNG");
persistentState.RemoveNamespace(nameof(PooledEntropyCprngGenerator) + ".Accumulator");

_Accumulator.ResetPoolZero();
}

Setting simply involves casting each object to IPersistentStateSource and calling Initialise().With a little namespacing going on to keep nested objects separate.

Although I am not initialising any IEntropySource objects yet, I have removed any internal state relating to PooledEntropyCprngGenerator, so that any future IEntropySource objects can’t peak at potentially sensitive data.Safety first!

The call to ResetPoolZero() is important.We’ll return to it shortly.

Get / Save

Moving onto the getting / saving process which runs when Terninger is stopping.There’s just one method at the top level:

GatherAndWritePeristentStateIfRequired(PersistentEventType.Stopping).GetAwaiter().GetResult();

Yeah, gotta look inside that method to see what it does:

private async Task GatherAndWritePeristentStateIfRequired(PersistentEventType eventType)
{
if (_PersistentStateWriter == null)
return;
if (!ShouldWritePersistentState(eventType))
return;

var persistentState = new PersistentItemCollection();

// Always accumulate internal objects last, so anyone trying to impersonate global namespaces gets overwritten.

persistentState.SetNamespace(nameof(PooledEntropyCprngGenerator), ((IPersistentStateSource)this).GetCurrentState(eventType));

var prngAsPeristentStateSource = _Prng as IPersistentStateSource;
if (prngAsPeristentStateSource != null)
{
persistentState.SetNamespace(nameof(PooledEntropyCprngGenerator) + ".PRNG", prngAsPeristentStateSource.GetCurrentState(eventType));
}

persistentState.SetNamespace(nameof(PooledEntropyCprngGenerator) + ".Accumulator", ((IPersistentStateSource)_Accumulator).GetCurrentState(eventType));

// Save.
try
{
await _PersistentStateWriter.WriteAsync(persistentState);
_LastPersistentStateWriteUtc = DateTime.UtcNow;
}
catch (Exception ex)
{
Logger.WarnException("Unable to write persistent state to.", ex);
}
}

This combines gathering all the state, and the actual save, into a single method.

We’ll come back to ShouldWritePersistentState() later.But its safe to assume when we call with PersistentEventType.Stopping, it returns true.

We gather all the state by creating an empty PersistentItemCollection as an accumulator, then casting related objects to IPersistentStateSource and calling GetCurrentState().Again, there’s a bit of namespacing going on to keep separate data separate.

And, there’s a reminder to myself that the internal state is accumulated last, so IEntropySources can’t do anything naughty.

Finally, there’s a similar WriteAsync() call wrapped in an exception handler.

Worker Loop

Terninger instances can last for a long time (makemeapassword.ligos.net runs for a month before a reboot; and it could run for much longer if needed). And there’s no guarantee it will be stopped cleanly (maybe the server crashes, or maybe the programmer simply doesn’t Dispose() the object).So, every now and then, the worker loop will save state.

Here are the relevant lines of WorkerLoop():

this.PollSources(syncSources, asyncSources).GetAwaiter().GetResult();
bool didReseed = MaybeReseedGenerator();

var writeEvent = didReseed ? PersistentEventType.Reseed : PersistentEventType.Periodic;
GatherAndWritePeristentStateIfRequired(writeEvent).GetAwaiter().GetResult();

We reuse GatherAndWritePeristentStateIfRequired(), but pass a different PersistentEventType depending on if we reseed the generator or not.

Time to return to ShouldWritePersistentState()!It returns true when we Reseed, because that’s when entropy pools will be updated.But only returns true for Periodic if a certain duration has elapsed since we last saved (5 minutes by default).

So, we save state whenever the generator reseeds (which may take a while if the generator isn’t being used), or every 5 minutes.

Security: Entropy Pool

OK, time to deal, once and for all, with the remaining security problem that external state raises:

An attacker can read the persistent state on disk and be able to predict future random numbers: we’ve already addressed that in a previous post.
An attacker can write to persistent state when the generator is stopped and poison the generator such that they can predict future random numbers.

That second problem is a tricky one to solve.While it’s very desirable to retain the state of entropy pools (it’s a core function of Fortuna), it also allows an attacker a way to influence the internal state of the generator.

And that could completely subvert the generator.

And there’s no way for Terninger to know the persistent state is genuine or malicious.Even if you encrypt, or sign, or hash the persistent state file, the Terninger code needs to be able to verify that crypto without human intervention.That means an attacker could, read crypto keys out of terninger code, and write a poisoned file.Or, if the end user could configure those keys in the config file, then the attacker would just read the keys from the config file!Or, the attacker could use Terninger’s own code to sign their own malicious file.

Cryptography does not solve this problem.

However, there are two, relatively simple approaches to mitigate this problem.

We haven’t looking into the implementation of IPersistentStateSource.Initialise() for all objects.In particular, EntropyPool adds some additional entropy when loading:

void IPersistentStateSource.Initialise(IDictionary<string, NamespacedPersistentItem> state)
{
// As we are reading external (and potentially untrusted) persisted state, we include some additional entropy.
if (state.TryGetValue("EntropyHash", out var entropyHashValue))
{
AccumulateBlock(entropyHashValue.Value, entropyHashValue.Value.Length);
var extraEntropy = PortableEntropy.Get32();
AccumulateBlock(extraEntropy, extraEntropy.Length);
}
}

We don’t just accumulate the saved hash, we also add additional entropy.

Now, PortableEntropy.Get32() isn’t a very good source of entropy.It will only add 16-32 bits of real entropy.That’s better than nothing, but not good enough on its own.

Saved Entropy is Just One Source of Many

After we load all external state, there’s a call to EntropyAccumulator.ResetPoolZero().That puts the generator into high priority mode to gather enough entropy for a reseed.That usually completes within two seconds.With the default settings, that accumulates 384 bits of new entropy in pool zero.And should also gather a similar amount of entropy for all other pools.(384 bits is the best case, an attacker might reasonably predict some of those bits, but its very difficult to predict all of them).

So, we initialise each pool based on prior state.But then add more entropy on top of that prior state.

Essentially, the external state becomes just one source of entropy out of many.

Now an attacker also needs to deal with at least 128 bits of entropy (probably much more) on top of whatever is loaded from disk.

And, within a few minutes, it’s likely there will be another reseed.So the window of opportunity for an attacker to make use of a poisoned file is quite low.

Next Up

We now have persistent state implemented! This is the last major piece of functionality in Terninger to fully implement Fortuna!And, because I’ve been very slack with my blog, it’s been in production for ~18 months!

You can see the actual Terninger code in GitHub. And the main NuGet package.

Next up: a major improvement to PingStatsSource which uses persisted state.

Building a CPRNG called Terninger - Part 15 Object State

2024-01-15T13:00:00.000Z

Getting and setting state on objects.

Background

There’s been a short (well, long) delay getting this post up. Sorry about that. 🙁

You can read other Terninger posts which outline my progress building a the Fortuna CPRNG, or see the source code.

So far, I’ve put Terninger into production in makemeapassword.ligos.net.

And we’re up to the second part of persistent state:

Saving and loading the state.
Getting and setting the state from components which make up Terninger (this post).
Integrating points 1 and 2 with the main entropy gathering worker loop.

Goal

Now we are able to load and save state to a file on disk, we need a way to gather that state from in-memory objects (before we save), and to set state on in-memory objects (after we load).

Essentially, we need a way to get and set properties / fields on objects based on the file.

Details

Here are the requirements in detail:

Define an interface to represent getting and setting persistent state from an object.
Implementing that interface on an object should be enough to enable usage of persistent state.
Objects should have no knowledge of how their state is saved / loaded; they just parse or return bundles of state objects.
Getting and setting should be implemented for core PooledEntropyCprngGenerator and related classes, such that entropy state is persisted across server restarts.

Interface

Unlike loading and saving, getting and setting will always be implemented on the same objects, so it makes sense to only have one interface.The simplest thing that could possibly work is:

public interface IPersistentStateSource {
  void Initialise(IDictionary<string, NamespacedPersistentItem> state);

  IEnumerable GetCurrentState();
}

The simplest thing for a getter and setter is… well… a getter and a setter!

Initialise() will only be called once, when the implemented object is loading, after reading from an IPersistentStateReader implementation.While GetCurrentState() will be called regularly over the lifetime of the generator, as state will change over time.The result of GetCurrentState() can be passed to an implementation of IPersistentStateWriter to save.

Use Cases

A short digression is in order at this point.

There are two use cases I have in mind for persistent state:

The core state in a PooledEntropyCprngGenerator, such that we accumulate entropy across machine restarts.
State which is useful in other objects, such as entropy sources (IEntropySource).

Because the state in a PooledEntropyCprngGenerator is always changing as entropy accumulates, the simple interface will work fine.

But entropy sources may not update their persistent state very often.Many sources execute on a period measured in minutes or hours, so nothing may have changed since the last save.

HasUpdates

To support slightly more efficient operation with entropy sources, I add a HasUpdates property.

public interface IPersistentStateSource {
  ...
  bool HasUpdates { get; }
}

This allows any objects which supports persistent state to communicate if calling GetCurrentState() will returns something different since last time.And means that entropy sources which rarely change state can stay dormant for longer.

PersistentEventType

Finally, there is a context enum passed to GetCurrentState():

public enum PersistentEventType
{
    // A reseed has just occurred.
    Reseed = 1,

    // A regular periodic interval for writing state.
    Periodic = 2,

    // The pooled generator is stopping. This is the last opportunity to write persistent state.
    Stopping = 3,
}

This gives the object some context about what is happening, and hints at what state it should return.In particular, Stopping is the last opportunity to save state before the generator stops - so you should return something in at least that case!

This enum is similar to EntropyPriority, which allows an entropy source to decide how aggressively it returns entropy, depending on the needs of the generator.

However, after completing all the persistent state functionality, I can’t think why GetCurrentState() would not return every piece of state every time it is called.HasUpdates is a better way to signal “I have nothing new”.

Oh well, it’s in the API now.

Final IPersistentStateSource

Here is the final interface:

public interface IPersistentStateSource {
  void Initialise(IDictionary<string, NamespacedPersistentItem> state);
  bool HasUpdates { get; }
  IEnumerable GetCurrentState(PersistentEventType eventType);
}

Implementing

Just defining an interface is the easy part. We also need to implement it on required classes!

Simple Implementation: PooledEntropyCprngGenerator

The main PooledEntropyCprngGenerator has two persistent fields.There are a number of other fields, but they are contained in other classes.

A UniqueId which is a Guid.
A BytesRequested counter, which is an Int128.

The implementation goes like so:

void Initialise(IDictionary<string, NamespacedPersistentItem> state)
{
  if (state.TryGetValue(nameof(UniqueId), out var uniqueIdValue)
      && uniqueIdValue.Value.Length == 16)
  {
    UniqueId = new Guid(uniqueIdValue.Value);
  }

  if (state.TryGetValue(nameof(BytesRequested), out var bytesRequestedValue)
      && Int128.TryParse(bytesRequestedValue.ValueAsUtf8Text, out var bytesRequested))
  {
    BytesRequested = bytesRequested;
  }
}

bool HasUpdates => true;

IEnumerable GetCurrentState(PersistentEventType eventType)
{
    yield return NamespacedPersistentItem.CreateBinary(nameof(UniqueId), UniqueId.ToByteArray());
    yield return NamespacedPersistentItem.CreateText(nameof(BytesRequested), BytesRequested.ToString("d", formatProvider: CultureInfo.InvariantCulture));
}

Initialising is a repetitive parsing process. Try read a field from the state dictionary, try parse the value, and if everything succeeds, set the appropriate property. Other classes have more state, but similar repetitive defensive code.If anything cannot be parsed correctly, it is simply ignored.

HasUpdates always returns true. That’s a bit a lie, but who cares about efficiency with just two fields.

GetCurrentState simply returns a collection of each field, either as a byte[] or string.

Nested State: EntropyAccumulator / EntropyPool

The PooledEntropyCprngGenerator has an EntropyAccumulator member, which is the really important part of the generator.However, the accumulator is made up of many EntropyPool objects.This is a nested array of complex objects, so how do we serialise it?

We do a bunch of copying and create some “array like” keys.Which is a bit hacky, but its the only place we need to deal with nested arrays.

We save values to define the pool counts and then array like keys for nested data:

Pooled...Generator.Accumulator  LinearPoolCount  Utf8Text  20
Pooled...Generator.Accumulator  RandomPoolCount  Utf8Text  12

Pooled...Generator.Accumulator  LinearPool.0.TotalEntropyBytes            Utf8Text  3674
Pooled...Generator.Accumulator  LinearPool.0.EntropyBytesSinceLastDigest  Utf8Text  16
Pooled...Generator.Accumulator  LinearPool.0.EntropyHash                  Hex       5DC1F9...
Pooled...Generator.Accumulator  LinearPool.1.TotalEntropyBytes            Utf8Text  4179
Pooled...Generator.Accumulator  LinearPool.1.EntropyBytesSinceLastDigest  Utf8Text  4
Pooled...Generator.Accumulator  LinearPool.1.EntropyHash                  Hex       C5D344...
...

Pooled...Generator.Accumulator  RandomPool.0.TotalEntropyBytes            Utf8Text  3290
Pooled...Generator.Accumulator  RandomPool.0.EntropyBytesSinceLastDigest  Utf8Text 
Pooled...Generator.Accumulator  RandomPool.0.EntropyHash                  Hex       F82DF0...
Pooled...Generator.Accumulator  RandomPool.1.TotalEntropyBytes            Utf8Text  3824
Pooled...Generator.Accumulator  RandomPool.1.EntropyBytesSinceLastDigest  Utf8Text  280
Pooled...Generator.Accumulator  RandomPool.1.EntropyHash                  Hex       194BA2...
...

Each EntropyPool returns its 3 fields directly:

IEnumerable GetCurrentState(PersistentEventType eventType)
{
  yield return NamespacedPersistentItem.CreateText("TotalEntropyBytes", TotalEntropyBytes.ToString("d", CultureInfo.InvariantCulture));
  yield return NamespacedPersistentItem.CreateText("EntropyBytesSinceLastDigest", EntropyBytesSinceLastDigest.ToString("d", CultureInfo.InvariantCulture));
  yield return NamespacedPersistentItem.CreateBinary("EntropyHash", GetCurrentDigest());
}

And the EntropyAccumulator copies that data into array like keys:

IEnumerable GetCurrentState(PersistentEventType eventType)
{
  yield return NamespacedPersistentItem.CreateText("LinearPoolCount", _LinearPools.Length.ToString(CultureInfo.InvariantCulture));
  yield return NamespacedPersistentItem.CreateText("RandomPoolCount", _RandomPools.Length.ToString(CultureInfo.InvariantCulture));

  for (int i = 0; i < _LinearPools.Length; i++)
  {
      var pool = _LinearPools[i];
      var stateSource = (IPersistentStateSource)pool;
      foreach (var item in stateSource.GetCurrentState(eventType))
      {
          yield return $"LinearPool.{i}.{item.Key}");
      }
  }
  // And again for _RandomPools.
}

It’s a bit of work, but effective.If there were more nested arrays or objects, I’d consider a more robust approach.

Security: Entropy Pool

OK, time to deal with the security problems that external state raises:

An attacker can read the persistent state on disk and be able to predict future random numbers.
An attacker can write to persistent state when the generator is stopped and poison the generator such that they can predict future random numbers.

In simpler terms: reading or writing persistent state is working with untrusted and potentially tainted data.

The first problem is dealt with in EntropyPool.GetCurrentState(): we don’t save the current pool (hash) state, instead we save a hash of the hash.This is fine when we re-read the hash into the pool, because the hash of the hash is just as random as the original hash.But it hides the internal state of the generator because a hash function cannot be (easily) reversed.

IEnumerable GetCurrentState(PersistentEventType eventType)
{
    var digest = GetCurrentDigest();
    AccumulateBlock(digest, digest.Length);
    var digestToPersist = GetCurrentDigest();
    AccumulateBlock(digest, digest.Length);
    yield return NamespacedPersistentItem.CreateBinary("EntropyHash", digestToPersist);
}

I’ll address the second problem in the next post.

But I will note one thing which won’t work: file security bits / ACLs.While using some kind of file system based security might mitigate the problem, it can’t be relied upon - perhaps the state is being stored somewhere with no security.Or, perhaps the attacker has the same (or higher) security context than Terninger, so they can write to the file anyway.

Future Work

Potential points for improvement:

Better support for arrays and other complex object graphs. I’ll come back to that if there’s a compelling need.
Some kind of auto-serialisation - similar to how most JSON serialisers work.
And we’ll see how things pan out when I implement IPersistentStateSource for an IEntropySource. I have a particular use case in mind.

Next Up

We now have a way to get and set persistent state on relevent objects.

You can see the actual Terninger code in GitHub. And the main NuGet package.

Next up: we’ll wire up the various IPersistentStateSources and the IPersistentStateReader / IPersistentStateWriter in PooledEntropyCprngGenerator to load state as part of initialisation, and periodically save it.

Building a CPRNG called Terninger - Part 14 Persistent State File

2022-10-27T13:00:00.000Z

Saving state to a file.

Background

You can read other Terninger posts which outline my progress building a the Fortuna CPRNG, or see the source code.

So far, I’ve put Terninger into production in makemeapassword.ligos.net.

Goal

It’s been a long while since my last Terninger post, but it’s been working well enough and my time has been spent in other places.

There is one major feature of Fortuna which I never implemented: persistent state.

That is, the ability for PooledEntropyCprngGenerator to save its internal state to disk.This state would include digests of all pools which have gathered entropy (plus various other information).

Without this feature, every time I restart makemeapassword.ligos.net, Terninger needs to start reading entropy from scratch.As I reboot my servers each week to avoid memory leaks and other random badness, that means Terninger can only accumulate entropy for 7 days before it has to start again.

With this feature, the accumulated entropy should increase forever (as long as the file on disk remains).And I’m serious about the forever part - each pool accumulates using SHA512, and 2^512 is a really big number.

Sub Goals

The reason persistent state took so long to implement (other than me getting distracted with other projects), is it has a number of moving parts.Rather than one giant post, I’ll split this up into smaller ones:

Saving and loading the state (this post).
Getting and setting the state from components which make up Terninger.
Integrating points 1 and 2 with the main entropy gathering worker loop.

Details

Drilling into point 1 in a bit more detail, here’s what I want to achieve:

The C# interfaces to load and save.
An out of the box solution for persistent state - the simplest solution is a text file on disk.
The out of the box solution should have no external dependencies - as I’m targeting netstandard 1.3, that rules out JSON and XML.
A way to extend Terninger to load and save to other locations - if someone wants to save an encrypted file or to a database then they should be able to.
The C# data structures required to represent discrete pieces of data in memory: key + value pairs work very nicely.
A way to keep data from different objects separate - that is, some kind of namespace or nesting.
Values must have strong support for binary data - because the primary use case is storing SHA512 digests.

File Format

Data always survives longer than code.So I think long and hard about the on-disk and in-memory format of any kind of persistent state.

Once I’ve worked out what the data looks like, other code and interfaces become relatively obvious.

Creating a namespaced key value pair is easy enough:

public readonly struct NamespacedPersistentItem {
  public readonly string Namespace;
  public readonly string Key;
  public readonly byte[] Value;
}

Saving that to a text file is easy: pick a delimiter (tab works well), base64 encode the Value, and store each item on a separate line. Eg:

ANamespace  AKey            WusWRBaOzm7zX3KQzdNhVpS+6aJHvpCXO8P1yJq3Zi0=
Terninger   UniqueId        V3VzV1JCYU96bTd6WDNLUXpkTmhWcFMrNmFKSHZwQ1hPOFAxeUpxM1ppMD0
Terninger   BytesRequested  VmpOV2VsW
Terninger   InternalState   VjNWelYxSkNZVTk2YlRkNldETkxVWHBrVG1oV2NGTXJObUZLU0had1ExaFBPRkF4ZVVweE0xcHBNRDA

However, I found pretty quickly that it wasn’t just binary data that needed to be stored.There were plenty of numbers (some Int64s and also Int128s), guids and strings which don’t need to be base64 encoded at all (so long as they don’t contain the delimiter).Base64 encoding everything makes the file really hard for a human to read.

If I can’t understand the content of the persistent state file, I’m probably going to get it wrong.So I added a way to encode the binary value in different ways:

public readonly struct NamespacedPersistentItem {
  public readonly string Namespace;
  public readonly string Key;
  public readonly ValueEncoding Key;
  public readonly byte[] Value;
}

public enum ValueEncoding {
  Base64,
  Hex,
  Utf8Text,
}

This allows easier to understand string encodings of binary values, particularly for strings or numbers.For example, all these encode the value 42, and the last one is easiest for a human to read:

Terninger   BytesRequestedAsBinary  Base64    KgAAAA==
Terninger   BytesRequestedAsHex     Hex       2A000000
Terninger   BytesRequestedAsUtf8    Utf8Text  42

Note the ValueEncoding doesn’t affect the content of Value in memory.It’s more of a recommendation of how to save that Value in a way humans can read it (relatively) easily.

The last part of any file format is a header, because storing a big tab separated file with no context or metadata is likely to cause problems in future.The Terninger file header is a single, tab delimited line with the following fields:

A magic number - the constant UTF8 text TngrData. Which also happens to fit in a UInt64.
The file version number. We’re starting with version 1!
An SHA256 checksum of the contents of the file (excluding the header line). If the file is damaged, this will prevent us loading corrupt data. Note this doesn’t stop malicious actors seeding a poisoned file.
The number of lines / records in the file. This isn’t required to parse the file, but helpful anyway.

An example header:

TngrData  1  UDpxL5ZiKhda8ok3/asKFbmdaihfvAzJmVhxzBP/SaI=  3

This represents a simple to read and write data format capable of storing all the state Terninger requires. It also is extendable (via namespaces) to be used by IEntropySource implementations, if they need to store persistent state.

Here’s an example file from a unit test:

TngrData  1  DBvlW8Nt/XTVKr/aMGWZd8N6KQ9nb8d+BNBWbfzSs8A=  6
Namespace      Key      Utf8Text  Data
Namespace      Key2     Utf8Text  Otherdata
Namespace      Integer  Hex       2A000000
Global         Thing    Base64    AAECAwQFBgcJCgsMDQ4P
Global         Key      Utf8Text  Data
SomeNamespace  aKey     Utf8Text  value

API

When in memory, the persistent state is represented as a PersistentItemCollection.It allows getting, setting and removing single items or whole namespaces of items.Internally, it is a dictionary of namespace > items, and within each namespace a dictionary of key > value.When getting a whole namespace, it will return an IDictionary, which is the structure used by consumers of the collection.

public class PersistentItemCollection {
  public IDictionary<string, NamespacedPersistentItem> Get(string itemNamespace);

  public void SetNamespaceItems(string itemNamespace, IDictionary<string, NamespacedPersistentItem> items);
  public void SetNamespace(string itemNamespace, IEnumerable items);
  public void SetItem(NamespacedPersistentItem item);
}

There are two interfaces to read and write the in-memory data:

public interface IPersistentStateReader {
  Task ReadAsync();
}
public interface IPersistentStateWriter {
  Task WriteAsync(PersistentItemCollection items);
}

I don’t think it gets simpler.We have a collection of NamespacedPersistentItems in a PersistentItemCollection, and can read the content of a whole file into memory, and then write an entire collection to file.Might not be the most efficient algorithm, but we aren’t going to be reading / writing very often, nor will be writing MBs of data.

There are two implementations of these interfaces:

TextStreamReader and TextStreamWriter, which are able to read / write the Terninger file format to a Stream.
And TextFileReaderWriter, which uses the stream reader / writer implementations and writes to a file on disk.

Extending the API

As we have simple interfaces, anyone can implement a reader / writer that works differently.For example, you may want to store persistent state in a database, or a web service, or in an encrypted file, etc.In all cases, the implementation is relatively easy, and you can then pass your reader & writer to any Terninger instance.

If you happen to be reading / writing a Stream and are happy with the delimited format, then you can use TextStremReader and TextStreamWriter to look after that part.

Namespaces and Data Isolation

The primary reason data is stored in namespaces is to isolate different parts of Terninger from each other.The EntropyAccumulator is a security sensitive area of Terninger, because if you can observe the pool of entropy, it is possible you can predict future random numbers - which kinda breaks everything!And if you can write a MaliciousEntropySource which spies on other persistent state, that’s bad.

So any one component of Terninger can only see data for its namespace, and not other components.The main PooledEntropyCprngGenerator class will ensure a component can only see its own key-value-pair list of data.This isolation mitigates the security risk.It also makes it easier to implement persistence within each component, as it only needs to worry about its own data.

Security

Persistent state represents a huge security risk.

If you can read the persistent state on disk then you may be able to predict future random numbers.
If you can write to persistent state you can poison the generator and influence future random numbers.

For now, I’m just going to acknowledge the risk.I’ll discuss mitigation in a future post.

Why not?

There are some alternative implementations I didn’t go with.

Why not arbitrary nesting instead of namespaces?

Because arbitrary nesting is harder than a fixed two level hierarchy.And, even after implementing everything, I’ve only found one use case where nesting would have been helpful, and there was a simple (if tedious) work around.

Why isn’t the file encrypted, or signed, or somehow protected from the Bad Guys™?

Because it won’t help.

Terninger itself needs to read the file, and if the file is encrypted then Terninger needs to know the key.If you have a hard coded key baked into Terninger, any malicious attacker can reverse engineer Terninger to find the key (or just find the key on Github).

Perhaps you could store the key somewhere else, and that keeps the key out of the hands of our malicious attacker.That might help, but the attacker could still find the key, and then game over.Also, that’s something else for the user of Terninger to manage - a persistent state file & a separate key.

Maybe you encrypt the key, which encrypts the persistent state.Oh dear! We’re now in infinite recursion!

Getting that kind of encryption right (and actually ensuring it provides meaningful benefits) is really hard.And there are other mitigations I will describe in future posts.

Anyone can implement their own IPersistentState[Reader|Writer] if they really want this feature.

Future Work

After writing the above interfaces and code, I found that separating the reader and writer as separate interfaces makes Terninger slightly difficult to configure.Because you have to pass the same instance twice:

var readerWriter = new TextFileReaderWriter("/some/path/terninger.txt");
var terninger = PooledEntropyCprngGenerator.Create(
  ...
  , persistentStateReader: readerWriter
  , persistentStateWriter: readerWriter
);

I’m not sure if I’ll ever bother to change this, but it was a bit annoying that I couldn’t do this:

var terninger = PooledEntropyCprngGenerator.Create(
  ...
  , persistentStateReaderWriter: new TextFileReaderWriter("/some/path/terninger.txt");
);

Conclusion

We have the data structure to keep persistent state on disk.And the code required to load and save it.And the API meets my non-functional requirements.

You can see the actual Terninger code in GitHub. And the main NuGet package.

Next up

Define an interface to get and set state from components. That is, how we add / remove from the PersistentItemCollection.

The Reliability of Optical Disks

2022-04-01T13:00:00.000Z

How long will burned CDs / DVDs / BluRays last?

Background

I just finished an extended series on Long Term Backups and Archives.A major shift in my personal and professional backup strategy is optical media, in particular, BluRay disks.

In 2009, I said goodbye to my DVD based backup strategy, because it was taking multiple disks to do a weekly snapshot of my documents and data.And photos were already overflowing many DVDs.

In 2017 my backup strategy involved:

Cloud based backups using ~~CrashPlan~~ BackBlaze.
Windows File History backups to a NAS.
Copy+Paste archives to external (offline) disks.

Five years later, in 2022, my strategy has changed to:

Cloud based backups using OneDrive & BackBlaze.
Windows File History to my TrueNAS for particular content that doesn’t live on OneDrive.
Archives to triplicate BluRay disks, indexed via WinCatalog, two copies stored offsite.

Goal

Discuss the reasons why I’ve moved back to optical disks as a key part of my backup strategy.With particular attention on the reliability of optical disks - BluRay disks and M-Disks.

History

My oldest backups are from the end of the year 2000, and are now on DVD+Rs.Originally, they were burned to CD-Rs, but I migrated all my CDs to DVDs at some point and discarded the original CDs.The oldest CD I can find is from 2003, containing a snapshot of all my documents at that time.Its hard to tell the oldest DVDs, but I think 2004 or 2005 was when I moved from CDs to DVDs for backups.

Finally, the DVD era came to an end in 2009, giving way to HDDs and the cloud.

What I didn’t realise at the time, was that all these CDs and DVDs would become a grand experiment of reliability and longevity.When I read data from these backup disks in 2021, I had a 100% success rate!

That’s a perfect record after being stored for 12-18 years, in semi-controlled conditions (darkness, but no temperature or humidity control) and zero maintenance!

Discussion

This high reliability is what drove me back to optical disks in 2021 - BluRay disks in particular.

The 45+ year storage requirement for church compliance data made me re-think how my backups would survive in the long term.I wasn’t comfortable with HDDs surviving that long, nor anything stored in the cloud.Tapes are cheap, but their drives are expensive and my only experience with them is from 2002.It was only when I tested these CDs and DVDs that I realised optical was a contender!And further research showed that BluRay disks were readily available at a reasonable price.

So why chose optical disks over the alternatives (tapes, HDDs, cloud)?

As I’ve mentioned, having hard data of their long term reliability was a big factor.Getting any kind of real world reliability data of storage mediums is really, really hard.Backblaze releases HDD stats, which is the only public information I’m aware of.Otherwise, you have to trust the manufacturer’s “mean time between failure” figure.And, when the time scales you’re looking at is 45+ years, there is no real world data because no consumer digital storage technology has been invented for that long (tape have been around longer, but its not aimed at consumers).

So, having a few hundred optical disks of 10+ year age that I could test is a huge plus.Real world data always trumps theory.

Optical disks are write once.When it comes to storing compliance data, or long term backups that’s a big plus.Because the only way data can be tampered with is by replacing an entire disk (not impossible, but tricky).While most storage mediums have some kind of “write protect” switch, write once optical media physically cannot be written to multiple times.

Optical disks have less moving parts than HDDs, and thus less that can break.A HDD contains the physical media (disk platters), electronics to read said media, and software to make it all work.If any one of those parts fails, it can be difficult, expensive or impossible to recover data.Optical disks are just the physical media - the electronics and software are in a separate package (the reader).If your reader breaks, you buy a new one for $200 and move on.And if you media fails, well, you’re no worse off than with HDDs.

HDDs, especially NAS disks, have an “always on” assumption - the disks are always online.Indeed, the Backblaze data is all about disks that run 24/7.On one hand, that’s great because the NAS can scrub disks to automatically detect and correct errors.On the other hand, that costs electricity.And if you ever wanted to take disks offline and store them on a shelf, you don’t really know how long they’ll survive - unless you plug them in every now and then.

Optical disks, by definition, are “always offline”.Once burned, they must remain stable without any scrubbing, error checking or automation.They will be stored in a jewel case or a spindle, and will rarely (possibly never) be read.And yet, the expectation is, that you will be able to read the disks without problem - even with zero maintenance.Indeed, that was the outcome of my ~15 year experiment!

The write once and offline properties combine for another benefit: optical disks are ransomware proof.As long as your backups are connected to a network and writable, it’s possible they could be encrypted and held to ransom (or simply deleted).That includes NAS servers, and the cloud.But, because optical disks are offline and immutable, no remote hacker or malware can touch it - the only way they could be held to ransom is via physical theft (very possible, but not the current strategy of Internet Bad Guys™).

Finally, BluRay disks improved the failure modes compared to CDs and DVDs.Their physical spec includes improvements such as a hard coating to reduce scratches, non-organic substrates, improved error correction, and improved track addressing.See references below for several white papers on BluRay physical specifications.(And I note these are theoretical improvements, only time will tell if they yield greater longevity).

There are certainly problems with optical disks though:

Their capacity isn’t great compared to HDDs (or tape).In the physical space of two x 4 TB HDDs, you might be able to fit 10 x 12cm optical disks.The highest capacity BluRays are 128GB per disk, which is around 1.3TB in the same space.For my purposes, I’m not generating enough data for this to be a problem.But if you’re dealing with 1080p or 4k video, you’ll be filling many, many spindles of optical disks each year.

Optical disks are slow to read.Their sequential read and random read speeds are 10-100x worse than even the slowest, cheapest HDD.My experience is that as optical disks age they become harder to read, which makes them slower still.So you don’t want to be reading from them frequently, or doing a restore with the clock ticking.Given they’re designed as long term media, this isn’t a big problem - but something to be aware of.

A big risk with optical disks is they are becoming a niche technology.That is, they aren’t as mainstream as they used to be in the 2000s.Most laptops don’t come with optical drives any more, and no one really misses them.Software and content is delivered by streaming rather than disks.So, its entirely possible they will go the way of floppy disks and become obsolete and difficult to purchase.As of 2022, it is possible to buy brand new BluRay readers and media - although I note eBay is your friend if you want to buy a wide variety of media.

Because optical disks are offline media, you really need to index or catalogue their content.That is, without some kind of catalogue you can browse or search, it’s somewhere between hard and impossible to find what you need.And putting 100 disks into a reader, one at a time, and slowly searching each of them really sucks (I tried).In the 2000s, I never bothered with this, but I’ve become more disciplined this time around and am using WinCatalog to catalogue all optical media.

Finally, optical disks are more expensive than HDDs - at least in cost per GB.A 4TB NAS branded HDD costs ~AU$160, which is ~4c/GB.My last BluRay purchase was for 3 x 50 spindles of 25GB disks costing AU$330, which works out to be ~9c/GB.Obviously, you need a computer for that HDD, a reader for the BluRays, and factor in things like electricity and maintenance - a full total-cost-of-ownership comparison is more complex.But in raw capacity, HDDs are cheaper.Note that M-Disc BluRays are around 4x more expensive than regular BluRays, costing ~33c/GB.

Test to Destruction

Given the primary reason to chose optical media over HDDs is long term reliability, I decided I should put them to the test.I tested a DVD, 3 brands of BluRays (BD-R disks), and a BluRay M-Disc to destruction.

There are four things that will destroy any kind of media: light, heat, moisture and time.

I’ve already tried time (at least for ~15 years) and found CDs and DVDs are pretty resilient!So I moved onto light and heat (I didn’t test against moisture).

My light test consisted of placing the disk in an east facing window that would receive ~4 hours of direct sunlight each day.While this isn’t entirely scientific because I wasn’t testing all the disks at the same time (some were tested in Summer and others in Winter), it’s still a place to start.I tested all disks this way.

My heat test is placing disks in a) my car (which is parked such that it has ~4 hours per day of sunlight) which acts as a greenhouse, b) my ceiling cavity (which is not insulated and can reach over 50℃ in Summer), and c) my freezer (which should be around -18℃).I only tested the BluRay M-Discs for heat.

Updated in January 2024: Note that heat tests expose disks to some indirect light; while the cold test is stored in a dark freezer - I suspect this means the freezer disk will end up lasting longer.Only time will tell.

The TL;DR results: keep optical disks out of direct sunlight and you should be good for a long time.

Results for direct sunlight:

Disk	Days Before Failure	Failure Mode
DVD	< 90	Completely unreadable; computer reports no disk when inserted. I didn’t check very diligently, so not sure exactly when it failed.
BluRay (Ritek)	38 days	Some sectors have errors; disk partially readable.
BluRay (Verbatum)	38 days	Some sectors have errors; disk partially readable.
BluRay (Verbatum M-Disc)	260 days	Some sectors have errors; disk readable after multiple attempts.

Note the M-Disc was tested over Winter rather than Summer.

Direct sunlight is definitely something to avoid.Keeping your optical media in darkness is your number one priority in storage.

Comparing longevity of regular vs M-Disc BluRay media, there’s a factor of ~6x difference.The marketing claims of M-Disc is they should last for “at least 100 years”.If we assume regular BluRays will last for the same 15 years as my CDs and DVDs, then an M-Disc should last ~90 years.That’s not quite what the manufacturer claims, but close enough - and confirms M-Disc media lasts longer.

Others have done similar test to destruction for M-disc media which support my results.

Tests for heat / cold started in May 2021, and remain ongoing without failure (the 2021-2022 Summer was nowhere near as hot as previous year, so I suspect this test will continue for another year at least. Temperature statistics for Sydney).

Last updated January 2024:

Location	Has it Failed?	Days Before Failure	Failure Mode
Freezer (cold)	No	977+	N/a
Car (heat)	No	977+	N/a
Ceiling Cavity (heat)	No	977+	N/a

I’ll update this table from time to time, as I check the disks.

The take away is: keep optical disks out of direct sunlight; even better in total darkness.Heat / cold seems to be less critical.

Always Test

I’m promoting optical media, and particularly BluRay M-Disc media, as a zero maintenance solution for long term data storage.However, given enough time, every form of digital media will eventually fail.

As long as we a) have multiple copies of the data, and b) can make new copies faster than failures, all is well.That means we must have some kind of maintenance schedule in place to detect failures and make new copies.

Data stored on a NAS server has a big advantage here: any NAS will automatically check for errors, and notify if problems are found.TrueNAS (via ZFS) will automatically correct errors.

But checking optical media for errors cannot be automated (unless you can afford a robot / jukebox) because the disks are stored separately to your computer.

Because I’m confident optical media, when stored away from direct light, will survive for 10 years, I’m going to check them every 5 years.At least until I get some failures, so have some idea of when failures are likely to happen.

References

Conclusion

Optical media, and BluRay M-Discs in particular, are the most reliable way to do long term, offline data storage.CDs and DVDs have lasted for 10-20 years and can still be read successfully.BluRay media offers disk capacity of 25-128GB, and should have similar longevity.The claims of special M-Disc media lasting 100+ years seems plausible - unfortunately, it will take another 99 years before we can confirm it!

Anyone who wants an offline, ransomware proof, 20+ year backup should consider BluRay optical media.

Long Term Archiving - Part 8 - Organising Data

2022-02-24T13:00:00.000Z

Files you can find in 100 years.

You can read the full series of Long Term Archiving posts which discusses the strategy for personal and church data archival for between 45 and 100 years.

Background

So far, we have considered the problem and overall strategy, and got to my chosen implementation.

The last point I’ll consider is: how do we organise our files and data so we can find stuff in 10, 20, 50 or even 100 years time?

Goal

Develop a structure, guidelines and processes to organise files / data such that specific data can be found in reasonable time.

This structure needs to be self-discoverable, as the original creator of the structure will not be available in 45 years.

This structure can apply to digital files and data, or physical documents. There are advantages to digital data storage, but we’ve considered a number of risks as well. Any structure should work with the physical as well as digital with minimal changes.

Concepts

Before we go any further, we should remind ourselves that backups and archives are infrequently accessed.We should always optimise for the common case, which is adding data to the archive.Retrieving is far less common, so its acceptable if it takes a bit longer.(Caveat: always check with stakeholders how long is acceptable).

With that out of the way, the way we organise data is dictated by how we need to access it.

If the access pattern is “restore everything”, then the structure should reflect how the data appears in our regular systems.Any additional structure just gets in the way.However, “restore everything” is just one possible access pattern.

There are also certain queries we might want to ask our archives.For example: find all the work photos from 2010-2016, or find that funny video of my kids from their first day of school, or find all the documents that refer to Mr Bloggs when he lead youth group.

Each of those queries has an explicit or implicit time dimension (Mr Bloggs only lead youth group from 2018-2023), plus various other parameters (file type, file content, and category).While unlikely, it is possible the time range is “forever”, in which case we just need to trawl everything - that will suck, but there’s not much we can do about it.

Queries usually have some kind of category or context to them.In the above examples, “work photos” or “kids first day at school” or “youth group”.These are the kinds of categories that can be incorporated into file structures to make things easier to find.For example, we might decide to keep all youth group documents together, and all work photos in one place, and keep personal videos separate from work related ones.

There are additional levels of categorisation as well.Perhaps work photos are also categorised by job number or location.Personal videos might be kept by event.And the “youth group” documents are sub-categorised into “permission forms”, “lesson plans”, “attendance” and “general resources”.

The categorisation I’ve mentioned can be augmented by tagging.Most systems (particularly the “physical document” system) can only put a file or document into one category.That is, all your “attendance” documents can only physically exist in one place, the youth group attendance 2020 folder.But you can tag folders, documents or files with additional keywords.Perhaps the youth leaders names are recorded on the front of the folder containing all the attendance documents.Many photo apps support tagging people (often automatically via facial recognition) and geo-location.And you can label a document with important keywords (either manually, or using an automated algorithm).Tags can make it much quicker to find data of interest, without reading the entire document.

File type is usually straight forward - most computer file types are trivially identifiable from the end of their name (eg: jpg or docx or mp4).And if not, the beginning of the file usually has a particular fingerprint (often referred to as magic bytes).

Finally, most computer systems support security rights, so that only authorised persons have access to particular files.The simplest way to apply these rights are at a top level, so everyone involved with work job 41354 has access to all the job data, or everyone involved in youth ministry has access to all youth related data.While it is possible to grant or revoke access at a more granular level, that brings additional complexity that I won’t consider too deeply here.Access to physical documents can be controlled in a similar way: different keys give access to different storage rooms or filing cabinets.

With those observations, we are ready to create a simple but effective structure for personal and church data.This structure will form a primary index, or a way to locate specific files.Additional secondary indexes will be listed as well, however they will always point to files that need to be retrieved via the primary index.

Organisation / Primary Index

Here are the principals I follow for the primary index, or how files are physically organised on disk:

Part of the structure must be time series (eg: each year). This doesn’t have to be the very top level, but closer to the top is better.
I choose some broad categories and sub-categories. These are repeated inside each year.
Three or four levels of nesting is usually enough, more becomes difficult to find. However, having too many files in one folder makes it equally hard to find things: there’s a balance here.
Its worth spending time naming files well. Word documents and PDFs can (sometimes) be indexed, but scans or photographs cannot. A well named file can make it much easier to identify a document.
Consistency is key. Whatever conventions you come up with are not so bad if you follow them each year.
Self-discoverability is important. While a secondary index is very helpful, if you can’t make sense of the files as they are physically organised, you’re doing it wrong.

Personal Examples

The top level provides very broad categories. Pictures, Music, Documents, etc.And then various sub-categories within there. Often by person.

Pictures is the most structured area: 12 months after my first digital camera, I was already struggling to organise photos.And it hasn’t got any easier.I quickly adopted a strict time series approach to storing photos, and created scripts to automate the process of getting photos from my camera (and more recently phone) into the Library folder.As all my cameras are actually WiFi connected phones these days, the process is fully automated: once I connect to local WiFi, photos are automatically synchronised, post-processed and copied to the right folder.I’ve used a number of secondary indexes for photos down the years - they’ve all ended up obsolete for one reason or another. Currently, I just click through photos month by month in Windows Explorer.

Music has always been managed by media players.Rip content from CDs directly to an album, and let the media player index and organise it for me.There’s never been enough content to warrant time series here.

Videos has never had enough files for structured time series.I generally have folders with a rough category / description + year.Since COVID there’s been lots more material added here, but not enough I care to re-arrange it.

Documents are again pretty ad-hoc.Particularly other family members.

- Pictures
  - 3rd Party
    - Grandma's Trip to Europe 2001
    - Frodo's Photos from the Beach 2099
    ...
  - Library
    - 2020
      - 01
      - 02
      ...
      - 12
    - 2021
      - 01
      ...
  - Scans
    - Murray's Parents
    - Catherine's Parents
    - Someone Else's
    ...
- Music
  - Album 1
  - Album 2
  ...
  - Album N
- Videos
  - Church Tech Training 2021
  - Church Recordings during COVID
  - DVDs
  - Family Christmas 2010
  - Old VHS
  - School Concert 2016
- Documents
  - Murray
  - Catherine
  - Child 1
  - Child 2
  - Child N
- Backup
  - GMail
    - Murray
    - Catherine
    ...
  - OneDrive
    - Murray
    - Catherine
    ...

Church Examples

Most of our church data is being stored on OneDrive, because of its combination of ease of use, price and functionality.Even data that isn’t primarily on OneDrive (on various other cloud based systems) gets exported and stored on OneDrive.Its our single source of truth.

The top level category is based on security roles.That is, different people are granted access to different areas as required for their ministry at church.

Then, there are more specific ministry categories.For example, within the broad “children’s ministry” category, we have “kids for Jesus” (our Sunday School) and “play time” (preschool).

Then, there is our time series structure.A folder for each year which contains lesson plans, meeting documents, attendance rolls, etc.

In some cases, there are additional folders for multi-year resources, or other buckets for files.

Finally, we need to take the extra time to name files descriptively.Well, we try to, it doesn’t always happen.

OneDrive
  - Admin
    - Advertising
      - 2019
      - 2020
    - Church Directory
    - Policy
    - Templates
  - Children's Ministry
    - Kids for Jesus
      - 2020
      - 2021
    - Playtime
      - 2020
      - 2021
    - SRE (Scripture)
      - 2020
      - 2021
    - Music Resources
  - Music
    - Sheet Music
    - Powerpoint Slides
    - MP3s
    - Videos
    - Copyright
  - Parish Council
    - Budget
    - Correspondence
      - 2010
      - 2011
    - Meetings
      - 2010
      - 2011
      ...
      - 2019
    - Policy
    - Property
  - Sunday Meetings
    - 2018
      - 2018-02-13
        - Powerpoint slides / runsheets.
    - Master Slides
    - Training Resources
  - Talk Recordings
    - 2018
      - 2018-02-13
        - MP3 / MP4 recordings
  - Youth Ministry
    - ROCK
      - 2018
      - 2019

Secondary Index(es)

I’m running several secondary indexes against both personal and church data.

Manifest Files

The first are the “manifest files”, generated by my ManifestMaker app.These are simple tab separated text files which list the contents of each disk burned.They include filename, size, created and modified dates, plus a content hash. And can be read by Excel.

While their primary purpose is integrity, they also provide a very crude way to search disk content without access to physical disks.

WinCatalog

The more featured index is WinCatalog.This app shows a graphical view of disk content (and live / working files), with the same core attributes as manifest files (name, size, dates, hash).In addition, it takes thumbnails of PDFs, Word documents and images, so you have basic visibility into file content.And will index some file specific information, such as EXIF details from photos, ID3 metadata from audio and video media, and metadata from e-books.It also allows you to tag disks, folders and files with arbitrary tags and user defined fields (although data entry for these is a manual process).

WinCatalog has a reasonably powerful search function.Allowing you to search by file date, size, name, type, location in catalog.It also lets you search for duplicates by name, size and hash.

As an aside, I don’t mind duplicate files.If the same file ends up on multiple disks, that’s extra redundancy!And, having indexes by file hash mean you can instantly determine if the file is a true byte-for-byte duplicate, or just a file with the same name.

One search function I found WinCatalog lacks is to find files which do not appear on backup disks.So, if I index all the “live data” and compare it to all backup disks, I would like a list of everything in “live” which is not in “backups”.That is, data that needs to be backed up!

While WinCatalog is a proprietary application, the underlying data is stored in an SqlLite database.And I’ve created a CatalogQuerier app to implement my “find data that isn’t backed up anywhere” search.I find this invaluable to ensure absolutely everything gets backed up.

The final search function lacking in WinCatalog is full-text search.It does not (as of 2022) let you search for text within Word documents or PDFs, etc.That would be a killer feature, particularly for church document searches!

Media Player / Photo Databases

I’ve used various databases for photos, video and audio down the years.All have eventually become obsolete or I’ve just run out of time to manage them.

Some really old Adobe software I can’t remember the name of.
IMatch - a digital asset management system.
Windows Photo Gallery - part of the (now obsolete) Windows Live suite.
The built in Windows Photo / Music apps in Windows 10.

These have various features like tags, facial recognition, geo-location.

Windows Search

Windows has a full-text search feature.This works very well to find content within a file, as long as the documents are on your local computer.Fortunately, documents / PDFs are small enough that it is feasible to keep them all.

Apple and other vendors have their own search functions as well.

References

The Sydney Anglican Diocese has some good content about structuring data and retaining records. I have issues with their over reliance of cloud based systems, but otherwise very good information.

Conclusion

You are keeping backups and archives so you can retrieve data from them in the future.Possibly the very far future.You need a structure in place to make it reasonably easy to find particular files.Even if you have inherited the archive from someone else, who inherited it from their predecessor.

Three or four levels of categorisation works quite well.At least one of those level must be time series.And files descriptively named.

If possible, it is highly recommended to keep one or more external secondary indexes.This provides a centralised search functionality that can see the entire archive, even as it is broken into many disks.And the ability to search using other criteria (eg: file content, thumbnails, and others).

This is the last part in my long term archiving series. I may report back in a few years about how my archives are going.

Read the full series of Long Term Archiving posts.

Long Term Archiving - Part 7 - File Formats

2021-11-27T13:00:00.000Z

Files to survive 100 years.

You can read the full series of Long Term Archiving posts which discusses the strategy for personal and church data archival for between 45 and 100 years.

Background

So far, we have considered the problem and overall strategy, and got to my chosen implementation.

One key point to consider is: what file formats will stand the test of time?

Goal

Decide on preferred file formats when saving data onto long term archives.Such formats should have a high likelihood of being read in 45-100 years.

Concepts

The longevity of any given file format sits on a spectrum:

Short Term <------------------------> Long Term

On the left, there are propriety formats that require expensive applications (and even specific hardware) to work with.These are undocumented (sometimes even within the creating company), involve restrictions to make public or open implementations difficult (patents, non-disclosure agreements), are only used in a narrow domains, and can be very complex.

Industrial and medical applications often fall in the category: highly specific, backed by expensive R&D (which means patents to protect that investment), and unavailable outside one product or company.

On the right, there are simple, open formats.These have a public specification, no legal impediments to making sense of the data, and are in common use by millions or billions of people across many different domains.

UTF8 plain text, PDF documents, and PNG images are examples of highly open formats.

And there are plenty of formats in the middle.Word documents, H.264 encoded videos, and even HTML have dangers as very long term formats.

Criteria

Here’s the criteria I see as important for long term file formats:

Open with a specification widely available. In the end, all digital files and data are a sequence of bytes. If you know how to interpret those bytes, you can make sense of the data.
Current usage. Generally, once applications can view a file, and many people make use of the format, application makers have an incentive to retain their implementation.
Age. Older file formats that have stood the test of time and continue to be used are likely to remain available in the future (even if they one day become legacy).
Multiple Implementations. If there’s only one app that can view a file then you have a single point of failure. Two is better. And many is better again.
Open Implementations. Open source apps are best for long term archiving, as they include the instructions for interpreting the file. Free apps are next best. Then commercial implementations from large companies. And finally, you really want to avoid a commercial app from a single, niche company.
An Export Function. Commonly available apps which can translate from one format to another are highly valuable when a migration is required. Being able to export your data from the cloud is crucial.
Simplicity. Simple formats are always easier to implement (or re-implement) than complex ones
Patents. These are a hindrance to free / open implementations, so the fewer patents the better.
Security. A file format that has had security vulnerabilities is dangerous. If there is no way to mitigate security problems, a file format may be removed from applications as a safety measure.

There’s plenty of criteria to evaluate there, but I have a very simple rule of thumb:

If web browsers can view the file format (without extra plugins), its likely to be safe.

That is, if you can drag the file onto Chrome, Firefox or another web browser, and it Just Works™, its likely to be supported into the future.Web browsers are about the most ubiquitous software available, and an excellent lowest common denominator.

Specific File Formats

Now, let’s consider common file formats and how safe they are in the long term.Only formats rated 4 or 5 will be used in my archives.

Plain Text

Plain text files have no formatting.They are about the simplest form of data you can store on a computer.

File Type	Rating	Comments
ASCII Plain Text (txt)	5/5	There’s nothing simpler than ASCII text, as long as you only speak English.
UTF8 Plain Text (txt)	5/5	UTF8 covers 98% of plain text data on the Internet. All languages are covered. This data should be easily readable in 45+ years. As long as you don’t need formatting, all is well.
Other encoding Plain Text (txt)	2/5	Yes, there are other text encodings. Best not to bother with unusual standards, they just make it harder to read. And, because plain text files are not self-describing, it can be difficult to know the correct endcoding.

Structured Data

Structured data is designed to be readable by both computers and humans - although with a priority to computers.

File Type	Rating	Comments
JSON	5/5	JavaScript Object Notation is mostly human readable in any text editor, and widely readable by computers. A schema is optional and rarely used (which means you usually need to reverse engineer an unfamiliar file). Most software development apps and advanced text editors can “pretty print” JSON.
XML	5/5	The Extensible Markup Language is more complicated than JSON, but otherwise very similar in terms of outcomes. Schemas are more common. And apps are widely available too.
CSV / TSV	5/5	While JSON and XML are document orientated, tab and comma separated files are tables of data. Again, they don’t have a schema built in, but its usually pretty obvious what the data means. Most spreadsheet apps can read CSV or TSV files.

Documents

When people think about storing “data” they are usually thinking of documents with text, formatting, images, etc.I’m including spreadsheets and presentations here too - so the core office productivity apps.

File Type	Rating	Comments
DOCX / XLSX / PPTX	4/5	Microsoft’s core Office formats are open, have multiple implementations and widely used. Deduct one point because they can’t be natively displayed in a web browser, and they do slowly evolve and change. While Microsoft has published a spec, I don’t view these are truly open formats. On the other hand they are used pervasively.
ODT / ODS / ODP	5/5	OpenDocument file formats are… well… open. Personally, I use Microsoft’s formats, but would be perfectly happy keeping these ones instead. As they are explicitly open, they score one point more than Microsoft’s formats, although it’s worth noting they are much less widely used.
PDF	5/5	The Portable Document Format is the gold standard for printable documents. As they are (usually) read-only, they are a great way to keep snapshots at a point in time.
HTML	4/5	While the whole Internet is built on HTML, it doesn’t get used very much for offline or editable documents (minus 1 point). Web browsers speak HTML natively, of course. It also isn’t well designed to save a document as a single file.
RTF	4/5	Rich Text Format is like DOCX and ODT, but simpler and it hasn’t changed in years. Its slightly more likely to be readable in the far future. However, it is a proprietary Microsoft standard.

Still Images

Family photos is the majority of my personal data.Many businesses will scan documents as still images or PDFs.

File Type	Rating	Comments
JPEG	5/5	JPEG images are the gold standard for lossy stills. While there are alternative digital negative formats that professional photographers may use, JPEG is readable pretty much everywhere, and has been since the mid 1990s.
PNG	5/5	Portable Network Graphics are loss-less images. They are ubiquitous on the Internet and viewable everywhere.
TIFF	5/5	Tagged Image File Format is associated with scanners. It’s a bit more obscure than the above formats, but been around longer. Its very stable and widely readable.
WebP	4/5	The WebP format is aiming to be a PNG successor. Version 1 was published in 2010, making it much younger than other formats (so minus one point). Modern web browsers support it, but it its usage is minimal compared to JPEG and PNG.
SVG	4/5	Scalable Vector Graphics is the most open vector format around. All the others listed are bitmaps. Vector graphics are great for icons, fonts and logos that need to grow and shrink. Web Browsers can view SVGs, but they are not as widely supported as the bitmap formats. Various Office apps can export graphics as SVGs, and it is a good long term format for computer aided design files.

Audio

Music and recordings are important to keep into the far future.Partly because we love music.And also because recordings may be of important events (eg: office meetings, police recordings, etc).At church, we keep audio recordings of each Sunday’s Bible talk.

File Type	Rating	Comments
MP3	5/5	MP3s are the gold standard of lossy audio compression. They have been playable since the mid 1990s in many, many apps. A very safe choice for long term storage.
WMA	3/5	Windows Media Audio was a Microsoft specific technology which improves on MP3. While its widely supported, its proprietary and not recommended for new recordings.
OGG	4/5	While less common than MP3, Ogg allows storage of lossy and loss-less audio, that is generally of higher quality for the same file size. Unlike WMA, it’s an open standard. It’s widely supported, but not as wide as MP3. No patents.
AAC	4/5	Advanced Audio Coding has a similar intent to OGG - improvements over MP3. Although less popular, it is commonly used in mobile devices. No patents.
WAV	5/5	Uncompressed audio is wonderfully simple and easy to understand in the future. Unfortunately, WAV files are several times larger than the equivalent MP3. Definitely readable; but not practical, and loss-less alternatives exist.

Video

Full motion video with audio is everywhere these days.Family videos are important to keep.And businesses care as well, as they may record teaching material, meetings, etc.Since 2020 at church, we keep video recordings of each Sunday’s Bible talk.

I’m dividing these into two sub-categories: containers and codecs.Containers are usually the file extension, but they just say how the audio and video is packaged.Codecs are the way you decode and display the video.

Container	Rating	Comments
MP4	5/5	MP4s are the most common video container at the moment. And are widely supported.
AVI	5/5	AVIs are more common on Windows and are an older container.
MKV	4/5	Matroska files are a bit less common, and frequently found in live streaming applications because they file is still readable even if it is stopped unexpectedly (eg: crash or interruption).
MOV	3/5	MOV files are common in the Apple world, based on QuickTime. While readable by many applications, it is not an open format (so minus points).

Note that modern video applications are capable of playing all the above containers.This was not always the case in 1990s and 2000s.

Codec	Rating	Comments
MPEG2	5/5	MPEG2 is the codec used on DVDs and video CDs from the 1990s, and still used in lower quality over the air digital TV broadcasts. Due to its age, it is readable pretty much everywhere. While it was patented, those have now expired. Not recommended for new content as there are better options.
H.264	5/5	Advanced Video Coding (AVC) is a more advanced codec and used on Blu-Ray disks, OTA TV and streaming services. This produces smaller files than MPEG2, but at a higher quality. All modern devices can play H.264 encoded videos, and it’s a great choise for long term archival. Royalties are not payable for non-commercial use.
H.265	4/5	High Efficiency Video Coding (HVEC) is superior again. It’s the codec for 4K and 8K broadcasts and many streaming services. It’s relatively new and has patents that cause legal issues (minus one point).
AV1	4/5	AV1 is an open, patent free, codec that competes with H.265. Technically, the two are quite similar, AV1’s big plus is you don’t need to pay royalties to use it. However, it’s not as widely supported as H.265 (minus one point).

It’s worth noting that the video encoding space has evolved faster than still or audio files.This is because the tech behind still images and audio files invented in the 1990s and 2000s is more than good enough - the quality is acceptable, and file sizes small.Video, on the other hand, has gone from low definition to standard def, high def, 4K and 8K - and the tech has needed to improve to keep file sizes manageable.

What that means is its quite likely there will be a new (and superior) video codec invented in the next 10-20 years.But there have been a number of new still image and audio formats invented over the last 20 years, but none were so much better than existing tech to take over - so much less likely for a newcomer.

Email

There’s a stack of data tied up in Email.All kinds of communication happens via email and it’s often important to capture for the long term.Personally, I prefer to save important emails (or email chains) as a PDF if it needs to be kept for the long term.And I don’t tend to pay as much attention to the file formats used by my email apps.

File Format	Rating	Comments
EML	5/5	EML files are used by many apps for individual emails.
MSG	4/5	MSG files are a Microsoft thing used for individual emails by MS Outlook. Minus one point for proprietary, although most modern email apps will read them.
PST	4/5	A PST file is what MS Outlook uses to store a whole mail box (many emails). While there are various apps to read a PST file, it’s still rather proprietary.
MBOX	5/5	MBOX files are how mail boxes were stored on older UNIX systems. They have carried on into various non-Microsoft email apps. The format is simple and open, so good for reading in 45+ years.

Databases

There are a stack of database technologies out there.And an even wider number of implementations such as MySQL, SQLite, MongoDB, LevelDB, and many others.

The data in these systems are used by all manner of apps in personal and business contexts.Our church keeps some records relating to Safe Ministry in a MySQL backed web application.So keeping this data available in the long term is really important.

Unfortunately, the only reliable way of doing this is to keep upgrading your database system every few years.Because there’s considerable research and development in the database field to improve performance and reduce storage requirements.Basically, it’s in the interests of large companies to improve their data processing.And that means file formats are constantly evolving.

Generally, it’s not too hard to upgrade from version 1 to version 2.Things get more complicated to go from v2 to v5 though - many systems only support upgrades for one or two versions different (so v2 -> v3 or v4 would be OK, but not v2 -> v5).Instead, you need to do a multi-step upgrade like v2 -> v4 -> v5.

For this reason, if you want to keep a snapshot of your database available into the far future, the best approach is to export to one of the structured formats above (JSON, XML, CSV or TSV).

While many database systems allow you to make backups, these backups are often very closely related to their main file formats, and come with similar restrictions to the upgrading process (eg: v5 can only restore v4 and v3, but not v2).

The only other option is to maintain your database system and keep it current.While that’s usually a desirable thing, it doesn’t always work with compliance requirements like “what did you data look at on 34th Smarch 2312”.

Other Apps

The above lists cover off my requirements.But many other apps are out there and used for mission critical business scenarios.I’m not going to make recommendations here, there are simply too many options.

In general, the database recommendations of doing regular snapshots is the best approach.And sometimes that means big exports, or lots of PDFs.

References

The National Archives of Australia have good file format recommendations for digital formats.They also have details about analogue formats, which isn’t my focus here, but may be of interest.

Conclusion

It’s no good to keep your data for 45-100 years, only to find there is no app to read and process it.Wisely choosing file formats is an important part of your archiving strategy.

Fortunately, the ubiquity of audio, video, still image and documents in our digital lives mean that common files are very likely to be readable in the far future.

Next up: In the last part to this series, I will discuss how to organise files on archival disks so they are easy (well, less difficult) to find.

Read the full series of Long Term Archiving posts.

Long Term Archiving - Part 6 - Implementation

2021-10-28T13:00:00.000Z

The Interesting Part.

You can read the full series of Long Term Archiving posts which discusses the strategy for personal and church data archival for between 45 and 100 years.

Background

So far, we have considered the problem and overall strategy, possible failure modes, how we will capture the required data, likely access patterns of the backups, and finally, listed possible options for backups and archives.

With all the due diligence out of the way, it’s time to describe the implementations chosen.

Goal

Describe the implementation of my chosen long term archiving strategy (45+ years) for personal and church data.

Personal

My family’s personal data is split across four areas:

TrueNAS server - photos, videos, music, file history from Windows devices, and full disk images.
OneDrive - general documents, selected photos.
Google - email.
Other cloud - there’s plenty of stuff I’ve forgotten, and I don’t really care.

The TrueNAS is a frankenstein computer of parts from down the ages (oldest is ~10 years).It has a 2 Core AMD CPU, 16GB RAM and 6TB usable storage (mirrored disks).It is powered via a small UPS, which is designed to protect against a 5 minute outage and allow a safe shutdown (electricity is extremely reliable in Sydney, but thunderstorms happen in Summer).Despite the low end (and second hand) hardware, it is one of the most reliable computers I’ve come across.

I consider OneDrive and Google very reliable cloud providers.But I take a weekly snapshot of OneDrive using RClone.GMail is more problematic to backup automatically, so I’m content to download a snapshot every couple of years.

Data on the TrueNAS is backed up to BackBlaze B2 cloud storage.TrueNAS has a web front end to RClone that makes it much easier to understand and use.Cost for B2 is ~AUD $6 / month with my current usage of ~700GB.

BluRays are used for offline backups and archives.So far, I’m sticking to single layer 25GB BDR disks as they are cheapest per GB and simplest (read: least ways for them to fail), though I’m experimenting with larger capacity disks as well.All important data is stored with triple redundancy (3 copies of each disk), and two copies are stored off-site.I’m also using 3 different brands of disk, in case there’s a systematic failure from a factory.BluRay disks are the cheapest offline backup system for consumers (tape is out of my price range).And optical media has the highest longevity I’m aware of in consumer hardware, which is good for archiving data for at least 20 years using standard disks.

Data on TrueNAS and BluRays are indexed using WinCatalog.This gives an explorer-like view across all disks, and facilitates searches and finding duplicates.Unfortunately, it doesn’t have a “find files that are NOT on a BluRay disk” feature - but the underlying database is Sqlite, so I have written my own utility to find missing files.I also have written a console app to generate hashes of each file on a BluRay disk (the disk manifest), and that gets signed using PGP and KeyBase keys - which gives high confidence of reading data correctly.The WinCatalog index & manifest files are stored separately on TrueNAS.

Finally, every year or two, I manually gather up all data from cloud services and local storage and burn BluRays of them all.This gives an occasional snapshot of all documents, email, etc.I also make a snapshot on a hard disk of photos, videos, etc (ie: larger data that is also on BluRays).

Every 5-10 years, I get dissatisfied with some aspect of my backup system, so I re-visit and re-work it.(This post outlines my latest iteration; previously I’ve used DVDs, HDDs, and cloud based systems).This is an informal “review” process to evaluate if I should change due to hardware / software obsolescence (and unfortunately involves data migrations).

This strategy satisfies the 3-2-1 backup rule:

Three copies of important data: TrueNAS, BackBlaze / OneDrive, BluRays.
At least two backup media: TrueNAS, BackBlaze, BluRays.
At least one copy off-site: 1 cloud copy + 2 BluRay copies.

BluRay disks are expected to be readable in 20 years, likely more.And provides an off-line, air-gapped, off-site archive.

Church

Processes for church data are slower in taking effect, however we’ve done all the planning (and I’ve tested pretty much everything in a personal context anyway).

The main difference is church data is primarily on the cloud, to facilitate sharing.Systems we’re using include:

OneDrive - for documents, recordings, etc.
BitBucket - website git repo.
YouTube - online videos & streaming platform.
Office365 - email.
Specialised cloud systems - in particular, the Sydney Anglican Diocese has website for capturing records required for Safe Ministry.
Hardcopies - we’ll never be fully electronic.

The stronger use of cloud systems is because church members need to share data with each other.We certainly have many computers on-site, but most members are volunteers who do their work from home, and sometimes need to access that content on-site (eg: for presentations, printing or post-processing).

Backup systems:

We view OneDrive as a system to store data we use on a day-to-day basis, as well as a backup system.It provides features to assist sharing documents, and also retaining them in longer term.It’s more reliable than anything I could build on a limited budget for backups.It’s also quite simple to use, which is a big plus for church volunteers who might not be very technologically savvy.

Data on OneDrive is mirrored to a local server (in case OneDrive disappears for some reason).Currently, that server is an even older frankenstein than my home TrueNAS box.It was cobbled together at short notice (to replace a failed server) from very old parts.It’s running Ubuntu server and has no web UI like TrueNAS does, so all admin is via SSH - which makes simple admin tasks more complex than they need to be.It is using ZFS to ensure data integrity.There are plans to migrate to TrueNAS (possibly even first party TrueNAS hardware).

There is no additional backups to other cloud systems (eg: BackBlaze or AWS).Due to our limited budget, and to keep things simple, we’re classifying OneDrive as both our day-to-day storage and a cloud backup system.

We take periodic snapshots from all systems.Some are automated (where possible) and others are manual.

BluRays are also used for offline backups and archives, in a very similar way to my personal backups.The main difference is church BluRays will use M-Discs - these are archive grade media designed to survive for “hundreds of years”.We’re also planning to store an additional copy (ie: 4 in total) at the Sydney Diocesan Archives - which have a better environment for storing disks long term.

We’re planning on using WinCatalog & manifests to index disks.No changes from personal strategy here.

One big difference from personal backups is a much more structured approach to procedures and reviews.Because there are legal compliance requirements we need to meet (particular data must be available for at least 45 years), we need to regularly check we are actually meeting those requirements.So there are template reviews drafted that will be done annually, and report back to our church’s board of directors to ensure compliance.These reviews include people focused questions - are people using the systems we’ve provided, is the data we need being stored. As well as technical questions - are backups working, can I read the media successfully, is the technology still viable.And even the manual processes - so we remember to do them!

This strategy satisfies the 3-2-1 backup rule:

Three copies of important data: OneDrive / and other cloud systems, on-prem server, BluRays.
At least two backup media: OneDrive, on-prem server, BluRays.
At least one copy off-site: 1 cloud copy + 3 BluRay copies.

M-Disk BluRay disks are expected to be readable in 45+ years, possibly over 100 years (if the advertising proves correct).And provides an off-line, air-gapped, off-site archive.

Possible Single Points of Failures

Very long term backups need to have as few single points of failure as possible.If there is a single link in the chain that can break and cause loss of ALL data, that is entirely unacceptable.

Encryption

The biggest single risk is encryption.

If your backup is encrypted, it is impossible to restore unless you have the password / encryption key.Of course, you want your backups encrypted because there’s likely to be sensitive data in them.

There is a fundamental tension here:

Backups should be encrypted because they contain sensitive data.
Backups should not be encrypted because its a single point of failure that can prevent a restore when in duress.

My approach is: when making backups, I only encrypt cloud backups.

That is, data on the public cloud is encrypted (and RClone makes that easy).But offline backups / archives are not encrypted.That is, anyone who gets their hands on my BluRay disks can read everything.

Which is by design.

Because archives a) are often old enough the sensitive data has lost its value, b) designed to be the last resort when restoring, so need to be easily accessible, c) more likely to be read by someone after I’m dead (eg: grand kids, archeologists, etc).

I can manage security of BluRay disks by controlling physical access to them.But if someone gets access to them in 100 years time, I’d prefer they can see their content rather than be thwarted by a password.

Aside: there are ways of keeping a backup password safe by distributing it to many people.Shamir’s secret sharing algorithm is a way to do this such that a quorum of people are required to recover a password.Or a “dumb” approach: have a long passphrase and give parts of it to different people.Both likely introduce a delay if you’re going to the backup of last resort, as you need to contact several people.

Simplicity

Other than encryption, overly complex recovery processes are the most likely way reading data would fail.

There are three ways to mitigate:

Document the restore process.
Do test restores on a regular basis.
Remove un-needed complexity.

Items 1 and 2 are in place for church archives via compliance reports.Less so for personal archives, but I still occasionally test the disks are readable.

Item 3 is my main focus here: keep things simple.

My BluRay disks are, as much as possible, just a bunch of files burned to a disk.

If you put them in any BluRay drive connected to a laptop / desktop computer, you can browse them using your favourite app, and open them using whatever apps are available.There’s no requirement for Windows, or Microsoft products (although that’s where much of the data originates).The disks could be read on a Mac, or a Linux machine (or some new OS that comes out in 50 years time, as long as it can talk to a BluRay reader and supports UDF).And the files should be readable using many applications (JPEG photos, MP4 videos, MP3 music, DOCX documents, PDF documents, XLSX spreadsheets, etc).

In particular, I avoid compressing data.The logic being: a single error has a higher chance of doing extensive damage to compressed data - but would only break a single file if not compressed.And, most large files I deal with (video, audio, photos) are already highly compressed; documents and spreadsheets are small enough that it doesn’t matter.

That is, the requirements to read my archive disks are a) the disks themselves, b) a BluRay drive, and c) a computer.

Special backup software should NEVER be required for long term archives.It adds a layer of complexity that may cause difficulty when trying to restore data.And you don’t know what the scenario is when the disk is read (it might be after your house burned down and you have absolutely nothing beyond an off-site backup, or it might be your great grand kids in 100 years time, or it might be an archaeologist in 500+ years time).

If you only have one disk (because all the rest were damaged beyond repair somehow), you should be able to read everything from that one disk without dependencies on others.

The biggest layer of complexity I’m happy to add is for large files to span multiple disks.This is pretty rare as I don’t often work with files over 25GB.But full disk images are the one exception.

Having said all that, I’m happy to include extra data on each disk.For example, the manifest file is not required to read anything on the disk - although it forms an index that may help someone find what they’re looking for more quickly, and provides a hash to verify the file integrity.I usually include MultiPar parity data - that includes additional checksums to verify integrity, and may help recover a damaged disk.

However, none of that “extra data” is required to read the content on disk.

Integrity

Also, I’d like to ensure the integrity of any data cannot be tampered with.That is, if someone edits or replaces a file (or an entire disk), you should be able to clearly tell something has changed.

First off, all BluRay disks I use are write-once.So it is technically impossible to accidentally or maliciously modify data on a disk.However, a bad guy could make a copy of the disk with changes and replace the original with the copy.Unless they are very careful, this would leave different date stamps or different media brands which could be noticed.

The WinCatalog index includes an SHA256 hash of each file, and the manifest files include SHA384 hashes.Both are stored separately from disks, so even if someone replaced a disk with a new one (with dodgy data), that could be detected.The bad guy would need to a) replace all disks in all physical locations, and b) update the index & manifest files which are stored separately to the disks.

I’m also signing the manifest files, so the signature would no longer be valid if the bad guy is tampering with things.The bad guy could generate new PGP / KeyBase keys that look like mine, but are not.However, KeyBase keys are public by default, so that should be very difficult.(PGP keys can be published as well, but there is no central authority so nothing stopping an attacker doing exactly the same thing).

If I was implementing this in a larger corporate environment, I might ask many people to observe the process to create disks, inspect the disk contents and then ALL sign the manifest.That is, you might have 2 or 3 or more people attesting to the correctness of a disk.If the private keys for this process were stored on a hardware token (eg: Yubikey), then the difficulty for an attacker to modify data without detection becomes extreme.

If I was really concerned about bad guys trying to alter data in deep archives, I could publish the original manifest files to a public location (like a blockchain), when the disks are created.As blockchains are effectively append-only databases, an attacker would need to re-create the whole blockchain to change hashes.

For my use case, write-once media + hashes + signatures is more than enough.

Conclusion

It took 5 posts and about 12 months of thinking to come to a reasonably simple (if overly redundant) backup strategy that can meet the 45+ year requirement.

By using on-prem (TrueNAS), cloud (BackBlaze / OneDrive) and offline storage (BluRays).And keeping copies off-site.And using two external indexing systems.And keeping signed hashes of all files.

I am very confident my data will survive well into the future.Even confident it will survive to my 45 year goal!

(And yes, I realise most of the 10+ year part is met via M-Disc BluRays. And 20+ years is met via “review backup technology and migrate if required”.Insert something about the journey being more important than the destination).

Next up: We aren’t finished yet! I will discuss which file formats are suitable for long term archiving.

Read the full series of Long Term Archiving posts.

Long Term Archiving - Part 5 - Platform Options

2021-10-08T13:00:00.000Z

So many choices for backups!

You can read the full series of Long Term Archiving posts which discusses the strategy for personal and church data archival for between 45 and 100 years.

Background

So far, we have considered the problem and overall strategy, possible failure modes, how we will capture the required data, and likely access patterns of the backups.

Now we’re up to the fun part!Time to research the options available to do backups and consider how well they meet our criteria.

Goal

List common backup platforms or technologies, and evaluate them based on the criteria we’ve identified over the last few posts.

Remember, I’m planning to backup personal data and church data (not-for-profit organisation).These evaluations mostly apply to a small business (20 or less employees), but less so for medium or large organisations - they will be processing orders of magnitude more data.

Cost - capital, ongoing, price per unit storage.
Reliability - how frequently should we expect failures.
Longevity - how long should we expect the backup to survive.
Access - how easy & fast is it to access data.
Scale - how large can you grow your data.
Simplicity - what technical skill is required.
Automated - to what degree can you automate.
Security - how secure is your data.
Recommendations - what is the technology most suited to.

Disclaimer: Some of the criteria are pretty arbitrary and subjective. Others will be based on other studies or maths.As always, do your own evaluations to determine if any service or technology is suitable for you.

Cloud

The Cloud is a fantastic place for backups and archives.It enables individuals and small businesses to access the same scale of storage as multi-national corporations.

Remember, the cloud is a euphemism for “renting someone else’s computer”.It is relatively cheap and highly reliable - essentially, the cloud provider takes responsibility for all the boring aspects of storing data.But in accessing those features, you give up ultimate control of your data.

So this advice applies to all cloud based backups: have an off-line copy as well.

Cloud: Object Storage

AWS S3 popularised “cloud storage”.It works by storing key-value pairs: some kind of name, and a blob of data.It has conventions for creating a filesystem-like view.And adds permissions, storage tiers, and various other features.You can “put” data into a “bucket”, and then retrieve it later by its name.

Alternatives: Azure Blob Storage, Backblaze B2.

Capital Cost: $0

Ongoing Cost / GB / Year (AUD): 30c (AWS), 25c (Azure), 6c (Backblaze). Plus network costs, API usage, and who knows what else.

Calculating costs of cloud storage is incredibly difficult; there are any number of pricing tiers, levels of redundancy and additional costs beyond raw data storage (eg: network uploads / downloads, API usage).The cloud promised “only pay for what you use”, but delivered “our pricing model is so complex, until you actually use our service, you have no idea what it will cost”.You should use the “pricing calculators” provided by each cloud service to get a rough estimate of cost.

Cost to Store 1TB for 1 year (AUD): ~$310 (AWS) (plus network / API / etc)

Reliability: All cloud providers use redundant storage within individual servers, data centres and can even replicate data between different geographical regions. While there are occasional outages due to network issues, your raw data is incredibly safe.Even if there are internal errors or failures (and there will be), the provider has automated systems to detect and correct them.

Effectively, you can safely assume you will never see a failure when using the cloud.This is by far the biggest advantage of the cloud: it is really expensive to achieve similar reliability by rolling your own.

Longevity: All cloud providers have long term archive options (eg: S3 Glacier, Azure Archive), and the same principals behind their high reliability mean your data is safe over 10+ years.So long as the provider itself remains in business, it is safe to assume your data is available.

Access: Is generally limited by your own Internet connection - as with any cloud solution, if your Internet connection is poor, the cloud will perform badly.S3 is the industry standard protocol for cloud object storage, and there are many apps available to upload / download / browse your data.Many backup solutions have built in support for S3 storage.

Scale: Object storage allows for Petabyte level storage (1 PB is 1 million GB).For personal usage, small or medium business, you can assume there are no technical limits.The first thing that will break is your credit card!

Simplicity: Any nerd or technically minded individual will have little trouble using object storage.However, cloud providers pitch this technology at technical people; your mom-or-pop is going to struggle signing up for these cloud providers, let along configuring their devices.

Automated: Cloud providers are available 24/7, and were primarily designed to be accessed by machines rather than humans.Their support for automation is excellent.Low level APIs are available (if you’re a programmer), graphic clients are available (for interactive access), command line clients are available (for automation via scripts).

Security: It is a vested interest of cloud providers to ensure privacy of your data, and security via access permissions & user authentication.Having said that, most cloud providers can peek at your data if they choose (although have strict policies prohibiting that) - you should configure any backup software to encrypt your data.And it is common for permissions to be accidentally set to “public” and allow anyone to download your data.

Recommendations: Object storage is an excellent candidate for backups and long term archiving.The only caveats are, 1) you need a nerd to get started, and 2) you have to trust they won’t go out of business in the next 50 years.

Criteria	Rating
Capital Costs	5/5
Ongoing Costs	3/5
Reliability	5/5
Longevity	4/5
Access	5/5
Scale	5/5
Simplicity	3/5
Automation	5/5
Security	4/5
Overall Suitability for Backups	5/5
Overall Suitability for Archives	3/5

Cloud: Sync Service

DropBox is the original cloud sync service.With similar services provided by OneDrive, Google Drive, Sync.com and others.

It is by far the simplest way of backing up data from your devices.You keep your files in designated folders, and the synchronisation service magically copies files to the cloud.When another device makes changes, they are magically copied to your device.Indeed, its so simple that “backups” in Windows 10 are “keep your files on OneDrive” - all the older backup features like File History or Backup and Restore are second class citizens.

Most cloud sync providers are able to see the contents of your data.Several providers make a point of confidentiality, by encrypting data on your computer before it is uploaded (zero knowledge cloud storage).This may be a desirable characteristic when making backups.Providers include: pCloud, Tresorit, and SpiderOak

Capital Cost: $0

Ongoing Cost / Year (AUD): $150-$200 for at least 1TB of storage.

Dropbox ~$190 for 2TB
Google Drive ~$150 for 1TB
OneDrive ~$130 for 6TB (6 users * 1TB)
Sync.com ~$140 for 2TB
pCloud ~$140 for 2TB
Tresorit ~$180 for 500GB
SpiderOak ~$200 for 2TB

Pricing above is for personal accounts; most services offer a business or professional level account which is more expensive and has more business orientated features.At the end of the day, if you want to backup data, it doesn’t matter; personal, professional or business is all the same.If you want to share files with other people, the professional accounts may be of interest.

Reliability: All cloud providers use redundant storage within individual servers, data centres and can even replicate data between data centres. While there are occasional outages due to network issues, your actually data is incredibly safe.Even if there are internal errors or failures, the provider has automated systems to detect and correct these.

Longevity: So long as the provider itself remains in business, it is safe to assume your data is available.Note that these consumer orientated cloud services don’t have the same guarantees about long term storage - that is, AWS S3 offers tiers specifically for retaining data for 10+ years for compliance purposes; none of the consumer services make such claims.

Access: Is generally limited by your own Internet connection - as with any cloud solution, if your Internet connection is poor, the cloud will perform badly.All these services have an app you need to install for the best connectivity, most (all?) offer a web interface as well.Most apps support iOS, Android, Windows, and MacOS. Linux is more hit and miss.

Scale: Most consumer cloud storage tops out around 5TB.Google Drive offers up to 30TB.If you want more storage, you’ll need to sign up for another account.This level of scale is fine for documents or photos, but if you’re recording 4k video you will hit the 5TB limit pretty quickly.

Simplicity: These services are aimed at every-day users.They are usable by pretty much anyone.

Automated: Cloud providers are available 24/7.But these consumer services are designed for humans rather than computers.At least, you will need to have a device with a person logged into it (so they won’t work on headless services).Having said that, there is software available which allow automation via scripts.

Security: It is a vested interest of cloud providers to ensure privacy of your data, and security via access permissions & user authentication.Most cloud providers can peek at your data if they choose (although have strict policies prohibiting that).It is difficult to encrypt data when using cloud sync apps.Fortunately, access permissions are private by default.

Recommendations: Cloud Sync based storage is a very good candidate for backups, particularly for everyday users.But not as good for long term archiving.And, as with any cloud provider, you need to trust they won’t go out of business.

Criteria	Rating
Capital Costs	5/5
Ongoing Costs	4/5
Reliability	5/5
Longevity	3/5
Access	5/5
Scale	4/5
Simplicity	5/5
Automation	4/5
Security	4/5
Overall Suitability for Backups	5/5
Overall Suitability for Archives	3/5

Hybrid

Hybrid systems allow many of the advantages of cloud storage, but you host the service on your own servers.Essentially, a cloud-like system, but using your own disks and hardware for storage.

If there is data that you can’t store in the public cloud (perhaps its too sensitive or you are prohibited by law) but still want a cloud-like interface to access it, then hybrid is the way to go.You retain ultimate control over your data, but need to take responsibility for maintaining the systems hosting said data.

Hybrid: Cloud Like

There are a number of Cloud Sync services that can be self-hosted.

OwnCloud / NextCloud are very similar services that behave like DropBox.SyncThing / Resilio are more like a writable version of BitTorrent.

All can be used as a backup, as long as you provide your own hardware.

Capital Cost (AUD): All need a server of some kind. Some need more powerful servers than others.

All the above services can function on a Raspberry Pi, which puts the entry cost at ~$400 including a disk.But this provides no redundancy if your disk fails.
See below for NAS devices, which are the cheapest devices available with redundant disks, starting from ~$600.
An desktop can be a functional server (even if the quality of parts might not be as high), starting from ~$1000.
A computer with “server” written on it will cost at least $2000.

Ongoing Cost / Year (AUD):

All services listed have free options, although that may be limited for personal use only.Most have business / enterprise pricing per user per month. You’re looking at $400 - $1000 per year for 5 users, depending on the service.

Fortunately, because these companies are selling you a product, their pricing is much easier to understand than AWS S3 or Azure Blob Storage!

Reliability: Because these are self-hosted, their reliability depends on the hardware you purchase and Internet connection available.The entry level costs (above) are NOT going to give you high reliability; cheapest is not best if you want reliability.Purchasing 3 of everything is a great way to improve reliability!But that means your capital costs just tripled.I’ll discuss reliability of hard disks in a NAS below in more detail.

SyncThing and Resilio are designed to scale out as you add more devices; OwnCloud and NextCloud not so much.

If you’re only using these devices at home or at business, your LAN may be plenty reliable for your needs.But, I’m assuming the “hybrid” part means you will want to access data or devices remotely, so a reliable Internet connection is important.

In Sydney Australia, I’ve found personal Internet via Internode more than reliable enough to host my own website.However, this may not be true in all part of the world (or even all parts of Sydney)!

Longevity: Again, I’ll discuss how long you can expect your hard disks to last for below.Your server(s) will last as long as you maintain / replace them on failure.

Access: These services require their own apps to run, which generally makes them easy to use.Otherwise, access to data is similar to other cloud sync providers.But with one important difference: you can always connect to the server directly if you need the data and the app isn’t working right.

Scale: I’m not aware of inbuilt limits for these services.OwnCloud / NextCloud will scale up to the size of your server.SyncThing / Resilio are distributed, so you can store more and more data as you add more and more servers.

Simplicity: “Self-hosted” means you need at least a computer nerd to get you started, possibly an IT professional.These services are moderately difficult to install, and pretty easy to use, but are certainly not aimed at mom-and-pop users.

Automated: All services can be automated within their own apps - generally this assumes a human logged onto a computer.Outside their apps, there is good scope for scripting and automation - “self-hosted” allows a high degree of flexibility in this department, if you have the expertise available.

Security: Data is in your own hands, so the security and privacy of your hybrid solutions are equally in your hands.All software listed have built in security and encryption - so the main point of failure is human: incorrect configuration or simply forgetting to revoke access to ex-employees.Also, make sure you keep software up to date - bugs and security vulnerabilities are found frequently, updates are key.

Recommendations: Hybrid Cloud Sync storage is a good candidate for backups and long term archiving (because you control the underlying hardware).Even if the parent company goes out of business, you’ll have whatever you last installed.Perhaps their best use case is to bridge between the public cloud and your own servers; which makes them a really good fit in the business world.

Criteria	Rating
Capital Costs	3/5
Ongoing Costs	4/5
Reliability	4/5
Longevity	4/5
Access	4/5
Scale	4/5
Simplicity	2/5
Automation	4/5
Security	4/5
Overall Suitability for Backups	4/5
Overall Suitability for Archives	3/5

Hybrid: Distributed Object Store

There are a number of “S3 compatible” services available, the two most popular are MinIO and Cyph, but there are plenty of others out there.Because they are “S3 compatible”, anything that can backup to AWS S3 can be configured to backup to these services.They need to be self-hosted.

Although not S3 compatible, the Interplanetory File System is a promising distributed system, which can use public providers, or self-hosted servers.The big feature of IPFS is “immutable content based addressing”, which is a fancy way of saying “you can’t every change something you upload on IPFS”.When archiving data for 45+ years, that is a very good property.On the other hand, it is relatively new and somewhat experimental.And the big gotcha is: everything is public on IPFS, which is a very bad property when keeping sensitive or confidential data - encryption is a must.

Capital Cost (AUD): All need a server of some kind. See above for starting costs.

Ongoing Cost / Year (AUD):

All services listed have free (open source) options.MinIO has commercial licensing options.

Generally, your ongoing costs are going to be related to the hardware more than software.As these are distributed solutions, they work best on many servers.At some point, if you install enough servers, you’ll have a data centre like AWS and Azure operate!

IPFS has a public cloud that let’s you “pin” content on other servers - the rough equivalent of uploading your data.Costs range ~$1-2 per GB per year (significantly higher than AWS / Azure).

Reliability: Because these are self-hosted, their reliability depends on the hardware you purchase and Internet connection available.The entry level costs (above) are NOT going to give you high reliability; cheapest is not best if you want reliability.I’ll discuss reliability of hard disks in a NAS below in more detail.

All these services are distributed and designed to scale out as you add more devices.And distributed systems mean you should probably have 5 or 7 of everything (or more).

Longevity: Again, I’ll discuss how long you can expect your hard disks to last for below.Your server(s) will last as long as you maintain / replace them on failure.

Access: MinIO and Ceph are S3 compatible, so its no harder than AWS to access data.IPFS runs its own service and provides command line, web based and virtual file system access.Because they are distributed services, the raw data on disk is not easy to read - data is split and copied between servers automatically.So direct access to servers is less useful.

Scale: I’m not aware of inbuilt limits for these services; because they are distributed, they are designed to scale up as you add more servers.MinIO and Ceph are designed for 10TB and up.IPFS is designed for effectively unlimited storage (though its relative immaturity means that hasn’t been extensively tested).

Simplicity: These services are even harder to use than “bring your own server, install this service, off you go”.Public IPFS is close to that level (if a bit experimental).MinIO and Cyph are designed to be integrated as part of other server infrastructure.It is possible to create your own private IPFS network, but that is quite technical.However, once your IT department looks after all the technical stuff, scripted backups should be nice and simple.

Automated: As with the “real” AWS S3, these services have excellent APIs and support for automation.MinIO and Cyph should work with any S3 compatible backup software.IPFS has command line scripting support.

Recommendations: Creating your own S3 Compatible object store is the ultimate hybrid cloud - having all the features of S3 but on servers you control.This is the kind of setup that medium or large business may find attractive, but it’s going to be out of reach of individuals and small business.

IPFS feels like it could be a fantastic solution for long term archiving.But its quite complex and expensive compared to other options.

Criteria	Rating
Capital Costs	3/5
Ongoing Costs	4/5
Reliability	5/5
Longevity	5/5
Access	4/5
Scale	5/5
Simplicity	1/5
Automation	4/5
Security	4/5
Overall Suitability for Backups	4/5
Overall Suitability for Archives	4/5

On-Premises

The traditional way to do backups and archives is to do it yourself.

Unlike the cloud, we can’t take advantage of economies of scale, nor the ultra high reliability.But we do retain ultimate control of our data - there is no external 3rd party who can cut us off from our precious data.No account that might be hacked, or locked.And no cloud provider that might go out of business.

We have ultimate control and ultimate responsibility with on-prem backups.

On-Prem: NAS (online disks)

Pretty much everything in IT runs on servers with disks.

Whether its the largest cloud provider or a tiny website, the service you access needs to run on real hardware.There might be many layers of virtual machines and services between the website and the hardware, but make no mistake, everything runs on servers with disks eventually.

For backups, we’re interested in many cheap disks.And the simplest way to achieve that is Network Attached Storage.

A NAS device is a small server that optimises for lots of disks (as opposed to CPU power).The ones we’re interested in have multiple disks, to allow redundant storage.So if one disk fails, your data remains intact.

Key players include Synology, QNAP, and TrueNAS.TrueNAS is the one I use because it uses ZFS for storage, but it’s more expensive than other brands.I don’t have direct experience with Synology or QNAP.

Capital Cost (AUD):

The cheapest NAS supporting 2 disks start around $400. And 4 disk models from $500.

You need to add disks for the NAS to be useful. 1TB disks are ~$100ea. 4TB looks to be the best value for money at ~$160ea. 8TB jumps to ~$350ea.

So, a basic NAS with 2 x 1TB will cost ~$600. A decent NAS with 4 x 4TB disks is ~$1200. Or a high end model with 8 x 8TB disks is ~$5000.

The TrueNAS software is available for free, but you need to supply your own hardware.My estimate is $1500-$2000 if you want to DIY with quality parts and 2 x 4TB disks.Genuine TrueNAS hardware starts in a similar range (and Australian buyers pay a premium for shipping, unfortunately).

The estimated life time of your NAS is 5-10 years.

Ongoing Cost / Year (AUD): Once you have purchased your NAS there are two main ongoing costs: electricity and network access. And don’t forget to add a maintenance allowance.

My electricity costs ~21c / kWH in Sydney.Your NAS will be running 24/7, and will consume 60-120W (depending on size).My math for this works out to an annual cost of ~$110 for a 60W NAS and $220 for a 120W NAS.

I’m assuming you want Internet access to your NAS (perhaps to mirror its content off-site).I pay $110 / month for 100/40Mbps Internet with a static IP in Sydney.Obviously, I use that for more than just my NAS, but it means I’m paying $1,320 per year to ensure it is online.The static IP and upgrade to 40Mbps upload is $20 per month, so let’s say that’s the special “NAS” part of my Internet, which is $240 / year.

Finally, maintenance.Disks do fail, and you need to allow a budget to replace them (the cloud providers do).I’m going with 7.5% per year of the original purchase price, which should be enough to buy a replacement disk after a few years.That’s $90 / year for our $1200 NAS.

A quick comparison with AWS shows a NAS is similar in cost once you include ongoing costs:

4TB $1500 NAS costed over 5 years + electricity + Internet + maintenance: $300 + $110 + $240 + $90 = $740 / year.
2TB stored on AWS: $600 / year.

Reliability: Backblaze publishes best public statistics on HDD failure rates.

There’s a 1-2% chance of any hard disk failing each year (assuming data centre conditions; assume worse environmental conditions for your NAS).So it’s quite likely the disks in your NAS will survive 10 or more years.

On top of that, all NAS devices employ some kind of technology to detect and correct failures on a regular basis, and notify you when that failure happens.That means there is an automated system checking if your disks are working or not, so there should be a very short time between an actual failure and when you can take corrective action.

All this means, disks in a server are very, very reliable.Not quite as reliable as the cloud, but still very good.

Longevity: The NAS itself should last 5-10 years, at which point you’ll need to migrate data to a new device.

Disks should last forever, so long as you can afford timely replacements.That is, the automated monitoring built into NAS devices is really important at keeping your data safe.

The underlying technology of NAS is Ethernet + various file transfer protocols. While they may become obsolete in 10-20 years, I don’t see them disappearing entirely in that time frame.Every time you buy a new NAS (say every 10 years) you are automatically be upgrading this core tech.

Access: NAS devices support various file transfer protocols for Windows, Mac and Linux devices, so no problems accessing.Mobile device support is not as good, because mobile devices are “cloud first” platforms.

Access outside your local network is dependent on your Internet connection.While my residential connection might have a few minutes of down time each month (which I rarely notice), it’s no where near as good as cloud providers.

Scale: NAS devices support a fixed number of disks.Once you install all those disks, your choices are a) buy a new (bigger) NAS to scale up, b) buy a second NAS to scale out, c) get into clustered file systems - which are expensive and require IT experts.There’s only so many disks you can fit in a single server.

For personal and small business use, ~70TB is a reasonable upper limit for an 8 disk NAS with 12TB disks.Larger NAS devices are available supporting 16 disks (plus another 16 disk expansion), which gives ~320TB.

Scaling out and buying more NAS devices also works.But then you need to think of a way to split your storage up between each device.

Simplicity: Running your own hardware is always more complex than using “the cloud”; you need a higher degree of technical knowledge to get it right.Having said that, NAS devices are the easiest way to add reliable on-prem storage.Most consumer orientated devices will have wizards and walk-throughs to get you started.

And, if you’re a business that needs to store more than 50TB of data, you’ll likely have professional help available.

Automated: NAS devices run 24/7 and should be always accessible on your local network.Combined with a wide variety of storage protocols, pretty much any non-mobile device should be able to automate backups with your NAS.

That gets more complex if you need connections from outside your local network, depending on your Internet connection.

Security: Data is in your own hands, so the security and privacy of on-prem solutions are equally in your hands.All software listed have built in security and encryption - so the main point of failure is human: incorrect configuration or simply forgetting to revoke access to ex-employees.Also, make sure you keep your NAS up to date - bugs and security vulnerabilities are found frequently, updates are key.

Recommendations: NAS devices are a great way to store backups.They have good reliability and longevity, plus are competitive with the cloud on cost, and pretty easy to configure.If you need global access to your data, they might not be as good, depending on your Internet connection.

Criteria	Rating
Capital Costs	3/5
Ongoing Costs	4/5
Reliability	4/5
Longevity	4/5
Access	4/5
Scale	3/5
Simplicity	3/5
Automation	4/5
Security	4/5
Overall Suitability for Backups	5/5
Overall Suitability for Archives	3/5

On-Prem: External HDDs (offline disks)

Disks (either hard disks or solid state drives) can be purchased in an external enclosure with USB connection and stored in a safe place (possibly an actual safe).

In many ways, this is simpler than NAS devices.Buy a disk, copy data on it, stick it in a safe, done.

Capital Cost (AUD): $80ea (1TB), $100ea (2TB), $160ea (4TB).

You absolutely 100% must without exception buy multiple disks for redundancy.Data should be copied onto at least 2, preferably 3 disks.And then stored in different locations.

If you want to store them in a real safe you might need buy one. Costs start at $500 and can reach $3,000 for larger fire proof safes.

If you don’t care for a safe, storing disks on a bookshelf is nice and cheap (if not very fire resistent).

Ongoing Cost / GB (AUD): ~5c (2TB drive).

There’s no electricity being used, and no Internet required, so no ongoing costs for existing media.

OK, we should allow some maintenance because these disks will fail.However, we’ve already factored 2x or 3x redundancy in capital costs.

Cost per raw GB is 5c.You need to multiply that by your desired level of redundancy.

Reliability: While Backblaze publishes HDD failure rates, these do not apply to disks stored offline.

In my failure modes article, I looked for good statistics about the reliability and longevity of disks stored offline.There’s nothing remotely comparable to Backblazes’ data.

My anecdotal data: I used external disks for backups for ~5 years.The biggest source of failures was me dropping them accidentally.You can get rugged external disks which can mitigate this risk, but the “physical factor” is much more important when you’re physically moving disks around.

Longevity: As with reliability, there’s minimal data in this area.

Checking the table on my failure modes article, 5 years looks very safe, 10 years is possible, and 20 years is the upper limit.Solid state disks have a shorter life time (and we have even less data about them).

The advantages a NAS has in this area (automated reliability checks and notifications) doesn’t apply.You need to manually pull disks out of your safe on a regular basis, and test for correct operation.

USB should be around for another 10-20 years in some form, so that’s relatively safe.

Access: Offline devices are harder to access by definition.You need to manually retrieve the device, and connect it to a computer to read data.

An external catalogue of disk contents (or at least a good labelling system for the physical disks) is highly recommended.If you need to check every file on every disk, it might take a long time to find what you’re looking for.

Scale: Boxes of external disks scale up really easily: just keep buying more disks (and boxes).This assumes you can divide your data up logically (eg: by year or month).

Kinda interesting that a NAS has an upper limit because all the disks need to be running in the same device at once.While if you’re happy for your data to sit offline, the only limit to scale is your wallet and size of warehouse.

Simplicity: On one hand, “just copy data to disks and stick them in a safe” is about as simple as you can get. But retrieving that data can be extremely painful if you don’t have a catalogue or index of your disks.

Automated: By definition, offline / physical operations cannot be entirely automated.They can certainly be supported by scripts to copy data, reminders to move disks to the safe, and maintenance schedules.But any process that can’t be 100% automated can be forgotten, or done inconsistently.

The biggest risk is testing old disks.We have minimal data about how long we can leave a hard disk powered down and still be able to read data from it.So those tests are incredibly important.And also the most likely thing to be neglected or forgotten.

Security: Data is in your own hands, so the security and privacy of on-prem solutions are equally in your hands.Offline devices require physical access, which is much easier to understand - no key to the safe means no access.No hacker from the other side of the world can touch them.And (with the exception of when disks are attached to a computer) they cannot be wiped or encrypted by malware like Cryptolocker.

Recommendations: External disks are a reasonable offline storage mechanism.However, NAS devices are better for backups (because disk maintenance and backups can be 100% automated).And there are better options for long term archives (see below).

In spite of my negative recommendation, if the other options are unsuitable for your scenario, don’t make perfect the enemy of good.External disks are ∞% better than no disks at all.

Criteria	Rating
Capital Costs	4/5
Ongoing Costs	5/5
Reliability	3/5
Longevity	3/5
Access	4/5
Scale	4/5
Simplicity	5/5
Automation	3/5
Security	5/5
Overall Suitability for Backups	3/5
Overall Suitability for Archives	3/5

On-Prem: Optical Media

Writable CDs and DVDs are the most common forms of optical media.But I’m only going to consider Blu-ray disks here (because CDs and DVDs simply don’t have the capacity needed in 2021).Blu-ray capacity ranges from 25GB to 128GB.

The technological development of optical media has been left behind due to NAS devices and high speed Internet connections.But the Archival Disk is a Blu-ray successor designed explicitly for 50 year life time.(It also costs over $10,000 for drives, so out of reach for personal and small business scenarios).

Capital Cost (AUD): ~$200 for Blu-ray burner.

Assumption: you have a computer available to plug it into.Internal SATA and external USB burners are available.

As with external hard disks, you should buy multiple burners for redundancy.And you may need a safe, bookshelf or small warehouse for storage.

Ongoing Cost / GB (AUD): ~9c.

There’s no electricity being used, and no Internet required.

Single layer Blu-ray disks store 25GB and cost ~$2.15ea (on average).That works out to ~9c per GB.As with external hard disks, you need to factor your desired level of redundancy (minimum 2x, recommended 3x).

Note that I found Blu-ray media a little hard to find via Australian vendors.I resorted to e-Bay to import direct from the US or Japan with good results.

This is more expensive than external hard disks, but quite competitive with the cloud.

Reliability: Once burned and verified, I’ve found optical disks have very high reliability.Unfortunately, that’s based on my experience, not published data.

In my failure modes article, I outlined my anecdotal evidence for CD and DVD based backups still accessible after 10-20 years, even when there was no maintenance or regular tests done on the disks.This was a giant experiment that I didn’t realise I was running!But it shows a 99.9% success rate for optical media.

I’ve also done some “test to destruction” tests for Blu-ray disks: the real killer is direct sunlight.Every disk exposed to extended sunlight showed failures within 1 month.Heat and cold are less of a problem.

Scratches are a concern.Blu-ray has made improvements to disk coatings to mitigate scratches.But care when handling disks is still important.

Note that some Blu-ray drives support surface error scanning which can estimate if a disk is degrading and will fail soon.Apparently mine doesn’t (and ones that do are hard to come by).I found a reasonable proxy is the read speed: if a disk reads at high speed, it’s probably OK, if it reads very slowly and has a number of retries, it’s likely to fail soon.

Longevity: As with reliability, there’s minimal data in this area.

I consider optical disks a better way to store data offline, as compared to hard disks.Optical disks have separate reader and media: as long as your disks are OK, you can always buy another reader.Modern hard disks integrate the physical media and reading interface: so the data might be OK, but if the disk firmware or motor fails, its very expensive to read the data.

Blu-ray disks also use non-organic material.The organic dyes used with writable CDs and DVDs were a big concern (although I never observed failures).With Blu-rays, this isn’t an issue any more.Hard disks use a magnetic basis for storing data; this will decay over time if the drive isn’t powered on (although its unclear how quickly).

Finally, there is a Blu-ray M-disc technology which claims “a projected lifetime of several hundred years”.As far as I’m aware, there is no other consumer technology that makes such a claim.(And my test-to-destruction tests of M-disk Blu-rays has yet to cause a failure; 6 months and counting)!

As with external disks, you need to manually pull your Blu-rays out of your safe on a regular basis, and test for correct operation.

As long as you can purchase a new Blu-ray reader, you should be able to read data from the disks.Given that CD readers have been available for ~30 years and are still sold today, we’re reasonably safe here.

Access: Offline devices are harder to access by definition.You need to manually retrieve the device, and connect it to a computer to read data.

An external catalogue of disk contents (or at least a good labelling system for the physical disks) is highly recommended - and even more so for optical media as it is significantly slower than hard disks, and has lower capacity (so more disks).If you need to check every file on every disk, it might take a very long time to find what you’re looking for.

Scale: Boxes of Blu-ray disks scale up really easily: just keep buying more disks (and boxes).This assumes you can divide your data up logically (eg: by year or month).

Blu-ray disk capacity starts at 25GB for single layer disks.50GB duel layer, 100GB triple layer and 128GB quad layer disks are available.Be aware that the 100GB and 128GB disks use a slight different technique when burning, which makes them incompatible with older readers.

There are even disk library systems available (for $call) which store up to 50TB of data.

Simplicity: Optical disks are more difficult to write than external hard disks.Modern operating systems generally make this straight forward, but its more involved than “just copy data to disks”.Remember that retrieving data can be extremely painful if you don’t have a catalogue or index of your disks.

The biggest risk is testing old disks.Although I’m more confident about longevity of optical media as compared to external hard disks, we still don’t have much data on the topic.So those tests are incredibly important.And also the most likely thing to be neglected or forgotten.

Security: Data is in your own hands, so the security and privacy of on-prem solutions are equally in your hands.Offline devices require physical access, which is much easier to understand - no key to the safe means no access.No hacker from the other side of the world can touch them.And write-once optical media cannot ever be wiped or encrypted by malware like Cryptolocker.

Recommendations: Optical media is an excellent offline storage mechanism for TB scales of data.The best use case is for long term offline archives.NAS devices are better for short term backups (because they are easier to automate).

Criteria	Rating
Capital Costs	5/5
Ongoing Costs	5/5
Reliability	5/5
Longevity	5/5
Access	3/5
Scale	4/5
Simplicity	4/5
Automation	3/5
Security	5/5
Overall Suitability for Backups	3/5
Overall Suitability for Archives	5/5

On-Prem: Magnetic Tape

Magnetic tape has been around longer than hard disks, and is very well understood as a long term data storage medium.It’s capacities are significantly higher than optical media, and similar to external hard disks (1.5TB for LTO-5 tape).

While there are many standards for magnetic tape, Linear Tape-Open is the most common.

Note: my personal experience with tape is very limited (I used it for business client backups in ~2004).

Capital Cost (AUD): $1,500 to $7,000. And sometimes $call.

Lenovo, HP and Dell all sell new LTO-6, LTO-7 and LTO-8 tape drives.However, many don’t publish prices on the Internet.Finding the drives via other Australian vendors is also an exercise in futility.

These devices are available second-hand on eBay for $300-$2000.Although they are usually older (LTO-4, LTO-5, LTO-6).

Ongoing Cost / GB (AUD): ~1c.

There’s no electricity being used, and no Internet required.

Tape cartridges are slightly easier to find pricing for, and are available for $100-200ea.Interestingly, there isn’t a significant premium for newer cartridges; LTO-6, LTO-7 and LTO-8 are priced within $50 of each other.And when the capacity of those are 2.5TB, 6TB and 12TB respectively, the cost per GB is really good!

As with external hard disks, you need to factor your desired level of redundancy (minimum 2x, recommended 3x).

Reliability: As I’ve had no recent experience with tapes, it’s hard to know how reliable they are.

The reliability of magnetic data tape suggests it is very good.And given that tape (indeed any offline storage) is the last line of defence, high reliability is very important.

Given the low cost of tapes cartridges, it would seem very silly to only have one copy.The usual 2x or 3x redundant copies should apply to tape to ensure reliability.

Longevity: LTO tape is designed for 15-30 years of archival storage.

As with external disks, you need to manually pull your tapes out of your safe on a regular basis, and test for correct operation.

As an individual consumer, finding tape drives is quite difficult.I assume if I were a medium or large business, I’d have a direct line to a large vendor who would make this process very easy.And given that there are many LTO manufacturers, I’m assuming this is a relatively safe technology.

One thing I noticed was that a drive only supports the current tech, and previous 2.So an LTO-8 drive can read/write LTO-7 and LTO-8 media, and read LTO-6 media, but can’t touch LTO-5 and earlier.That’s not a great property.

Access: Offline devices are harder to access by definition.You need to manually retrieve the device, and connect it to a computer to read data.

An external catalogue of disk contents (or at least a good labelling system for the physical disks) is highly recommended - and even more so for tape media as it is significantly slower than hard disks.If you need to check every file on every tape, it might take a very long time to find what you’re looking for.

Scale: Boxes of tapes scale up really easily: just keep buying more disks (and boxes).This assumes you can divide your data up logically (eg: by year or month).There are even tape libraries that make it easy to work with many tapes (just don’t expect to be able to afford one in your home).

LTO-4 capacity: 800GB
LTO-5 capacity: 1.5TB
LTO-6 capacity: 2.5TB
LTO-7 capacity: 6.0TB
LTO-8 capacity: 12TB

Simplicity: Tapes are even more complex and unusual than optical media.Because I haven’t had any recent experience with tapes, “not simple” is all I can say here.

Security: Data is in your own hands, so the security and privacy of on-prem solutions are equally in your hands.Offline devices require physical access, which is much easier to understand - no key to the safe means no access.No hacker from the other side of the world can touch them.There are write-once LTO tapes (although I understand that’s based on tape firmware rather than a physical proprty of the tape cartridge), and write-once media cannot ever be wiped or encrypted by malware like Cryptolocker.

Recommendations: Magnetic tape is an excellent offline storage mechanism for multi-TB scales of data.The best use case is for long term offline archives.NAS devices are better for short term backups (because they are easier to automate).

Criteria	Rating
Capital Costs	2/5
Ongoing Costs	5/5
Reliability	5/5
Longevity	5/5
Access	3/5
Scale	5/5
Simplicity	3/5
Automation	3/5
Security	5/5
Overall Suitability for Backups	4/5
Overall Suitability for Archives	5/5

Software

I’ll make brief mention of some useful software, when doing backups or archiving.

RClone

RClone is a command line app which can copy and synchronise data between many different cloud storage providers.In short, it can be used to mirror data from your NAS to AWS S3, or between AWS and Azure, etc.Its rather difficult to configure at first, but once working, its a fantastic way to ensure you have backups on both the cloud and an on-prem NAS.You do your backups to either the cloud OR your NAS, then use RClone to mirror to the other.

Cyberduck / mountainduck

Cyberduck, and its more powerful cousin Mountain Duck, are my go-to tool for GUI / interactive use of cloud storage.

Cyberduck is similar to an FTP client, letting you explore your data on most cloud providers via a powerful interface.

Mountain Duck lets you mount your cloud data as if it were a local disk drive.So you can explore work work with data using the same tools you use for disks or NAS data.

WinCatalog

Whenever I’ve discussed offline storage (hard disks, optical disks, tapes), I’ve recommended some kind of catalog or index, so you don’t need to inspect all your disks to find what you’re looking for.WinCatalog is such software.It’s interface feels a bit dated, but it is extremely effective at keeping a searchable catalogue of your external media.And that’s a huge improvement over “hmm… maybe what I’m looking for is on this disk… nope, let’s try the next one”.

This one is Windows only, and costs AUD $30 (although there are frequent discounts).

MultiPar

One risk when archiving data is the disk will only be partially readable, so certain files can’t be recovered.MultiPar lets you add redundant parity data to a disk to mitigate this risk.

While I use MultiPar to ensure the integrity of files (via hash / checksum), my primary way to mitigate partial disk failures is to make multiple redundant disks!External hard disks, optical media and tapes are relatively cheap - if you care about your data, just make 2 (or more) copies.

Conclusion

There’s lots of text, so here’s the TL;DR:

If you care about your data, you will have a copy on the cloud AND on-premises.

Cloud:

The simplest backup solution is Cloud Sync. Excellent pricing, easy to use. Best for individuals.
Cloud Object Store is best for business. Pricing is a pain, but otherwise there’s not much to dislike.

On-Prem:

Network Attached Storage is the go-to for on-premises backups. And a great platform for a bridge between cloud and on-prem. Running costs make it less attractive for very long term archiving.
Optical Storage (Blu-ray) is great for long term archives for individuals, small and medium business. Tape is for larger business.

Next up: I will outline my own choices of technology for personal and church backups (which you can probably guess based on my conclusions)!

Read the full series of Long Term Archiving posts.

Long Term Archiving - Part 4 - Access Patterns

2021-08-06T14:00:00.000Z

Optimise for the common case.

You can read the full series of Long Term Archiving posts which discusses the strategy for personal and church data archival for between 45 and 100 years.

Background

Last time, we listed the failure modes possible when making long term backups and archives. Also remember the broad strategy.

The last thing we will consider before we get into the how of backups & archives is how we might need to access said backups & archives.

On one hand, this will require a certain amount of guess work.On the other hand, it’s very educated guess work.And there are some strategies which will help even if we guess wrong.

Goal

List the likely ways I need to access the backups & archives.How often that might happen.And how that influences my choice of technology.

Some Observations

I’ll start with some personal observations:

Our minimum retention period is 45 years. That’s longer than I’ve lived!
I’ve occasionally needed to go to a backup to restore data, but its a very infrequent action.
I’ve done exactly one bare metal restore under duress.

And some implications:

Backups & archives need to scale with time. Often you need to break a large dataset into small chunks. And just as often, you won’t have all the data at the start; it will be created as you go.
Restoring from backups is a rare event. Optimise for the common case (a backup that will never be touched). But be very, very careful you don’t optimise the backup out of existence.

Let’s think about these in more detail.

Time Series Data

The only way to structure long term backups is as time-series data. That is, data must be grouped by year (which works well for financial transactions) or when it was created, or last modified. That is, you store all the files, documents and records for 2020 on one disk, and all the data for 2021 on another, 2022 on another. and so on.

Nothing else works.Nothing else scales.Particularly when you have 45+ years of data to retain.

The good thing is there’s only so much data you can create or modify in a given time period.And unless you’re Google or Facebook or Twitter, you can always backup everything that changed in the last year / month / week / day (choose whichever works best).If you end up with a particularly large year / month / week / day, you can usually break it up into smaller chunks.Or, in the worst case, split into multiple chunks (eg: A..K and L..Z, or first 100GB, second 100GB, etc).That is, when I say “store all data on a disk”, that may be “a set of disks” (2020 might only be 1 disk, but 2021 might be 2: January to June, and July to December).

Your backup media needs to scale with time as well.And time is big; 45 years is a long time.Some media does this better than others: hard disks in a server will eventually run out physical space in the server, while external hard disks can just pile up until your warehouse is full (and you can get very big warehouses).Even if you decide to convert your warehouse into a data center and with lots of servers, it will be much more expensive than the raw media - the servers themselves, people to maintain them, electricity to run them: they all cost money.The cloud is very good as scaling up, if you chose the right storage product: “object storage” is effectively limitless, “block storage” has an upper limit.

(Aside: time also impacts media longevity, as we discussed in part 2’s failure modes. I’ll consider that in more detail in the next post).

Access Scenarios

There are five access scenarios to consider:

You add more data to your backup.
You verify your backup media still works.
There is a catastrophic failure of your main system (eg: fire, theft, crash) and you need to restore everything (or almost everything).
There is a localised failure (eg: accidental deletion, file corruption) and you need to restore one thing (or a few).
Old data is needed from an archive (after being deleted from the main system) and you need to restore a few things.

Scenarios 1 and 2 are the regular operations of creating and maintaining backups.Scenarios 4 and 5 are pretty much the same.So, when you need to restore from backups, we’re down to 1) everything, and 2) a few things.

If the everything scenario happens, you’re going to grab all your backups and restore everything from them in sequence (or parallel if you can load multiple disks at once).There’s no worry about “do we need this or not?” - you need everything so the restore is done in bulk.Access pattern is sequential, and all media is really good at sequential.

If the few things scenario happens, you need to be more targeted in which backup disks you restore from.You need some way of identifying which disk(s) are of interest.So, at minimum, you should keep a list of the files on each disk separately.Even better, an index or table of contents that you can look at without loading every disk.Also, some backup technologies are much better at random access than others - HDDs, optical disks and the cloud are all good at reading one thing; tapes not so much.

Access Frequency

Backups and archives are accessed, by nature, infrequently.Here are the operations you perform on backups, in order of frequency (most frequent first):

Adding to your backups. Automated backups will be doing this daily (at minimum). And manual backups will be happening on a regular schedule (maybe every week or year).
Verifying your backups. Again, automated backups should do this daily or weekly. And manual backups monthly or annually.
Restoring from backups. People asking to restore an accidentally deleted file used to be pretty common. But these days most storage systems have automatic “versioning”, so you can easily “go back in time” to a copy from last week (before you accidentally hit delete). That is, most day-to-day storage systems are so reliable that a catastrophic failure is the only reason to restore. That is, you may go several years before you truly need a backup; indeed, you may never need to restore anything.

Always optimise for your common operations.There’s no point making sure you can restore an individual file in under 5 seconds if it takes a week to back it up in the first place.Making backups & archives need to be quick & painless (and automated whenever possible).Verifying should also be straight forward.

Generally, people assume that pulling data from a backup doesn’t happen instantly.So if it takes an hour or even a day to complete a restore, that’s OK.(Of course, always ensure your users understand and agree to any time frames).

A Special Case: Bare Metal Restore (Under Duress)

I’ve just said its OK if restores take time.Well, this is a special case where it’s not OK.

If your one and only server crashes, every minute longer the restore takes is a minute of lost productivity multiplied by every user (in a business, you can easily put a dollar figure on this; it gets big very quickly).

If you need disaster recovery, you need it fast!So, you should a) plan your backups so a restore can happen fast, b) do practise runs so you understand exactly what needs to happen, and c) optimise & automate so that it happens faster!

Fortunately, not everyone needs disaster recovery - if my personal TrueNAS server fails and I can’t get it running again in under 24 hours, it will be a headache for me, but its not like I’ll lose a million dollars or get fired.

A Special Case: Legal Request

For the data Wenty Anglican needs to retain for Safe Ministry purposes, there is one additional access scenario: a legal request.

I’d expect it to go something like:

“B Bloggs has allegations of made against him / her. As the police / prosecution / defence team, we require all relevant ministry documents from Wenty Anglican pertaining to B Bloggs in the ministry role of youth and children’s leader from January 2027 to December 2030.”

I’m hoping that will never happen, but history shows that some people, given power over another, will abuse it some of the time (Christians call that “sin”).So I’m expecting it will happen one day.

And that day will suck if I’m still in charge of church backups & archives.

From a data access point of view, I have a date range, so I can get any disks that have data for that period of time easily (time series data).

There are two additional criteria: the person, and the their role.Ideally, I want some way to identify documents or data based on those criteria.So that I don’t need to trawl through 3 years of everything.

For now, I won’t answer that question.Part 8 will look into how to structure data within backups & archives, and how to create good indexes to find things within offline media.But its something to keep in mind.

And you should consider if there are special access scenarios you need to optimise for in your particular situation.Otherwise, you get to trawl everything.

Conclusion

In this post, we’ve established all backups & archives need to be time-series data, broken down by year or month.We’ve identified our core access scenarios: everything & a few things, and know we will need some kind of index for the few things scenario.And we’ve identified how frequently we need to access our backups: very rarely - so we should optimise creating & verifying rather than restoring (unless you have a special case that demands otherwise).

Now that we’ve covered failure modes, identified what needs to be on the backup, and the ways we need to access the data, we’re in a position to make intelligent decisions about what backup technology to use!

Next up: I will list different technology options for backups & archives. And discuss pros and cons of each, based on the criteria I’ve listed.

Read the full series of Long Term Archiving posts.

Be Your Own Certificate Authority

2021-06-25T14:00:00.000Z

All the SSL certificates you want for free!!

Background

I’ve used LetsEncrypt to generate publicly trusted certificates for any websites I’m running.And used InstantSSL to generate similar S/MIME certificates for my email.These are all free services, which is fantastic.

But there are limitations to them: LetsEncrypt requires a level of automation for maintenance - you can’t install a certificate and forget about it.And it works best if you have shell / console access to the machine you want the certificate on, and that machine has public Internet access.

There are other places I’d like certificates, like internal only websites, or routers - they are using plain HTTP, and browsers get irritated at this “non-HTTPS” thing these days.And there’s more you can use certificates for than just HTTPS: I’d like to have a go at EAP WiFi using certificates, due to an increasing list of security gotchas and issues with WPA2 and WPA3 (EAP is the enterprise equivalent, and seems to have held up better security-wise).

For internal use, I could mint Self Signed Certificates, but they aren’t trusted by devices - they encrypt your data but don’t provide any clear identity for the service you’re connecting to.And if you have to click through all the security warnings, you’re teaching your users the wrong thing.If I had one root certificate to sign the certs installed on my services, I could trust that one certificate to rule them all and my devices would be happy!

And this is exactly what a Certificate Authority (aka, the companies who sell you SSL certificates) does!They have a root certificate, trusted by your browser, operating system or device, and then follow special rules to make sure they only mint certificates for the right people.

If I could be my own Certificate Authority (CA), I could make whatever certificates I wanted!Of course, they’d only be trusted by my own computers and devices, but I can live with that.

Indeed, there’s a sense in which creating my own certificates is more secure than paying someone else to.After all, the magic certificates and keys never leave my network.

I’d always thought creating my own certificates would be just too hard.Then there was a work project that… well… encouraged me to just do it.

Turns out a few Power Shell commands is all I need.

Goal

Be my own Certificate Authority.That is:

Create a root signing certificate, suitable for signing.
Mint at least one certificate and install it on an internal web server.
Install the root CA so browsers trust my internal certificates.

How Do These Certificate Things Work Anyway?

Before we get to certificates, we start with asymmetric cryptography.This is a bunch of magic math which let you encrypt and decrypt data - but only in one direction.“Asymmetric” comes because the key has two parts: public and private.The public half is available to all and sundry, and lets you encrypt data or verify signatures.The private half is secret to the owner only, and lets you decrypt data and create signatures.The public half can never decrypt or sign, and the private half can never encrypt or verify, so they’re a bit like one-way mirrors.

Data -> Public Key  -> Encrypted / Signature
Data <- Private Key <- Encrypted / Signature

Asymmetric Cryptography is used in a number of computing applications and contexts.The best known is SSL / TLS and HTTPS.But it’s also used by SSH, PGP and the infamous Bitcoin.

While asymmetric cryptography is wonderful, but it’s just maths.And maths can be used for lots of things, not all of which are useful.So, we need to impose rules on what different key pairs can do, when they are valid, what contexts they are valid in, and so on.

In particular, the maths allow us to be very confident of a secret conversation with another party - that’s wonderful and a big part of what makes HTTPS “secure”.However, on it’s own, it doesn’t help identify the other party - so we might be having a very secure conversation with the Bad Guys™, because we couldn’t confirm their identiy.

X.509

Enter X.509.

“SSL Certificates” are actually X.509 certificates.These are horribly complicated things which define a bunch of properties and rules on top of your public / private key pair.In the context of HTTPS, they enable reasonably high confidence in the identity of the other computer.

One of the rules is “what servers is this certificate valid for” - which corresponds to the name you type into your browser’s address bar.My blog is blog.ligos.net, so the certificate must also be valid for blog.ligos.net for web browsers to accept it.

So, the question becomes: how do you get a certificate for blog.ligos.net?Or more specifically, how can someone else validate Murray is really the owner of blog.ligos.net?Or, in the negative, how does the validation process prevent the Bad Guys™ get a certificate for blog.ligos.net?

There’s a standard for that.If you want to be a Certificate Authority, there are processes you need to follow to check identities before issuing certificates.

There are two common ways, and a third complex one:

Send an email to a “special” email address (eg: postmaster@ligos.net) with a magic code. If I own ligos.net then I can get access to that code.
Let’s Encrypt requires creation of the magic file in a well defined location on the web server. If I own blog.ligos.net then I can create that file.
Conduct one or more manual processes to verify identity, including a voice & video call, inspection of a passport, checking documents, call backs, etc.

The first two ways simply validate someone (or something) controls the domain name or web server.The third way is a stricter validation of the actual person (or company) identity.

And in practice, all three ways can be faked if you try hard enough.None are fool proof, but they present enough difficulty to the Bad Guys™ that the system works most of the time.

Certificate Chaining

One thing I didn’t explain is how the Certificate Authority communicates to end users that it successfully validated the blog.ligos.net certificate.That is, if every person who visits blog.ligos.net needs to send me an email to verify I own that domain, the whole internet would break very quickly!

The Certificate Authority signs the blog.ligos.net certificate to say “yes, this is valid”.As long as you trust the CA, you trust anything the CA has signed, so you trust blog.ligos.net.

The Certificate Authority has a root certificate, which is the thing your web browser knows about.That certificate might chain to zero or more intermediate certificates.Before finally blog.ligos.net is signed at the very bottom.

This “chaining” allows a small number of trusted root certificates to scale out to the whole Internet.

PowerShell Commands

OK, enough theory, let’s make certificates!

Certificate Authority

First up, we need to create a root certificate.This is what will pretend to our very own Certificate Authority.

PS> New-SelfSignedCertificate 
    -Subject "CN=Grant Root CA 2021,OU=certs@ligos.net,O=Murray Grant,DC=ligos,DC=net,S=NSW,C=AU" 
    -FriendlyName "Grant Root CA 2021" 
    -NotAfter (Get-Date).AddYears(50) 
    -KeyUsage CertSign,CRLSign,DigitalSignature 
    -TextExtension "2.5.29.19={text}CA=1&pathlength=1"
    -KeyAlgorithm "RSA" 
    -KeyLength 4096 
    -HashAlgorithm 'SHA384' 
    -KeyExportPolicy Exportable 
    -CertStoreLocation cert:\CurrentUser\My 
    -Type Custom

There are many options here, let’s walk through them all:

Subject: the official name of the entity / person. It is a list of key-value pairs, where the most specific is the left, and least specific on the right. C = country, S = state, DC are parts of domain names (ligos.net in my case), O = organisation, OU = organisation unit, and CN = common name. Given we’re inventing a CA, you can put whatever you like here!
FriendlyName: is what most browsers display to the user. Best to make it the same as “common name” (CN).
NotAfter: indicates when the certificate expires. I’ve set mine to expire in 50 years, because I only want to create one root certificate (and I’m not expecting to be issuing certs in 50 years time).
KeyUsage: a list of things the certificate is allowed to do, all variations of “signing”.
TextExtension: some magic which says “this is a root certificate”. This is essential for all browsers to trust your certificate as a true certificate authority.
KeyAlgorithm: RSA is the most common, and oldest.
KeyLength: the RSA key size. 4096 is the largest, which is best practise for the root certificate.
HashAlgorithm: SHA384 is higher than the usual 256 bit version. Again, biggest is usually better for root certificates.
KeyExportPolicy: tells Windows we are allowed to export (and backup) the private key. Yes, you need to backup your certificate key!
CertStoreLocation: tells Windows to save the generated certificate in your “Personal” store. More about that below.
Type: there are pre-defined types of certificates. Root certificates are not one of them.

After you run the command, Powershell will tell you the thumbprint for your brand new root certificate. Make a note of this, because you will need it when issuing certificates.

Thumbprint                                Subject
----------                                -------
BCCD1A6260025347F3302F10ED1A23CC2DAC75A4  CN=Grant Root CA 2021, OU=certs@ligos.net, O=Murray Grant,...

Your private key is currently accessible to any application you run.Which means, if you get malware on your computer, the Bad Guys™ could create their own certificate that your computer trusts.Potentially letting them impersonate any website (eg: your bank).

To stop this, you should export the certificate including the private key (which goes somewhere very safe as a backup).Then re-import it with certificate protection.This requires a password to be entered each time create a new certificate using your root.

Steps to Export

Search for “Manage User Certificates” to open Certificate Manager.Expand “Personal” > “Certificates”.

Right click your new certificate > All Tasks > Export.Make sure you “export the private key”.And tick “Export all extended properties”.

Give you certificate a password and save it.

Finally, delete the certificate from Certificate Manager!

Steps to Import

Double click the file you saved.Import for “Current User”.

Ensure “Enable strong private key protection” is ticked. And “Mark this key as exportable” is unticked.

Each time you create a new certificate using your root CA, you will be prompted for it’s password.(And you should make 200% sure you have that certificate file backed up; because if you lose it, you have to start again).

Trusting the Root Certificate

You need to load your root certificate into your operating system certificate store.Only then will it trust it.

First, repeat the above process to export your certificate without the private key:

This file can (and should) be redistributed publically.Anyone who installs it will trust certificates you create.The onus is on them to verify your identity and decide to trust you (or not).

Import the root certificate into the “Trusted Root Certificate Authorities” store by double clicking and then “Install Certificate”.Be sure to place the certificate in the “Trusted Root Certificate Authorities” store:

You will need to repeat this process on every device that you own.

You may also need to load the certificate into application specific stores, for example, Firefox has its own certificate store that you can find in Settings.

HTTPS Certificate

Now, your device & applications should trust any certificates issued by your brand new Certificate Authority!Let’s make one:

PS> New-SelfSignedCertificate 
    -DnsName @("countdooku.ligos.local", "countdooku.ligos.net", "192.168.0.2") 
    -Type SSLServerAuthentication 
    -Signer Cert:\CurrentUser\My\BCCD1A6260025347F3302F10ED1A23CC2DAC75A4
    -NotAfter (Get-Date).AddYears(10) 
    -KeyAlgorithm "RSA" 
    -KeyLength 2048 
    -HashAlgorithm 'SHA256' 
    -KeyExportPolicy Exportable 
    -CertStoreLocation cert:\CurrentUser\My

I’ll outline the major differences:

DnsName: this is a special case of “subject”. We use a powershell array to list all DNS names we might access this server by. In this example, there’s an internal DNS name, a public name, and an IP address. The first name becomes the “common name”, others are known as “alternate names”.
Type: unlike root certificates, there’s a well known type for HTTPS.
Signer: this is the thumbprint of your root certificate.
NotAfter: 10 year expiry. I expect my server will be replaced before then. Be careful setting a longer lifetime than your root certificate.

When you run this command, Windows prompts you for the root certificate password (hopefully, making it difficult for Bad Guys™ to get their hands on your precious root cert):

Thumbprint                                Subject
----------                                -------
2368DCAD54D5043EFAF3D8179B843A2E53B436DF  CN=countdooku.ligos.local

Once again, your new certificate will be accessible in Certificate Manager.I’m not as paranoid about backing up HTTPS certificates I create.They cost me 10 minutes of my time - if I lose one or muck it up, I can just create another.

(But just to remind everyone, your root certificate MUST, without fail or exception be backed up)!

After deploying my new certificate, Firefox now trusts my connection to my TrueNAS server! (Even if it has a small disclaimer).

Code Signing Certificate

The final type of certificate is a “code signing certificate”.Developers may be interested in this to do code signing of executables and installers.

PS> New-SelfSignedCertificate 
    -Subject "CN=Murray Grant Code Signing,OU=murray.grant@ligos.net,ST=NSW,C=AU" 
    -FriendlyName "Murray Grant Code Signing 2021" 
    -Type CodeSigningCert 
    -Signer Cert:\CurrentUser\My\BCCD1A6260025347F3302F10ED1A23CC2DAC75A4
    -NotAfter (Get-Date).AddYears(10) 
    -KeyAlgorithm "RSA" 
    -KeyLength 2048 
    -HashAlgorithm 'SHA256' 
    -KeyExportPolicy Exportable 
    -CertStoreLocation cert:\CurrentUser\My

There are not many differences:

Subject and FriendlyName: we’re back to the convention used in the root certificate.
Type: there’s a well known type for code signing.

Export and Convert

I’ve outlined the process to export certificate using Certificate Manager from the Windows Certificate Store.When you include the private key, you will get a pfx file.

Different servers use the key pairs and certificates in different formats.Some can use pfx with a password, others require a pem file with no password.They’re all a bit different.

So we need to convert the pfx into other formats.Unfortunately, I’m not aware of a powershell command for this, so we resort to using openssl:

openssl pkcs12 -in certificate.pfx -out private_key_with_password.key
openssl rsa -in private_key_with_password.pem -out private_key_without_password.key

The first command extracts the private key and certificate from a pfx file, and saves it in a password protected file.

The second command reads from an encrypted pem file, and saves the private key with no password.

You may need to open the files produced by openssl, and copy+paste the contents (to get the exact certificate / key you’re interested in), but all the data is available.

Why PowerShell And Not OpenSSL?

Because OpenSSL is too complicated!

I originally set out to write this article using OpenSSL on a Linux server.And was confronted by this document outlining how to do certificates using OpenSSL.

If you thought this post is long, that link has 7 chapters and about 4400 words of “how to configure openssl” (and very little about how certificates work)!

Quite simply, I don’t need revocation servers and serial numbers and all the rest.I want just enough certificate to make browsers happy when connecting to my TrueNAS server or SyncThing or Mikrotik router.

Other Resources

The following resources were used to create this post:

Microsoft Reference for New-SelfSignedCertificate - this gives a number of useful examples.
Further Microsoft Examples for New-SelfSignedCertificate - this had the magic text required for a root CA.
Build Your own Public Key Infrastructure - for kernel code signing, but has good diagrams and examples.
How to Create a Self Signed Certificate with Powershell - simplified versions of this post.
How to Convert SSL Certificate to various formats - I don’t pretend to understand openssl!

Conclusion

You are now your very own Certificate Authority!And can create certificates trusted by… well… whoever you can convince to install your root certificate.

For use within a household, family or small business, this is fine.And a darn sight cheaper than “real” certificates.

Web browsers will stop nagging you about untrusted and unsecure connections.

(Have I mentioned you need to backup your root certificate enough yet)?

Long Term Archiving - Part 3 - Capturing Data

2021-06-03T14:00:00.000Z

What to store? How to get it?

You can read the full series of Long Term Archiving posts which discusses the strategy for personal and church data archival for between 45 and 100 years.

Background

Last time, we listed the failure modes possible when making long term backups and archives. Also remember the broad strategy.

Before we consider the how of backups & archives, we need to ensure we can get our hands on the data we need!After all, it’s rather pointless to have a robust strategy for keeping 45+ years of data safe, if we forget to include crucial files or documents.

Goal

List the data I need to store on backups & archives.Then ensure I have access to said data.

Part 0 - General Principals

Make a list of everything you need to backup.

The simplest list is “everything” - that way you won’t forget!Often “everything” ends up being too big and you have to choose, but we can cross that bridge later.

It might be too abstract to work out a meaningful list of “data”.so, you could check all devices you own / control and inspect the files on them.You could list applications used and the files they use.You could list all your cloud accounts to check for data in the cloud.And don’t forget hard copies.

Now you have a list of data (files, photos, videos, recordings, databases, financials, records, etc).Figure out what devices they reside on.Its possible you have a centralised server (or servers), or they could be stored on each device, or perhaps in the cloud.Write down how you can access them.Write down how large each category is (MB, GB, TB, etc) and how much it grows each year - often one or two categories will make up 90% or more of the total data size.And finally, how you might include them in backups & archives (preferably via an automated process).

If you want, you can make the data list in priority order, and ensure most important things are backed up first.The relative size of each category might mean those are backed up less frequently.

You could also define a lifetime - for my purposes, the lifetime is 45+ years for everything.But it’s possible some data only needs to be retained for a few months or years - that might indicate different backup strategies are required.

Part 1 - Personal Data

Enough abstract principals, lets make some lists!

The list of personal data I want backed up:

Photos and videos - from cameras / phones. ~350GB, growing by ~30GB / yr.
Larger videos - these go for over 20 minutes, sometimes from 3rd party sources. ~80GB, growing by ~10GB / yr.
Music - ripped CDs. ~10GB, growing by ~100MB / year.
Documents / source code - from devices, cloud & File History. 30GB, growing by 500MB / year.
Occasional disk images - taken when I decommission a device. ~50-150GB per image, one image every 2-5 years.
Email - stored on GMail. ~5GB, growing by 250MB / year.
Cloud - OneDrive is the only thing I care about. ~30GB, growing by 500MB / year.
Hard copies - only of very important documents (birth certificates, university certificates, etc).

The list of personal devices:

A few PC computers - laptops and desktops (slowly shinking in number)
A few servers - Linux boxes & TrueNAS server
Android Mobile phones / tablets (slowly increasing in number)

I have passwords and admin rights to everything!So no problem with access.And all devices have OneDrive and / or Syncthing to automatically copy data to well known locations.

How does the data on each device get to a backup?

The rule is: if I want it backed up, it should end up on my TrueNAS server.Otherwise, it should be in the Microsoft or Google clouds.I can manage backups from all these locations, either via automated or manual processes.

Photos and videos: any WiFi enabled Android device runs Syncthing which automatically mirrors photos to my TrueNAS server. A handful of cameras don’t have WiFi, so they need to be manually copied.
Larger videos: stored on TrueNAS
Music: stored on TrueNAS
Documents / source code: stored on individual devices + OneDrive. File History makes sure these get copied to TrueNAS regularly. Anything not copied automatically is captured in my bi-annual “manually export everything from the cloud to a local backup” (see below).
Disk images: copied to TrueNAS on creation
Email: bi-annual manual export.
Cloud: bi-annual manual export.
Hard copies: scanned / photographed as required. Ends up on TrueNAS.

Assumption: things stored in the cloud are pretty safe; I’m happy to do a manual bi-annual export.I had an automated GMail backup to local files, but it broke years ago and I never fixed it.

A note about cloud data: Google, Microsoft and Facebook all have an export all your data function.The format it is exported in is often mediocre, but it’s better than nothing.For other cloud services, you will need to search for an “export” function.If one is not available, that’s a big risk - if the provider goes out of business you will likely lose your data.

My photos and videos category is both the largest and fastest growing.It’s also my highest priority to survive any disaster or data loss event.

Part 2 - Church Data

Church is considerably more complex.The main reason is data is stored on various devices owned by volunteers; centralised digital storage is a relatively new thing.

The list of church data I need to backup:

OneDrive - this is our main church digital storage. Sub divided as follows:
- Specific Ministries - eg: childrens, youth, young adults, etc. ~1-5GB each, growing by ~100MB / year / ministry
- Sunday Meetings - eg: Powerpoints / videos shown during church meetings. 5-10GB per year.
- Music - eg: Song slide masters, recordings used when live music isn’t possible. 5GB, growing by ~250MB / year.
- Recordings - eg: audio and video recordings of church meetings. ~60GB per year.
- Documents - relating to committee meetings, management, policies, financials, etc. 4GB, growing by ~250MB / year.
Email - Office365 for staff, personal emails for volunteers. ~5GB, growing by ~500MB / year. No idea about personal email size.
Hard copies - eg: records of weddings, baptisms, burials. Youth & children permission. Attendance rolls. ~250 documents / year.
Databases - eg: Safe Ministry Compliance (cloud based SaaS), church contact details (MS Access). Under 1GB, growing by ~10MB / year.
Website - wentyanglican.org.au, which is stored on git in BitBucket. Under 1GB.

The list of devices I’ll need to get data from:

Small number of church owned PCs / laptops. All Windows PCs.
Large number of personal devices owned by members / leaders / staff. PCs, Android, Mac. I really don’t know how many of these are out there; it’s not feasible to get access to all of them.
I have admin access to church devices, but not personal ones. I need to provide a way for people to make this data available.

How does the data on each device get to a backup?

This is the main complexity of our church environment.I need to provide a way (probably via OneDrive) for people to store / submit data to church controlled systems.That’s a change to how people conduct their regular church ministry / work, so it’s not trivial - I need to provide processes, documents and technical support to assist non-technical people in this transition.

OneDrive: I can download everything from OneDrive. Or use something like rsync or rclone to mirror to local or cloud storage.
Email: Manual process to download from Office365. Personal email is more difficult - best option I’ve come up with so far is a special “backup” mail box in Office365 that people can send important messages to. That’s clearly far from fool proof.
Hard copies: Stored in filing cabinets / archive boxes. Scans made to OneDrive.
Databases: Manual exports from cloud, or database backups for on-prem. Under 1GB, growing by ~10MB / year.
Website: Automated git pulls to a local server.

Key to this strategy is to move more ministry related data to cloud storage.The more data on servers / services I can access without asking, the easier I can automate backups.

One option I am toying with is Nextcloud, which is an “on-site DropBox”.Basically, something like OneDrive, but on our own hardware.The main reason is to increase our control over data with personally identifiable or sensitive information.It just so happens we have an existing Linux server with a few hundred GB of storage, which should be plenty enough for storing small documents.

The single largest category is church meeting recordings.Since November 2020, we’ve been live streaming and doing video recordings of all Sunday meetings (plus various other events), which is ~1GB per meeting.Previously, it was audio only recordings, which weighed in at 50MB per meeting.These video recordings dwarf all other categories of data, so they’ll need special treatment.However, in terms of surviving 45+ years, they are only of historical importance - compliance data is what we really need to keep long term.

Conclusion

We’ve identified the categories of data needed to be backed up, where they are stored and how we can get this data to a backup (at least at a very high level).Essentially, we’ve identified how to get access or control of any data we need to backup.

Which boils down to: what devices do I need access to?And: how can I export from my cloud service providers?

So make your lists and check them twice!

Next up: how will we need to access data? That is, access patterns will drive the storage technology chosen.

Read the full series of Long Term Archiving posts.

Long Term Archiving - Part 2 - Failure Modes

2021-05-12T14:00:00.000Z

How many ways can a backup fail? Let’s count!

You can read the full series of Long Term Archiving posts which discusses the strategy for personal and church data archival for between 45 and 100 years.

Background

So far, we have a broad strategy for making long term backups and archives.

To implement a viable technical solution, we need to be aware of why it won’t work.That is, we need to think of all the ways backups might fail over 100 years.That is, we need to know exactly how robust we need to be.

That is, failure modes.

Goal

Define likely (and unlikely) failure modes for data storage over 45 - 100 years.Remembering that over 100 years, even very unlikely failures become possible or even common.

I’ll discuss the various failure modes below, and give some examples.

Insta-Fails

The first group I call “insta-fail”.Which means, your backup was never viable in the first place.

Many other failure modes involve time: they become more likely over time, or your data degrades over time.Insta-fails are instant - your data is gone in the blink of an eye!

Examples:

Your backup didn’t actually work. Perhaps the backup disk wasn’t plugged in. Perhaps you didn’t run a manual process. Perhaps you don’t even have a backup!

If you never had a backup to start with well… you’ll have nothing tomorrow, let alone in 45 years. Insta-fail!

A variation: your backup didn’t include the files you need to restore. Perhaps you didn’t configure your backup correctly (missing includes, wrong excludes).Perhaps some files couldn’t be copied because they were in use - remember that important databases and financial records are often in use 24/7.

If you never backed up the files you need well… you’ll have nothing to restore tomorrow, let alone in 45 years.Insta-fail!

Is your backup encrypted? Make sure you never lose the password / encryption key!Modern encryption is built so that you need the exact password to decrypt your data - one character wrong is the same as everything wrong.If you forget the password to your backup, or lose the paper you wrote it down on, or can’t access your password manager - your backups are gone.Well, the data might be perfectly preserved, but you’ll never be able to read it.Insta-fail!

This brings up a tricky question when storing data for 45+ years: should you encrypt it or not?On one hand, there is almost certainly personally identifiable information in your backup, so you should encrypt it.On the other hand, how do you backup the password to your backups?Clearly you can’t use your normal backups for the password, but how do you make sure the password survives 45+ years?I’ll discuss that in more detail in a future post.

There’s a joke that goes: “backups never fail, but restores do”.That’s a jaded way of saying “restoring data is the important thing, backups are an incidental process along the way”.Don’t forget to test you can restore from your backups on a regular (if infrequent) basis.

Media Failure

Media failure is the most common thing people think of when storing data for a long time.Your hard disks, or CDs, or tapes, or whatever slowly degrade over time to the point where they can no longer be read reliably.

However, your backup needs to survive a long time before the media itself cannot be read!There are a few variations of this one, so lets think about examples:

Your backups are lost.Perhaps your backups are on some USB hard disks, and you misplace them in some “safe” place.Or you move house once, or twice, or thrice and they disappear (maybe into the trash, maybe into… well… somewhere).Or they are filed into some system which makes no sense and they end up in some giant warehouse with the wrong label and no hope of finding them without inspecting all 1,000,000 items.

Your backups are stolen.A variation on “lost” - you are robbed and your precious backup on USB disk that is connected to your laptop is pilfered along with your computer.Remember that thieves don’t discriminate: computer gear is computer gear is computer gear, and backups look just the same as any other computer gear.

Your backups are destroyed.This is “lost” into tiny little bits.Fire, flood, earthquake, tornado, and so on.Note that it doesn’t need to be a catastrophic event - a car crash while taking your backup hard disk home might be just as destructive as a fire.I’ve had a number of USB disks fail simply because I dropped them once too often.And if you are taking your backups home or off-site (which is a good thing) accidents become more likely.

In all these cases, if your backups are on physical media, you need to have that media in your hands to get the data off it.If it’s lost, stolen or destroyed - you have no backup.

Your backups survive for years, but simply degrade over time.OK, now we’re into the 45+ year realm!Nothing bad happened, but given enough time, even the best media will fail.

It’s an open question how long this will take, and depends on lots of environmental factors.But hard disks are rated for, say, 100,000 hours of use - which is around 11 ½ years.How might that change if the disks are in cold storage and never powered up?What about solid state disks?Or tapes?Or optical media?

I’ve put together a basic table of different media and approximate life time, based on the Internet.The “Refresh Interval” is the frequency you’d need to power up media and “scrub” for errors to achieve the “Life Time” reliably.Note that I found it quite difficult to find hard data on long term media life time; most is speculation and guess work, with the occasional anectdote.The best source is BackBlazes’ hard drive report, but that is for running and active drives, not cold storage.

This reflects a cold, hard reality: no consumer media has survived 45 years, because none of this media was available 45 years ago.

Media	Life Time	Refresh Interval	Sources
Hard Disk	8-20 years	1-2 years	Source 1, Source 2, Source 3, Source 4, Source 5, Source 6, Source 7
Solid State Disk / SD Card	5-10 years	6-18 months	Source 1, Source 2, Source 3
Optical (CD / DVD / BluRay)	7-30 years	None	Source 1, Source 2, personal experience
Magnetic Tape	15-50 years	???	Source 1, Source 2

A short story about the “personal experience” for optical media:I made backups from 2000-2010 on CDs and DVDs (stopping when my weekly backups exceeded capacity of single layer DVDs), burning data and leaving the media on spindles.There was zero maintenance - disks went on spindles each week and were left in a cupboard.Occasionally, I made two copies and stored the other copy at my parent’s house.I dug these disks out recently and was able to read every disk, except one from 1999!There was definitely some degradation of older disks (reading was very slow), but only one hard failure.

So, ~50 CDs and DVDs tested out of ~200 burned.Age: 10-20 years.No maintenance.No special environmental control: I kept them away from direct light and water, but temperature would range from 10°C to an only-in-Australian-Summer 40°C.

And 99.9% success!

Cloud Provider Unavailable

Cloud providers like AWS, Azure and Backblaze claim crazy reliable data availability of 99.9999% or more.And when they spend billions of dollars each year, they can probably do a better job than I can on a budget of $500.But there are a few failure modes that can catch you unaware.

Your Internet is down.Pretty obvious that you can’t access the cloud when the Internet is down.

Your Provider has a Temporary Outage.It’s possible the provider has a serious network outage - though this is much less likely because they have multiple redundant connections.What is more likely is an application issue, or an authorisation problem, or some other transient outage.These usually only last a few hours, and only happen once or twice a year, but they do happen.

Your Cloud Provider Disappears Forever.Yes, the cloud can disappear.And much faster than you think!Companies go out of business all the time, or decide cloud backups aren’t a profitable business model.While it is unlikely Amazon or Google or Microsoft will go out of business, over a 45-100 year time line who knows what might happen!Remember, “the cloud” is a trendy way of saying “renting someone else’s server” - rental agreements last a few years at most, not 45+ years.

Your Cloud Account is UnavailablePersonally, I think this is the scariest thing about using the cloud for long term archiving.If you forget or lose your account password, your data is gone.If the provider decides to block access to your account, your data is gone.If a government takes legal action against a cloud provider, your data may be seized and unavailable.

Basically, there are things completely outside of your control that could block access to your data in the cloud.

Media Obsolete

Technology gets old very fast.And the new and shiny quickly replaces last year’s amazing storage tech.When you’re thinking about a 45+ year time scale, whatever you use to store data today is definitely, 100%, without a doubt going to be obsolete when you really to read it.

Floppy disks are a nice example.I haven’t touched a floppy disk since… I can’t remember!And I don’t own a computer capable of reading one any more.If my backups are on floppy disks, I’m in trouble.

I have backups on optical media (CDs and DVDs).Optical drives are not as popular as they once were, but some (not all) of my computers still have optical drives.Perhaps optical drives will go the way of floppy disks in 10 years.

SATA is the standard interface for consumer hard disk drives.20 years ago it was IDE with 40 pin ribbon cables.35 years ago there were ST506 controllers and MFM drives - and yes, I remember using them when I was a kid.NVMe is becoming more popular, perhaps it will surpass SATA in the next 40-ish years, rendering all today’s HDDs unreadable?

USB is everywhere today.But will it be in 40 years? 60 years? 100 years?You need a laptop or desktop device to read a USB disk; but mobile phones and tablets are more popular, yet cannot read a USB disk.If pocket computers completely replace desktops and laptops, how will you read your precious backups?

Using a NAS appliance with an Ethernet UTP cable to store your backups?I remember attending LAN parties in the mid-90’s with 10BASE2 coax cable.Wired ethernet seems to have stagnated in the consumer space recently, perhaps your next NAS will be WiFi only (I hope not, but who knows)!

Pretty much every storage technology in common use today didn’t exist 45 years ago.If you’re storing data for 45+ years, be ready to migrate from old to new technology.

Fortunately, all my doom-saying isn’t all that bad.Almost all the tech I’ve mentioned above is still available, it just might require some eBay purchases to acquire niche equipment.

Application Unavailable

Data can be available in open or proprietary formats.Open formats like PDF, RTF, JPG or MP3 are readable by many applications.Proprietary files are only readable by one application.If you can’t use that app any more, the data is also gone.

This kind of thing is very common in medical or industrial settings, less so for every day documents, pictures and videos.So this is more applicable to businesses using niche or specialised software.

I looked through some old backups from late 90’s and found pm6 files.For various reasons, most of my work in high school was done using Adobe PageMaker.The last update for PageMaker was in 2001; I don’t have the disks any more, and even if I did, Wikipedia says it doesn’t work on Windows 10.So I have no way of reading those files - that data is gone.

An insidious form of this is proprietary backups.Imagine you purchase “Acme Backup”, which saves your data in acme files that only it can read.One day, Acme goes bust and your backup product is no longer supported.No problem, you continue using Acme because “not supported” doesn’t mean “it stops working”.But eventually, after a few computer upgrades it does stop working: Acme Backup is not compatible with Windows 2040.Even if your acme backups files are available, you lack the software to restore from them.

Perhaps the proprietary application is still available, but you just lost the license code required to use it.Best case, you need to buy a new license.Worst case, the application is not sold any more and you’re stuck.

File Format Obsolete

This is a variation on Application Unavailable, but 100x worse.Not only is the application obsolete, the data format is also obsolete.That is, nothing out there can read your files.

Let me say, this is really, really, REALLY unlikely to happen.Applications almost never remove support for such core functionality.

But imagine if someone came up with something better than JPEG - images could be stored without any loss of quality, in just a few kB, encoded and decoded with minimal CPU usage.It is such a good format that everyone stops using JPEG.Cameras, phones, even the Internet all switch to this magical new image format.Eventually, application developers decide supporting JPEG is too difficult, too time consuming, and brings no benefit.So they remove JPEG support.And all those JPEG family photos from the early 2000s are unreadable.

(Note that we already tried to come up with a better JPEG - it didn’t take off).

Some of the core standard file formats include: JPEG, MP3, ZIP, PDF, UTF8 text.And let’s be honest, none of them are going to be obsolete any time soon.But 100 years is a long time.

Perhaps the file system on your disk isn’t supported any more.NTFS, ZFS, UFS, Ext4 are all in common use - and again, support for these isn’t likely to disappear.But 100 years is a long time.

One real world example of an obsolete standard (albeit not a file format) is SSL.The thing everyone calls SSL is actually TLS - Transport Layer Security.The current version of TLS is 1.3.Version 1.2 is also in common use.And 1.0 and 1.1 are considered a security problem, so many servers are disabling them.Poor old SSL is even older, being deprecated in 2015 as a security hazard.

So, if you’re using an old version of Netscape Navigator from the late 90’s, you cannot access most of the Internet.And if you’re sticking to Android 4, Windows XP, or any version of Internet Explorer before 11 (so 15-ish years ago), you’re in the same boat.All those backups in the cloud are inaccessible!

Maintainer Unavailable

People are a key point of failure in any organisation.

It could be as simple as someone leaves the organisation and doesn’t leave passwords required to access backups.Or that person was responsible for the backup procedures, and never bothered to train a successor.Or that person physically has the backup media.If the person’s gone, the backup may have gone with them.

Perhaps backups are still available, but their content was organised in a very unusual way.Without the “librarian” who knows how it all works, content is lost in a maze of twisty backup disks, all alike.Or there’s a “computer guy” who just knows how the backups work - only he’s gone.

On a long enough time line, the survival rate of everyone drops to zero.People die.Sometimes suddenly, sometime with lots of warning.Either way, any knowledge about backups solely in their head is gone (eg: passwords, procedures, places).

In 45 years, I don’t expect to be maintaining backups at Wenty Anglican.In 100 years, I expect to be with the Lord.

Without key people, backups may be totally useless.They need to pass their knowledge, expertise and passwords onto a successor.

Fundamental Changes

Finally, there might be fundamental changes to undermine long term backups and archives.Things that break our assumptions about how the world works.

The English language will change.Probably not so much that we can’t understand today’s documents in 2121, but probably enough that they will be confusing or ambiguous.A few hundred years and it’s quite possible the English of 2021 won’t be recognisable or understandable.English might end up as a dead language.

Perhaps the Internet will change radically.Maybe someone will undermine how HTTPS works and “the cloud” will no longer be a secure place to store data.Maybe the global Internet will break into multiple Internets that can’t access each other - China is already trying pretty hard to segregate itself.It would suck if your cloud backups were in the other Internet, or behind a great firewall.

Perhaps digital storage isn’t a thing any more.It could be due to a shortage of materials and chips, or a lack of rare earth materials used in high-tech devices, or simply storage stops getting cheaper.Maybe a significant global event (pandemic anyone?) makes digital devices a luxury item and we can’t afford to use them for archives.

Electricity is rather fundamental to digital storage - heck, even hard copies rely on printers, copiers and lighting.No electricity, no digital anything, and definitely no backups.One hundred fifty years ago, in 1871, electricity was well understood from scientific and engineering points of view, but not widely available to general population.One hundred years ago, in 1921, electricity was a luxury that was available to upper classes only.It seems unlikely that the power would go off permanently, but we need to remember its a relatively recent invention when trying to store data for 100 years.

COVID has reminded us that disasters, natural or otherwise, can cause significant social and economic disruption which may impact long term backups.Few will maintain archives or keep passwords if they’re in fear of their lives!COVID has turned into a long disaster, lasting several years (even when vaccines being deployed at break neck speed).While an earthquake or flood has an immediate impact, a pandemic is longer and more drawn out.And requires a different approach to ensure archives survive.

(In case you’re wondering, I’m not going to consider how to mitigate these fundamental changes. I’m just listing them to illustrate how hard long term data storage is).

A Word About Hard Copies

I’ve focused on digital media all through this article.But it’s worth thinking how the failure cases apply to physical hard copies of documents (ie: paper).

Many failure cases are specific to digital data, and just don’t apply to hard copies:

Insta-fails
The cloud: closest analogy is renting a storage unit.
Media obsolescence: paper from 500 years ago is just as readable today.
Software obsolescence: the mk1 eyeball is all you need to read.
File format obsolescence: papers collected in a folder hasn’t really changed.

Some failures apply equally to both:

Media failures: paper degrades just like disks do.
Fundamental changes: shifts in language are the same for digital or hard copies.

Hard copies are affected by some even more:

Maintainer unavailable: without a good index, hard copies are really hard to search through. Even slow computers can search hundreds of documents per second. If your librarian is missing, your hard copies are a needle in a haystack.
Natural disasters: paper is very vulnerable to moisture, light, fire, flood, etc. Storing them is trickier than digital.

The biggest disadvantage of hard copies is: they are hard to copy.Computers are really good at making perfect copies over and over, really quickly.That’s why the solution for digital archiving is to just make lots of copies and compare them every now and then.Hard copies are physically bigger, harder to copy and trickier to compare.So although you can apply the same principals, its 100x more difficult in practice.

Conclusion

Well, there certainly are a lot of ways data can be lost!(And I’m not even claiming this is an exhaustive list).

I haven’t really discussed how to stop these events, but that will come in the future.And I expect many readers will already have answers in mind.

For now, let’s just admit many things could go wrong.

Some are entirely within our control (so letting them go wrong is just dumb), others are predictable and preventable with appropriate maintenance, others are outside our control and we need to take special steps to mitigate them.And some are really tricky to deal with - indeed, so hard that I simply can’t address them on my $500 annual budget.

Next up: what data I’m interested in collecting (and what I’m not), and how I’ll collect it.

Read the full series of Long Term Archiving posts.

Long Term Archiving - Part 1 - The Problem

2021-04-10T14:00:00.000Z

100 Years is a long time to store data.

You can read the full series of Long Term Archiving posts which discusses the strategy for personal and church data archival for between 45 and 100 years.

TL;DR: skip to the Implementation (part 6).

Background

In mid 2020, right as our church was working through what needed to happen to be COVID Safe and resume face-to-face meetings, we got a nasty surprise:

We have to keep records relating to “Safe Ministry” forever.That is, any records or documents that might be needed for a court case involve sexual abuse cannot be deleted.Ever.

Reliable and comprehensive Safe Ministry Records will be an important part of building a case against an alleged abuser of children in our churches, so it is vital that the correct information is recorded in a manner that is able to be kept indefinitely – in other words no Safe Ministry Record information can ever be deleted or thrown away. Source

After some reading of the Royal Commission into Child Sexual Abuse I found a recommendation for storing records for a minimum of 45 years:

We also recommend that institutions that engage in child-related work retain, for at least 45 years, records relating to child sexual abuse that has occurred or is alleged to have occurred. This is to allow for delayed disclosure of abuse by victims and to take account of limitation periods for civil actions for child sexual abuse (see Recommendations 8.1 to 8.3).

I asked: “is there any government or diococen assistance?” And found the answer is “No”.

My initial response was: “Are. You. Serious??!?!?There is no way this is possible!”

And the problem was parked until we had more breathing room post-COVID.

Well, in Sydney, we’re doing pretty well with COVID at the moment, so time to deal with this storing-data-forever problem.

Goal

My mission (which I have no choice but to accept - yay for government complience) is to develop a long term data archival strategy for Wenty Anglican Church.The data must be readable in 45 years, and is desirable to be readable in 100 years (the approximate lifetime of a person).

This must be accomplished with off-the-shelf technology, implemented by myself in my spare time, be supported by non-technical volunteer users, and has a maximum budget of a few hundred dollars per year.

Bonus points if we are able to search the data and find relevent information in any way other than “trawling through everything year-by-year”.

Sub-goal: accomplish the same aim for my own family.If I can adopt a strategy that works for me, I have some hope of church doing the same.

Aside:

Although I’m focusing this series on the technical requirement of “long term data archival”, it’s important to note that in a church context this requirement is part of wider policies and procedures to ensure the safety of everyone who comes on our property.That includes church staff, volunteer workers, regular members, occasional visitors, one-off guests, contractors, and anyone else who might walk through our front door (or back gate).It addresses physical, emotional, and spiritual safety.It is particularly geared to protect minorities and vulnerable people (who have been terribly abused in church contexts in the past).

That is, this is not a box ticking exercise for government compliance.It is part of our church’s desire to keep people safe, as we seek to share the good news of Jesus.

Is This Even Possible?!?

My initial reaction to this requirement of 45+ year data retention was: this isn’t possible!

The government is asking volunteer organisations (not just churches) to collect data in a systematic way, store it securely (as many records will identify people; thus raising privacy issues), and ensure it is still available in at least 45 years.

As so much data is digital these days, we need to come up with a digital solution.Only thing is, 45 years ago (1976) the personal computer was not a thing.The cutting edge of digital storage was the cassette tape and could store perhaps 100kB.

In other words, we’re being asked to do something that has literally never been done before, because the technology has not existed long enough yet!

However, I’m not one to be dissuaded by “impossible” goals.

While the digital technology has not existed to retain records for 45-100 years, the analog technology certainly has.

My church has paper records going back to 1919 (when the building was completed).Governments have records going back hundreds of years.And archaeology has been able to recover documents - OK clay tablets - from thousands of years ago.

At church, we see the Bible as the supreme authority in matters of salvation.It also happens to be a collection of documents that have been handed down over many generations - so a fitting yardstick for my current project!

The New Testament was collected from various sources into its final form in 325AD at the Council of Nicaea.And while there is plenty of debate how old the original source material is, the New Testament can be no younger than 1700 years, and is likely closer to 1900 years old (the latest material written in ~120AD).The Old Testament is messier (mostly because it’s older) but consensus is it was essentially what we have today in 132BC when the Greek Septuagint translation was finalised.And the original sources must be older (just how old is a subject of much debate that isn’t relevant for my data storage project).

The point is: the Bible is a written document, originally created in an oral culture, written on materials that naturally decay, and often propagated and copied by volunteers.Yet it has survived remarkably well for around two thousand years.

So storing records for 100 years is certainly not an easy task, but it’s far from impossible.

Strategy

I’ll leave the details of long term archiving for future posts.This is my overall strategy:

Point 1: the data I store today will outlive me.

In 45 years time I’m not likely to be maintaining records at Wenty Anglican.I might not even be a member there.Heck, I might not be alive.

So I MUST, without fail, be able to hand data on to a successor.I need one (or more) people in-training who can take over after I stop looking after the data.

The data itself (however and wherever its stored) must be documented enough that someone could pick up archiving even if I’m not around.That is, storage needs to be simple, and self documenting.

If someone randomly comes across one piece of the archive (say a hard disk, DVD or cloud backup), they should be able to find their way to other parts of the archive.That is, even if I disappear without handing on to a successor, the poor archivist who has to take over can piece things together from any one part of the archive.

Point 2: whatever choices I make now will be wrong in 45 years.

The technical details of backups and archiving will change over time.And 45 years is a long time.The decisions I make today will become obsolete, or wrong, or be superseded.

In 2000, my first backups were on burned CDs.Later I moved to DVDs.Then to hard disks and network attached storage.And eventually the cloud.Most recently, I’ve started using BluRay disks.

So I MUST, without fail, take a big step back and review my backups & archives every 10 years.I need to be prepared to migrate before technologies become obsolete.I need to look for new and better ways of storing data.Worst of all, I need to migrate from old file formats to new ones (and I’m really not looking forward to that).

In other words, the technical details will definitely change over time.

Point 3: as long as one copy is readable, all is well.

Ultimately, long term archives are a distributed data problem.And that has a well known solution:

Make many redundant copies of data. Distributed to multiple locations to prevent systematic failure.
Check the copies are still good on a regular basis.
When any go bad, make new copies from the good ones.

As long as one copy can be read, the data has survived.

Step 2 is the weak point, because it implies maintenance.If maintenance is not automated (or at least scheduled) it won’t happen.And, given a long enough time line, all backups & archives will be lost - there is no media that will reliably survive 100+ years (even the 45 year minimum is a stretch).

So I MUST, without fail, have some kind of maintenance program to detect failures and replace them BEFORE all copies fail.

Incidently, this is how the Bible - in particular the New Testament - survived so well.People just kept making more and more copies of it.Even though the originals were lost, the copies (of copies, of copies…) survived.

Future Articles

OK, you’ve come here not to read about some hand-waving high-level strategy, but concrete technical plans to achieve 45+ year data storage.In the context of my own personal data, and for Wenty Anglican church (and definitely NOT for some big corporate organisation).

Here’s what I plan to discuss in coming posts:

Failure Modes: the different ways data can be lost.
Collecting Data: what data will I archive and how will I get my hands on it.
Access Patterns: the way you to need to access data determines how it gets stored.
Storage Options: the many ways you can store data - hard copies, on-site, cloud, archives.
Implementation: the actual way I’ll be storing data.
File Formats: which ones are likely to survive 45+ years.
Organising Data: how to organise data so its self-documenting and can be connected to other parts.

Conclusion

Long term archiving of data for 45+ years is tricky.It’s a goal that is longer than I’ve been alive!But it is not impossible if you have many copies, maintain them, and are prepared to change (possibly radically).

I’m signing up for the long haul. Assuming it works, my grandkids will be reading this in 2121!

Next up: a discussion of what things can go wrong with backups over 100 years.In other words - failure modes.

Read the full series of Long Term Archiving posts.

A/V Video for Church

2021-02-25T13:00:00.000Z

Training People in the Age of Live Streaming

Background

COVID has forced churches around the world online. While we previously were happy meeting in person, we were suddenly forced (by law) to provide content for Wenty Anglican online.

So, phone calls, Zoom meetings, pre-recorded sermons and live streams became normal.All in the space of a few weeks in 2020.

Now the COVID threat is slowly dissipating, we’ve decided to continue live streaming church.(Zoom, on the other hand, is significantly less popular and no one is rushing to keep it)!

I’m pretty technical and learned “the new normal” quickly.And, being the go-to technical guy at Wenty Anglican, I had to implement live streaming in late 2020.I’ve learned a lot about OBS Studio, how unreliable WiFi can be, YouTube copyright and video cameras in very short time!

Now, I’m trying to train others to run our church live streams to a reasonable level of quality.

New People and New Tricks

My main goal is to train new people how to do A/V work in general, and also to train existing people how to run our Sunday meeting live streams.

The way we’ve always done this kind of training is ad-hoc and “on the job”.That is, the person who knows how it’s done (me) tells the people rostered on particular roles how to do their job on Sunday morning.Usually that means there’s one day they’ll watch me do it, then next time (which might be 2-4 weeks later) I put them in the driving seat while I supervise.

This has a number of draw backs, including a) there’s a limited time for people to prepare for our church meeting (30-45 minutes) and that doesn’t allow much time for training. b) Most people want to have some level of training before hand, so they know what they’re up against.

Some changes we made to our meetings for COVID purposes meant our morning meetings were “tech heavy” - many people were already trained for A/V duties. While our evening meetings were “tech light” - only a handful of people were trained and had to be rostered on pretty much every week.We want to transfer those skills around so more people can do more A/V roles.

Strategy

The first step was to work out very clearly in my mind what needed to be done, and how best to do it.This involved things like configuring OBS Studio for the very simple scenes we needed.And then acquiring and installing the required hardware (a camera with accessories and HDMI to USB converters).

We’ve been streaming since November 2020, and I wasn’t doing all the work myself along the way.There was plenty of on-the-job training.But only recently I completed all the changes I wanted for the minimum level of quality I was aiming for.

To get training to people in bulk, I recorded a number of training videos and screen casts.These demonstrated what people needed to do each Sunday.I won’t comment about those videos here, you can watch them yourself if you want.

Then we conducted a “training day”, which was basically a few hours where people could practise, experiment and do what they need to do on a Sunday.Some of that was ad-hoc experimenting and learning.Some was more structured - following a runsheet for a regular Sunday church meeting.

Just not on a Sunday.And in an environment where there was no pressure to get it right.

OK, it wasn’t just practise, there was a little bit of theory as well:

Always hit your cues. That is, keep concentrating and don’t miss things.
Anyone can fix things, even if you aren’t rostered on a particular task. That is, multi-skilling people.
Keep improving. That is, always try to do better.

Recordings

The equipment and software I used for creating these videos was pretty much the same as the start of COVID.

Video recordings were done on my Nexus 5X. It’s certainly not the newest phone, but it is fit for purpose (although the battery isn’t holding charge like it used to).
Screencasts were recorded using OBS Studio. Occasionally, I used TightVNC to show the actual computers at church while recording from home.
Editing with the stock Microsoft “Video Editor” app. I’ve figured out most of it’s limitations and can work around them well enough.
Re-encoding was done with Handbrake. Nothing much more to say here.
Finally, the videos were distributed on my personal web server. They are static content, and I use UUIDs to keep the files mostly private (although I’ve just shared these ones with the world)!

Conclusion

Training people for A/V takes time.And it works best if you teach in different ways - theory, demonstrations and practical.

Most of all, you need to be clear what you are teaching. Otherwise people will learn nothing.

My aim is that our church will have a good number of technically trained people, so we can live stream at a reasonable quality on into the future.

CMS Summer School Networking 2021

2021-01-23T13:00:00.000Z

Live events run on Ethernet

Background

Each year my family and I attend a Christian missionary convention CMS Summer School.The focus of the conference is to hear a series of in-depth Bible talks (5 x 45 min), receive updates from CMS missionaries serving around the world, and to support said missionaries in prayer and financially.It is attended by ~4,000 people over 6 days.

In short, its the biggest church event I attend in a year.

A few years ago I volunteered for their “tech team”, which does major work setting up the various infrastructure required for the conference.This ranges from power and lighting, to audio and visual (and many other things in-between).

The “tech team” is the goto team for troubleshooting of any vaguely technical issue, plus operating cameras, sound desks, and making recordings.

As per other church meetings, it’s effectively a big live event. Indeed, the biggest live event I have responsibilities at.

For the last two years, I’ve been responsible for implementing networking (among other things).

Goal

To provide networking infrustructure for the conference.This includes:

Audio (via Dante devices)
Live Streaming
General Internet Access & WiFi
Connectivity to Video Switcher

Other than in-ear communications and raw video from cameras, pretty much everything runs over ethernet.

Let’s drill into a few of those in more detail.

Dante Audio

Several team members work for Audinate, which created the Dante audio protocol.This is a high quality, low latency protocol to deliver uncompressed digital audio over IP and ethernet.It achieves extremely tight latency between devices: ~300 microsecond latency is normal.And is commonly used in the A/V industry.

From a networking point of view, it needs very low latency gigabit switches.

While the mixing desks and amplifiers use standard ethernet, we make heavy use of Avios, which are an analog to Dante audio adapter.Most Avios require PoE switches.

I’m not an audiophile by any means, but I can understand the technical side of Dante.It’s basically software controlled audio (similar to the usual software controlled things I’m used in to my day job).

Live Streaming

2020 was the year of COVID and the year of virtual everything.CMS Summer School runs in January, and due to a number of Sydney COVID cases in late December, the conference had to pivot from ~500 people in-person to essential persons on-site only (max of 200) live streamed conference with a two week warning.

We always knew live streaming would be our primary audience this year, but something like 95% of our audience ended up being virtual.

So high quality live streaming was very important.

Streaming was done via Vimeo using a Teradek Vidiu Go hardware device.Obviously, high speed broadband internet is required.

Internet Access & WiFi

The network needed Internet access.The KCC Conference Centre, which hosts CMS Summer School, already has Internet access.We just need to tap into it.And provide WiFi APs so that wireless devices can connect to the network (there are apps which act as simplified mixing desks for Dante audio).

Nothing special here.

Video Switchers

Video at CMS Summer School is delivered via 3 HD cameras over SDI to a BlackMagic ATEM Video Switcher.A matching video controller is used for live vision control.

Although the raw video does not run over ethernet, the control channel to the video switcher does.

Implementation

While we tried various VLANs, trunking and other solutions to create isolated networks for Internet vs audio vs other data.In the end, the best solution was 3 PoE switches in a flat config.The most complexity was some bridging to create an isolated secondary network (to satisfy a Dante audio requirement).

3 x CRS328-24P-4S+RM: 24 port PoE switches
1 x hAP2: border router (also doubled as an AP)
3 x hAP2: WiFi APs

The border router does NAT and some shaping using simple queues.It has the complex firewall rules.We thought we might need to do other complex things on this device (running HDMI over ethernet through it) but didn’t need to.It is also a WiFi AP, but only because of physical proximity to some of our equipment.

Border Router config

The videoland switch is connected to the border router.Videoland is where the video switcher lives and the live streaming happens, plus a few minor sub-title / graphics roles (powered by laptops with HDMI outputs).There are no firewall rules on the switch; our goal is purely hardware switching for minimal latency.Usually with Mikrotik devices, I use VLAN interfaces, but we found they are implemented in software and introduce additional latency that Dante could detect.

Between the border router & videoland switch, we have 24 ports of PoE ethernet + WiFi.

Live streaming had a dedicated NBN connection (100Mb down / 40Mb up), plus a 4G / LTE backup.These were patched by the owners of the auditorium into a network switch; we just needed to run patch leads to the Teradek Vidiu devices.The 4G backup functioned via WiFi: Teradek > WiFi > switch > external 4G modems.Why? The Teradeks do an automatic fail over from ethernet to WiFi; so if the NBN were to fail, the stream would fail over automatically to 4G via WiFi.

In the end, the NBN never failed and the 4G was never used in anger (although we did manage to crash the Teradek due to a particular visual we used at one point).

The foldback land switch patches from video land.Foldback land is behind the stage controls audio so the band can hear themselves.It’s also where we have all the wireless microphone receivers, amps, etc.And there’s an X32 mixing desk which is considered our master device.This switch is where DHCP runs from; so it is closest to our master mixing desk.

Dante audio requires duel, redundant and independent networks to function correctly.And you can’t fool it by simply connecting the secondary interface to your main switch, or “forgetting” to connect the secondary interface..However, you can fool it by creating two, separate networks with no bridge between them.So we do that for the ~4 devices which require it.

Once again, there’s 24 PoE ethernet ports + WiFi.

Foldback Land Switch config

Note we only run 5GHz WiFi on the APs.And even then, it’s configured on narrow 20MHz channels.We’re in an auditorium which has Ubiquity APs all over the place with guest networks - the 2.4GHz spectrum is completely full and useless for us.And we aren’t trying to push bulk data over WiFi, just ~100kB/sec of control data.So its multiple APs on narrow, non-overlapping 5GHz channels.

AP config

Finally, our front of house switch patches from foldback land for both primary and secondary networks.Other than the secondary network, there’s nothing extra here.

In Action

When in action, we see up to 60Mbps of traffic and 30kpps:

But that can vary, depending on device:

And around 40 devices on DHCP:

[admin@sw01-foldback-poe] /ip dhcp-server lease> print
 #   ADDRESS                    MAC-ADDRESS       HOST-NAME          SERVER           LAST-SEEN                    
 1   192.168.16.61              34:F6:4B:00:00:00 AdmiralAckbar      dhcp-primary     24m31s                       
 2 D 192.168.16.198             00:15:64:00:00:00 X32-00-53-EB       dhcp-primary     1h35m3s                      
 3 D 192.168.16.194             00:1D:C1:00:00:00                    dhcp-primary     1h35m2s                      
 4 D 192.168.16.193             00:1D:C1:00:00:00                    dhcp-primary     1h35m1s                      
 5 D 192.168.16.191             00:0E:DD:00:00:00                    dhcp-primary     1h17m36s                     
 6 D 192.168.16.190             00:1D:C1:00:00:00                    dhcp-primary     1h34m59s                     
 7 D 192.168.16.189             00:1D:C1:00:00:00                    dhcp-primary     1h35m1s                      
 8 D 192.168.16.188             00:1D:C1:00:00:00                    dhcp-primary     1h35m1s                      
 9 D 192.168.16.187             00:1D:C1:00:00:00                    dhcp-primary     1h35m1s                      
10 D 192.168.16.186             00:1D:C1:00:00:00                    dhcp-primary     1h33m39s                     
11 D 192.168.16.185             00:1D:C1:00:00:00                    dhcp-primary     1h35m1s                      
12 D 192.168.16.184             00:1D:C1:00:00:00                    dhcp-primary     1h34m59s                     
13 D 192.168.16.181             58:B0:35:00:00:00 audio-laptop       dhcp-primary     39m54s                       
14 D 192.168.16.179             00:1D:C1:00:00:00                    dhcp-primary     1h15m37s                     
15 D 192.168.16.182             00:1D:C1:00:00:00                    dhcp-primary     1h15m30s                     
16 D 192.168.16.177             00:15:64:00:00:00 X32-04-33-A7       dhcp-primary     1h15m47s                     
17 D 192.168.16.176             00:15:64:00:00:00 X32P-01-5E-AC      dhcp-primary     1h15m42s                     
18 D 192.168.16.175             00:1D:C1:00:00:00                    dhcp-primary     1h34m59s                     
19 D 192.168.16.174             00:1D:C1:00:00:00                    dhcp-primary     1h34m59s                     
20 D 192.168.17.179             00:0E:DD:00:00:00                    dhcp-primary...  1h17m39s                     
21 D 192.168.17.178             00:1D:C1:00:00:00                    dhcp-primary...  1h35m2s                      
22 D 192.168.16.173             00:0E:DD:00:00:00                    dhcp-primary     1h17m31s                     
23 D 192.168.17.177             00:1D:C1:00:00:00                    dhcp-primary...  1h15m41s                     
24 D 192.168.17.176             00:1D:C1:00:00:00                    dhcp-primary...  1h15m38s                     
25 D 192.168.16.169             D0:37:45:00:00:00 Record-B           dhcp-primary     4m23s                        
26 D 192.168.16.167             40:6C:8F:00:00:00 packer-mac         dhcp-primary     1h34m58s                     
27 D 192.168.16.166             00:1D:C1:00:00:00                    dhcp-primary     1h35m1s                      
28 D 192.168.16.164             2C:36:F8:00:00:00 StageIO-Switch     dhcp-primary     1h34m55s                     
29 D 192.168.16.161             C8:BC:C8:00:00:00 Record-A           dhcp-primary     4m6s                         
30 D 192.168.16.160             6C:70:9F:00:00:00 iPad               dhcp-primary     18m14s                       
31 D 192.168.16.159             58:EF:68:00:00:00 CMSs-MBP           dhcp-primary     17m55s                       
32 D 192.168.16.152             00:1D:C1:00:00:00                    dhcp-primary     1h35m                        
33 D 192.168.16.199             C8:5B:76:00:00:00 LAPTOP-R6B5QC28    dhcp-primary     1h17m30s                     
34 D 192.168.16.144             2C:59:8A:00:00:00                    dhcp-primary     2m39s                        
35 D 192.168.16.141             00:E0:4C:00:00:00 Attila             dhcp-primary     sometime                     
36 D 192.168.16.140             82:4B:30:00:00:00                    dhcp-primary     30m16s                       
37 D 192.168.16.150             C6:F8:C0:00:00:00 Toms-iPhone        dhcp-primary     47m52s                       
38 D 192.168.16.139             C4:65:16:00:00:00 L19AU-33639        dhcp-primary     12m35s                       
39 D 192.168.16.138             00:E0:4C:00:00:00 Joshuas-MBP        dhcp-primary     1h34m55s                     
40 D 192.168.16.137             F2:6F:E9:00:00:00 Jymz               dhcp-primary     14s                          
41 D 192.168.16.135             00:E0:4C:00:00:00                    dhcp-primary     31m26s                       
42 D 192.168.16.134             74:E2:F5:00:00:00 Beth-iPad          dhcp-primary     27m18s                       
43 D 192.168.16.133             00:1D:C1:00:00:00                    dhcp-primary     9m1s

A couple of interesting things to point out:

The maximum required bandwidth for our 32 channel mixers is ~60Mbps. And many ports are operating as low as 4Mbps (AVIOs).
The bridge interface is not being used directly; the audio packets are entirely in the switch ASIC. This is rather unusual for me - all my home networking hits the bridge interface.
There are more discrete devices on the network then I thought (almost 50 configured via DHCP).

Future

We couldn’t do any VLAN trunking over single cables, because the software VLAN implementation caused Dante audio latency problems.Fortunately, we didn’t need to - we had enough network outlets to patch two flat networks.However, if we want this feature, we’d need to work out how to implement it using hardware switch rules.There was even talk of converting the switches to SwOS, to minimise the chance of using features implemented in software - but everyone on team is so familiar with RouterOS we’re very hesitent.

We may move DHCP off the foldback land switch to the border router.It’s something in software on latency critical switches, and it really doesn’t need to be there.The leases are deliberately long enough to cover our live sessions, but short enough to expire before the next session begins.

Although not required this year, we’ve run HDMI over ethernet to get video to other parts of the property - around 150m away from video land (where the signal originates).This is painful because a) we don’t have enough cable runs to where the video needs to get to, b) its video only; we have to run Dante audio separately, c) it consumes too much bandwidth.Given we had plenty of success with streaming via RTMP this year, I’m considering using it to replace HDMI over ethernet.Rather than streaming out to YouTube or Facebook or wherever, and then back in again (with the 30 second delay and external bandwidth hit) we could run an internal RTMP service via nginx which can be consumed by any location with ethernet (even WiFi) on-site.The main advantage is it runs combined audio & video at a high quality using 8.5Mbps, and solves most of our HDMI over ethernet problems.The disadvantage is we need a smart device (Raspberry Pi, Android device or Android TV) to play the RTMP stream.That and I haven’t tested it.

We have a bunch of analog comms (think video director talking to camera operators), which I think is the very last bit of analog gear we use.I’d like to get rid of that and run digital comms over Ethernet.No idea what this involves though.

Conclusion

CMS Summer School is a reasonably large live event.And pretty much everything A/V runs over ethernet at live events.Mikrotik devices are cheap, powerful and meet our requirements for pushing ~100Mbps around for Dante audio.Along with our more minor networking needs, Mikrotik has us covered.

And that means speakers and missionaries can get on with their thing: sharing Jesus.

Covid Compliance

2020-12-19T13:00:00.000Z

Why I’m so busy in 2020.

It’s been a while since my last blog post.Which is because I’ve not had enough time this year.

Because I’ve been working to get Wenty Anglican Church COVID Safe.

Background

OK, you don’t really need much background to COVID-19.It’s been the dominant event of 2020.

In my part of the world in Sydney, Australia, we got off pretty lightly.We had one lock down in March through to May, which was moderately severe, but not as hard as in other parts of Australia or the world.And since then, Sydney has slowly been rolling back restrictions.And managed to avoid a second wave.(Our friends in Melbourne were not so fortunate).

From June 2020, our church started working toward being COVID Safe.

The Timeline to COVID Safe Church

Places of worship were the subject of several COVID clusters early in the pandemic, so there were some pretty strict conditions required to re-open.

Wenty Anglican had been doing online meetings since the March lock down came into effect.After a month or so, we had got into something of a new rhythm: pre-recorded talks, YouTube songs, Zoom meetings.Although it was horribly impersonal, it was as good as we could do.And meant we could continue encouraging each other to follow Jesus even when we could not meet in person.

By June, the NSW government had rolled back restrictions to the point where churches and places of worship could re-open for up to 50 people, if they had a COVID Safe plan in place.At this point we decided to wait - the online meetings were going OK and there was a lot of work required to be COVID Safe.And, the restrictions meant a) no singing and b) limited mingling.Which felt like face-to-face meetings would almost be worse than online ones.

In July, our regular Parish Council meeting spent considerable time working out what going back to face-to-face meetings would look like under the COVID Safe requirements.As a warden, my responsibility was to ensure compliance with these requirements, and the safety of people coming on our property.And we also started reaching out to church members to gauge how many people wanted to return to in-person meetings.

This was the start of crazy busy time for me.There were regular meetings between church wardens to discuss what COVID Safe would look like.And many pages of draft policy documents written.

By August things had changed significantly.In NSW, COVID was relatively under control.But the local Wentworthville area was a COVID hot-spot (our local government area had ~50 active cases and was one of the worst areas in Sydney), and the beginning of the second wave was hitting Melbourne.This gave us pause.There was no immediate need to return physical meetings, and we had to consider the possibility that Sydney could have a second wave just like Melbourne.

While we did not commit to a date to return, the wardens continued working hard to be COVID Safe.

In September, it was clear Sydney wasn’t heading to another lock down.So we committed to going back before Christmas - with an early November soft launch, followed by a few weeks online to debrief and make any changes, before a public re-launch in late November.

And that’s when the work stepped up a gear!We put together a Trello board listing off all the things we needed to do before relaunch day.There were 32 specific items on that list, ranging from moving the wooden pews, to be socially distant, to putting policies together addressing all the government requirements.The biggest single item in my orbit was training people in all the new processes.

At that point, we worked backwards from the due date.Training needed to happen in the week prior to relaunch.All the physical work needed to be completed before training (because any on-site training was effectively testing all our COVID Safe policies) - signage, moving of furniture, purchasing equipment, etc.And we needed to decide all our policies before everything.So September was mostly policy discussions and making final decisions.

In October, everyone was crazy busy implementing everything we decided for the early November relaunch date.While this is one line in a blog, remember that all the wardens are volunteers, so we were working most evenings and weekends to make it all happen.Crazy busy was an understatement!

In November, we actually launched!Our first in-person church meeting was on the 1st of November.There was a lot of nerves after we did our first live meeting since March.And plenty of awkwardness with social distancing.And way too much paperwork to ensure our compliance.

There were three weeks that we went back to online meetings.That was time where I could relax a little.

There was also some debriefing, which lead to a few tweaks and improvements to our processes and policies, which took up some time.

Finally, on 29th of November, we went back to face-to-face church meetings permanently, much to everyone’s delight! (Assuming no further COVID outbreaks occur in Sydney).

By December, we were getting back into the groove of in-person meetings.The NSW government unexpectedly relaxed the restrictions for places of worship.While this did not affect us significantly, it did require us to update our paperwork and policies.The process of these minor updates is now pretty straight forward, so the amount of work is minimal.

Which brings us to mid-December, where I have finally got enough time to write an article!

Only to have a new infection cluster emerge after 6 weeks of zero cases, and see restrictions become stricter once again!

What Does COVID Safe Look Like?

While all places of worship in NSW had to follow the same set of Covid Safe rules, we had some freedom of how to implement them.The following list is what we did (taken from our COVID Safe training slides).

Don’t come if you are sick. Part of that is temperature checks on entry.
Capacity limits to our buildings (main auditorium is limited to 50).
1.5m social distancing applies.
No sharing - so no hand outs or Bibles; the Lords Supper is pre-packaged.
Masks are recommended (but not required).
No singing (this changed from 13th December).
Lots of posters reminding people to stay distant, hand santise and maximum numbers allowed in a room. The government provided most of these, we just had to print them.
Publish our conditions of entry via posters and online. Plus all our COVID related policies.

Those are the dot points. Plus a few more items I’ll going into more detail.

Record Attendance

Contact tracing has been a big part of the success of NSW Health’s containment of COVID - whenever a case appeared, lots of effort goes into working out where that person has been when potentially infectious and aggressively testing anyone they may have had contact with.While this isn’t so effective when there are hundreds or thousands of active cases, NSW rarely got beyond 100 active cases.

At church, we have to keep contact tracing records for 28 days.Our church database for the directory we publish for members is in MS Access.It’s not the most advanced technology, but it’s effective enough.The biggest change was to ensure the database is available on cloud storage so many people could access it, while also being secured.

We built a simple MS Access Report to produce a church roll for regular members - tick a box to indicate the person is here.Plus an A5 sheet to capture details of any vistors.These details are retained in our church safe (and hopefully never needed).

This was built based on my experience managing elections in Australia, which is entirely based on paper rolls and ballots (and righly so IMO).Quickly identify and mark off the majority of people, and have a mechanism for everyone else.

No “Mingling”

This was the most annoying requirement we had, because it went completely against what church meetings are about: we want to talk to people.Be they people who are regular members, who we can encourage as they walk with the Lord Jesus.Or if they are irregulars, who we want to re-connect with.Or visitors, who we want to welcome and extend the news of salvation in Jesus.Church is about people.And the no mingling rule made that really hard.

Our instructions:

Following the Premier’s advice, we should ensure that members of our congregations do not mingle before, during or after the service. Where morning tea is served after the service, provision should be made for seating persons 1.5m apart, discouraging people from mingling or walking around.

We delayed going back to face-to-face meetings because we knew this would be a) unpopular, and b) very difficult to enforce.

Our policy on this was to enforce 1.5m distancing, and to encourage people to stay in their seats before and after the meeting.

However, in practise, there are a number of people who need to be moving around (ushers, musos, leaders, techs, etc).So it becomes very difficult to enforce “stay in your seats” when half the people present have a reason (or perhaps “excuse” is a better word) to be moving around.And while the number of cases were low, the risk factors meant enforcing this would do more harm than good.(This is changing since the cases appearing in mid-December).

Online Option

The government requires churches to have an online option, so every church has (by rule of law) become tele-evangelists!!

I have been using OBS Studio quite a bit through 2020 to record training material at work.So it was my choice of streaming software.And we were doing online church via our Wenty Anglican YouTube channel, so YouTube was our broadcast platform.We did a number of tests in the lead up to our first meeting - including streaming our training material.

As of December, it’s usable, if a little unprofessional (with the powerpoint slides appearing as part of the frame).IMO the most important part is the audio feed; if the video isn’t perfect its no big deal, but if you can’t hear then you might as well not bother streaming at all.

The main technical changes were:

Purchase an HDMI to USB converter so we could get video from a camera.
Purchase a decent camera so we could… well… have video!
Re-arrange our existing computers so the most powerful desktop was doing streaming.
Configure OBS Studio so it was easy for our regular sound operators to use it.

We have a few simple scenes: three static slides, plus the live stream itself.And there are the Start and Stop buttons.And that’s it - it’s designed to be simple.

The operators click “Start”, then just before our meeting begins they click “LIVE STREAM”.At the end they click “Thanks”, then wait a few minutes before clicking “Stop”.

Our first lesson was to both watch and listen to the live stream during the meeting to ensure it’s working as expected.There were several times when there was no audio (because of incorrect OBS config), or no stream (because the YouTube stream somehow was marked “private”).

The trickiest technical things were:

Making sure OBS picked up the right audio input device - it has a habit of changing back to the “default audio device”.
Removing background noise using appropriate filters in OBS. I’m running a noise gate for now, because the noise removal made for beautiful speech, but murdered any songs or music.
Getting the audio and video in sync (which still isn’t 100% right).

The main improvements we’re planning are:

Connect the streaming computer via wired ethernet instead of WiFi. We have had several drop outs relating to WiFi.
Purchase an additional HDMI to USB converter so we can run the powerpoint slides as a direct feed. The current way of framing the speaker + screen on the wall isn’t ideal.

Cleaning

Cleaning was one of the big COVID requirements: we need to clean any regularly touched surface after every meeting to remove the COVID virus.Again, simplicity is key to making sure this happens and is effective.

Many churches have taken to issuing alcohol wipes to each person so they can clean their seat after the meeting.We took a slightly different route: Glen 20.

After the meeting, we ask a few people to spray down all the wooden pew seats with Glen 20.It takes about 5 minutes and ensures more uniform cleaning as trained people are doing it.And a few others are responsible for other parts of the building.

There is a long list of other things we need to clean as well (tables, lectern, door knobs, benches, light switches, etc), but Glen 20 is effective on almost all of them.Electrical equipment is our biggest problem - as spraying 60% alcohol into electronic devices is bound to break things.

For everything else, we quarantine for 96 hours.

Training

My biggest time sink in the lead up to face-to-face meetings was training.We needed to get all our core members up to speed quickly, and as many other regulars as well.Although we were used to doing “COVID things” in other public places and our own homes, church counts as a “business”, so we need to be consistent and meet a higher standard.

There were two sides to this: written material, which contained extra details for specific cases and ministries. And a recording for general training. And then a summary recording so people could see exactly what to expect as they arrive.

The written material is available on our church’s website.We ended up calling it our COVID Safe Playbook, which contains all our policies and procedures required.It makes for pretty dry reading (as is the case for most compliance documents).

For a less boring approach, I recorded a screen cast summarising the playbook using powerpoint slides based on the playbook.This is similar to what I’ve done several times at work.It went for 60 minutes, and was still pretty boring (only slightly less bad than the written version).I also did the same training material as a live stream on two occasions, to give people maximum chance of hearing it - in both live stream cases, it was done to an empty auditorium!

Finally, we recorded a short what to expect video, focused on what someone just walking in the door should know.That one is under 3 minutes!

After we went back to in-person meetings, we found we needed more people trained using the sound desk and computers.So I’ve recorded a few tech training videos walking people through the minimum requirements for making church audible and broadcast in 2020.These ones were done via my phone and edited using the Windows Photos app (which is just barely suitable for the task at hand).

Conclusion

COVID compliance: This is where my last six months has gone.

I’ve learned many new technical things (live streaming, video editing, YouTube).I’ve applied my previous knowledge of creating policy documents.I’ve found how alcohol kills viruses.I’ve used paper to record attendance.And tried to train people how to be COVID Safe.

Hopefully, it will actually stop at least one person getting COVID, and maybe even save a life.

(But at the moment, it feels like fifty+ hours of my time wasted on bureaucracy and paperwork).

Murray's Blog

Installing Home Batteries - Part 3

Background

Does it Work?

Gotchas

Washing Machine

Timing Charging

Sunny vs cloudy days

Death By 1000 Conversions

Clamp Gotchas

Leaking Power

Not Enough Energy

Servers

Limited High Draw Appliances

Micromanagement

Do They Save Me Money?

Conclusion

Installing Home Batteries - Part 2

Background

Choosing Batteries

Bluetti AC200L + B300

Specs and Capabilities

AC and DC Outputs

AC and DC Inputs

Status Screen

B300 Expansion

Weight and Size

App

App - Charging Mode

App - Working Mode

App - Advanced Settings > Max Charging Current of Grid

App - Data Logging

Tear Down

Electrical Work

Installed Batteries

Conclusion

Installing Home Batteries - Part 1

Background

Goal

Plugin and Clamp Meters

Home Assistant

The Data!

Solar Power is great in Summer, and sucks in Winter

Plugin Meters Show Low Draw Appliances

Clamp Meters Show High Draw Appliances

Evening Peak

Conclusion

Building a CPRNG called Terninger - Part 17 Persistent State and Entropy Source

Background

Goal

Wire Up Persistent State to Entropy Sources

Tracking Targets

Discover New Targets

TCP Ping

That’s All For Now!

Building a CPRNG called Terninger - Part 16 Worker Loop State

Background

Goal

Details

Main Worker Loop

Load / Set

Get / Save

Worker Loop

Security: Entropy Pool

Add Extra Entropy When Loading

Saved Entropy is Just One Source of Many

Next Up

Building a CPRNG called Terninger - Part 15 Object State

Background

Goal

Details

Interface

Use Cases

HasUpdates

PersistentEventType

Final IPersistentStateSource

Implementing

Simple Implementation: PooledEntropyCprngGenerator

Nested State: EntropyAccumulator / EntropyPool

Security: Entropy Pool