Scaling for MHHS: how Octopus Vitality achieved a 50x price discount in margin knowledge engineering

May 26, 2026

19

The vitality transition has a knowledge downside

The UK’s vitality grid is in the course of its most vital structural transformation in many years. As renewables like wind and photo voltaic take a bigger share of electrical energy era, intermittency turns into a first-class downside: vitality is reasonable when the solar shines and costly when it would not.

The prevailing settlement mannequin – constructed on month-to-month meter reads and averaged consumption profiles – can not worth that sign precisely. And if you cannot worth it precisely, you may’t go the sign to shoppers, and demand by no means shifts to match provide.

Market-wide Half-Hourly Settlement (MHHS) is the regulatory response. Each family in Nice Britain strikes from two meter reads monthly to 48 reads per day. That’s not an incremental change. For a provider like Octopus Vitality serving over 8 million prospects, it’s a 48x improve within the knowledge factors driving each margin calculation, each settlement obligation, and each industrial choice.

The info engineering implication is direct: with out re-architecture, the infrastructure price to run Octopus Vitality’s margin pipelines was projected to balloon by $1 million yearly.

Why throwing compute at this does not work

The intuition when knowledge volumes improve 48x is to provision extra infrastructure. For Octopus Vitality’s margin knowledge group, that intuition was shortly validated as untenable. The projected price per settlement date underneath the legacy structure was $23.63 – a 33x improve from historic norms. Multiply that throughout settlement home windows, and the invoice compounds quick.

Nevertheless, the deeper downside was not compute price – it was structure mismatch. The legacy pipeline had been constructed round a single grain: month-to-month. Billing ran month-to-month. Settlement ran month-to-month. The complete pipeline was monolithic by design.

MHHS launched a basic break up. Business price knowledge now arrives at half-hourly granularity – 48 knowledge factors per buyer per day. Sensible tariff prospects with EVs and warmth pumps want half-hourly income calculations. Commonplace tariff prospects nonetheless settle month-to-month. Working all three via a single monolithic pipeline meant processing your entire dataset on each run, no matter what had truly modified.

As Saad Ali, Lead of the Margin Knowledge Group at Octopus Vitality, framed it: “You may’t simply throw extra compute at an issue like this. It’s a must to rebuild and rethink your logic from the bottom up.”

The structure: three streams, one supply of fact

The group re-architected round three specialised streams, every optimised independently for its pure grain:

Settlement – Half-hourly granularity for regulatory settlement and value allocation. Business costs at 48 knowledge factors per day; this stream matches that grain precisely.

Half-Hourly – Half-hourly processing for sensible tariff prospects: EV drivers, warmth pump customers, and time-of-use merchandise the place the half-hourly worth sign is your entire industrial proposition.

Month-to-month – Month-to-month processing for traditional tariff prospects, unchanged in grain however now reconcilable towards the half-hourly knowledge.

A “Job of Jobs” orchestration sample manages dependencies and parallel execution throughout all three streams. Every stream is independently tunable – what works as a Spark optimisation for Settlement is just not essentially proper for NHH.

Underpinning all three is the downstream consumption layer: a unified, multi-grain supply of fact consolidating meter reads, sensible meter knowledge, and business flows at multi-terabyte scale. This layer is the reconciliation bridge between month-to-month billing and half-hourly settlement – and it grew to become the positioning of the one highest-leverage optimisation within the mission.

Incremental processing: 98.8% fewer rows

The naive method to the upstream consumption tables – reprocessing your entire multi-terabyte dataset on each run – would have meant unsustainable compute prices on the new quantity.

Delta Lake’s Change Knowledge Feed (CDF) made true incremental processing viable at this grain. As a substitute of full overwrites, the pipeline now reads solely data which have truly modified because the final run. The outcome: rows processed per run dropped from 25 billion to 300 million – a 98.8% discount.

Knowledge freshness improved from weekly to each day. For the industrial group, that shift means margin visibility on the grain the place pricing selections are literally made – each morning, not as soon as per week.

Notice: the $1M in annualised financial savings figures cited under exclude the extra financial savings from this transfer to incremental processing on upstream tables. The total effectivity acquire is bigger.

Spark & Delta optimisation – and what to take away

With 48x extra knowledge flowing via the system, the group utilized focused optimisations validated by measurement throughout 4 classes:

Lineage and I/O discount

Simplified lineage by consolidating knowledge early within the pipeline, decreasing downstream joins and shuffle operations
Knowledge pruning: chosen solely the columns strictly essential for settlement and pruned rows on the earliest doable stage, decreasing I/O overhead earlier than costly transformations

Be part of and partition tuning

Broadcast joins for reference tables underneath 500MB, eliminating costly shuffle operations on advanced multi-key joins with date ranges
Liquid clustering was enabled throughout a number of tables for columns incessantly utilized in filters and joins. Liquid clustering dynamically co-locates associated data on the required clustering keys with out requiring mounted partition boundaries. Liquid clustering avoids the small-file downside, larger reminiscence consumption, and I/O overhead that come from over-partitioning.

Trusted the optimiser

In a number of circumstances, Spark’s Adaptive Question Execution (AQE) outperformed hand-tuned logic. The group eliminated customized optimisation code and let AQE do its job.

That final level bears emphasis: eradicating unjustified compute operations was as impactful as including new optimisations. In case you are working Z-ordering or ANALYZE with out measuring their impact, they might be costing you greater than they’re saving.

Serverless as a growth accelerator

Databricks Serverless made the three-month supply window viable. Zero cluster startup time meant the group might iterate quickly – write, run, measure, alter – with out ready for infrastructure to provision.

The Serverless UI enabled side-by-side run comparisons, making it sensible to isolate the impact of particular person optimisations.

Within the group’s personal phrases: “The testing and growth course of couldn’t have been finished with out serverless. Utilizing the serverless UI helped us to establish bottlenecks and make straightforward comparisons between totally different runs.”

Outcomes

Metric	Earlier than	After	Change
Rows processed per run	25 billion	300 million	98.8% discount
Price per settlement date (projected MHHS)	$23.63	$0.48	~50x discount
Price per settlement date (vs legacy)	$0.71	$0.48	2x extra environment friendly
Financial savings per month-end run	–	~$83,000	vs unoptimised projection
Annualised price avoidance	–	~$1,000,000	excludes upstream financial savings
Knowledge freshness	Weekly	Day by day	7x enchancment
Construct time	–	3 months	Group of three

The $0.48 per settlement date isn’t just a 50x discount from the MHHS projected price – it’s 2x cheaper than the legacy system had ever been, regardless of processing 48x extra knowledge factors. Re-architecture delivered regulatory compliance and made the system materially extra environment friendly than the one it changed.

What this implies past vitality

MHHS is a UK vitality regulation. Nevertheless, the sample it represents – a regulatory or enterprise occasion that multiplies knowledge quantity at a finer grain – is just not distinctive to vitality. Any time a system strikes from month-to-month to each day, each day to real-time, or combination to transactional, the identical dynamics apply.

4 transferable takeaways from the Octopus Vitality expertise:

Grain misalignment is the hidden price driver. When a pipeline processes the whole lot on the best grain no matter enterprise want, you pay for it in compute, freshness, and upkeep complexity. Establish the pure grains in your knowledge and align processing to them.
Incremental processing transforms pipeline economics. The 98.8% row discount got here from CDF-based incremental logic, not Spark tuning. Begin there – and keep in mind the complete financial savings are bigger than the headline determine.
Take away earlier than you add. Audit current optimisation selections earlier than assuming you want extra compute. Z-ordering, ANALYZE, and customized shuffle logic utilized with out measurement could also be costing you greater than they save.
Belief the optimiser. AQE outperformed hand-coded logic in a number of circumstances. Earlier than writing customized optimisation, take a look at whether or not Spark already handles your case.

The larger image

Within the phrases of Saad: “By making our techniques sooner and extra environment friendly, we will supply smarter tariffs that assist our prospects use vitality when it is most cost-effective and cleanest.”

The lowered price base does one thing particular: it removes the financial barrier to high-frequency knowledge processing. That makes grid balancing viable as a product. That makes sensible tariffs commercially sustainable. That’s how knowledge engineering at scale connects to the vitality transition – not as infrastructure overhead, however because the industrial basis for it.

MHHS compliance was the mandate. Making sustainable vitality the reasonably priced choice is the mission. The info engineering is what connects the 2.

Go additional

———

Saad Ali is Lead of the Margin Knowledge Group at Octopus Vitality. Ismail Makhlouf, David Poulet, and Daniel Taylor are Options Architects at Databricks.

Scaling for MHHS: how Octopus Vitality achieved a 50x price discount in margin knowledge engineering

The vitality transition has a knowledge downside

Why throwing compute at this does not work

The structure: three streams, one supply of fact

Incremental processing: 98.8% fewer rows

Spark & Delta optimisation – and what to take away

Serverless as a growth accelerator

Outcomes

What this implies past vitality

The larger image

Go additional

Related Articles

AWS Weekly Roundup: Native Zone in Athens, Claude Opus 5 on AWS, Lambda sturdy execution for .NET, and extra (July 27, 2026)

Comfortable robotic coronary heart provides new strategy to examine illness and take a look at life-saving units

Community brokers are “prepared” for industrial use, however are telcos? (Analyst Angle)

LEAVE A REPLY Cancel reply

Latest Articles

AWS Weekly Roundup: Native Zone in Athens, Claude Opus 5 on AWS, Lambda sturdy execution for .NET, and extra (July 27, 2026)

Comfortable robotic coronary heart provides new strategy to examine illness and take a look at life-saving units

Community brokers are “prepared” for industrial use, however are telcos? (Analyst Angle)

Finest Items for Mother (2026): E-Readers, Digital Wall Calendar, Sensible Fowl Feeders

GEFERTEC launches WAAM system for additive manufacturing of titanium components

ABOUT US