On February 15, 2021, as a polar vortex pushed temperatures across Texas to levels not seen in decades, the Electric Reliability Council of Texas (ERCOT) began ordering rolling blackouts. Within hours, those rolling blackouts became sustained outages affecting more than 4.5 million homes and businesses. The outages lasted, in some cases, for four consecutive days. At least 246 people died, according to the Texas Department of State Health Services official count, though independent analyses have estimated the true figure at several times that number. The economic damage, as assessed by the Federal Energy Regulatory Commission's joint inquiry, reached an estimated $130 billion to $195 billion. The event was not a surprise in the sense that engineers had long warned about the vulnerability of the Texas grid to extreme cold. It was a surprise in the sense that the scale and duration of failure exceeded what most planners had modelled.
Winter Storm Uri was the most economically damaging grid failure in U.S. history, but it was not an isolated event. In December 2022, Winter Storm Elliott caused widespread outages across the Eastern Interconnection, with NERC's post-event analysis finding that generators failed to perform at levels consistent with their capacity commitments. The UK experienced its own grid stress events: in June 2023, record temperatures combined with low wind generation pushed the National Grid ESO to issue multiple warnings about system margins, and the winter of 2023-2024 saw several periods where gas-fired generation was running at near-maximum output to cover shortfalls. Australia's National Electricity Market has dealt with repeated summer stress events, including a period in January 2024 when South Australia experienced demand peaks coinciding with low wind and solar output, requiring emergency demand response activation.
These events share a common pattern: a centralised power system, designed around large generating units connected by a high-voltage transmission network, fails in a correlated way when a system-wide stress event occurs. The failure is not random -- it is structural. And it points toward a different approach to reliability.
The Fragility of Centralised Systems
Modern electricity grids were designed on a principle of centralised redundancy: build more generating capacity than you need, connect it through a robust transmission network, and maintain reserves that can be called upon when demand exceeds expectations or generators trip offline. This model served well for decades when the primary risks were isolated generator outages (a single power plant going offline) or localised transmission faults (a line going down in one area). The reserve margin -- the excess of available capacity over peak demand -- was the primary metric of system reliability, and maintaining it at levels of 15-20% was generally considered adequate.
The problem is that the risks facing modern grids are increasingly correlated rather than isolated. Extreme heat events reduce the output of thermal generators (which lose efficiency at high temperatures) while simultaneously increasing air conditioning demand across entire regions. Cold snaps can freeze natural gas infrastructure, disable wind turbines, and increase heating demand all at once. These are not independent failures that can be managed through simple reserve margins; they are system-wide stresses that can affect dozens of generators simultaneously.
The North American Electric Reliability Corporation's 2023 Long-Term Reliability Assessment warned that much of North America faces elevated risk of electricity shortfalls during extreme weather events, noting that resource adequacy assessments based on traditional probabilistic methods may not adequately capture the tail risks associated with correlated, widespread generator outages. NERC identified two-thirds of North America as being at elevated or high risk of insufficient resources during peak demand conditions.
The traditional response to this problem is to build more centralised capacity and more transmission. But this approach faces increasing challenges. New transmission lines take 7-12 years to permit and build in the United States, according to DOE transmission studies. Natural gas peaker plants, long the workhorse of grid reliability, face uncertain economics in a decarbonising grid and growing public opposition. New nuclear capacity takes a decade or more to develop. The grid needs reliability solutions that can be deployed faster and at smaller scale.
Distributed Resilience: A Different Architecture
Distributed energy resources -- rooftop solar, behind-the-meter batteries, small-scale combined heat and power, and EV chargers with vehicle-to-grid capability -- offer a fundamentally different reliability architecture. Instead of concentrating generation in a few large plants and relying on the transmission network to deliver power where it is needed, distributed resources generate and store power at or near the point of consumption. When the grid goes down, a building with solar and battery storage can continue to operate, at least partially, on its own power. When the grid is stressed but not failed, distributed resources can reduce demand on the central grid by supplying local loads and, in aggregate, provide system-level services that help maintain grid stability.
This is not a theoretical proposition. During Winter Storm Uri, buildings in Texas equipped with solar-plus-battery systems with islanding capability maintained power throughout the event while their neighbours went dark. The Solar Energy Industries Association documented cases of homes with Tesla Powerwall and Enphase IQ battery systems maintaining critical loads for 2-4 days during the outage, drawing on stored energy and, when conditions permitted, charging from rooftop solar during the brief periods of sunlight between storm bands.
The examples extend well beyond individual homes. The Blue Lake Rancheria, a tribal community in Northern California, built a microgrid comprising solar panels, battery storage, and intelligent controls that has kept the community powered through multiple Pacific Gas and Electric Company (PG&E) Public Safety Power Shutoffs (PSPS) -- the deliberate de-energisation of transmission lines during high wildfire risk conditions that has affected millions of Californians. The Blue Lake Rancheria microgrid demonstrated that a community-scale distributed energy system could island from the grid and maintain full operations, including powering a gas station and a community centre that served as an emergency shelter, for extended periods.
At the grid level, virtual power plants (VPPs) -- aggregations of distributed resources that are coordinated to behave like a single, dispatchable power plant -- are demonstrating system-level resilience contributions. In South Australia, Tesla's Virtual Power Plant, which aggregates thousands of residential Powerwall batteries, has repeatedly responded to grid contingency events, injecting stored power into the grid within milliseconds of detecting a frequency disturbance. During a September 2023 grid event in which two large generators tripped simultaneously, the South Australia VPP delivered over 100 MW of response within seconds -- faster than any conventional generator could have ramped up.
The Economics of Resilience
Historically, resilience has been treated as a luxury -- something that hospitals, data centres, and military bases pay for, but that most commercial and residential customers do not value highly enough to justify the cost. This calculus is changing as extreme weather events become more frequent and grid outages more common.
The cost of power outages is substantial and often underestimated. The U.S. Department of Energy has estimated that power outages cost the American economy $150 billion or more annually in lost economic activity, spoiled inventory, productivity losses, and emergency response costs. For commercial and industrial customers, the costs are highly concentrated: a manufacturing facility that loses power for 8 hours may lose an entire production run; a grocery store that loses refrigeration for 24 hours must discard perishable inventory; a hotel that cannot operate during a power outage loses room revenue and incurs reputational damage.
Lawrence Berkeley National Laboratory has developed detailed estimates of the "value of lost load" (VOLL) across different customer segments, finding that the average cost of an unserved kilowatt-hour ranges from roughly $10 for residential customers to over $100 for commercial and industrial customers, depending on the duration and timing of the outage. These figures mean that even modest improvements in reliability can be economically justified. A 250 kWh battery system that prevents a commercial facility from experiencing 8 hours of outage per year is avoiding $10,000 to $50,000 or more in economic losses -- often comparable to or greater than the annual debt service on the battery itself.
The economic case for distributed resilience is further strengthened by the fact that the same assets that provide resilience also provide everyday economic value. A battery that serves as backup power during outages spends most of its time performing demand charge management, energy arbitrage, and grid services -- generating revenue that offsets its cost. This "stacking" of resilience value on top of everyday operational value fundamentally changes the cost-benefit calculation. The customer is not paying for a backup generator that sits idle 99% of the time; they are paying for an active energy management system that also happens to provide backup power.
From Preventing Failure to Designing for Graceful Degradation
The traditional approach to grid reliability is binary: the system works or it does not. A building has power or it is in blackout. The grid maintains frequency and voltage within specifications or it collapses. This binary framing drives a design philosophy focused entirely on preventing failure -- building enough redundancy that the probability of total failure is acceptably low.
Distributed energy resources enable a different design philosophy: graceful degradation. A building with solar, storage, and intelligent load management can operate at reduced capacity when the grid is down, powering critical loads (lighting, refrigeration, communications, security) while shedding non-essential loads (aesthetic lighting, non-critical HVAC zones). The building does not have full power, but it has enough to continue functioning. This is fundamentally different from both "full power" and "total blackout," and it is a much more realistic and cost-effective reliability target for most facilities.
The concept is well established in other engineering domains. Aircraft are designed with multiple redundant systems so that the loss of one hydraulic system or one engine does not result in a total loss of aircraft control. Data centres use redundant servers and load balancing so that the failure of individual machines does not take down the entire service. The energy industry has been slower to adopt this thinking, partly because the centralised grid model created the illusion that reliability was a system-level problem that could be solved at the system level. The reality, increasingly evident from recent events, is that system-level reliability must be supplemented by distributed, local resilience.
Behind-the-meter assets create what engineers call "natural redundancy" -- not because any individual system is particularly reliable (solar panels produce nothing at night; batteries have finite capacity), but because a diverse portfolio of small resources has different failure modes than a concentrated portfolio of large ones. When a single 500 MW gas plant fails, 500 MW of capacity disappears instantaneously. When one of a thousand distributed battery systems fails, 0.1% of the distributed capacity is lost. The statistical properties of large numbers of small, independent resources provide inherent stability that concentrated resources cannot match.
The Path Forward
Resilience is not a feature that can be retrofitted onto a centralised grid through incremental improvements. The fundamental vulnerability -- the dependence on a small number of large generators and long-distance transmission lines, all subject to correlated failure under system-wide stress -- is architectural. Addressing it requires a gradual but meaningful shift toward a more distributed architecture in which generation, storage, and intelligent load management are deployed throughout the distribution network, not just at a handful of central stations.
This does not mean abandoning the centralised grid. Large-scale generation and high-voltage transmission will remain essential for decades, and most electricity will continue to flow through the existing network. But the margin of resilience -- the capacity that keeps the lights on when things go wrong -- is increasingly likely to come from distributed resources. The economics support it: behind-the-meter solar and storage generate everyday savings that subsidise their resilience function. The technology enables it: batteries, inverters, and intelligent controllers are mature, declining in cost, and capable of autonomous operation during grid outages. And the evidence demands it: every major grid failure of the past five years has demonstrated that centralised redundancy, however well-engineered, has limits that distributed resources can help address.
The question is no longer whether distributed resources will play a role in grid resilience, but how quickly the institutional and financial structures can adapt to deploy them at the scale needed. That is a question of policy, market design, and -- critically -- the operational infrastructure needed to manage thousands of small assets as a coherent system. The technology is ready. The economics work. The remaining challenge is execution.