Skip to content
Hominis Agentic OS — early access program now openJoin the waitlist
RealAI
InsightsOil & Gas

Real-Time Anomaly Detection: Turning Noisy Telemetry into Predictive Maintenance

RealAIJul 10, 202411 min read
Oil & GasOperations
Flowmeter telemetryalert thresholdanomalyFlowmeter telemetry

Your production facility is streaming thousands of data points every second. Flowmeter readings, pressure transducers on the manifold, vibration sensors bolted to pump shafts, bearing-temperature logs from rotating equipment. Years of history sit in the SCADA historian. And today, a compressor bearing began degrading silently. By the time the on-site tech heard the noise, an unplanned shutdown had already started cascading across the platform.

This is the operational reality of legacy brownfield plants. Sensors are everywhere. The signal-to-noise ratio is terrible. Baselines shift across equipment types, operating conditions, and seasons. And the gap between "something is wrong" and "something measurably wrong enough to act on" is exactly the gap where unplanned downtime is born.

Real-time anomaly detection — a hybrid of statistical process control and deep-learning autoencoders tuned to your specific telemetry noise — closes that gap. It catches degradation within seconds of emergence, routes the alert into your existing SCADA workflow, and does it with a false-positive rate low enough that your crews will actually believe the signal. In the engagements behind this work, that combination cut unplanned downtime by 40% and lowered maintenance cost by 25%.

A now-split production timeline: a jagged raw flowmeter history feeds two forecast cones. The naive cone off the raw signal balloons to ±12 index and droops off-centre by 6 months, while the AI cone off the cleaned signal stays ±4 and hugs the realised actuals — a 3.4× width gap. The naive band is so wide it tells you nothing; only the tight AI band is plannable. untrustworthy.
Exhibit 1The forecast is only as good as the meter.From a jagged raw flowmeter history, two forecasts project: a naive one off the raw noise balloons into a wide, drooping band, while an AI forecast off the cleaned signal stays tight on the actuals. Drag the horizon and watch the naive band balloon while the AI band holds.

The Signal Hidden in the Noise

The first problem anomaly detection has to solve is not detecting faults. It is distinguishing normal operational variation from a real fault on equipment that was never meant to work the same way twice.

A flowmeter on an offshore platform does not stream at a constant rate. Production ramps. Well pressure shifts with depletion. Sand gets into the line and the flow profile gets noisier. Seasonal temperature swings shift baseline readings. Add to that the hardware reality: different flowmeter types (Coriolis vs. orifice plate) have different noise characteristics. A pump running near nameplate capacity looks different than the same pump running well below it. And a brownfield facility has many overlapping "normal" regimes layered on top of each other.

Legacy alarm systems respond to this chaos with fixed thresholds. Flow above a set point triggers an alert. Vibration above a set point triggers an alert. The result is either high false-positive exhaustion — your crews stop trusting the signals within weeks — or thresholds set so high that real degradation screams in the noise before it surfaces.

Anomaly detection sidesteps that trap by learning what normal looks like per asset, per operating condition. A statistical process control layer establishes a moving baseline — the mean and variance of flowmeter readings over a rolling window, separated by equipment type. An autoencoder — a deep-learning architecture that compresses telemetry into a latent representation and reconstructs it — learns the temporal patterns that show up in healthy equipment and flags deviations.

The hybrid matters. SPC is fast and explainable — your operations team can read the reason a reading triggered an alert — but it fails on subtle, multivariate patterns. An autoencoder catches patterns across many sensor streams at once, but on its own it can flag an anomaly nobody understands. Together, they partition the detection space: SPC catches sharp swings and drift. Autoencoders catch slow degradation and coupled failures.

The feature engineering is where the detection actually happens. Instead of feeding raw telemetry — one data point per second — the pipeline extracts temporal features at multiple timescales: rolling means and variances over short windows, longer-horizon trends, autocorrelation at lag, and the spectral power in the vibration band where bearing wear first shows. A single degrading bearing emerges first in the high-frequency vibration signature before it shows in mean temperature. A flowmeter clogging shows as rising pressure before flow drops. Those multi-timescale signatures are the patterns the autoencoder learns to spot.

The result: a system that can say, "Yes, your pump is running below nominal flow and temperature is slightly above the seasonal average — that is normal for these operating conditions. But the vibration spectrum just shifted into a degradation signature we see ahead of rotating-equipment failures. Alert."

Brownfield Integration: No Rip-and-Replace

The second problem is harder than detection: how to land a real-time anomaly system into a decades-old plant without shutting down production.

The system you have is SCADA and historian — probably OSIsoft PI, maybe Wonderware, possibly in-house legacy infrastructure. You cannot touch the control layer. You cannot rip out the historians. The data is coming whether or not the anomaly detector is there. The operator interface is what it is. And every system outage costs you a shutdown.

The integration pattern that works: the anomaly system sits on top of the data layer, not inside it. It reads historians via standard OPC-UA or PI polling interfaces. It consumes the same historian that the operators' dashboards consume. It scores telemetry in real time as it streams in. When it detects an anomaly, it writes back into the SCADA alarm queue — the same place that pressure-relief-valve triggers and high-level shutdowns land — so the alert surfaces in the operator's existing workflow.

No rip-and-replace. No new sensors. The historians stay where they are. The control logic stays unchanged. The anomaly system is a passenger on the data bus, pulling signals and adding intelligence.

That integration depth is what made the system deployable across multiple facilities without a capital project. A facility's IT team did not have to redesign network segmentation. Cybersecurity did not have to certify new control pathways. Operations did not have to retrain on a new interface. The detector showed up, plugged into the historian socket, and alerts began arriving in the alarm console they already monitored.

The Detection / Trust Tradeoff

For an anomaly system to survive the first month in production, it has to make a specific trade: sensitivity vs. false positives.

A detector tuned for maximum sensitivity — catching every possible degradation pattern — will alert constantly. Crews will ignore it. They will create standing rules to suppress the most common alerts. Within weeks, the system is neutered.

The systems that held in production achieved 95% detection at under 2% false positives. That discipline is the whole game: when the overwhelming majority of alerts turn out to be genuine, crews investigate them. When they investigate them and find real issues, they start to trust the detector. That trust is what changes behavior on the floor — and it is exactly what fixed thresholds can never buy.

Getting to that tradeoff point requires tuning the autoencoder to your specific asset and your specific tolerance for false alerts. A critical compressor warrants a tighter false-positive budget than a flowmeter that operators can isolate. That tuning is not a one-time fire-and-forget calibration. It lives in the Sustain phase.

Process flow · hover a step to trace it
Hybrid anomaly detection from telemetry to maintenance

A Ranked Roadmap: Where Downtime Actually Hides

Not every equipment failure costs the same. A compressor bearing can take down the whole platform. A single flowmeter can be isolated. A pump casing crack spreads to corrosion. A valve drift is a slow financial bleed.

The Assess phase starts with a telemetry audit. You map every streaming asset: which ones have the longest mean time to recovery (MTTR) when they fail, which failure modes have the highest production impact, which equipment types are candidates for condition-based maintenance (rotating equipment, yes; mechanical seals, maybe; pressure transducers, no). You establish per-asset baselines by drilling into the historian — what does healthy look like for this pump on this flowline? — and pull ground-truth failure records. When did that compressor last fail? How much production did it cost? How would earlier detection have helped?

That forensic inventory produces a ranked list. Deploy anomaly detection first on the assets where earlier detection buys the most downtime savings. That is usually rotating equipment: compressors, pumps, turbines. That is the 4–6 week Assess phase roadmap.

95%
Detection at <2% false positives
40%
Less unplanned downtime
25%
Lower maintenance cost
4–6 wks
To a ranked roadmap

Staying Trusted as Operating Conditions Drift

Here is where brownfield AI usually breaks: the model works the first month, then production patterns shift, sensor drift creeps in, and false-positive creep kills the crew's trust.

A seasonal shift in well pressure moves the baseline. You add a new production line and the flowmeter load profile changes. A sensor ages and its output drifts. The crew starts running the facility differently — instead of steady-state operations, now it is intermittent, cycling to clear sand. All of that moves the "normal" distribution that the SPC baseline and autoencoder were trained on.

The Sustain phase is real-time monitoring for drift. The system watches the false-positive rate — if it starts climbing above the baseline you tuned to, retraining is triggered. It watches for seasonal transitions and pre-retrains on the new operating regime before false positives spike. It logs every alert that turned out to be noise and uses those to refine the decision threshold. When a new flowmeter type gets installed, the model is re-baselined on historical data from that meter type before it goes live.

This is not glamorous. It is the work that keeps a system in production instead of relegated to a pilot project. But it is also where the reactive-to-proactive shift actually sustains across years of operation.

The systems that hold crews' trust are not the ones with the highest detection rate — they are the ones that alert on genuine faults and stay silent when the equipment is just being noisy.

Where to Start

The first 4–6 weeks focus on three concrete outputs: asset baseline establishment, failure-mode prioritization, and SCADA integration mapping.

Start by selecting one high-impact asset class — usually a production platform's centrifugal compressor, or the main lift-gas compressor on a deepwater facility. Pull a long, clean stretch of telemetry from the historian for that equipment from periods when it was operating healthily. Establish the per-operating-condition baselines: what do the temperature, vibration, and pressure readings look like near nameplate capacity? At partial load? During a pressure ramp? That baseline is the ground truth the anomaly models will train against.

Second, pull the failure history: when did similar equipment fail in your fleet? What were the observable telemetry signatures in the hours before failure? Did vibration start climbing slowly, or did it spike overnight? Did pressure gradients across the compressor shift? Those failure signatures are the patterns the autoencoder is built to catch.

Third, map your SCADA architecture: where is the historian server? What protocols does it expose (OPC-UA, PI SDK, REST)? How is the operator console configured? What is the ingestion latency for new alarms? That integration map determines whether alerts land in real time or with a delay — a material difference for rotating equipment.

The output of Assess is a prioritized roadmap: which assets to deploy anomaly detection on first (ordered by downtime impact × detection feasibility), what baseline and ground-truth data you need to pull and clean, and what integration work the IT team needs to ready. Most oil & gas operations see first-wave value — genuine faults caught and acted on early — on rotating equipment, with maintenance cost lowered 25% and unplanned downtime cut 40% as the first assets baseline and the crews develop trust in the signal.

Then the real work: wiring the alerts into the maintenance procedures, tuning the false-positive rate until the crews act on every alert, and building the monitoring that keeps the system honest as conditions shift across the seasons and the platform ages.

The systems that hold crews' trust are not the ones with the highest detection rate — they are the ones that alert on genuine faults and stay silent when the equipment is just being noisy.

Get in touch

Put RealAI’s applied-AI team on your hardest data problem.

We help enterprises move from pilots to production — sovereign models, governed data, and agents you can audit. Start with a value-first assessment.

Next step

Ready to make AI real?