Technical Approaches and Sensible Tradeoffs

July 15, 2025

18

Technical Approaches and Sensible Tradeoffs

(Chuysang/Shutterstock)

On this planet of monitoring software program, the way you course of telemetry knowledge can considerably affect your skill to derive insights, troubleshoot points, and handle prices.

There are 2 main use instances for the way telemetry knowledge is leveraged:

Radar (Monitoring of techniques) normally falls into the bucket of identified knowns and identified unknowns. This results in eventualities the place some knowledge is sort of ‘pre-determined’ to behave, be plotted in a sure method – as a result of we all know what we’re searching for.
Blackbox (Debugging, RCA and so on.) ones then again are extra to do with unknown unknowns. Which entails to what we don’t know and should have to hunt for to construct an understanding of the system.

Understanding Telemetry Knowledge Challenges

Earlier than diving into processing approaches, it’s vital to grasp the distinctive challenges of telemetry knowledge:

Quantity: Fashionable techniques generate monumental quantities of telemetry knowledge

Velocity: Knowledge arrives in steady, high-throughput streams

Selection: A number of codecs throughout metrics, logs, traces, profiles and occasions

Time-sensitivity: Worth usually decreases with age

Correlation wants: Knowledge from totally different sources should be linked collectively

These traits create particular issues when selecting between ETL and ELT approaches.

ETL for Telemetry: Remodel-First Structure

Technical Structure

In an ETL strategy, telemetry knowledge undergoes transformation earlier than reaching its last vacation spot:

Fig. 1 — ETL for Telemetry

A typical implementation stack may embody:

Assortment: OpenTelemetry, Prometheus, Fluent Bit

Transport: Kafka or Kinesis or in reminiscence because the buffering layer

Transformation: Stream processing

Storage: Time-series databases (Prometheus) or specialised indices or Object Storage (s3)

Key Technical Elements

Aggregation Methods

Pre-aggregation considerably reduces knowledge quantity and question complexity. A typical pre-aggregation movement appears like this:

Fig. 2 — Aggregation Methods

This transformation condenses uncooked knowledge into 5-minute summaries, dramatically decreasing storage necessities and enhancing question efficiency.

Instance: For a gaming utility dealing with tens of millions of requests per day, uncooked request latency metrics (doubtlessly billions of information factors) will be grouped by service and endpoint, then aggregated into 5-minute (or 1-minute) home windows. A single API name that generates 100 latency knowledge factors per second (8.64 million per day) is decreased to only 288 aggregated entries per day (one per 5-minute window), whereas nonetheless preserving crucial p50/p90/p99 percentiles wanted for SLA monitoring.

Cardinality Administration

Excessive-cardinality metrics can break time-series databases. The cardinality administration course of follows this sample:

Fig. 3 — Cardinality-Administration

Efficient methods embody:

Label filtering and normalization

Strategic aggregation of particular dimensions

Hashing strategies for high-cardinality values whereas preserving question patterns

Instance: A microservice monitoring HTTP requests consists of consumer IDs and request paths in its metrics. With 50,000 day by day lively customers and hundreds of distinctive URL paths, this creates tens of millions of distinctive label combos. The cardinality administration system filters out consumer IDs fully (configurable, too excessive cardinality), normalizes URL paths by changing dynamic segments with placeholders (e.g., /customers/123/profilebecomes /customers/{id}/profile), and applies constant categorization to errors. This reduces distinctive time sequence from tens of millions to a whole bunch, permitting the time-series database to perform effectively.

Fig. 4 — Actual-time Enrichment

Actual-time Enrichment

Including context to metrics throughout the transformation section includes integrating exterior knowledge sources:

This course of provides crucial enterprise and operational context to uncooked telemetry knowledge, enabling extra significant evaluation and alerting based mostly on service significance, buyer affect, and different components past pure technical metrics.

Instance: A fee processing service emits fundamental metrics like request counts, latencies, and error charges. The enrichment pipeline joins this telemetry with service registry knowledge so as to add metadata in regards to the service tier (crucial), SLO targets (99.99% availability), and workforce possession (payments-team). It then incorporates enterprise context to tag transactions with their kind (subscription renewal, one-time buy, refund) and estimated income affect. When an incident happens, alerts are routinely prioritized based mostly on enterprise affect fairly than simply technical severity, and routed to the suitable workforce with wealthy context.

Technical Benefits

Question efficiency: Pre-calculated aggregates eradicate computation at question time

Predictable useful resource utilization: Each storage and question compute are managed

Schema enforcement: Knowledge conformity is assured earlier than storage

Optimized storage codecs: Knowledge will be saved in codecs optimized for particular entry patterns

Technical Limitations

Lack of granularity: Some element is completely misplaced

Schema rigidity: Adapting to new necessities requires pipeline modifications

Processing overhead: Actual-time transformation provides complexity and useful resource calls for

Transformation-time choices: Evaluation paths should be identified upfront

ELT for Telemetry: Uncooked Storage with Versatile Transformation

Technical Structure

ELT structure prioritizes getting uncooked knowledge into storage, with transformations carried out at question time:

Fig. 5 — ELT for Telemetry

A typical implementation may embody:

Assortment: OpenTelemetry, Prometheus, Fluent Bit

Transport: Direct ingestion with out advanced processing

Storage: Object storage (S3, GCS) or knowledge lakes in Parquet format

Transformation: SQL engines (Presto, Athena), Spark jobs, or specialised OLAP techniques

Key Technical Elements

Fig. 6 — Environment friendly-Uncooked-Storage

Environment friendly Uncooked Storage

Optimizing for long-term storage of uncooked telemetry requires cautious consideration of file codecs and storage group:

This strategy leverages columnar storage codecs like Parquet with acceptable compression (ZSTD for traces, Snappy for metrics), dictionary encoding, and optimized column indexing based mostly on widespread question patterns (trace_id, service, time ranges).

Instance: A cloud-native utility generates 10TB of hint knowledge day by day throughout its distributed companies. As an alternative of discarding or closely sampling this knowledge, the entire hint info is captured utilizing OpenTelemetry collectors and transformed to Parquet format with ZSTD compression. Key fields like trace_id, service title, and timestamp are listed for environment friendly querying. This strategy reduces the storage footprint by 85% in comparison with uncooked JSON whereas sustaining question efficiency. When a crucial customer-impacting situation occurred, engineers have been capable of entry full hint knowledge from 3 months prior, figuring out a delicate sample of intermittent failures that may have been misplaced with conventional sampling.

Partitioning Methods

Efficient partitioning is essential for question efficiency towards uncooked telemetry. A well-designed partitioning technique follows this hierarchy:

Fig. 7 — Partitioning-Methods

This partitioning strategy permits environment friendly time-range queries whereas additionally permitting filtering by service and tenant, that are widespread question dimensions. The partitioning technique is designed to:

Optimize for time-based retrieval (commonest question sample)

Allow environment friendly tenant isolation for multi-tenant techniques

Enable service-specific queries with out scanning all knowledge

Separate telemetry varieties for optimized storage codecs per kind

Instance: A SaaS platform with 200+ enterprise clients makes use of this partitioning technique for its observability knowledge lake. When a high-priority buyer studies a problem that occurred final Tuesday between 2-4pm, engineers can instantly question simply these particular partitions: /12 months=2023/month=11/day=07/hour=1[4-5]/tenant=enterprise-x/*. This strategy reduces the scan dimension from doubtlessly petabytes to only a few gigabytes, enabling responses in seconds fairly than hours. When evaluating present efficiency towards historic baselines, the time-based partitioning permits environment friendly month-over-month comparisons by scanning solely the related time partitions.

Question-time Transformations

SQL and analytical engines present highly effective query-time transformations. The question processing movement for on-the-fly evaluation appears like this (See Fig. 8).

This question movement demonstrates how advanced evaluation like calculating service latency percentiles, error charges, and utilization patterns will be carried out fully at question time without having pre-computation. The analytical engine applies optimizations like predicate pushdown, parallel execution, and columnar processing to realize cheap efficiency even towards giant uncooked datasets.

Fig. 8 — Question-time-Transformations

Instance: A DevOps workforce investigating a efficiency regression found it solely affected premium clients utilizing a selected function. Utilizing query-time transformations towards the ELT knowledge lake, they wrote a single question that first filtered to the affected time interval, joined buyer tier info, extracted related attributes about function utilization, calculated percentile response instances grouped by buyer section, and recognized that premium clients with excessive transaction volumes have been experiencing degraded efficiency solely when a selected elective function flag was enabled. This evaluation would have been not possible with pre-aggregated knowledge for the reason that buyer section + function flag dimension hadn’t been beforehand recognized as vital for monitoring.

Technical Benefits

Schema flexibility: New dimensions will be analyzed with out pipeline modifications

Price-effective storage: Object storage is considerably cheaper than specialised DBs

Retroactive evaluation: Historic knowledge will be examined with new views

Technical Limitations

Question efficiency challenges: Interactive evaluation could also be gradual on giant datasets

Useful resource-intensive evaluation: Compute prices will be excessive for advanced queries

Implementation complexity: Requires extra subtle question tooling

Storage overhead: Uncooked knowledge consumes considerably more room

Technical Implementation: The Hybrid Method

Core Structure Elements

Implementation Technique

Twin-path processing

Fig. 10 — -Twin-path-processing

Instance: A worldwide ride-sharing platform carried out a dual-path telemetry system that routes service well being metrics and buyer expertise indicators (experience wait instances, ETA accuracy) by the ETL path for real-time dashboards and alerting. In the meantime, all uncooked knowledge together with detailed consumer journeys, driver actions, and utility logs flows by the ELT path to cost-effective storage. When a regional outage occurred, operations groups used the real-time dashboards to shortly establish and mitigate the quick situation. Later, knowledge scientists used the preserved uncooked knowledge to carry out a complete root trigger evaluation, correlating a number of components that wouldn’t have been seen in pre-aggregated knowledge alone.

Sensible knowledge routing

Fig. 11 — Sensible Knowledge Routing

Instance: A monetary companies firm deployed a wise routing system for his or her telemetry knowledge. All knowledge is preserved within the knowledge lake, however crucial metrics like transaction success charges, fraud detection indicators, and authentication service well being metrics are instantly routed to the real-time processing pipeline. Moreover, any security-related occasions reminiscent of failed login makes an attempt, permission modifications, or uncommon entry patterns are instantly despatched to a devoted safety evaluation pipeline. Throughout a latest safety incident, this routing enabled the safety workforce to detect and reply to an uncommon sample of authentication makes an attempt inside minutes, whereas the entire context of consumer journeys and utility habits was preserved within the knowledge lake for subsequent forensic evaluation.

Unified question interface

Actual-world Implementation Instance

A particular engineering implementation at last9.io demonstrates how this hybrid strategy works in observe:

For a large-scale Kubernetes platform with a whole bunch of clusters and hundreds of companies, we carried out a hybrid telemetry pipeline with:

Vital-path metrics processed by a pipeline that:

Fig. 12 — Unified question interface

- Performs dimensional discount (limiting label combos)

- Pre-calculates service-level aggregations

- Computes derived metrics like success charges and latency percentiles

Uncooked telemetry saved in a cheap knowledge lake:

- Partitioned by time, knowledge kind, and tenant

- Optimized for typical question patterns

- Compressed with acceptable codecs (Zstd for traces, Snappy for metrics)

Unified question layer that:

- Routes dashboard and alerting queries to pre-aggregated storage

- Redirects exploratory and ad-hoc evaluation to the info lake

- Manages correlation queries throughout each techniques

This strategy delivered each the question efficiency wanted for real-time operations and the analytical depth required for advanced troubleshooting.

Choice Framework

When architecting telemetry pipelines, these technical issues ought to information your strategy:

Choice Issue	Use ETL	Use ELT
Question latency necessities	< 1 second	Can wait minutes
Knowledge retention wants	Days/Weeks	Months/Years
Cardinality	Low/Medium	Very excessive
Evaluation patterns	Nicely-defined	Exploratory
Finances precedence	Compute	Storage

Conclusion

The technical realities of telemetry knowledge processing demand considering past easy ETL vs. ELT paradigms. Engineering groups ought to architect tiered techniques that leverage the strengths of each approaches:

ETL-processed knowledge for operational use instances requiring quick insights

ELT-processed knowledge for deeper evaluation, troubleshooting, and historic patterns

Metadata-driven routing to intelligently direct queries to the suitable tier

This engineering-centric strategy balances efficiency necessities with price issues whereas sustaining the flexibleness required in trendy observability techniques.

In regards to the creator: Nishant Modak is the founder and CEO of Last9, a excessive cardinality observability platform firm backed by Sequoia India (now PeakXV). He’s been an entrepreneur and dealing with giant scale firms for almost 20 years.

Associated Gadgets:

From ETL to ELT: The Subsequent Technology of Knowledge Integration Success

Can We Cease Doing ETL But?

50 Years Of ETL: Can SQL For ETL Be Changed?

Technical Approaches and Sensible Tradeoffs

Understanding Telemetry Knowledge Challenges

ETL for Telemetry: Remodel-First Structure

Technical Structure

Key Technical Elements

ELT for Telemetry: Uncooked Storage with Versatile Transformation

Technical Structure

Key Technical Elements

Technical Benefits

Technical Implementation: The Hybrid Method

Core Structure Elements

Actual-world Implementation Instance

Choice Framework

Conclusion

Related Articles

The lunar mining gold rush is coming – and success requires bridging two worlds

Launch Roundup: SpaceX to launch first expendable Falcon 9 since January

NASA to reopen Artemis 3 HLS contract

LEAVE A REPLY Cancel reply

Latest Articles

The lunar mining gold rush is coming – and success requires bridging two worlds

Launch Roundup: SpaceX to launch first expendable Falcon 9 since January

NASA to reopen Artemis 3 HLS contract

Apple is the unique new broadcast accomplice for Formulation 1® within the U.S.

Collagen Has Anti-Getting older Properties. This is Why You Have to Add it to Your Food plan