Axiom is blazing fast. This page explains how you can further improve performance in Axiom.
Practice | Severity | Impact |
---|---|---|
Mixing unrelated data in datasets | Critical | Combining unrelated data inflates schema, slows queries |
Excessive backfilling, big difference between _time and _sysTime | Critical | Creates overlapping blocks, breaks time-based indexing |
Large number of fields in a dataset | High | Very high dimensionality slows down query performance |
Failing to use _time | High | No efficient time-based filtering |
Overly wide queries (project *) | High | Returns massive unneeded data |
Mixed data types in the same field | Moderate | Reduces compression, complicates queries |
Using regex when simpler filters suffice | Moderate | More CPU-heavy scanning |
Overusing runtime JSON parsing (parse_json) | Moderate | CPU overhead, no indexing on nested fields |
Virtual fields for simple transformations | Low | Extra overhead for trivial conversions |
Poor filter order in queries | Low | Suboptimal scanning of data |
user_id
as a string, while others store it as a number in the same user_id
field.null
or typed differently for another.k8s_logs
separate from web_traffic
._time
vs. _sysTime
gaps_time
index is critical for query performance. Ideally, incoming events for a block lie in a closely bounded time range. However, backfilling large amounts of historical data after the fact (especially out of chronological order) creates wide time overlaps in blocks. If _time
is far from _sysTime
(the time the event was ingested), Axiom’s time index effectiveness is weakened.
_time
field for event timestamps_time
for indexing and time-based queries. If you store event timestamps in a different field (for example, timestamp
or created_at
) and use that field in time filters, Axiom’s time-based optimizations will not be leveraged.
_time
: Configure your ingest pipelines so that Axiom sets _time
to the actual event timestamp.
created_at
, rename it to _time
at ingest._time
.where _time >= ... and _time <= ...
or the built-in time range selectors in the query UI.tostring()
calls, etc.).project *
) for each matching event. This can return large amounts of unneeded data, especially in wide datasets with many fields.
project
or project-keep
Specify exactly which fields you need. For example:
project-away
if you only need to exclude a few fields: If you need 90% of the fields but want to exclude the largest ones, for instance:
limit
value (such as 10) instead of the default 1000.
matches
, regex
) can be powerful, but they are also expensive to evaluate, especially on large datasets.
search
for substring search:
To find foobar
in all fields, use:
search
matches text in all fields. To find text in a specific field, a more efficient solution is to use the following:
cs
stands for case-sensitive.
parse_json
)parse_json()
. This is both CPU-intensive and slower than columnar operations.
parse_json()
in query: If your JSON cannot be flattened entirely, ingest it into a map field. Then query subfields directly:
extend converted = toint(some_field)
) to transform data at query time. While sometimes necessary, every additional virtual field imposes overhead.
extend
for trivial or frequently repeated operations can add up.where
clauses optimally. This means the sequence of filters in your query can matter.
user_id == 1234
discards most rows, apply it before log_level == "ERROR"
.