This page explains how to use the topk aggregation function in APL.
topk
aggregation in Axiom Processing Language (APL) allows you to identify the top k
results based on a specified field. This is especially useful when you want to quickly analyze large datasets and extract the most significant values, such as the top-performing queries, most frequent errors, or highest latency requests.
Use topk
to find the most common or relevant entries in datasets, especially in log analysis, telemetry data, and monitoring systems. This aggregation helps you focus on the most important data points, filtering out the noise.
topk
aggregation in APL is a statistical aggregation that returns estimated results. The estimation comes with the benefit of speed at the expense of accuracy. This means that topk
is fast and light on resources even on a large or high-cardinality dataset, but it doesn’t provide precise results.For completely accurate results, use the top
operator.Splunk SPL users
topk
function. You can achieve similar results with SPL’s top
command which is equivalent to APL’s top
operator. The topk
function in APL behaves similarly by returning the top k
values of a specified field, but its syntax is unique to APL.The main difference between top
(supported by both SPL and APL) and topk
(supported only by APL) is that topk
is estimated. This means that APL’s topk
is faster, less resource intenstive, but less accurate than SPL’s top
.ANSI SQL users
k
rows often involves using the ORDER BY
and LIMIT
clauses. While the logic remains similar, APL’s topk
simplifies this process by directly returning the top k
values of a field in an aggregation.The main difference between SQL’s solution and APL’s topk
is that topk
is estimated. This means that APL’s topk
is faster, less resource intenstive, but less accurate than SQL’s combination of ORDER BY
and LIMIT
clauses.Field
: The field or expression to rank the results by.k
: The number of top results to return.k
values based on the specified field.
topk
function to find the top 5 most frequent HTTP status codes.Querystatus | count_ |
---|---|
200 | 1500 |
404 | 400 |
500 | 200 |
301 | 150 |
302 | 100 |
k
), making it useful when you’re unsure how many top values to retrieve.k
results without filtering. Use topk when you do not need to restrict your analysis to a subset.k
values.topk
to create custom rankings.topk
to find the most common values.