Splunk Quick Reference Guide 6.x

User Manual: Pdf

Open the PDF directly: View PDF PDF.
Page Count: 2

Add Fields
Set velocity to distance / time. … | eval velocity=distance/
time
Extract "from" and "to" elds using
regular expressions. If a raw event
contains "From: Susan To: David", then
from=Susan and to=David.
… | rex eld=_raw "From:
(?<from>.*) To: (?<to>.*)"
Save the running total of "count" in a
eld called "total_count".
… | accum count as total_
count
For each event where 'count' exists,
compute the dierence between count
and its previous value and store the
result in 'countdi'.
… | delta count as countdiff
Filter Results
Filter results to only include those with
"fail" in their raw text and status=0. … | search fail status=0
Remove duplicates of results with the
same host value. … | dedup host
Keep only search results whose "_raw"
eld contains IP addresses in the non-
routable class A (10.0.0.0/8).
… | regex _raw="(?<!\d)10.\
d{1,3}\.\d{1,3}\.\d{1,3}
(?!\d)"
REGEX NOTE EXAMPLE EXPLANATION
\s white space \d\s\d digit space digit
\S not white space \d\S\d digit non-whitespace digit
\d digit \d\d\d-\d\d-\d\d\d\d SSN
\D not digit \D\D\D three non-digits
\w word character (letter, number, or _ ) \w\w\w three word chars
\W not a word character \W\W\W three non-word chars
[...] any included character [a-z0-9#] any char that is a thru z, 0 thru 9, or #
[^...] no included character [^xyz] any char but x, y, or z
*zero or more \w* zero or more words chars
+one or more \d+ integer
?zero or one \d\d\d-?\d\d-?\d\d\d\d SSN with dashes being optional
|or \w|\d word or digit character
(?P<var> ...) named extraction (?P<ssn>\d\d\d-\d\d-\d\d\d\d) pull out a SSN and assign to 'ssn' eld
(?: ... ) logical or atomic grouping (?:[a-zA-Z]|\d) alphabetic character OR a digit
^start of line ^\d+ line begins with at least one digit
$end of line \d+$ line ends with at least one digit
{...} number of repetitions \d{3,5} between 3-5 digits
\escape \[ escape the [ char
SEARCH EXAMPLES
Overview
Index-time Processing: Splunk reads data from a source, such as a le or port, on
a host (e.g. "my machine"), classies that source into a sourcetype (e.g., "syslog",
"access_combined", "apache_error", ...), then extracts timestamps, breaks up the
source into individual events (e.g., log events, alerts, …), which can be a single-line
or multiple lines, and writes each event into an index on disk, for later retrieval with
a search.
Search-time Processing: When a search starts, matching indexed events are
retrieved from disk, elds (e.g., code=404, user=david,...) are extracted from
the event's text, and the event is classied by matching against eventtype denitions
(e.g., 'error', 'login', ...). The events returned from a search can then be
powerfully transformed using Splunk's search language to generate reports that live
on dashboards.
Events
An event is a single entry of data. In the context of log le, this is an event in a Web
activity log:
173.26.34.223 - - [01/Jul/2009:12:05:27 -0700] "GET /
trade/app?action=logout HTTP/1.1" 200 2953
More specically, an event is a set of values associated with a timestamp. While
many events are short and only take up a line or two, others can be long, such as a
whole text document, a cong le, or whole java stack trace. Splunk uses line-
breaking rules to determine how it breaks these events up for display in the search
results.
Sources/Sourcetypes
A source is the name of the le, stream, or other input from which a particular event
originates – for example, /var/log/messages or UDP:514. Sources are classied into
sourcetypes, which can either be well known, such as access_combined (HTTP Web
server logs), or can be created on the y by Splunk when it sees a source with data
and formatting it hasn’t seen before. Events with the same sourcetype can come
from dierent sources—events from the le /var/log/messages and from a syslog
input on udp:514 can both have sourcetype=linux_syslog.
Hosts
A host is the name of the physical or virtual device where an event originates. Host
provides an easy way to nd all data originating from a given device.
Indexes
When you add data to Splunk, Splunk processes it, breaking the data into individual
events, timestamps them, and then stores them in an index, so that it can be later
searched and analyzed. By default, data you feed to Splunk is stored in the "main"
index, but you can create and specify other indexes for Splunk to use for dierent
data inputs.
Fields
Fields are searchable name/value pairings in event data. As Splunk processes events
at index time and search time, it automatically extracts elds. At index time, Splunk
extracts a small set of default elds for each event, including host, source, and
sourcetype. At search time, Splunk extracts what can be a wide range of elds from
the event data, including user-dened patterns as well as obvious eld name/value
pairs such as user_id=jdoe.
Tags
Tags are aliases to eld values. For example, if there are two host names that refer
to the same computer, you could give both of those host values the same tag (e.g.,
"hal9000"), and then if you search for that tag (e.g., "hal9000"), Splunk will return
events involving both host name values.
Eventtypes
Eventtypes are cross-referenced searches that categorize events at search time.
For example, if you have dened an eventtype called "problem" that has a search
denition of "error OR warn OR fatal OR fail", any time you do a search where a result
contains error, warn, fatal, or fail, the event will have an eventtype eld/value with
eventtype=problem. So, for example, if you were searching for "login", the logins
that had problems would get annotated with eventtype=problem. Eventtypes
are essentially dynamic tags that get attached to an event if it matches the search
denition of the eventtype.
Reports/Dashboards
Search results with formatting information (e.g., as a table or chart) are informally
referred to as reports, and multiple reports can be placed on a common page, called
a dashboard.
Apps
Apps are collections of Splunk congurations, objects, and code, allowing you to
build dierent environments that sit on top of Splunk. You can have one app for
troubleshooting email servers, one app for web analysis, and so on.
Permissions/Users/Roles
Saved Splunk objects, such as savedsearches, eventtypes, reports, and tags,
enrich your data, making it easier to search and understand. These objects have
permissions and can be kept private or shared with other users, via roles (e.g.,
"admin", "power", "user"). A role is a set of capabilities that you can dene, like
whether or not someone is allowed to add data or edit a report. Splunk with a Free
License does not support user authentication.
Transactions
A transaction is a set of events grouped into one event for easier analysis. For
example, given that a customer shopping at an online store would generate web
access events with each click that each share a SessionID, it could be convenient to
group all of his events together into one transaction. Grouped into one transaction
event, it’s easier to generate statistics like how long shoppers shopped, how many
items they bought, which shoppers bought items and then returned them, etc.
Forwarder/Indexer
A forwarder is a version of Splunk that allows you to send data to a central Splunk
indexer or group of indexers. An indexer provides indexing capability for local and
remote data.
CONCEPTS
Quick Reference Guide
Lookup Tables
Lookup the value of each event's 'user'
eld in the lookup table usertogroup,
setting the event's 'group' eld.
… | lookup usertogroup user
output group
Write the search results to the lookup
le "users.csv". … | outputlookup users.csv
Read in the lookup le "users.csv" as
search results. … | inputlookup users.csv
Group Results
Cluster results together, sort by their
"cluster_count" values, and then return
the 20 largest clusters (in data size).
… | cluster t=0.9
showcount=true | sort
limit=20 -cluster_count
Group results that have the same "host"
and "cookie", occur within 30 seconds
of each other, and do not have a pause
greater than 5 seconds between each
event into a transaction.
… | transaction host cookie
maxspan=30s maxpause=5s
Group results with the same IP address
(clientip) and where the rst result
contains "signon", and the last result
contains "purchase".
… | transaction clientip
startswith="signon"
endswith="purchase"
Order Results
Return the rst 20 results. … | head 20
Reverse the order of a result set. … | reverse
Sort results by "ip" value (in ascending
order) and then by "url" value
(in descending order).
… | sort ip, -url
Return the last 20 results
(in reverse order). … | tail 20
Multi-Valued Fields
Combine the multiple values of the
recipients eld into a single value … | nomv recipients
Separate the values of the "recipients"
eld into multiple eld values,
displaying the top recipients
… | makemv delim=","
recipients | top recipients
Create new results for each value of the
multivalue eld "recipients" … | mvexpand recipients
For each result that is identical except
for that RecordNumber, combine them,
setting RecordNumber to be a multi-
valued eld with all the varying values.
… | elds EventCode,
Category, RecordNumber
| mvcombine delim=","
RecordNumber
Find the number of recipient values … | eval to_count =
mvcount(recipients)
Find the rst email address in the
recipient eld
… | eval recipient_rst =
mvindex(recipient,0)
Find all recipient values that end in .net
or .org
… | eval netorg_recipients
= mvlter(match(recipient,
"\.net$") OR
match(recipient, "\.org$"))
Find the combination of the values of
foo, "bar", and the values of baz
… | eval newval =
mvappend(foo, "bar", baz)
Find the index of the rst recipient value
match "\.org$"
… | eval orgindex =
mvnd(recipient, "\.org$")
Reporting
Return events with uncommon values. … | anomalousvalue
action=lter pthresh=0.02
Return the maximum "delay" by "size",
where "size" is broken down into a
maximum of 10 equal sized buckets.
… | chart max(delay) by size
bins=10
Return max(delay) for each value of foo
split by the value of bar.
… | chart max(delay) over
foo by bar
Return max(delay) for each value of foo. … | chart max(delay) over
foo
Remove all outlying numerical values. … | outlier
Remove duplicates of results with the
same "host" value and return the total
count of the remaining results.
… | stats dc(host)
Return the average for each hour, of any
unique eld that ends with the string
"lay" (e.g., delay, xdelay, relay, etc).
… | stats avg(*lay) by date_
hour
Calculate the average value of "CPU"
each minute for each "host".
… | timechart span=1m
avg(CPU) by host
Create a timechart of the count of from
"web" sources by "host" … | timechart count by host
Return the 20 most common values of
the "url" eld. … | top limit=20 url
Return the least common values of the
"url" eld. … | rare url
Modify Fields
Rename the "_ip" eld as "IPAddress". … | rename _ip as IPAddress
Change any host value that ends with
"localhost" to "mylocalhost".
… | replace *localhost with
mylocalhost in host
Filter Fields
Keep the "host" and "ip" elds, and
display them in the order: "host", "ip". … | elds + host, ip
Remove the "host" and "ip" elds. … | elds - host, ip
REGULAR EXPRESSIONS (REGEXES)
Copyright © 2013 Splunk Inc. All rights reserved.
Splunk Inc.
250 Brannan Street
San Francisco, CA 94107
www.splunk.com
COMMON SPLUNK STRPTIME FORMATS
strptime formats are useful for eval functions strftime() and strptime(), and for timestamping of event data.
Time
%H 24 hour (leading zeros) (00 to 23)
%I 12 hour (leading zeros) (01 to 12)
%M Minute (00 to 59)
%S Second (00 to 61)
%N subseconds with width (%3N = millisecs,
%6N = microsecs, %9N = nanosecs)
%p AM or PM
%Z Time zone (EST)
%z Time zone oset from UTC, in hour and
minute: +hhmm or -hhmm. (-0500 for EST)
%s Seconds since 1/1/1970 (1308677092)
Days
%d Day of month (leading zeros) (01 to 31)
%j Day of year (001 to 366)
%w Weekday (0 to 6)
%a Abbreviated weekday (Sun)
%A Weekday (Sunday)
Months
%b Abbreviated month name (Jan)
%B Month name (January)
%m Month number (01 to 12)
Years %y Year without century (00 to 99)
%Y Year (2008)
Examples
%Y-%m-%d 1998-12-31
%y-%m-%d 98-12-31
%b %d, %Y Jan 24, 2003
%B %d, %Y January 24, 2003
q|%d %b '%y = %Y-%m-%d| q|25 Feb '03 = 2003-02-25|
Regular Expressions are useful in multiple areas: search commands regex and rex;
eval functions match() and replace(); and in eld extraction.
Go to apps.splunk.com to download apps
COMMAND DESCRIPTION
chart/
timechart Returns results in a tabular output for (time-series) charting.
dedup Removes subsequent results that match a specied criterion.
eval Calculates an expression. (See EVAL FUNCTIONS table.)
elds Removes elds from search results.
head/tail Returns the rst/last N results.
lookup Adds eld values from an external source.
rename Renames a specied eld; wildcards can be used to specify
multiple elds.
replace Replaces values of specied elds with a specied new value.
rex Species regular expression named groups to extract elds.
search Filters results to those that match the search expression.
sort Sorts search results by the specied elds.
stats Provides statistics, grouped optionally by elds.
top/rare Displays the most/least common values of a eld.
transaction Groups search results into transactions.
FUNCTION DESCRIPTION EXAMPLES
searchmatch(X) Returns true if the event matches the search string X. searchmatch("foo AND bar")
split(X,"Y") Returns X as a multi-valued eld, split be delimiter Y. split(foo, ";")
sqrt(X) Returns the square root of X. sqrt(9)
strftime(X,Y) Returns epochtime value X rendered using the format specied by Y. strftime(_time, "%H:%M")
strptime(X,Y) Given a time represented by a string X, returns value parsed from
format Y. strptime(timeStr, "%H:%M")
substr(X,Y,Z) Returns a substring eld X from start position (1-based) Y for Z
(optional) characters.
substr("string", 1, 3)
+substr("string", -3)
time() Returns the wall-clock time with microsecond resolution. time()
tonumber(X,Y) Converts input string X to a number, where Y (optional, defaults to 10)
denes the base of the number to convert to. tonumber("0A4",16)
tostring(X,Y)
Returns a eld value of X as a string. If the value of X is a number, it
reformats it as a string; if a Boolean value, either "True" or "False". If X is
a number, the second argument Y is optional and can either be "hex"
(convert X to hexadecimal), "commas" (formats X with commas and
2 decimal places), or "duration" (converts seconds X to readable time
format HH:MM:SS).
This example returns:
foo=615 and foo2=00:10:15:
… | eval foo=615 | eval foo2 =
tostring(foo, "duration")
trim(X,Y) Returns X with the characters in Y trimmed from both sides.
If Y is not specied, spaces and tabs are trimmed. trim(" ZZZZabcZZ ", " Z")
typeof(X) Returns a string representation of its type.
This example returns:
"NumberStringBoolInvalid":
typeof(12)+ typeof("string")+
typeof(1==2)+ typeof(badeld)
upper(X) Returns the uppercase of X. upper(username)
urldecode(X) Returns the URL X decoded. urldecode("http%3A%2F%2Fwww.splunk.
com%2Fdownload%3Fr%3Dheader")
validate(X,Y,…)
Given pairs of arguments, Boolean expressions X and strings Y, returns
the string Y corresponding to the rst expression X that evaluates to
False and defaults to NULL if all are True.
validate(isint(port), "ERROR: Port is not
an integer", port >= 1 AND port <= 65535,
"ERROR: Port is out of range")
FUNCTION DESCRIPTION EXAMPLES
abs(X) Returns the absolute value of X. abs(number)
case(X,"Y",…)
Takes pairs of arguments X and Y, where X arguments are Boolean
expressions that, when evaluated to TRUE, return the corresponding Y
argument.
case(error == 404, "Not found", error ==
500,"Internal Server Error", error ==
200, "OK")
ceil(X) Ceiling of a number X. ceil(1.9)
cidrmatch("X",Y) Identies IP addresses that belong to a particular subnet. cidrmatch("123.132.32.0/25",ip)
coalesce(X,…) Returns the rst value that is not null. coalesce(null(), "Returned val", null())
exact(X) Evaluates an expression X using double precision oating point
arithmetic. exact(3.14*num)
exp(X) Returns eX.exp(3)
oor(X) Returns the oor of a number X. oor(1.9)
if(X,Y,Z) If X evaluates to TRUE, the result is the second argument Y. If X
evaluates to FALSE, the result evaluates to the third argument Z. if(error==200, "OK", "Error")
isbool(X) Returns TRUE if X is Boolean. isbool(eld)
isint(X) Returns TRUE if X is an integer. isint(eld)
isnotnull(X) Returns TRUE if X is not NULL. isnotnull(eld)
isnull(X) Returns TRUE if X is NULL. isnull(eld)
isnum(X) Returns TRUE if X is a number. isnum(eld)
isstr() Returns TRUE if X is a string. isstr(eld)
len(X) This function returns the character length of a string X. len(eld)
like(X,"Y") Returns TRUE if and only if X is like the SQLite pattern in Y. like(eld, "foo%")
ln(X) Returns its natural log. ln(bytes)
log(X,Y) Returns the log of the rst argument X using the second argument Y as
the base. Y defaults to 10. log(number,2)
lower(X) Returns the lowercase of X. lower(username)
ltrim(X,Y) Returns X with the characters in Y trimmed from the left side. Y defaults
to spaces and tabs. ltrim(" ZZZabcZZ ", " Z")
match(X,Y) Returns if X matches the regex pattern Y. match(eld, "^\d{1,3}\.\d$")
max(X,…) Returns the max. max(delay, mydelay)
md5(X) Returns the MD5 hash of a string value X. md5(eld)
min(X,…) Returns the min. min(delay, mydelay)
mvcount(X) Returns the number of values of X. mvcount(multield)
mvlter(X) Filters a multi-valued eld based on the Boolean expression X. mvlter(match(email, "net$"))
mvindex(X,Y,Z) Returns a subset of the multivalued eld X from start position (zero-
based) Y to Z (optional). mvindex( multield, 2)
mvjoin(X,Y) Given a multi-valued eld X and string delimiter Y, and joins the
individual values of X using Y. mvjoin(foo, ";")
now() Returns the current time, represented in Unix time. now()
null() This function takes no arguments and returns NULL. null()
nullif(X,Y) Given two arguments, elds X and Y, and returns the X if the arguments
are dierent; returns NULL, otherwise. nullif(eldA, eldB)
pi() Returns the constant pi. pi()
pow(X,Y) Returns XY.pow(2,10)
random() Returns a pseudo-random number ranging from 0 to 2147483647. random()
relative_time
(X,Y)
Given epochtime time X and relative time specier Y, returns the
epochtime value of Y applied to X. relative_time(now(),"-1d@d")
replace(X,Y,Z) Returns a string formed by substituting string Z for every occurrence of
regex string Y in string X.
Returns date with the month and day
numbers switched, so if the input was
1/12/2009 the return value would be
12/1/2009: replace(date, "^(\d{1,2})/
(\d{1,2})/", "\2/\1/")
round(X,Y) Returns X rounded to the amount of decimal places specied by Y. The
default is to round to an integer. round(3.5)
rtrim(X,Y) Returns X with the characters in Y trimmed from the right side.
If Y is not specied, spaces and tabs are trimmed. rtrim(" ZZZZabcZZ ", " Z")
SEARCH LANGUAGE
A search is a series of commands and arguments, each chained together with "|"
(pipe) character that takes the output of one command and feeds it into the next
command on the right.
search-args | cmd1 cmd-args | cmd2 cmd-args | ...
Search commands are used to take indexed data and lter unwanted information,
extract more information, calculate values, transform, and statistically analyze. The
search results retrieved from the index can be thought of as a dynamically created
table. Each search command redenes the shape of that table. Each indexed event
is a row, with columns for each eld value. Columns include basic information about
the data as well as columns that are dynamically extracted at search-time.
At the head of each search is an implied search-the-index-for-events command,
which can be used to search for keywords (e.g., error), boolean expressions
(e.g., (error OR failure) NOT success), phrases (e.g., "database
error"), wildcards (e.g., fail* will match fail, fails, failure, etc.), eld values (e.g.,
code=404), inequality (e.g., code!=404 or code>200), a eld having any value
or no value (e.g., code=* or NOT code=*). For example, the search:
sourcetype="access_combined" error | top 10 uri
will retrieve indexed access_combined events from disk that contain the term
"error" (ANDs are implied between search terms), and then for those events,
report the top 10 most common URI values.
Subsearches
A subsearch is an argument to a command that runs its own search, returning those
results to the parent command as the argument value. Subsearches are contained
in square brackets. For example, nding all syslog events from the user that had the
last login error:
sourcetype=syslog [ search login error | return 1
user ]
Relative Time Modiers
Besides using the custom-time ranges in the user-interface, you can specify in
your search the time ranges of retrieved events with the latest and earliest
search modiers. The relative times are specied with a string of characters that
indicate amount of time (integer and unit) and, optionally, a "snap to" time unit:
[+|-]<time_integer><time_unit>@<snap_time_unit>
For example: "error earliest=-1d@d latest=-1h@h" will retrieve events
containing "error" that occurred from yesterday (snapped to midnight) to the
last hour (snapped to the hour).
Time Units: specied as second (s), minute(m), hour(h), day(d), week(w),
month(mon), quarter(q), year(y). "time_integer" defaults to 1 (e.g., "m" is the same as
"1m").
Snapping: indicates the nearest or latest time to which your time amount rounds
down. Snaps rounds down to the latest time not after the specied time. For
example, if it is 11:59:00 and you "snap to" hours (@h), you will snap to 11:00 not
12:00. You can "snap to" a specic day of the week: use @w0 for Sunday, @w1 for
Monday, etc.
Optimizing Searches
The key to fast searching is to limit the data that needs to be pulled o disk to an
absolute minimum, and then to lter that data as early as possible in the search so
that processing is done on the minimum data necessary.
Partition data into separate indexes, if you’ll rarely perform searches across multiple
types of data. For example, put web data in one index, and rewall data in another.
• Search as specically as you can (e.g. fatal_error, not *error*)
• Limit the time range to only what’s needed (e.g., -1h not -1w)
• Filter out unneeded elds as soon as possible in the search.
• Filter out results as soon as possible before calculations.
• For report generating searches, use the Advanced Charting view, and
not the Flashtimeline view, which calculates timelines.
• On Flashtimeline, turn o ‘Discover Fields when not needed.
• Use summary indexes to pre-calculate commonly used values.
• Make sure your disk I/O is the fastest you have available.
EVAL FUNCTIONS
COMMON SEARCH COMMANDS The eval command calculates an expression and puts the resulting value into a eld (e.g. "...| eval force = mass * acceleration").
The following table lists the functions eval understands, in addition to basic arithmetic operators (+ - * / %), string concatenation
(e.g., '...| eval name = last . ", " . last'), boolean operations (AND OR NOT XOR < > <= >= != = == LIKE).
FUNCTION DESCRIPTION
avg(X) Returns the average of the values of eld X.
count(X) Returns the number of occurrences of the eld X. To indicate a specic eld value to match, format X as eval(eld="value").
dc(X) Returns the count of distinct values of the eld X.
rst(X) Returns the rst seen value of the eld X. In general, the rst seen value of the eld is the chronologically most recent instance of eld.
last(X) Returns the last seen value of the eld X.
list(X) Returns the list of all values of the eld X as a multi-value entry. The order of the values reects the order of input events.
max(X) Returns the maximum value of the eld X. If the values of X are non-numeric, the max is found from lexicographic ordering.
median(X) Returns the middle-most value of the eld X.
min(X) Returns the minimum value of the eld X. If the values of X are non-numeric, the min is found from lexicographic ordering.
mode(X) Returns the most frequent value of the eld X.
perc<X>(Y) Returns the X-th percentile value of the eld Y. For example, perc5(total) returns the 5th percentile value of a eld "total".
range(X) Returns the dierence between the max and min values of the eld X.
stdev(X) Returns the sample standard deviation of the eld X.
stdevp(X) Returns the population standard deviation of the eld X.
sum(X) Returns the sum of the values of the eld X.
sumsq(X) Returns the sum of the squares of the values of the eld X.
values(X) Returns the list of all distinct values of the eld X as a multi-value entry. The order of the values is lexicographical.
var(X) Returns the sample variance of the eld X.
COMMON STATS FUNCTIONS Common statistical functions used with the chart, stats, and timechart commands. Field names
can be wildcarded, so avg(*delay) might calculate the average of the delay and xdelay elds.
EVAL FUNCTIONS (continued)
ask questions, find answers.
download apps, share yours.
community.splunk.com
Community

Navigation menu