2
Notational Conventions and Generic Grammar
All of the mechanisms specified in this
document are described in
both prose and an augmented Backus-Naur Form
(BNF) similar to that
used by RFC 822 [9]. Implementors will need
to be familiar with the
notation in order to understand this
specification. The augmented BNF
includes the following constructs:
name = definition
The name of a rule is simply the name
itself (without any
enclosing "<" and
">") and is separated from its definition by the
equal "=" character. White
space is only significant in that
indentation of continuation lines is used
to indicate a rule
definition that spans more than one line.
Certain basic rules are
in uppercase, such as SP, LWS, HT, CRLF,
DIGIT, ALPHA, etc. Angle
brackets are used within definitions
whenever their presence will
facilitate discerning the use of rule
names.
"literal"
Quotation marks surround literal text.
Unless stated otherwise,
the text is case-insensitive.
rule1 | rule2
Elements separated by a bar
("|") are alternatives, e.g., "yes |
no" will accept yes or no.
(rule1 rule2)
Elements enclosed in parentheses are
treated as a single element.
Thus, "(elem (foo | bar) elem)"
allows the token sequences "elem
foo elem" and "elem bar
elem".
*rule
The character "*" preceding an
element indicates repetition. The
full form is
"<n>*<m>element" indicating at least <n> and at most
<m> occurrences of element. Default
values are 0 and infinity so
that "*(element)" allows any
number, including zero; "1*element"
requires at least one; and
"1*2element" allows one or two.
[rule]
Square brackets enclose optional
elements; "[foo bar]" is
equivalent to "*1(foo bar)".
Fielding,
et al. Standards Track [Page 14]
RFC
2616 HTTP/1.1 June 1999
N rule
Specific repetition: "<n>(element)"
is equivalent to
"<n>*<n>(element)";
that is, exactly <n> occurrences of (element).
Thus 2DIGIT is a 2-digit number, and
3ALPHA is a string of three
alphabetic characters.
#rule
A construct "#" is defined,
similar to "*", for defining lists of
elements. The full form is
"<n>#<m>element" indicating at least
<n> and at most <m> elements,
each separated by one or more commas
(",") and OPTIONAL linear white
space (LWS). This makes the usual
form of lists very easy; a rule such as
( *LWS element *( *LWS ","
*LWS element ))
can be shown as
1#element
Wherever this construct is used, null
elements are allowed, but do
not contribute to the count of elements
present. That is,
"(element), , (element) " is
permitted, but counts as only two
elements. Therefore, where at least one
element is required, at
least one non-null element MUST be
present. Default values are 0
and infinity so that "#element"
allows any number, including zero;
"1#element" requires at least
one; and "1#2element" allows one or
two.
; comment
A semi-colon, set off some distance to
the right of rule text,
starts a comment that continues to the
end of line. This is a
simple way of including useful notes in
parallel with the
specifications.
implied *LWS
The grammar described by this
specification is word-based. Except
where noted otherwise, linear white space
(LWS) can be included
between any two adjacent words (token or
quoted-string), and
between adjacent words and separators,
without changing the
interpretation of a field. At least one
delimiter (LWS and/or
separators) MUST exist between any two
tokens (for the definition
of "token" below), since they
would otherwise be interpreted as a
single token.
The following rules are used throughout this
specification to
describe basic parsing constructs. The
US-ASCII coded character set
is defined by ANSI X3.4-1986 [21].
Fielding,
et al. Standards Track [Page 15]
RFC
2616 HTTP/1.1 June 1999
OCTET = <any 8-bit sequence of data>
CHAR = <any US-ASCII character (octets
0 - 127)>
UPALPHA = <any US-ASCII uppercase letter
"A".."Z">
LOALPHA = <any US-ASCII lowercase letter
"a".."z">
ALPHA = UPALPHA | LOALPHA
DIGIT = <any US-ASCII digit
"0".."9">
CTL = <any US-ASCII control
character
(octets 0 - 31) and DEL
(127)>
CR = <US-ASCII CR, carriage return
(13)>
LF = <US-ASCII LF, linefeed (10)>
SP = <US-ASCII SP, space (32)>
HT = <US-ASCII HT, horizontal-tab
(9)>
<"> = <US-ASCII double-quote mark
(34)>
HTTP/1.1 defines the sequence CR LF as the
end-of-line marker for all
protocol elements except the entity-body
(see appendix 19.3 for
tolerant applications). The end-of-line
marker within an entity-body
is defined by its associated media type, as
described in section 3.7.
CRLF = CR LF
HTTP/1.1 header field values can be folded
onto multiple lines if the
continuation line begins with a space or
horizontal tab. All linear
white space, including folding, has the same
semantics as SP. A
recipient MAY replace any linear white space
with a single SP before
interpreting the field value or forwarding
the message downstream.
LWS = [CRLF] 1*( SP | HT )
The TEXT rule is only used for descriptive
field contents and values
that are not intended to be interpreted by the
message parser. Words
of *TEXT MAY contain characters from
character sets other than ISO-
8859-1 [22] only when encoded according to
the rules of RFC 2047
[14].
TEXT = <any OCTET except CTLs,
but including LWS>
A CRLF is allowed in the definition of TEXT
only as part of a header
field continuation. It is expected that the
folding LWS will be
replaced with a single SP before
interpretation of the TEXT value.
Hexadecimal numeric characters are used in
several protocol elements.
HEX = "A" | "B" |
"C" | "D" | "E" | "F"
| "a" |
"b" | "c" | "d" | "e" | "f" |
DIGIT
Fielding,
et al. Standards Track [Page 16]
RFC
2616 HTTP/1.1 June 1999
Many HTTP/1.1 header field values consist of
words separated by LWS
or special characters. These special
characters MUST be in a quoted
string to be used within a parameter value (as
defined in section
3.6).
token = 1*<any CHAR except CTLs or
separators>
separators = "(" | ")" |
"<" | ">" | "@"
| "," |
";" | ":" | "\" | <">
| "/" |
"[" | "]" | "?" | "="
| "{" |
"}" | SP | HT
Comments can be included in some HTTP header
fields by surrounding
the comment text with parentheses. Comments
are only allowed in
fields containing "comment" as
part of their field value definition.
In all other fields, parentheses are
considered part of the field
value.
comment = "(" *( ctext | quoted-pair
| comment ) ")"
ctext = <any TEXT excluding
"(" and ")">
A string of text is parsed as a single word
if it is quoted using
double-quote marks.
quoted-string = ( <"> *(qdtext | quoted-pair )
<"> )
qdtext = <any TEXT except
<">>
The backslash character ("\") MAY
be used as a single-character
quoting mechanism only within quoted-string
and comment constructs.
quoted-pair = "\" CHAR