-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathgrammar.tex
More file actions
430 lines (384 loc) · 17.6 KB
/
grammar.tex
File metadata and controls
430 lines (384 loc) · 17.6 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
%!TEX root = reference.tex
\chapter{Grammar}
\label{Grammar}
The grammar of \Sr is based on a notation which makes extensibility easier to achieve. Thus, at the core, the grammar is very simple and straightforward -- it is based on an \emph{operator precedence grammar}.
\begin{aside}
This choice gives us two key benefits: it is simple to understand and it is simple to extend. The latter is particularly important in \Sr as a significant part of its functionality is derived from \emph{profiles} which are similar to macros.
However, it also makes certain other aspects more challenging. In particular, an operator precedence grammar 'knows less' about the program as it is parsed. This means that syntax errors are liable to less informative.
\end{aside}
\section{Operator Precedence Grammar}
An operator grammar allows us to write expressions like:
\begin{lstlisting}
X * Y + X / Y
\end{lstlisting}
and to know that this means the equivalent of:
\begin{lstlisting}
(X * Y) + (X / Y)
\end{lstlisting}
or more specifically:
\begin{lstlisting}
+(*(X, Y), /(X, Y))
\end{lstlisting}
Operator precedence grammars are often used to capture arithm\-etic-style expressions. In \Sr we extend the concept to cover the entire language.
For example, an equation such as:
\begin{lstlisting}
fun double(X) is X*X
\end{lstlisting}
can be interpreted -- by treating \q{fun} and \q{is} as operators -- as:
\begin{lstlisting}
fun(is(double(X),*(X,X)))
\end{lstlisting}
Of course, this is merely a \emph{parse} of the equation. The real task of the compiler is to interpret this abstract syntax as an equation rather than as an attempt to apply the \q{is} function.
%The entire operator precedence grammar (not including tokenization) is very succinct, as can be seen in Program~\vref{srOpPrecGrammar}.
\section{Standard Operators}
\label{standardOperators}
A key input to the grammar is the table of operators. \Sr starts with a number of standard operators, but this can be extended via the use of \emph{package} extensions to the language (see Chapter~\vref{packages}).
The standard operators that are part of the core language and the base extensions are listed in Table~\vref{StandardOps}. Operators in this table are listed in order of priority. Together with a priority, operators can also be considered to \q{prefix}, \q{infix}, \q{postfix}, or some combination of the three.
The priority of an operator is the indication of the `importance' of the operator: the higher the priority the nearer the top of the abstract syntax tree the corresponding structure will be. Priorities are numbers in the range 1..2000; by convention, priorities in the range 1..899 refer to entities that normally take the role of expressions, priorities in the range 900..1000 refer to predicates and predicate-level connectives and priorities in the range 1001..2000 refer to entries that have a statement or program level interpretation.
\begin{longtable}{|lll|lll|}
\caption{Standard Operators}\label{StandardOps}\\
\hline
Operator&Priority&Assoc.&Operator&Priority&Assoc.\\
\hline
\endfirsthead
\multicolumn{6}{c}{
{Table \ref{StandardOps} Standard Operators (cont.)}}\\
\hline
Operator&Priority&Assoc.&Operator&Priority&Assoc.\\
\hline
\endhead
\hline\multicolumn{6}{r}{\small\emph{continued\ldots}}\
\endfoot
\hline
\endlastfoot
\tt ;&2000&right&\tt ;&2000&postfix\\
\tt \#&1350&prefix&\tt :-&1347&infix\\
\tt -->&1347&infix&\tt ==>&1347&infix\\
\tt :|&1345&right&\tt :\&&1344&right\\
\tt :!&1343&prefix&\tt \#\#&1342&infix\\
\tt ::&1341&infix&\tt |*&1340&infix\\
\tt :*&1340&infix&\tt :+&1340&infix\\
\tt ;*&1340&infix&\tt private&1320&assoc prefix\\
\tt def&1300&prefix&\tt java&1300&prefix\\
\tt var&1300&prefix&\tt contract&1300&prefix\\
\tt fun&1300&prefix&\tt open&1300&prefix\\
\tt prc&1300&prefix&\tt on&1300&prefix\\
\tt implementation&1300&prefix&\tt ptn&1300&prefix\\
\tt case&1290&prefix&\tt |&1290&right\\
\tt type&1250&prefix&\tt counts\spce{}as&1200&infix\\
\tt else&1200&right&\tt is&1200&infix\\
\tt do&1200&right&\tt remove&1200&prefix\\
\tt then&1180&infix&\tt if&1175&prefix\\
\tt while&1175&prefix&\tt for&1175&prefix\\
\tt to&1130&infix&\tt from&1130&infix\\
\tt :=&1120&infix&\tt perform&1120&prefix\\
\tt merge&1100&prefix&\tt extend&1100&prefix\\
\tt notify&1100&prefix&\tt default&1100&postfix\\
\tt assert&1100&prefix&\tt valis&1100&prefix\\
\tt try&1100&prefix&\tt update&1100&prefix\\
\tt delete&1100&prefix&\tt ignore&1100&prefix\\
\tt //&1100&infix&\tt ,..&1099&infix\\
\tt ..,&1098&infix&\tt catch&1050&infix\\
\tt request&1050&prefix&\tt with&1050&infix\\
\tt switch&1020&prefix&\tt has\spce{}type&1020&infix\\
\tt such\spce{}that&1010&right&\tt exists&1005&assoc prefix\\
\tt for\spce{}all&1005&assoc prefix&\tt ,&1000&right\\
\tt default&1000&infix&\tt raise&1000&prefix\\
\tt query&1000&prefix&\tt import&1000&prefix\\
\tt memo&999&prefix&\tt without&999&prefix\\
\tt computation&999&infix&\tt ./&999&infix\\
\tt with&999&prefix&\tt :&960&right\\
\tt :&960&postfix&\tt group\spce{}by&960&left\\
\tt when&950&prefix&\tt ?&950&right\\
\tt \tlda{}&950&right&\tt order\spce{}descending\spce{}by&950&infix\\
\tt waitfor&950&prefix&\tt spawn&950&prefix\\
\tt order\spce{}by&950&infix&\tt descending\spce{}by&950&infix\\
\tt where&940&infix&\tt any\spce{}of&935&prefix\\
\tt all&935&prefix&\tt otherwise&930&right\\
\tt or&930&right&\tt implies&920&right\\
\tt and&920&right&\tt =>&910&right\\
\tt \$=>&910&right&\tt not&910&prefix\\
\tt <=>&910&right&\tt let&909&prefix\\
\tt using&908&left&\tt in&908&infix\\
\tt 's&907&infix&\tt 'n&906&right\\
\tt or\spce{}else&900&right&\tt matches&900&infix\\
\tt !=&900&infix&\tt <&900&infix\\
\tt =&900&infix&\tt >&900&infix\\
\tt substitute&900&infix&\tt <=&900&infix\\
\tt ref&900&prefix&\tt =<&900&infix\\
\tt kind&900&prefix&\tt is\spce{}tuple&900&postfix\\
\tt ->&900&infix&\tt has\spce{}kind&900&infix\\
\tt bound\spce{}to&900&infix&\tt has\spce{}value&900&infix\\
\tt >=&900&infix&\tt implements&900&right\\
\tt fn&900&prefix&\tt on&900&infix\\
\tt over&900&right&\tt instance\spce{}of&900&infix\\
\tt determines&895&infix&\tt ++&850&right\\
\tt of&840&right&\tt reduction&830&prefix\\
\tt matching&800&infix&\tt +&720&left\\
\tt -&720&left&\tt \%&700&left\\
\tt *&700&left&\tt /&700&left\\
\tt **&650&infix&\tt valof&500&prefix\\
\tt on\spce{}abort&475&infix&\tt cast&420&infix\\
\tt as&420&infix&\tt unique&400&prefix\\
\tt @@&200&infix&\tt @&200&infix\\
\tt \#@&200&infix&\tt .&175&left\\
\tt ?.&175&left&\tt !&150&prefix\\
\tt +&100&prefix&\tt -&100&prefix\\
\tt \$&75&prefix&\tt \%&75&prefix\\
\tt ?&75&prefix&\tt \%\%&75&prefix\\
\tt \#\$&50&prefix&\tt \#*&50&prefix\\
\tt \#+&50&right&\tt \#:&50&prefix\\
\tt \$\$&50&prefix&\tt \$\$&50&right\\
\tt \#\tlda{}&50&prefix&&&\\
\end{longtable}
\section{Defining new Operators}
\label{newOperator}
\index{defining new operators}
\index{operators!defining new}
\begin{figure}[htbp]
\begin{eqnarray*}
\ntDef{OperatorDeclaration}&\arrow&\q{\#} [\q{force}]\ ( \ntRef{PrefixOperator}\
\choice\ \ntRef{InfixOperator}\
\choice\ \ntRef{PostfixOperator}\\
&\choice&\ntRef{BracketDeclaration})\\
\ntDef{PrefixOperator}&\arrow&\q{prefix(}\ntRef{OperatorName}\q{,}\ntRef{Integer}[\q{,}\ntRef{Integer}]\q{)}\\
&\choice&\q{prefixAssoc(}\ntRef{OperatorName}\q{,}\ntRef{Integer}[\q{,}\ntRef{Integer}]\q{)}\\
\ntDef{InfixOperator}&\arrow&\q{left(}\ntRef{OperatorName}\q{,}\ntRef{Integer}[\q{,}\ntRef{Integer}]\q{)}\\
&\choice&\q{infix(}\ntRef{OperatorName}\q{,}\ntRef{Integer}[\q{,}\ntRef{Integer}]\q{)}\\
&\choice&\q{right(}\ntRef{OperatorName}\q{,}\ntRef{Integer}[\q{,}\ntRef{Integer}]\q{)}\\
\ntDef{PostfixOperator}&\arrow&\q{postfix(}\ntRef{OperatorName}\q{,}\ntRef{Integer}[\q{,}\ntRef{Integer}]\q{)}\\
&\choice&\q{postfixAssoc(}\ntRef{OperatorName}\q{,}\ntRef{Integer}[\q{,}\ntRef{Integer}]\q{)}\\
\ntDef{BracketDeclaration}&\arrow&\q{pair(}\ntRef{OperatorName}\q{,}\ntRef{OperatorName}\q{,}\ntRef{Integer}\q{)}\\
\ntDef{OperatorName}&\arrow&\ntRef{StringLiteral}\\
\end{eqnarray*}
\caption{Operator Declaration}
\label{operatorDeclarationFig}
\end{figure}
A new operator is defined using an operator definition statements:
\begin{description}
\item[\tt infix]
A statement of the form:
\begin{lstlisting}
#infix("myOp",730);
\end{lstlisting}
defines the operator \q{myOp} to be an infix operator, with priority 730.
\begin{aside}
Defining an operator does \emph{not} define anything about its semantics -- except that in the case of an \q{infix} operator, it has two arguments.
\end{aside}
\item[\tt left]
A statement of the form:
\begin{lstlisting}
#left("lftOp",730);
\end{lstlisting}
defines the operator \q{lftOp} to be a left-associative infix operator, with priority 730. That means that expression such as
\begin{lstlisting}
A lftOp B lftOp C lftOp D
\end{lstlisting}
will be parsed as though written:
\begin{lstlisting}
((A lftOp B) lftOp C) lftOp D
\end{lstlisting}
\item[\tt right]
A statement of the form:
\begin{lstlisting}
#right("rgtOp",730);
\end{lstlisting}
defines the operator \q{rgtOp} to be a right associate infix operator, with priority 730. Exressions such as
\begin{lstlisting}
A rgtOp B rgtOp C rgtOp D
\end{lstlisting}
will be parsed as though written:
\begin{lstlisting}
(A rgtOp (B rgtOp (C rgtOp D)))
\end{lstlisting}
\item[\tt prefix]
A statement of the form:
\begin{lstlisting}
#prefix("prOp",730);
\end{lstlisting}
defines the operator \q{prOp} to be a prefix operator, with priority 730.
\item[\tt prefixAssoc]
A statement of the form:
\begin{lstlisting}
#prefixAssoc("prOp",730);
\end{lstlisting}
defines the operator \q{prOp} to be an \emph{associative} prefix operator, with priority 730. That means that expressions such as:
\begin{lstlisting}
prOp prOp prOp A
\end{lstlisting}
are permitted, and have interpretation:
\begin{lstlisting}
(prOp (prOp (prOp A)))
\end{lstlisting}
\item[\tt postfix]
A statement of the form:
\begin{lstlisting}
#postfix("psOp",730);
\end{lstlisting}
defines the operator \q{psOp} to be a postfix operator, with priority 730.
\item[\tt postfixAssoc]
A statement of the form:
\begin{lstlisting}
#postfixAssoc("psOp",730);
\end{lstlisting}
defines the operator \q{psOp} to be an \emph{associative} postfix operator, with priority 730. That means that expressions such as:
\begin{lstlisting}
A psOp psOp psOp
\end{lstlisting}
are permitted, and have interpretation:
\begin{lstlisting}
(((A psOp) psOp) psOp)
\end{lstlisting}
\end{description}
An operator declaration may only take place in a \q{package} body. Its scope is from the declaration statement to the end of the \q{package} body. In the latter case, when a \q{profile} is imported via the \q{use} statement, any operator definitions are also made available to the importing context.
\subsection{Forced Operator Declaration}
Normally, any attempt to re-declare an operator will result in a syntax error being raised. However, there may be situations where it is important to be able to change an existing operator declaration.
\begin{aside}
Note that a given symbol may be defined as a prefix operator, an infix operator and a postfix operator. Each of these are treated separately.
\end{aside}
The \q{force} directive is used in this situation:
\begin{lstlisting}
#force(infix("as",200));
\end{lstlisting}
has the effect of \emph{changing} the existing operator priority of the \q{as} operator as an infix operator to 200 -- whatever its previous priority was.
\subsection{Symbolic Operators}
\label{symbolicOperators}
An operator may consist of a single \ntRef{Identifier}, a sequence of \ntRef{Identifier}s or it may consist of a \ntRef{StringLiteral} containing a sequence of so-called symbolic characters. In this form, the first character of the operator may not be a digit or a letter. In addition, none of the characters may be a space or other white-space character.
However, other than these constraints the characters in the operator declaration may be any legal unicode character.
\begin{aside}
For the sake of programmers' sanity we strongly suggest not using characters that overlap with other categories. For example, do not include a parenthesis in the operator name.
\end{aside}
For example, the declaration:
\begin{lstlisting}
#infix("**",700)
\end{lstlisting}
declares \q{**} to be a new infix operator.
The lexical analyzer is able to incorporate the newly declared operator as a distinct token. Thus, for example, with the \q{**} declaration above, \q{**} becomes a distinct token to the normal symbol for multiplication.
\subsection{Multi-word Operators}
\label{multiWordOperators}
\index{multi-word operators}
\index{operators!defining multi-word operators}
A multi-word operator defines a new \ntRef{MultiWordIdentifier}; i.e., a special combination of alpha numeric words that form a single logical identifier.
Multi-word operators are defined like regular operators, except that their names contain spaces. For example, the operator declaration:
\begin{lstlisting}
#infix("no more",500);
\end{lstlisting}
defines the combination of words ``\q{no}'' followed by ``\q{more}'' as a single operator of priority 500.
A multi-word operator is only an operator when all of its constituent words are present. If one or more of the constituent words are not present (or have other tokens intervening) then the sequence is not interpreted as a single operator but is parsed separately. For example, in the text:
\begin{lstlisting}
5 no more 10
\end{lstlisting}
is interpreted as the equivalent of:
\begin{alltt}
no\spce{}more(5,10)
\end{alltt}
but the text
\begin{lstlisting}
5 no (more) 10
\end{lstlisting}
is not, and, in this case, will be reported as a syntax error.
\begin{aside}
It \emph{is} permissible to interpose comments between the words of a multi-word operator. Thus:
\begin{lstlisting}
5 no /* way */ more 10
\end{lstlisting}
is legal.
\end{aside}
\begin{aside}
A given word can be an operator in its own right, as well as participating in a multi-word operator. The combination may have different priorities to the individual pieces.
For example, in the standard operator declarations:
\begin{lstlisting}
#prefix("type",1250);
#infix("has type",1020);
\end{lstlisting}
the word \q{type} is a prefix operator when it appears by itself, and is part of the infix operator \q{has\spce{}type} in combination.
\end{aside}
\subsection{Minimum Priorities}
In some circumstances, it becomes important to control the extent to which a name is interpreted as an operator. Recall that the priority of an operator defines not only the circumstances in which it occurs legally but also the expected priorities of terms on the left or right (depending on the form of the operator).
When an operator is defined, it is possible to also specify a \emph{minimum} priority as well as a maximum priority for the operator. Specifying a minimum priority for an operator has the effect of suppressing the operator definition when the expected priority of a fragment is lower than the minimum.
For example, the \q{type} standard operator is defined to be a prefix operator with priority 1250 and a minimum priority of 1200:
\begin{lstlisting}
#prefix("type",1250,1200)
\end{lstlisting}
This means that \q{type} is an operator of priority 1250 -- unless the expected priority is less than 1200 in which case it is not an operator.
Thus, in the fragment:
\begin{lstlisting}
type foo has kind type
\end{lstlisting}
the first occurrence of \q{type} is as a prefix operator, but the second occurrence is as a simple identifier -- because the priority of \q{kind} is 900 which is lower than \q{type}'s minimum priority.
By default, the minimum priority of an operator is zero, which means that it is always an operator.
\begin{aside}
Setting a minimum priority on an operator should be done sparingly.
\end{aside}
\subsection{Bracketing pairs}
The \Sr grammar also permits a special feature that may be used to support language extensions -- \emph{defined bracket pairs}.
A regular bracket pair is a pair of tokens such as \q{()} which are used to 'protect' expressions where there may be an operator precedence clash -- the classic example being
\begin{lstlisting}
(2+3)*4
\end{lstlisting}
which has a different meaning to
\begin{lstlisting}
2+3*5
\end{lstlisting}
Declaring bracket operators allows new forms of syntax. For example, the statement:
\begin{lstlisting}
#pair("begin","end",2000)
\end{lstlisting}
can be used to all programmers to use Algol-style \q{begin}\ldots\q{end} statements.
Program~\vref{beginEndProgram} shows a collection of macro definitions
that permits a pascal-style form of procedure definition, such as:
\begin{lstlisting}[language=Pascal]
procedure iFact(N)
var F := 1;
var Ix := 1;
while Ix < N do
begin
F := F*Ix;
Ix := Ix+1;
end;
return F;
end;
\end{lstlisting}
\begin{program}
\begin{lstlisting}
#prefix("return",1200);
#pair("procedure","end",2000);
#pair("begin","end",2000);
#procedure ?Tmpl ; ?body end :: statement :- body;*action;
#begin ?B end :: action :- B;*action;
#begin end :: action;
#procedure ?Tmpl ; ?body./#(return ?E)# end ==>
fun Tmpl is valof {body./#(valis E)#};
#procedure ?Tmpl ; ?body end ==> prc Tmpl do body ;
#begin ?B end ==> {B};
#begin end ==> {};
\end{lstlisting}
\caption{Macros that implement Pascal-style programs\label{beginEndProgram}}
\end{program}
\begin{longtable}{|llll|}
\caption{Standard Brackets}\label{StandardBrackets}\\
\hline
Begin&End&Inner Priority&Description\\
\hline
\endfirsthead
\multicolumn{4}{c}{
{Table \ref{StandardBrackets} Standard Brackets (cont.)}}\\
\hline
Begin&End&InnerPriority&Description\\
\hline
\endhead
\hline\multicolumn{4}{r}{\small\emph{continued\ldots}}\
\endfoot
\hline
\endlastfoot
\tt (&\tt)&1200&expression\\
\tt \{&\tt\}&2000&brace expression\\
\tt [&\tt]&1000&index expression\\
\tt \#(&\tt)\#&2000&meta parentheses\\
\tt <|&\tt |>&2000"ing parentheses\\
\tt \#<&\tt >\#&2000¯o tupling
\end{longtable}
%\section{Standard Grammar}
%\label{standardGrammar}
%
%Althou