diff --git a/flang/documentation/ParserCombinators.md b/flang/documentation/ParserCombinators.md index 1c797425f3b5..20a47e272e7b 100644 --- a/flang/documentation/ParserCombinators.md +++ b/flang/documentation/ParserCombinators.md @@ -45,7 +45,10 @@ These objects and functions are (or return) the fundamental parsers: * `cut` is a trivial parser that always fails silently. * `guard(pred)` returns a parser that succeeds if and only if the predicate expression evaluates to true. -* `nextChar` returns the next character, and fails at EOF. +* `nextCh` consumes the next character and returns its location, + and fails at EOF. +* `"xyz"_ch` succeeds if the next character consumed matches any of those + in the string (ignoring case), and returns its location. ### Combinators These functions and operators combine existing parsers to generate new parsers. @@ -96,6 +99,8 @@ collect the values that they return. * `construct{}(p1, p2, ...)` matches zero or more parsers in succession, collecting their results and then passing them with move semantics to a constructor for the type T if they all succeed. +* `sourced(p)` matches p, and fills in its `source` data member with the + locations of the cooked character stream that it consumed * `applyFunction(f, p1, p2, ...)` matches one or more parsers in succession, collecting their results and passing them as rvalue reference arguments to some function, returning its result. @@ -109,19 +114,15 @@ These are non-advancing state inquiry and update parsers: * `getColumn` returns the 1-based column position. * `inFixedForm` succeeds in fixed form Fortran source. -* `setInFixedForm` sets the fixed form flag, returning its prior value. -* `columns` returns the 1-based column number after which source is clipped. -* `setColumns(c)` sets the column limit and returns its prior value. ### Token Parsers Last, we have these basic parsers on which the actual grammar of the Fortran is built. All of the following parsers consume characters acquired from -`nextChar`. +`nextCh`. * `spaces` always succeeds after consuming any spaces or tabs * `digit` matches one cooked decimal digit (0-9) * `letter` matches one cooked letter (A-Z) -* `CharMatch<'c'>{}` matches one specific cooked character. * `"..."_tok` match the content of the string, skipping spaces before and after, and with multiple spaces accepted for any internal space. (Note that the `_tok` suffix is optional when the parser appears before diff --git a/flang/documentation/extensions.md b/flang/documentation/extensions.md new file mode 100644 index 000000000000..5807a109d554 --- /dev/null +++ b/flang/documentation/extensions.md @@ -0,0 +1,57 @@ +Extensions, deletions, and legacy features supported +==================================================== + +* Tabs in source +* `<>` as synonym for `.NE.` and `/=` +* `$` and `@` as legal characters in names +* `.T.` and `.F.` +* Initialization in type declaration statements using `/values/` +* Kind specification with `*`, e.g. `REAL*4` +* `DOUBLE COMPLEX` +* Signed complex literal constants +* `.XOR.` as predefined operator (can be overridden) +* `.N.`, `.A.`, `.O.`, `.X.` predefined operator synonyms +* `STRUCTURE`, `RECORD`, `UNION`, and `MAP` +* Structure field access with `.field` +* `NCHARACTER` type and `NC` Kanji character literals +* `BYTE` as synonym for `INTEGER(KIND=1)` +* Quad precision REAL literals with `Q` +* `X` prefix/suffix as synonym for `Z` on hexadecimal literals +* `B`, `O`, `Z`, and `X` accepted as suffixes as well as prefixes +* Triplets allowed in array constructors +* Old-style `PARAMETER pi=3.14` statement (no parentheses) +* `%LOC`, `%VAL`, and `%REF` +* Leading comma allowed before I/O item list +* Empty parentheses allowed in `PROGRAM P()` +* Missing parentheses allowed in `FUNCTION F` +* Cray based `POINTER(p,x)` +* Arithmetic `IF`. (Which branch with NaN take?) +* `ASSIGN` statement, assigned `GO TO`, and assigned format +* `PAUSE` statement +* Hollerith literals and edit descriptors +* `NAMELIST` allowed in the execution part +* Omitted colons on type declaration statements with attributes +* COMPLEX constructor expression, e.g. `(x+y,z)` +* `+` and `-` before all primary expressions, e.g. `x*-y` +* `.NOT. .NOT.` accepted +* `NAME=` as synonym for `FILE=` +* `DISPOSE=` +* Data edit descriptors without width or other details +* Backslash escape character sequences in quoted character literals +* `D` lines in fixed form as comments or debug code + +Extensions and legacy features deliberately not supported +--------------------------------------------------------- +* `.LG.` as synonym for `.NE.` +* `REDIMENSION` +* Allocatable `COMMON` +* Expressions in formats +* `ACCEPT` as synonym for `READ *` +* `ARRAY` as synonym for `DIMENSION` +* `VIRTUAL` as synonym for `DIMENSION` +* `ENCODE` and `DECODE` as synonyms for internal I/O +* `IMPLICIT AUTOMATIC`, `IMPLICIT STATIC` +* Default exponent of zero, e.g. `3.14159E` +* Characters in defined operators that are neither letters nor digits +* `B` suffix on unquoted octal constants +* `Z` prefix on unquoted hexadecimal constants diff --git a/flang/documentation/parsing.md b/flang/documentation/parsing.md index a40a0a86cb4b..ef1e756f96b3 100644 --- a/flang/documentation/parsing.md +++ b/flang/documentation/parsing.md @@ -51,6 +51,9 @@ by a CookedSource class instance, in which: * preprocessing directives have been implemented * preprocessing macro invocations have been expanded * legacy `D` lines in fixed form source have been omitted or included +* except for the payload in character literals, Hollerith constants, + and character and Hollerith edit descriptors, all letters have been + normalized to lower case Lines in the cooked character stream can be of arbitrary length. diff --git a/flang/lib/parser/basic-parsers.h b/flang/lib/parser/basic-parsers.h index 2804196f32c5..72f41b0abc2d 100644 --- a/flang/lib/parser/basic-parsers.h +++ b/flang/lib/parser/basic-parsers.h @@ -14,6 +14,7 @@ // This header defines the fundamental parser template classes and helper // template functions. See parser-combinators.txt for documentation. +#include "char-block.h" #include "idioms.h" #include "message.h" #include "parse-state.h" @@ -81,7 +82,7 @@ public: MessageContext context{state->context()}; ParseState backtrack{*state}; std::optional result{parser_.Parse(state)}; - if (result) { + if (result.has_value()) { // preserve any new messages messages.Annex(state->messages()); state->messages()->swap(messages); @@ -233,6 +234,7 @@ inline constexpr auto operator/(const PA &pa, const PB &pb) { template class AlternativeParser { public: using resultType = typename PA::resultType; + static_assert(std::is_same_v); constexpr AlternativeParser(const AlternativeParser &) = default; constexpr AlternativeParser(const PA &pa, const PB &pb) : pa_{pa}, pb_{pb} {} std::optional Parse(ParseState *state) const { @@ -306,6 +308,7 @@ inline constexpr auto operator||(const AlternativeParser &papb, template class RecoveryParser { public: using resultType = typename PA::resultType; + static_assert(std::is_same_v); constexpr RecoveryParser(const RecoveryParser &) = default; constexpr RecoveryParser(const PA &pa, const PB &pb) : pa_{pa}, pb_{pb} {} std::optional Parse(ParseState *state) const { @@ -1197,20 +1200,22 @@ private: inline constexpr auto guard(bool truth) { return GuardParser(truth); } -// nextChar is a parser that succeeds if the parsing state is not -// at the end of its input, returning the next character and +// nextCh is a parser that succeeds if the parsing state is not +// at the end of its input, returning the next character location and // advancing the parse when it does so. -constexpr struct NextCharParser { - using resultType = char; - constexpr NextCharParser() {} - std::optional Parse(ParseState *state) const { - std::optional ch{state->GetNextChar()}; - if (!ch) { +constexpr struct NextCh { + using resultType = const char *; + constexpr NextCh() {} + std::optional Parse(ParseState *state) const { + if (state->IsAtEnd()) { state->PutMessage("end of file"_en_US); + return {}; } - return ch; + const char *at{state->GetLocation()}; + state->UncheckedAdvance(); + return {at}; } -} nextChar; +} nextCh; // If a is a parser for nonstandard usage, extension(a) is a parser that // is disabled in strict conformance mode and otherwise sets a violation flag @@ -1226,7 +1231,7 @@ public: } auto at = state->GetLocation(); auto result = parser_.Parse(state); - if (result) { + if (result.has_value()) { state->set_anyConformanceViolation(); if (state->warnOnNonstandardUsage()) { state->PutMessage(at, "nonstandard usage"_en_US); @@ -1274,6 +1279,29 @@ template inline constexpr auto deprecated(const PA &parser) { return DeprecatedParser(parser); } +// Parsing objects with "source" members. +template class SourcedParser { +public: + using resultType = typename PA::resultType; + constexpr SourcedParser(const SourcedParser &) = default; + constexpr SourcedParser(const PA &parser) : parser_{parser} {} + std::optional Parse(ParseState *state) const { + const char *start{state->GetLocation()}; + auto result = parser_.Parse(state); + if (result.has_value()) { + result->source = CharBlock{start, state->GetLocation()}; + } + return result; + } + +private: + const PA parser_; +}; + +template inline constexpr auto sourced(const PA &parser) { + return SourcedParser{parser}; +} + constexpr struct GetUserState { using resultType = UserState *; constexpr GetUserState() {} @@ -1300,15 +1328,6 @@ constexpr struct GetColumn { return {state->column()}; } } getColumn; - -constexpr struct GetProvenance { - using resultType = Provenance; - constexpr GetProvenance() {} - static std::optional Parse(ParseState *state) { - return {state->GetProvenance()}; - } -} getProvenance; - } // namespace parser } // namespace Fortran #endif // FORTRAN_PARSER_BASIC_PARSERS_H_ diff --git a/flang/lib/parser/char-block.h b/flang/lib/parser/char-block.h new file mode 100644 index 000000000000..85db0f92aad7 --- /dev/null +++ b/flang/lib/parser/char-block.h @@ -0,0 +1,80 @@ +#ifndef FORTRAN_PARSER_CHAR_BLOCK_H_ +#define FORTRAN_PARSER_CHAR_BLOCK_H_ + +// Describes a contiguous block of characters; does not own their storage. + +#include "interval.h" +#include +#include +#include +#include +#include + +namespace Fortran { +namespace parser { + +class CharBlock { +public: + CharBlock() {} + CharBlock(const char *x, std::size_t n = 1) : interval_{x, n} {} + CharBlock(const char *b, const char *e) + : interval_{b, static_cast(e - b)} {} + CharBlock(const std::string &s) : interval_{s.data(), s.size()} {} + CharBlock(const CharBlock &) = default; + CharBlock(CharBlock &&) = default; + CharBlock &operator=(const CharBlock &) = default; + CharBlock &operator=(CharBlock &&) = default; + + bool empty() const { return interval_.empty(); } + std::size_t size() const { return interval_.size(); } + const char *begin() const { return interval_.start(); } + const char *end() const { return interval_.start() + interval_.size(); } + const char &operator[](std::size_t j) const { return interval_.start()[j]; } + + bool IsBlank() const { + for (char ch : *this) { + if (ch != ' ' && ch != '\t') { + return false; + } + } + return true; + } + + std::string ToString() const { + return std::string{interval_.start(), interval_.size()}; + } + + bool operator<(const CharBlock &that) const { return Compare(that) < 0; } + bool operator<=(const CharBlock &that) const { return Compare(that) <= 0; } + bool operator==(const CharBlock &that) const { return Compare(that) == 0; } + bool operator!=(const CharBlock &that) const { return Compare(that) != 0; } + bool operator>=(const CharBlock &that) const { return Compare(that) >= 0; } + bool operator>(const CharBlock &that) const { return Compare(that) > 0; } + +private: + int Compare(const CharBlock &that) const { + std::size_t bytes{std::min(size(), that.size())}; + int cmp{std::memcmp(static_cast(begin()), + static_cast(that.begin()), bytes)}; + if (cmp != 0) { + return cmp; + } + return size() < that.size() ? -1 : size() > that.size(); + } + + Interval interval_{nullptr, 0}; +}; +} // namespace parser +} // namespace Fortran + +// Specializations to enable std::unordered_map &c. +template<> struct std::hash { + std::size_t operator()(const Fortran::parser::CharBlock &x) const { + std::size_t hash{0}, bytes{x.size()}; + for (std::size_t j{0}; j < bytes; ++j) { + hash = (hash * 31) ^ x[j]; + } + return hash; + } +}; +#endif // FORTRAN_PARSER_CHAR_BLOCK_H_ diff --git a/flang/lib/parser/char-buffer.cc b/flang/lib/parser/char-buffer.cc index 185bcd92b459..2a3c017dc6bf 100644 --- a/flang/lib/parser/char-buffer.cc +++ b/flang/lib/parser/char-buffer.cc @@ -1,12 +1,13 @@ #include "char-buffer.h" #include "idioms.h" #include +#include #include namespace Fortran { namespace parser { -char *CharBuffer::FreeSpace(size_t *n) { +char *CharBuffer::FreeSpace(std::size_t *n) { int offset{LastBlockOffset()}; if (blocks_.empty()) { blocks_.emplace_front(); @@ -20,16 +21,16 @@ char *CharBuffer::FreeSpace(size_t *n) { return last_->data + offset; } -void CharBuffer::Claim(size_t n) { +void CharBuffer::Claim(std::size_t n) { if (n > 0) { bytes_ += n; lastBlockEmpty_ = false; } } -void CharBuffer::Put(const char *data, size_t n) { - size_t chunk; - for (size_t at{0}; at < n; at += chunk) { +void CharBuffer::Put(const char *data, std::size_t n) { + std::size_t chunk; + for (std::size_t at{0}; at < n; at += chunk) { char *to{FreeSpace(&chunk)}; chunk = std::min(n - at, chunk); Claim(chunk); diff --git a/flang/lib/parser/char-buffer.h b/flang/lib/parser/char-buffer.h index 7ea6419cb883..1347f82b54df 100644 --- a/flang/lib/parser/char-buffer.h +++ b/flang/lib/parser/char-buffer.h @@ -4,6 +4,7 @@ // Defines a simple expandable buffer suitable for efficiently accumulating // a stream of bytes. +#include #include #include #include @@ -29,7 +30,7 @@ public: return *this; } - size_t size() const { return bytes_; } + std::size_t size() const { return bytes_; } void clear() { blocks_.clear(); @@ -38,15 +39,15 @@ public: lastBlockEmpty_ = false; } - char *FreeSpace(size_t *); - void Claim(size_t); - void Put(const char *data, size_t n); + char *FreeSpace(std::size_t *); + void Claim(std::size_t); + void Put(const char *data, std::size_t n); void Put(const std::string &); void Put(char x) { Put(&x, 1); } private: struct Block { - static constexpr size_t capacity{1 << 20}; + static constexpr std::size_t capacity{1 << 20}; char data[capacity]; }; @@ -76,7 +77,7 @@ public: ++*this; return result; } - iterator &operator+=(size_t n) { + iterator &operator+=(std::size_t n) { while (n >= Block::capacity - offset_) { n -= Block::capacity - offset_; offset_ = 0; @@ -110,7 +111,7 @@ private: int LastBlockOffset() const { return bytes_ % Block::capacity; } std::forward_list blocks_; std::forward_list::iterator last_{blocks_.end()}; - size_t bytes_{0}; + std::size_t bytes_{0}; bool lastBlockEmpty_{false}; }; } // namespace parser diff --git a/flang/lib/parser/characters.cc b/flang/lib/parser/characters.cc index 8ead0b7955a3..7953fb98d447 100644 --- a/flang/lib/parser/characters.cc +++ b/flang/lib/parser/characters.cc @@ -1,4 +1,6 @@ #include "characters.h" +#include +#include namespace Fortran { namespace parser { @@ -52,9 +54,9 @@ std::optional EUC_JPCharacterBytes(const char *p) { return {}; } -std::optional CountCharacters( - const char *p, size_t bytes, std::optional (*cbf)(const char *)) { - size_t chars{0}; +std::optional CountCharacters( + const char *p, std::size_t bytes, std::optional (*cbf)(const char *)) { + std::size_t chars{0}; const char *limit{p + bytes}; while (p < limit) { ++chars; diff --git a/flang/lib/parser/characters.h b/flang/lib/parser/characters.h index 4e1e1d0bdf8a..d0a837e7c15e 100644 --- a/flang/lib/parser/characters.h +++ b/flang/lib/parser/characters.h @@ -5,6 +5,7 @@ // conversions here to avoid dependences upon and // also to accomodate Fortran tokenization. +#include #include #include @@ -160,8 +161,8 @@ void EmitQuotedChar(char ch, const NORMAL &emit, const INSERTED &insert, std::optional UTF8CharacterBytes(const char *); std::optional EUC_JPCharacterBytes(const char *); -std::optional CountCharacters( - const char *, size_t bytes, std::optional (*)(const char *)); +std::optional CountCharacters( + const char *, std::size_t bytes, std::optional (*)(const char *)); } // namespace parser } // namespace Fortran #endif // FORTRAN_PARSER_CHARACTERS_H_ diff --git a/flang/lib/parser/debug-parser.h b/flang/lib/parser/debug-parser.h index 664d029c1807..f474a8cf9870 100644 --- a/flang/lib/parser/debug-parser.h +++ b/flang/lib/parser/debug-parser.h @@ -7,6 +7,7 @@ #include "basic-parsers.h" #include "parse-state.h" +#include #include #include #include @@ -18,7 +19,8 @@ class DebugParser { public: using resultType = Success; constexpr DebugParser(const DebugParser &) = default; - constexpr DebugParser(const char *str, size_t n) : str_{str}, length_{n} {} + constexpr DebugParser(const char *str, std::size_t n) + : str_{str}, length_{n} {} std::optional Parse(ParseState *state) const { if (auto context = state->context()) { context->Emit(std::cout, *state->cooked().allSources()); @@ -31,10 +33,10 @@ public: private: const char *const str_; - size_t length_; + std::size_t length_; }; -constexpr DebugParser operator""_debug(const char str[], size_t n) { +constexpr DebugParser operator""_debug(const char str[], std::size_t n) { return DebugParser{str, n}; } } // namespace parser diff --git a/flang/lib/parser/grammar.h b/flang/lib/parser/grammar.h index ffcaa84463c4..f32155d2ab63 100644 --- a/flang/lib/parser/grammar.h +++ b/flang/lib/parser/grammar.h @@ -78,7 +78,6 @@ constexpr Parser signedIntLiteralConstant; // R707 constexpr Parser intLiteralConstant; // R708 constexpr Parser kindParam; // R709 constexpr Parser realLiteralConstant; // R714 -constexpr Parser exponentPart; // R717 constexpr Parser charLength; // R723 constexpr Parser charLiteralConstant; // R724 constexpr Parser initialization; // R743 & R805 @@ -179,15 +178,15 @@ template using statementConstructor = construct>; template inline constexpr auto unterminatedStatement(const PA &p) { - return skipMany("\n"_tok) >> statementConstructor{}(getProvenance, - maybe(label), isLabelOk, spaces >> p); + return skipMany("\n"_tok) >> + sourced(statementConstructor{}(maybe(label), isLabelOk, spaces >> p)); } -constexpr auto endOfLine = CharMatch<'\n'>{} / skipMany("\n"_tok) || - fail("expected end of line"_en_US); +constexpr auto endOfLine = "\n"_ch / skipMany("\n"_tok) || + fail("expected end of line"_en_US); constexpr auto endOfStmt = spaces >> - (CharMatch<';'>{} / skipMany(";"_tok) / maybe(endOfLine) || endOfLine); + (";"_ch / skipMany(";"_tok) / maybe(endOfLine) || endOfLine); template inline constexpr auto statement(const PA &p) { return unterminatedStatement(p) / endOfStmt; @@ -273,9 +272,6 @@ TYPE_PARSER( construct{}(indirect(Parser{})) || construct{}(indirect(Parser{}))) -// R516 keyword -> name -constexpr auto keyword = name; - // R604 constant -> literal-constant | named-constant // Used only via R607 int-constant and R845 data-stmt-constant. TYPE_PARSER(construct{}(literalConstant) || @@ -298,8 +294,11 @@ constexpr auto namedIntrinsicOperator = ".LT." >> ".OR." >> pure(DefinedOperator::IntrinsicOperator::OR) || ".EQV." >> pure(DefinedOperator::IntrinsicOperator::EQV) || ".NEQV." >> pure(DefinedOperator::IntrinsicOperator::NEQV) || - extension(".XOR." >> pure(DefinedOperator::IntrinsicOperator::XOR)); -// TODO: .N./.A./.O./.X. abbreviations? + extension(".XOR." >> pure(DefinedOperator::IntrinsicOperator::XOR) || + ".N." >> pure(DefinedOperator::IntrinsicOperator::NOT) || + ".A." >> pure(DefinedOperator::IntrinsicOperator::AND) || + ".O." >> pure(DefinedOperator::IntrinsicOperator::OR) || + ".X." >> pure(DefinedOperator::IntrinsicOperator::XOR)); constexpr auto intrinsicOperator = "**" >> pure(DefinedOperator::IntrinsicOperator::Power) || @@ -588,33 +587,21 @@ constexpr auto executionPart = inContext("execution part"_en_US, many(executionPartConstruct)); // R602 underscore -> _ -constexpr CharMatch<'_'> underscore; +constexpr auto underscore = "_"_ch; +// R516 keyword -> name // R601 alphanumeric-character -> letter | digit | underscore +// R603 name -> letter [alphanumeric-character]... // N.B. Don't accept an underscore if it is immediately followed by a // quotation mark, so that kindParameter_"character literal" is parsed properly. -constexpr auto otherIdCharacter = - underscore / !(CharMatch<'\''>{} || CharMatch<'"'>{}) || - extension( - CharMatch<'$'>{} || // PGI/ifort (and Cray/gfortran, but not first) - CharMatch<'@'>{}); // Cray - -constexpr auto nonDigitIdCharacter = letter || otherIdCharacter; - -// R603 name -> letter [alphanumeric-character]... -static inline Name listToString(std::list &&chlist) { - Name result; - for (auto ch : chlist) { - result += ToLowerCaseLetter(ch); - } - return result; -} - -constexpr auto rawName = applyFunction(listToString, - applyFunction(prepend, nonDigitIdCharacter, - many(nonDigitIdCharacter || digit))); - -TYPE_PARSER(spaces >> rawName) +// PGI and ifort accept '$' in identifiers, even as the initial character. +// Cray and gfortran accept '$', but not as the first character. +// Cray accepts '@' as well. +constexpr auto otherIdChar = underscore / !"'\""_ch || extension("$@"_ch); +constexpr auto nonDigitIdChar = letter || otherIdChar; +constexpr auto rawName = nonDigitIdChar >> many(nonDigitIdChar || digit); +TYPE_PARSER(spaces >> sourced(attempt(rawName) >> construct{})) +constexpr auto keyword = construct{}(name); // R605 literal-constant -> // int-literal-constant | real-literal-constant | @@ -716,7 +703,8 @@ TYPE_PARSER(construct{}( extension(construct{}( construct{}("*" >> digitString)))) -// R707 signed-int-literal-constant -> [sign] int-literal-constant +// R710 signed-digit-string -> [sign] digit-string +// N.B. Not a complete token -- no spaces are skipped. static inline std::int64_t negate(std::uint64_t &&n) { return -n; // TODO: check for overflow } @@ -725,11 +713,13 @@ static inline std::int64_t castToSigned(std::uint64_t &&n) { return n; // TODO: check for overflow } -TYPE_PARSER(spaces >> - construct{}( - CharMatch<'-'>{} >> applyFunction(negate, digitString) || - maybe(CharMatch<'+'>{}) >> applyFunction(castToSigned, digitString), - maybe(underscore >> kindParam))) +constexpr auto signedDigitString = "-"_ch >> + applyFunction(negate, digitString) || + maybe("+"_ch) >> applyFunction(castToSigned, digitString); + +// R707 signed-int-literal-constant -> [sign] int-literal-constant +TYPE_PARSER(spaces >> sourced(construct{}( + signedDigitString, maybe(underscore >> kindParam)))) // R708 int-literal-constant -> digit-string [_ kind-param] TYPE_PARSER(construct{}( @@ -739,16 +729,10 @@ TYPE_PARSER(construct{}( TYPE_PARSER(construct{}(digitString) || construct{}(scalar(integer(constant(name))))) -// R710 signed-digit-string -> [sign] digit-string -// N.B. Not a complete token -- no spaces are skipped. -constexpr auto signedDigitString = CharMatch<'-'>{} >> - applyFunction(negate, digitString) || - maybe(CharMatch<'+'>{}) >> digitString; - // R712 sign -> + | - // Not a complete token. -constexpr auto sign = CharMatch<'+'>{} >> pure(Sign::Positive) || - CharMatch<'-'>{} >> pure(Sign::Negative); +constexpr auto sign = "+"_ch >> pure(Sign::Positive) || + "-"_ch >> pure(Sign::Negative); // R713 signed-real-literal-constant -> [sign] real-literal-constant constexpr auto signedRealLiteralConstant = spaces >> @@ -758,37 +742,23 @@ constexpr auto signedRealLiteralConstant = spaces >> // significand [exponent-letter exponent] [_ kind-param] | // digit-string exponent-letter exponent [_ kind-param] // R715 significand -> digit-string . [digit-string] | . digit-string -// N.B. Preceding spaces are not skipped. -TYPE_CONTEXT_PARSER("REAL literal constant"_en_US, - construct{}(some(digit), - CharMatch<'.'>{} >> - !(some(letter) >> CharMatch<'.'>{}) >> // don't misinterpret 1.AND. - many(digit), - maybe(exponentPart), maybe(underscore >> kindParam)) || - construct{}(CharMatch<'.'>{} >> some(digit), - maybe(exponentPart), maybe(underscore >> kindParam)) || - construct{}( - some(digit), exponentPart, maybe(underscore >> kindParam))) - // R716 exponent-letter -> E | D // Extension: Q -// Not a complete token. -inline constexpr bool isEorD(char ch) { - ch = ToLowerCaseLetter(ch); - return ch == 'e' || ch == 'd'; -} - -inline constexpr bool isQ(char ch) { return ToLowerCaseLetter(ch) == 'q'; } - -constexpr CharPredicateGuardParser exponentEorD{ - isEorD, "expected exponent letter"_en_US}; -constexpr CharPredicateGuardParser exponentQ{ - isQ, "expected exponent letter"_en_US}; - // R717 exponent -> signed-digit-string -// Not a complete token. -TYPE_PARSER(construct{}( - extension(exponentQ) || exponentEorD, signedDigitString)) +// N.B. Preceding spaces are not skipped. +constexpr auto exponentPart = + ("ed"_ch || extension("q"_ch)) >> signedDigitString; + +TYPE_CONTEXT_PARSER("REAL literal constant"_en_US, + construct{}( + sourced( + (digitString >> "."_ch >> + !(some(letter) >> "."_ch /* don't misinterpret 1.AND. */) >> + maybe(digitString) >> maybe(exponentPart) >> ok || + "."_ch >> digitString >> maybe(exponentPart) >> ok || + digitString >> exponentPart >> ok) >> + construct{}), + maybe(underscore >> kindParam))) // R718 complex-literal-constant -> ( real-part , imag-part ) TYPE_CONTEXT_PARSER("COMPLEX literal constant"_en_US, @@ -844,8 +814,7 @@ TYPE_PARSER(construct{}(parenthesized(typeParamValue)) || // N.B. charLiteralConstantWithoutKind does not skip preceding spaces. // N.B. the parsing of "name" takes care to not consume the '_'. constexpr auto charLiteralConstantWithoutKind = - CharMatch<'\''>{} >> CharLiteral<'\''>{} || - CharMatch<'"'>{} >> CharLiteral<'"'>{}; + "'"_ch >> CharLiteral<'\''>{} || "\""_ch >> CharLiteral<'"'>{}; TYPE_CONTEXT_PARSER("CHARACTER literal constant"_en_US, construct{}( @@ -1161,7 +1130,7 @@ TYPE_PARSER(construct{}(declarationTypeSpec, optionalListBeforeColons(Parser{}), nonemptyList(entityDecl)) || // PGI-only extension: don't require the colons - // TODO: The standard requires the colons if the entity + // N.B.: The standard requires the colons if the entity // declarations contain initializers. extension(construct{}(declarationTypeSpec, defaulted("," >> nonemptyList(Parser{})), @@ -1205,7 +1174,7 @@ TYPE_PARSER(construct{}(objectName, maybe(arraySpec), maybe(coarraySpec), maybe("*" >> charLength), maybe(initialization))) // R806 null-init -> function-reference -// TODO: confirm that NULL still intrinsic +// TODO: confirm in semantics that NULL still intrinsic in this scope TYPE_PARSER("NULL ( )" >> construct{}) // R807 access-spec -> PUBLIC | PRIVATE @@ -1503,8 +1472,8 @@ TYPE_PARSER(construct{}(declarationTypeSpec, // R865 letter-spec -> letter [- letter] TYPE_PARSER(spaces >> (construct{}(letter, maybe("-" >> letter)) || - construct{}(extension(otherIdCharacter), - construct>{}))) + construct{}(otherIdChar, + construct>{}))) // R867 import-stmt -> // IMPORT [[::] import-name-list] | @@ -1661,8 +1630,10 @@ TYPE_PARSER(construct{}( // R1023 defined-binary-op -> . letter [letter]... . // R1414 local-defined-operator -> defined-unary-op | defined-binary-op // R1415 use-defined-operator -> defined-unary-op | defined-binary-op -TYPE_PARSER(construct{}(applyFunction(listToString, - spaces >> CharMatch<'.'>{} >> some(letter) / CharMatch<'.'>{}))) +// N.B. The name of the operator is captured without the periods around it. +TYPE_PARSER(spaces >> "."_ch >> + construct{}(sourced(some(letter) >> construct{})) / + "."_ch) // R911 data-ref -> part-ref [% part-ref]... // R914 coindexed-named-object -> data-ref @@ -1672,7 +1643,7 @@ constexpr struct DefinedOperatorName { static std::optional Parse(ParseState *state) { if (std::optional n{definedOpName.Parse(state)}) { if (const auto *user = state->userState()) { - if (user->IsDefinedOperator(n->v)) { + if (user->IsDefinedOperator(n->v.source)) { return {Success{}}; } } @@ -2084,7 +2055,7 @@ static constexpr struct EquivOperand { // R1017 level-5-expr -> [level-5-expr equiv-op] equiv-operand // R1021 equiv-op -> .EQV. | .NEQV. // Logical equivalence is left-associative. -// Extension: .XOR. as synonym for .NEQV. (TODO: is this the right precedence?) +// Extension: .XOR. as synonym for .NEQV. constexpr struct Level5Expr { using resultType = Expr; constexpr Level5Expr() {} @@ -3423,7 +3394,7 @@ constexpr struct NoteOperatorDefinition { if (op.has_value()) { if (auto ustate = state->userState()) { if (const auto *name = std::get_if(&op->u)) { - ustate->NoteDefinedOperator(name->v); + ustate->NoteDefinedOperator(name->v.source); } } } @@ -3726,7 +3697,7 @@ TYPE_CONTEXT_PARSER("PAUSE statement"_en_US, // R1221 dtv-type-spec -> TYPE ( derived-type-spec ) | // CLASS ( derived-type-spec ) // -// There requirement productions are defined and used, but need not be +// These requirement productions are defined and used, but need not be // defined independently here in this file: // R771 lbracket -> [ // R772 rbracket -> ] diff --git a/flang/lib/parser/interval.h b/flang/lib/parser/interval.h new file mode 100644 index 000000000000..fc238b27f5b1 --- /dev/null +++ b/flang/lib/parser/interval.h @@ -0,0 +1,74 @@ +#ifndef FORTRAN_PARSER_INTERVAL_H_ +#define FORTRAN_PARSER_INTERVAL_H_ + +// Defines a generalized template class Interval to represent +// the half-open interval [x .. x+n). + +#include "idioms.h" +#include +#include + +namespace Fortran { +namespace parser { + +template class Interval { +public: + using type = A; + Interval() {} + Interval(const A &s, std::size_t n = 1) : start_{s}, size_{n} {} + Interval(A &&s, std::size_t n = 1) : start_{std::move(s)}, size_{n} {} + Interval(const Interval &) = default; + Interval(Interval &&) = default; + Interval &operator=(const Interval &) = default; + Interval &operator=(Interval &&) = default; + + bool operator==(const Interval &that) const { + return start_ == that.start_ && size_ == that.size_; + } + + const A &start() const { return start_; } + std::size_t size() const { return size_; } + bool empty() const { return size_ == 0; } + + bool Contains(const A &x) const { return start_ <= x && x < start_ + size_; } + bool Contains(const Interval &that) const { + return Contains(that.start_) && Contains(that.start_ + (that.size_ - 1)); + } + bool ImmediatelyPrecedes(const Interval &that) const { + return NextAfter() == that.start_; + } + void Annex(const Interval &that) { + size_ = (that.start_ + that.size_) - start_; + } + bool AnnexIfPredecessor(const Interval &that) { + if (ImmediatelyPrecedes(that)) { + size_ += that.size_; + return true; + } + return false; + } + + std::size_t MemberOffset(const A &x) const { + CHECK(Contains(x)); + return x - start_; + } + A OffsetMember(std::size_t n) const { + CHECK(n < size_); + return start_ + n; + } + + A Last() const { return start_ + (size_ - 1); } + A NextAfter() const { return start_ + size_; } + Interval Prefix(std::size_t n) const { return {start_, std::min(size_, n)}; } + Interval Suffix(std::size_t n) const { + CHECK(n <= size_); + return {start_ + n, size_ - n}; + } + +private: + A start_; + std::size_t size_{0}; +}; +} // namespace parser +} // namespace Fortran +#endif // FORTRAN_PARSER_INTERVAL_H_ diff --git a/flang/lib/parser/message.cc b/flang/lib/parser/message.cc index 85c629fd177c..055275332770 100644 --- a/flang/lib/parser/message.cc +++ b/flang/lib/parser/message.cc @@ -1,5 +1,6 @@ #include "message.h" #include +#include #include #include @@ -7,7 +8,7 @@ namespace Fortran { namespace parser { std::ostream &operator<<(std::ostream &o, const MessageFixedText &t) { - for (size_t j{0}; j < t.size(); ++j) { + for (std::size_t j{0}; j < t.size(); ++j) { o << t.str()[j]; } return o; @@ -41,7 +42,7 @@ MessageFixedText MessageExpectedText::AsMessageFixedText() const { if (chars[1] == '\0') { // one-time initialization of array used for permanant single-byte string // pointers - for (size_t j{0}; j < sizeof chars; ++j) { + for (std::size_t j{0}; j < sizeof chars; ++j) { chars[j] = j; } } diff --git a/flang/lib/parser/message.h b/flang/lib/parser/message.h index b58664530338..4a6fbd01885a 100644 --- a/flang/lib/parser/message.h +++ b/flang/lib/parser/message.h @@ -6,6 +6,7 @@ #include "idioms.h" #include "provenance.h" +#include #include #include #include @@ -19,7 +20,7 @@ namespace parser { class MessageFixedText { public: MessageFixedText() {} - constexpr MessageFixedText(const char str[], size_t n) + constexpr MessageFixedText(const char str[], std::size_t n) : str_{str}, bytes_{n} {} constexpr MessageFixedText(const MessageFixedText &) = default; MessageFixedText(MessageFixedText &&) = default; @@ -27,17 +28,17 @@ public: MessageFixedText &operator=(MessageFixedText &&) = default; const char *str() const { return str_; } - size_t size() const { return bytes_; } + std::size_t size() const { return bytes_; } bool empty() const { return bytes_ == 0; } std::string ToString() const; private: const char *str_{nullptr}; - size_t bytes_{0}; + std::size_t bytes_{0}; }; -constexpr MessageFixedText operator""_en_US(const char str[], size_t n) { +constexpr MessageFixedText operator""_en_US(const char str[], std::size_t n) { return MessageFixedText{str, n}; } @@ -55,14 +56,14 @@ private: // Represents a formatted rendition of "expected '%s'"_en_US on a constant text. class MessageExpectedText { public: - MessageExpectedText(const char *s, size_t n) : str_{s}, bytes_{n} {} + MessageExpectedText(const char *s, std::size_t n) : str_{s}, bytes_{n} {} explicit MessageExpectedText(char ch) : singleton_{ch} {} MessageFixedText AsMessageFixedText() const; private: const char *str_{nullptr}; char singleton_; - size_t bytes_{1}; + std::size_t bytes_{1}; }; class Message; @@ -132,7 +133,7 @@ public: const AllSources &allSources() const { return allSources_; } Message &Put(Message &&m) { - CHECK(m.provenance() < allSources_.size()); + CHECK(allSources_.IsValid(m.provenance())); if (messages_.empty()) { messages_.emplace_front(std::move(m)); last_ = messages_.begin(); diff --git a/flang/lib/parser/parse-state.h b/flang/lib/parser/parse-state.h index abed10390966..a6e6915bfe15 100644 --- a/flang/lib/parser/parse-state.h +++ b/flang/lib/parser/parse-state.h @@ -11,6 +11,7 @@ #include "idioms.h" #include "message.h" #include "provenance.h" +#include #include #include #include @@ -53,7 +54,7 @@ public: } void swap(ParseState &that) { - constexpr size_t bytes{sizeof *this}; + constexpr std::size_t bytes{sizeof *this}; char buffer[bytes]; std::memcpy(buffer, this, bytes); std::memcpy(this, &that, bytes); diff --git a/flang/lib/parser/parse-tree-visitor.h b/flang/lib/parser/parse-tree-visitor.h index d89dc961d98a..27608096509e 100644 --- a/flang/lib/parser/parse-tree-visitor.h +++ b/flang/lib/parser/parse-tree-visitor.h @@ -2,6 +2,7 @@ #define FORTRAN_PARSER_PARSE_TREE_VISITOR_H_ #include "parse-tree.h" +#include #include #include #include @@ -17,7 +18,7 @@ namespace Fortran { namespace parser { -// Default case for visitation of non-class data members (and strings) +// Default case for visitation of non-class data members and strings template typename std::enable_if || std::is_same_v>::type @@ -45,7 +46,7 @@ template void Walk(const std::list &x, V &visitor) { Walk(elem, visitor); } } -template +template void ForEachInTuple(const T &tuple, Func func) { if constexpr (I < std::tuple_size_v) { func(std::get(tuple)); @@ -130,6 +131,12 @@ template void Walk(const Statement &x, V &visitor) { } } +template void Walk(const Name &x, V &visitor) { + if (visitor.Pre(x)) { + visitor.Post(x); + } +} + template void Walk(const AcSpec &x, V &visitor) { if (visitor.Pre(x)) { Walk(x.type, visitor); @@ -240,13 +247,16 @@ template void Walk(const ReadStmt &x, V &visitor) { } template void Walk(const RealLiteralConstant &x, V &visitor) { if (visitor.Pre(x)) { - Walk(x.intPart, visitor); - Walk(x.fraction, visitor); - Walk(x.exponent, visitor); + Walk(x.real, visitor); Walk(x.kind, visitor); visitor.Post(x); } } +template void Walk(const RealLiteralConstant::Real &x, V &visitor) { + if (visitor.Pre(x)) { + visitor.Post(x); + } +} template void Walk(const StructureComponent &x, V &visitor) { if (visitor.Pre(x)) { Walk(x.base, visitor); diff --git a/flang/lib/parser/parse-tree.cc b/flang/lib/parser/parse-tree.cc index cf98176a6a56..315edefaabd6 100644 --- a/flang/lib/parser/parse-tree.cc +++ b/flang/lib/parser/parse-tree.cc @@ -6,33 +6,6 @@ namespace Fortran { namespace parser { -// R714 real-literal-constant -// R715 significand -static std::string charListToString(std::list &&cs) { - std::string result; - for (auto ch : cs) { - result += ch; - } - return result; -} - -RealLiteralConstant::RealLiteralConstant(std::list &&i, - std::list &&f, std::optional &&expo, - std::optional &&k) - : intPart{charListToString(std::move(i))}, fraction{charListToString( - std::move(f))}, - exponent(std::move(expo)), kind(std::move(k)) {} - -RealLiteralConstant::RealLiteralConstant(std::list &&f, - std::optional &&expo, std::optional &&k) - : fraction{charListToString(std::move(f))}, exponent(std::move(expo)), - kind(std::move(k)) {} - -RealLiteralConstant::RealLiteralConstant( - std::list &&i, ExponentPart &&expo, std::optional &&k) - : intPart{charListToString(std::move(i))}, exponent(std::move(expo)), - kind(std::move(k)) {} - // R867 ImportStmt::ImportStmt(Kind &&k, std::list &&n) : kind{k}, names(std::move(n)) { @@ -67,7 +40,7 @@ ProcedureDesignator Designator::ConvertToProcedureDesignator() { }, [](Substring &) -> ProcedureDesignator { CHECK(!"can't get here"); - return {Name{""}}; + return {Name{}}; }}, u); } @@ -158,7 +131,7 @@ ActualArg SectionSubscript::ConvertToActualArgument() { }, [](SubscriptTriplet &) -> ActualArg { CHECK(!"can't happen"); - return {Name{"bad"}}; + return {Name{}}; }}, u); } diff --git a/flang/lib/parser/parse-tree.h b/flang/lib/parser/parse-tree.h index e5bab7fdfc67..0caf8d65e79f 100644 --- a/flang/lib/parser/parse-tree.h +++ b/flang/lib/parser/parse-tree.h @@ -9,6 +9,8 @@ // run-time I/O support library have been isolated into a distinct header file // (viz., format-specification.h). +#include "char-block.h" +#include "characters.h" #include "format-specification.h" #include "idioms.h" #include "indirection.h" @@ -39,14 +41,19 @@ CLASS_TRAIT(UnionTrait); CLASS_TRAIT(TupleTrait); // Most non-template classes in this file use these default definitions -// for their move constructor and move assignment operator=. -#define BOILERPLATE(classname) \ +// for their move constructor and move assignment operator=, and disable +// their copy constructor and copy assignment operator=. +#define COPY_AND_ASSIGN_BOILERPLATE(classname) \ classname(classname &&) = default; \ classname &operator=(classname &&) = default; \ - classname() = delete; \ classname(const classname &) = delete; \ classname &operator=(const classname &) = delete +// Almost all classes in this file have no default constructor. +#define BOILERPLATE(classname) \ + COPY_AND_ASSIGN_BOILERPLATE(classname); \ + classname() = delete + // Empty classes are often used below as alternatives in std::variant<> // discriminated unions. #define EMPTY_CLASS(classname) \ @@ -229,8 +236,10 @@ struct AssignStmt; struct AssignedGotoStmt; struct PauseStmt; +// Cooked character stream locations +using Location = const char *; + // Implicit definitions of the Standard -using Keyword = std::string; // R403 scalar-xyz -> xyz // These template class wrappers correspond to the Standard's modifiers @@ -290,10 +299,10 @@ using Label = std::uint64_t; // validated later, must be in [1..99999] // A wrapper for xzy-stmt productions that are statements, so that // source provenances and labels have a uniform representation. template struct Statement { - Statement(Provenance &&at, std::optional &&lab, bool &&accept, A &&s) - : provenance(at), label(std::move(lab)), isLabelInAcceptableField{accept}, + Statement(std::optional &&lab, bool &&accept, A &&s) + : label(std::move(lab)), isLabelInAcceptableField{accept}, statement(std::move(s)) {} - Provenance provenance; + CharBlock source; std::optional