我无法确切了解你的词法分析器如何工作,所以为了测试和嵌入标识符的解析,我快速编写了一个简陋的词法分析器。
该代码中,括号可以在词法分析器或解析器中处理。在这个例子中,我在词法分析器中处理了它们。
首先声明以下类型:
public enum TokenType
{
Error,
Identifier,
Number,
Etc,
}
public class Token
{
public TokenType TokenType { get; set; }
public string Text { get; set; }
public long Number { get; set; }
public override string ToString() => TokenType == TokenType.Number ? $"Number = {Number}" : $"{TokenType} = \"{Text}\"";
}
然后是词法分析器类:
public class Lexer
{
string _input;
int _index;
char _ch;
public Lexer(string input)
{
_input = input;
_index = 0;
GetChar();
}
private void GetChar()
{
_ch = _index < _input.Length ? _input[_index++] : (char)0;
}
public IEnumerable<Token> EnumerateTokens()
{
while (_ch != (char)0)
{
while (_ch != (char)0 && Char.IsWhiteSpace(_ch))
{
GetChar();
}
switch (_ch)
{
case >= '0' and <= '9':
string numString = "";
do
{
numString += _ch;
GetChar();
} while (Char.IsDigit(_ch));
if (Int64.TryParse(numString, out long num))
{
yield return new Token { TokenType = TokenType.Number, Number = num };
}
else
{
yield return new Token { TokenType = TokenType.Error, Text = "Invalid number" };
}
break;
case '[': // 处理带方括号的标识符
GetChar(); // 跳过 '['
var identToken = GetIdentifier();
if (_ch == ']')
{
GetChar();
yield return identToken;
}
else if (identToken.TokenType == TokenType.Error)
{
yield return identToken;
}
else
{
yield return new Token { TokenType = TokenType.Error, Text = "Missing ']' " };
}
break;
case > ' ' and < (char)127 and not '.' and not '!' and not '`' and not ']':
yield return GetIdentifier(); // 处理不带方括号的标识符
break;
default:
yield return new Token { TokenType = TokenType.Error, Text = $"Unexpected character '{_ch}'" };
break;
}
}
}
private Token GetIdentifier()
{
string ident = "";
do
{
ident += _ch;
GetChar();
} while (_ch is >= ' ' and < (char)127 and not '.' and not '!' and not '`' and not '[' and not ']');
return new Token { TokenType = TokenType.Identifier, Text = ident };
}
}
测试部分:
var lexer = new Lexer("10 myid\r\n[myid]\r\n[my identifier]");
foreach (Token token in lexer.EnumerateTokens())
{
Console.WriteLine(token);
}
Console.ReadKey();
运行上述测试代码将输出:
Number = 10
Identifier = "myid"
Identifier = "myid"
Identifier = "my identifier"
注意:Erik A 提供的文档并未明确指出有效标识符应采用哪种字符集。在这个例子中,我假设使用的是7位ASCII字符集,但在更现代的实现中可能需要扩展到8位ANSI或Unicode。然而,Unicode还包括许多编码大于255的控制字符。