Forth是一种基于栈的编程语言,以其简洁性和高效性而闻名。它采用逆波兰表示法(后缀表示法),所有操作都通过操作栈来完成。在 Exercism 的 “forth” 练习中,我们需要实现一个简化版的Forth解释器,支持基本的算术运算、栈操作和用户自定义词汇。这不仅能帮助我们掌握解释器设计,还能深入学习Rust中的错误处理、数据结构和解析技术。

什么是Forth?

Forth是一种基于栈的编程语言,具有以下特点:

  1. 基于栈:所有操作都通过操作栈完成
  2. 逆波兰表示法:操作符后置于操作数(如:1 2 +)
  3. 交互式:支持即时执行命令
  4. 可扩展:允许用户定义新词汇(函数)

例如:

1 2 +    // 将1和2压入栈,然后执行加法操作
3 4 -    // 将3和4压入栈,然后执行减法操作,结果为-1

让我们先看看练习提供的结构和函数签名:

pub type Value = i32;
pub type Result = std::result::Result<(), Error>;

pub struct Forth;

#[derive(Debug, PartialEq)]
pub enum Error {
    DivisionByZero,
    StackUnderflow,
    UnknownWord,
    InvalidWord,
}

impl Forth {
    pub fn new() -> Forth {
        unimplemented!()
    }

    pub fn stack(&self) -> &[Value] {
        unimplemented!()
    }

    pub fn eval(&mut self, input: &str) -> Result {
        unimplemented!("result of evaluating '{}'", input)
    }
}

我们需要实现一个完整的Forth解释器,支持基本操作和用户自定义词汇。

设计分析

1. 核心组件

  1. :存储数据值
  2. 词汇表:存储内置和用户定义的词汇
  3. 解析器:解析输入字符串
  4. 执行器:执行解析后的命令

2. 错误处理

需要处理以下错误情况:

  • DivisionByZero:除零错误
  • StackUnderflow:栈下溢错误
  • UnknownWord:未知词汇错误
  • InvalidWord:无效词汇错误

完整实现

1. 基础结构实现

use std::collections::HashMap;

pub type Value = i32;
pub type Result = std::result::Result<(), Error>;

#[derive(Debug, PartialEq)]
pub enum Error {
    DivisionByZero,
    StackUnderflow,
    UnknownWord,
    InvalidWord,
}

pub struct Forth {
    stack: Vec<Value>,
    words: HashMap<String, Vec<String>>,
}

impl Forth {
    pub fn new() -> Forth {
        Forth {
            stack: Vec::new(),
            words: HashMap::new(),
        }
    }

    pub fn stack(&self) -> &[Value] {
        &self.stack
    }

    pub fn eval(&mut self, input: &str) -> Result {
        let tokens: Vec<&str> = input.split_whitespace().collect();
        self.eval_tokens(&tokens)
    }
    
    fn eval_tokens(&mut self, tokens: &[&str]) -> Result {
        let mut i = 0;
        while i < tokens.len() {
            let token = tokens[i];
            
            // 检查是否是定义词汇的开始
            if token.eq_ignore_ascii_case(":") {
                i = self.define_word(&tokens[i+1..])?;
                continue;
            }
            
            // 检查是否是数字
            if let Ok(num) = token.parse::<Value>() {
                self.stack.push(num);
                i += 1;
                continue;
            }
            
            // 检查是否是内置命令
            if self.eval_builtin(token)? {
                i += 1;
                continue;
            }
            
            // 检查是否是用户定义的词汇
            if let Some(word_def) = self.words.get(&token.to_ascii_lowercase()) {
                let word_tokens: Vec<&str> = word_def.iter().map(|s| s.as_str()).collect();
                self.eval_tokens(&word_tokens)?;
                i += 1;
                continue;
            }
            
            // 未知词汇
            return Err(Error::UnknownWord);
        }
        
        Ok(())
    }
    
    fn define_word(&mut self, tokens: &[&str]) -> Result<usize> {
        if tokens.is_empty() {
            return Err(Error::InvalidWord);
        }
        
        // 获取词汇名称
        let word_name = tokens[0];
        
        // 检查名称是否是数字
        if word_name.parse::<Value>().is_ok() {
            return Err(Error::InvalidWord);
        }
        
        // 查找分号
        let semicolon_pos = tokens.iter().position(|&t| t == ";").ok_or(Error::InvalidWord)?;
        
        // 获取定义内容
        let definition: Vec<String> = tokens[1..semicolon_pos].iter().map(|s| s.to_string()).collect();
        
        // 存储定义
        self.words.insert(word_name.to_ascii_lowercase(), definition);
        
        // 返回下一个位置
        Ok(semicolon_pos + 1)
    }
    
    fn eval_builtin(&mut self, token: &str) -> std::result::Result<bool, Error> {
        match token.to_ascii_lowercase().as_str() {
            "+" => {
                if self.stack.len() < 2 {
                    return Err(Error::StackUnderflow);
                }
                let b = self.stack.pop().unwrap();
                let a = self.stack.pop().unwrap();
                self.stack.push(a + b);
                Ok(true)
            }
            "-" => {
                if self.stack.len() < 2 {
                    return Err(Error::StackUnderflow);
                }
                let b = self.stack.pop().unwrap();
                let a = self.stack.pop().unwrap();
                self.stack.push(a - b);
                Ok(true)
            }
            "*" => {
                if self.stack.len() < 2 {
                    return Err(Error::StackUnderflow);
                }
                let b = self.stack.pop().unwrap();
                let a = self.stack.pop().unwrap();
                self.stack.push(a * b);
                Ok(true)
            }
            "/" => {
                if self.stack.len() < 2 {
                    return Err(Error::StackUnderflow);
                }
                let b = self.stack.pop().unwrap();
                let a = self.stack.pop().unwrap();
                if b == 0 {
                    return Err(Error::DivisionByZero);
                }
                self.stack.push(a / b);
                Ok(true)
            }
            "dup" => {
                if self.stack.is_empty() {
                    return Err(Error::StackUnderflow);
                }
                let val = self.stack[self.stack.len() - 1];
                self.stack.push(val);
                Ok(true)
            }
            "drop" => {
                if self.stack.is_empty() {
                    return Err(Error::StackUnderflow);
                }
                self.stack.pop();
                Ok(true)
            }
            "swap" => {
                if self.stack.len() < 2 {
                    return Err(Error::StackUnderflow);
                }
                let len = self.stack.len();
                self.stack.swap(len - 1, len - 2);
                Ok(true)
            }
            "over" => {
                if self.stack.len() < 2 {
                    return Err(Error::StackUnderflow);
                }
                let val = self.stack[self.stack.len() - 2];
                self.stack.push(val);
                Ok(true)
            }
            _ => Ok(false), // 不是内置命令
        }
    }
}

测试用例分析

通过查看测试用例,我们可以更好地理解需求:

#[test]
fn no_input_no_stack() {
    assert_eq!(Vec::<Value>::new(), Forth::new().stack());
}

新创建的Forth实例应该有空栈。

#[test]
fn numbers_just_get_pushed_onto_the_stack() {
    let mut f = Forth::new();
    assert!(f.eval("1 2 3 4 5").is_ok());
    assert_eq!(vec![1, 2, 3, 4, 5], f.stack());
}

数字应该被压入栈中。

#[test]
fn can_add_two_numbers() {
    let mut f = Forth::new();
    assert!(f.eval("1 2 +").is_ok());
    assert_eq!(vec![3], f.stack());
}

加法运算应该正确执行。

#[test]
fn addition_error() {
    let mut f = Forth::new();
    assert_eq!(Err(Error::StackUnderflow), f.eval("1 +"));
    assert_eq!(Err(Error::StackUnderflow), f.eval("+"));
}

栈下溢时应该返回错误。

#[test]
fn errors_if_dividing_by_zero() {
    let mut f = Forth::new();
    assert_eq!(Err(Error::DivisionByZero), f.eval("4 0 /"));
}

除零时应该返回错误。

#[test]
fn can_consist_of_built_in_words() {
    let mut f = Forth::new();
    assert!(f.eval(": dup-twice dup dup ;").is_ok());
    assert!(f.eval("1 dup-twice").is_ok());
    assert_eq!(vec![1, 1, 1], f.stack());
}

应该支持用户自定义词汇。

#[test]
fn can_use_different_words_with_the_same_name() {
    let mut f = Forth::new();
    assert!(f.eval(": foo 5 ;").is_ok());
    assert!(f.eval(": bar foo ;").is_ok());
    assert!(f.eval(": foo 6 ;").is_ok());
    assert!(f.eval("bar foo").is_ok());
    assert_eq!(vec![5, 6], f.stack());
}

应该支持词汇的重新定义。

#[test]
fn defining_a_number() {
    let mut f = Forth::new();
    assert_eq!(Err(Error::InvalidWord), f.eval(": 1 2 ;"));
}

不应该允许用数字作为词汇名称。

性能优化版本

考虑性能的优化实现:

use std::collections::HashMap;

pub type Value = i32;
pub type Result = std::result::Result<(), Error>;

#[derive(Debug, PartialEq)]
pub enum Error {
    DivisionByZero,
    StackUnderflow,
    UnknownWord,
    InvalidWord,
}

pub struct Forth {
    stack: Vec<Value>,
    words: HashMap<String, Vec<String>>,
}

impl Forth {
    pub fn new() -> Forth {
        Forth {
            stack: Vec::with_capacity(32), // 预分配容量
            words: HashMap::new(),
        }
    }

    pub fn stack(&self) -> &[Value] {
        &self.stack
    }

    pub fn eval(&mut self, input: &str) -> Result {
        // 预分配tokens向量
        let mut tokens = Vec::with_capacity(input.len() / 2);
        for token in input.split_whitespace() {
            tokens.push(token);
        }
        self.eval_tokens(&tokens)
    }
    
    fn eval_tokens(&mut self, tokens: &[&str]) -> Result {
        let mut i = 0;
        while i < tokens.len() {
            let token = tokens[i];
            
            // 检查是否是定义词汇的开始
            if token.eq_ignore_ascii_case(":") {
                i = self.define_word(&tokens[i+1..])?;
                continue;
            }
            
            // 检查是否是数字
            if let Ok(num) = token.parse::<Value>() {
                self.stack.push(num);
                i += 1;
                continue;
            }
            
            // 检查是否是内置命令
            match self.eval_builtin(token) {
                Ok(true) => {
                    i += 1;
                    continue;
                }
                Ok(false) => {
                    // 继续检查用户定义的词汇
                }
                Err(e) => return Err(e),
            }
            
            // 检查是否是用户定义的词汇
            if let Some(word_def) = self.words.get(&token.to_ascii_lowercase()) {
                // 为了避免递归调用导致栈溢出,我们直接展开定义
                let mut expanded_tokens = Vec::with_capacity(word_def.len());
                for word in word_def {
                    expanded_tokens.push(word.as_str());
                }
                self.eval_tokens(&expanded_tokens)?;
                i += 1;
                continue;
            }
            
            // 未知词汇
            return Err(Error::UnknownWord);
        }
        
        Ok(())
    }
    
    fn define_word(&mut self, tokens: &[&str]) -> Result<usize> {
        if tokens.is_empty() {
            return Err(Error::InvalidWord);
        }
        
        // 获取词汇名称
        let word_name = tokens[0];
        
        // 检查名称是否是数字
        if word_name.parse::<Value>().is_ok() {
            return Err(Error::InvalidWord);
        }
        
        // 查找分号
        let semicolon_pos = tokens.iter().position(|&t| t == ";").ok_or(Error::InvalidWord)?;
        
        // 获取定义内容
        let definition: Vec<String> = tokens[1..semicolon_pos].iter().map(|s| s.to_string()).collect();
        
        // 存储定义
        self.words.insert(word_name.to_ascii_lowercase(), definition);
        
        // 返回下一个位置
        Ok(semicolon_pos + 1)
    }
    
    fn eval_builtin(&mut self, token: &str) -> std::result::Result<bool, Error> {
        match token.to_ascii_lowercase().as_str() {
            "+" => {
                if self.stack.len() < 2 {
                    return Err(Error::StackUnderflow);
                }
                let b = self.stack.pop().unwrap();
                let a = self.stack.pop().unwrap();
                self.stack.push(a + b);
                Ok(true)
            }
            "-" => {
                if self.stack.len() < 2 {
                    return Err(Error::StackUnderflow);
                }
                let b = self.stack.pop().unwrap();
                let a = self.stack.pop().unwrap();
                self.stack.push(a - b);
                Ok(true)
            }
            "*" => {
                if self.stack.len() < 2 {
                    return Err(Error::StackUnderflow);
                }
                let b = self.stack.pop().unwrap();
                let a = self.stack.pop().unwrap();
                self.stack.push(a * b);
                Ok(true)
            }
            "/" => {
                if self.stack.len() < 2 {
                    return Err(Error::StackUnderflow);
                }
                let b = self.stack.pop().unwrap();
                let a = self.stack.pop().unwrap();
                if b == 0 {
                    return Err(Error::DivisionByZero);
                }
                self.stack.push(a / b);
                Ok(true)
            }
            "dup" => {
                if self.stack.is_empty() {
                    return Err(Error::StackUnderflow);
                }
                let val = self.stack[self.stack.len() - 1];
                self.stack.push(val);
                Ok(true)
            }
            "drop" => {
                if self.stack.is_empty() {
                    return Err(Error::StackUnderflow);
                }
                self.stack.pop();
                Ok(true)
            }
            "swap" => {
                if self.stack.len() < 2 {
                    return Err(Error::StackUnderflow);
                }
                let len = self.stack.len();
                self.stack.swap(len - 1, len - 2);
                Ok(true)
            }
            "over" => {
                if self.stack.len() < 2 {
                    return Err(Error::StackUnderflow);
                }
                let val = self.stack[self.stack.len() - 2];
                self.stack.push(val);
                Ok(true)
            }
            _ => Ok(false), // 不是内置命令
        }
    }
}

内存优化版本(防止分配攻击)

考虑内存使用的优化实现:

use std::collections::HashMap;

pub type Value = i32;
pub type Result = std::result::Result<(), Error>;

#[derive(Debug, PartialEq)]
pub enum Error {
    DivisionByZero,
    StackUnderflow,
    UnknownWord,
    InvalidWord,
}

pub struct Forth {
    stack: Vec<Value>,
    words: HashMap<String, WordDefinition>,
}

// 使用枚举来存储词汇定义,避免在定义时展开
#[derive(Debug, Clone)]
enum WordDefinition {
    BuiltIn(BuiltInWord),
    Custom(Vec<Token>),
}

#[derive(Debug, Clone)]
enum Token {
    Number(Value),
    Word(String),
}

#[derive(Debug, Clone)]
enum BuiltInWord {
    Add,
    Subtract,
    Multiply,
    Divide,
    Dup,
    Drop,
    Swap,
    Over,
}

impl Forth {
    pub fn new() -> Forth {
        let mut forth = Forth {
            stack: Vec::with_capacity(32),
            words: HashMap::new(),
        };
        
        // 初始化内置词汇
        forth.init_builtins();
        forth
    }

    pub fn stack(&self) -> &[Value] {
        &self.stack
    }

    pub fn eval(&mut self, input: &str) -> Result {
        let tokens = self.tokenize(input)?;
        self.eval_tokens(&tokens)
    }
    
    fn init_builtins(&mut self) {
        self.words.insert("+".to_string(), WordDefinition::BuiltIn(BuiltInWord::Add));
        self.words.insert("-".to_string(), WordDefinition::BuiltIn(BuiltInWord::Subtract));
        self.words.insert("*".to_string(), WordDefinition::BuiltIn(BuiltInWord::Multiply));
        self.words.insert("/".to_string(), WordDefinition::BuiltIn(BuiltInWord::Divide));
        self.words.insert("dup".to_string(), WordDefinition::BuiltIn(BuiltInWord::Dup));
        self.words.insert("drop".to_string(), WordDefinition::BuiltIn(BuiltInWord::Drop));
        self.words.insert("swap".to_string(), WordDefinition::BuiltIn(BuiltInWord::Swap));
        self.words.insert("over".to_string(), WordDefinition::BuiltIn(BuiltInWord::Over));
    }
    
    fn tokenize(&self, input: &str) -> Result<Vec<Token>> {
        let mut tokens = Vec::new();
        
        for word in input.split_whitespace() {
            if let Ok(num) = word.parse::<Value>() {
                tokens.push(Token::Number(num));
            } else {
                tokens.push(Token::Word(word.to_ascii_lowercase()));
            }
        }
        
        Ok(tokens)
    }
    
    fn eval_tokens(&mut self, tokens: &[Token]) -> Result {
        for token in tokens {
            match token {
                Token::Number(value) => {
                    self.stack.push(*value);
                }
                Token::Word(word) => {
                    if word == ":" {
                        // 这里应该处理定义,但在tokenize阶段已经处理了
                        // 实际实现中需要更复杂的解析逻辑
                        return Err(Error::InvalidWord);
                    }
                    
                    self.eval_word(word)?;
                }
            }
        }
        
        Ok(())
    }
    
    fn eval_word(&mut self, word: &str) -> Result {
        // 处理定义命令
        if word == ":" {
            return Err(Error::InvalidWord); // 定义应该在更高层处理
        }
        
        // 查找词汇定义
        match self.words.get(word) {
            Some(WordDefinition::BuiltIn(builtin)) => {
                self.eval_builtin(builtin.clone())
            }
            Some(WordDefinition::Custom(tokens)) => {
                // 递归执行自定义词汇
                for token in tokens {
                    match token {
                        Token::Number(value) => {
                            self.stack.push(*value);
                        }
                        Token::Word(word) => {
                            self.eval_word(word)?;
                        }
                    }
                }
                Ok(())
            }
            None => Err(Error::UnknownWord),
        }
    }
    
    fn define_word(&mut self, tokens: &[&str]) -> Result<usize> {
        if tokens.is_empty() {
            return Err(Error::InvalidWord);
        }
        
        // 获取词汇名称
        let word_name = tokens[0];
        
        // 检查名称是否是数字
        if word_name.parse::<Value>().is_ok() {
            return Err(Error::InvalidWord);
        }
        
        // 查找分号
        let semicolon_pos = tokens.iter().position(|&t| t == ";").ok_or(Error::InvalidWord)?;
        
        // 获取定义内容并转换为Token
        let mut definition = Vec::new();
        for token in &tokens[1..semicolon_pos] {
            if let Ok(num) = token.parse::<Value>() {
                definition.push(Token::Number(num));
            } else {
                definition.push(Token::Word(token.to_ascii_lowercase()));
            }
        }
        
        // 存储定义
        self.words.insert(word_name.to_ascii_lowercase(), WordDefinition::Custom(definition));
        
        // 返回下一个位置
        Ok(semicolon_pos + 1)
    }
    
    fn eval_builtin(&mut self, builtin: BuiltInWord) -> Result {
        match builtin {
            BuiltInWord::Add => {
                if self.stack.len() < 2 {
                    return Err(Error::StackUnderflow);
                }
                let b = self.stack.pop().unwrap();
                let a = self.stack.pop().unwrap();
                self.stack.push(a + b);
                Ok(())
            }
            BuiltInWord::Subtract => {
                if self.stack.len() < 2 {
                    return Err(Error::StackUnderflow);
                }
                let b = self.stack.pop().unwrap();
                let a = self.stack.pop().unwrap();
                self.stack.push(a - b);
                Ok(())
            }
            BuiltInWord::Multiply => {
                if self.stack.len() < 2 {
                    return Err(Error::StackUnderflow);
                }
                let b = self.stack.pop().unwrap();
                let a = self.stack.pop().unwrap();
                self.stack.push(a * b);
                Ok(())
            }
            BuiltInWord::Divide => {
                if self.stack.len() < 2 {
                    return Err(Error::StackUnderflow);
                }
                let b = self.stack.pop().unwrap();
                let a = self.stack.pop().unwrap();
                if b == 0 {
                    return Err(Error::DivisionByZero);
                }
                self.stack.push(a / b);
                Ok(())
            }
            BuiltInWord::Dup => {
                if self.stack.is_empty() {
                    return Err(Error::StackUnderflow);
                }
                let val = self.stack[self.stack.len() - 1];
                self.stack.push(val);
                Ok(())
            }
            BuiltInWord::Drop => {
                if self.stack.is_empty() {
                    return Err(Error::StackUnderflow);
                }
                self.stack.pop();
                Ok(())
            }
            BuiltInWord::Swap => {
                if self.stack.len() < 2 {
                    return Err(Error::StackUnderflow);
                }
                let len = self.stack.len();
                self.stack.swap(len - 1, len - 2);
                Ok(())
            }
            BuiltInWord::Over => {
                if self.stack.len() < 2 {
                    return Err(Error::StackUnderflow);
                }
                let val = self.stack[self.stack.len() - 2];
                self.stack.push(val);
                Ok(())
            }
        }
    }
}

错误处理和边界情况

考虑更多边界情况的实现:

use std::collections::HashMap;

pub type Value = i32;
pub type Result = std::result::Result<(), Error>;

#[derive(Debug, PartialEq)]
pub enum Error {
    DivisionByZero,
    StackUnderflow,
    UnknownWord,
    InvalidWord,
}

pub struct Forth {
    stack: Vec<Value>,
    words: HashMap<String, Vec<ParsedToken>>,
    state: ParserState,
}

#[derive(Debug, Clone)]
enum ParsedToken {
    Number(Value),
    Word(String),
}

#[derive(Debug)]
enum ParserState {
    Normal,
    Defining { name: String, body: Vec<ParsedToken> },
}

impl Forth {
    pub fn new() -> Forth {
        Forth {
            stack: Vec::with_capacity(32),
            words: HashMap::new(),
            state: ParserState::Normal,
        }
    }

    pub fn stack(&self) -> &[Value] {
        &self.stack
    }

    pub fn eval(&mut self, input: &str) -> Result {
        // 重置状态
        self.state = ParserState::Normal;
        
        let tokens = self.tokenize(input)?;
        self.eval_tokens(&tokens)
    }
    
    fn tokenize(&self, input: &str) -> Result<Vec<ParsedToken>> {
        let mut tokens = Vec::new();
        
        for word in input.split_whitespace() {
            if let Ok(num) = word.parse::<Value>() {
                tokens.push(ParsedToken::Number(num));
            } else {
                tokens.push(ParsedToken::Word(word.to_string()));
            }
        }
        
        Ok(tokens)
    }
    
    fn eval_tokens(&mut self, tokens: &[ParsedToken]) -> Result {
        for token in tokens {
            match &self.state {
                ParserState::Normal => {
                    match token {
                        ParsedToken::Number(value) => {
                            self.stack.push(*value);
                        }
                        ParsedToken::Word(word) => {
                            if word.eq_ignore_ascii_case(":") {
                                self.state = ParserState::Defining { name: String::new(), body: Vec::new() };
                                continue;
                            }
                            
                            self.eval_word(word)?;
                        }
                    }
                }
                ParserState::Defining { name, body } => {
                    if name.is_empty() {
                        // 这是定义的名称
                        if let Ok(_) = word.parse::<Value>() {
                            return Err(Error::InvalidWord);
                        }
                        
                        self.state = ParserState::Defining { 
                            name: word.clone(), 
                            body: Vec::new() 
                        };
                    } else if word == ";" {
                        // 定义结束
                        let definition = body.clone();
                        self.words.insert(name.to_ascii_lowercase(), definition);
                        self.state = ParserState::Normal;
                    } else {
                        // 添加到定义体中
                        let mut new_body = body.clone();
                        match token {
                            ParsedToken::Number(value) => {
                                new_body.push(ParsedToken::Number(*value));
                            }
                            ParsedToken::Word(word) => {
                                new_body.push(ParsedToken::Word(word.clone()));
                            }
                        }
                        
                        self.state = ParserState::Defining { 
                            name: name.clone(), 
                            body: new_body 
                        };
                    }
                }
            }
        }
        
        // 检查是否仍在定义状态
        if matches!(self.state, ParserState::Defining { .. }) {
            return Err(Error::InvalidWord);
        }
        
        Ok(())
    }
    
    fn eval_word(&mut self, word: &str) -> Result {
        // 查找词汇定义
        if let Some(definition) = self.words.get(&word.to_ascii_lowercase()) {
            // 执行自定义词汇
            for token in definition {
                match token {
                    ParsedToken::Number(value) => {
                        self.stack.push(*value);
                    }
                    ParsedToken::Word(word) => {
                        self.eval_builtin(word)?;
                    }
                }
            }
            return Ok(());
        }
        
        // 执行内置命令
        self.eval_builtin(word)
    }
    
    fn eval_builtin(&mut self, word: &str) -> Result {
        match word.to_ascii_lowercase().as_str() {
            "+" => {
                if self.stack.len() < 2 {
                    return Err(Error::StackUnderflow);
                }
                let b = self.stack.pop().unwrap();
                let a = self.stack.pop().unwrap();
                self.stack.push(a + b);
                Ok(())
            }
            "-" => {
                if self.stack.len() < 2 {
                    return Err(Error::StackUnderflow);
                }
                let b = self.stack.pop().unwrap();
                let a = self.stack.pop().unwrap();
                self.stack.push(a - b);
                Ok(())
            }
            "*" => {
                if self.stack.len() < 2 {
                    return Err(Error::StackUnderflow);
                }
                let b = self.stack.pop().unwrap();
                let a = self.stack.pop().unwrap();
                self.stack.push(a * b);
                Ok(())
            }
            "/" => {
                if self.stack.len() < 2 {
                    return Err(Error::StackUnderflow);
                }
                let b = self.stack.pop().unwrap();
                let a = self.stack.pop().unwrap();
                if b == 0 {
                    return Err(Error::DivisionByZero);
                }
                self.stack.push(a / b);
                Ok(())
            }
            "dup" => {
                if self.stack.is_empty() {
                    return Err(Error::StackUnderflow);
                }
                let val = self.stack[self.stack.len() - 1];
                self.stack.push(val);
                Ok(())
            }
            "drop" => {
                if self.stack.is_empty() {
                    return Err(Error::StackUnderflow);
                }
                self.stack.pop();
                Ok(())
            }
            "swap" => {
                if self.stack.len() < 2 {
                    return Err(Error::StackUnderflow);
                }
                let len = self.stack.len();
                self.stack.swap(len - 1, len - 2);
                Ok(())
            }
            "over" => {
                if self.stack.len() < 2 {
                    return Err(Error::StackUnderflow);
                }
                let val = self.stack[self.stack.len() - 2];
                self.stack.push(val);
                Ok(())
            }
            _ => Err(Error::UnknownWord),
        }
    }
}

扩展功能

基于基础实现,我们可以添加更多功能:

use std::collections::HashMap;

pub type Value = i32;
pub type Result = std::result::Result<(), Error>;

#[derive(Debug, PartialEq)]
pub enum Error {
    DivisionByZero,
    StackUnderflow,
    UnknownWord,
    InvalidWord,
}

pub struct Forth {
    stack: Vec<Value>,
    words: HashMap<String, WordDefinition>,
    debug_mode: bool,
}

#[derive(Debug, Clone)]
enum WordDefinition {
    BuiltIn(fn(&mut Forth) -> Result),
    Custom(Vec<String>),
}

impl Forth {
    pub fn new() -> Forth {
        let mut forth = Forth {
            stack: Vec::with_capacity(32),
            words: HashMap::new(),
            debug_mode: false,
        };
        
        // 初始化内置词汇
        forth.init_builtins();
        forth
    }

    pub fn stack(&self) -> &[Value] {
        &self.stack
    }

    pub fn eval(&mut self, input: &str) -> Result {
        let tokens: Vec<&str> = input.split_whitespace().collect();
        self.eval_tokens(&tokens)
    }
    
    pub fn set_debug_mode(&mut self, enabled: bool) {
        self.debug_mode = enabled;
    }
    
    pub fn clear(&mut self) {
        self.stack.clear();
        // 保留用户定义的词汇
    }
    
    fn init_builtins(&mut self) {
        self.words.insert("+".to_string(), WordDefinition::BuiltIn(add));
        self.words.insert("-".to_string(), WordDefinition::BuiltIn(subtract));
        self.words.insert("*".to_string(), WordDefinition::BuiltIn(multiply));
        self.words.insert("/".to_string(), WordDefinition::BuiltIn(divide));
        self.words.insert("dup".to_string(), WordDefinition::BuiltIn(dup));
        self.words.insert("drop".to_string(), WordDefinition::BuiltIn(drop));
        self.words.insert("swap".to_string(), WordDefinition::BuiltIn(swap));
        self.words.insert("over".to_string(), WordDefinition::BuiltIn(over));
        self.words.insert(".".to_string(), WordDefinition::BuiltIn(print_top));
        self.words.insert(".s".to_string(), WordDefinition::BuiltIn(print_stack));
    }
    
    fn eval_tokens(&mut self, tokens: &[&str]) -> Result {
        let mut i = 0;
        while i < tokens.len() {
            let token = tokens[i];
            
            if self.debug_mode {
                println!("Executing: {}", token);
                println!("Stack: {:?}", self.stack);
            }
            
            // 检查是否是定义词汇的开始
            if token.eq_ignore_ascii_case(":") {
                i = self.define_word(&tokens[i+1..])?;
                continue;
            }
            
            // 检查是否是数字
            if let Ok(num) = token.parse::<Value>() {
                self.stack.push(num);
                i += 1;
                continue;
            }
            
            // 查找并执行词汇
            match self.words.get(&token.to_ascii_lowercase()) {
                Some(WordDefinition::BuiltIn(func)) => {
                    func(self)?;
                    i += 1;
                    continue;
                }
                Some(WordDefinition::Custom(definition)) => {
                    let def_tokens: Vec<&str> = definition.iter().map(|s| s.as_str()).collect();
                    self.eval_tokens(&def_tokens)?;
                    i += 1;
                    continue;
                }
                None => return Err(Error::UnknownWord),
            }
        }
        
        Ok(())
    }
    
    fn define_word(&mut self, tokens: &[&str]) -> Result<usize> {
        if tokens.is_empty() {
            return Err(Error::InvalidWord);
        }
        
        // 获取词汇名称
        let word_name = tokens[0];
        
        // 检查名称是否是数字
        if word_name.parse::<Value>().is_ok() {
            return Err(Error::InvalidWord);
        }
        
        // 查找分号
        let semicolon_pos = tokens.iter().position(|&t| t == ";").ok_or(Error::InvalidWord)?;
        
        // 获取定义内容
        let definition: Vec<String> = tokens[1..semicolon_pos].iter().map(|s| s.to_string()).collect();
        
        // 存储定义
        self.words.insert(word_name.to_ascii_lowercase(), WordDefinition::Custom(definition));
        
        // 返回下一个位置
        Ok(semicolon_pos + 1)
    }
}

// 内置函数实现
fn add(forth: &mut Forth) -> Result {
    if forth.stack.len() < 2 {
        return Err(Error::StackUnderflow);
    }
    let b = forth.stack.pop().unwrap();
    let a = forth.stack.pop().unwrap();
    forth.stack.push(a + b);
    Ok(())
}

fn subtract(forth: &mut Forth) -> Result {
    if forth.stack.len() < 2 {
        return Err(Error::StackUnderflow);
    }
    let b = forth.stack.pop().unwrap();
    let a = forth.stack.pop().unwrap();
    forth.stack.push(a - b);
    Ok(())
}

fn multiply(forth: &mut Forth) -> Result {
    if forth.stack.len() < 2 {
        return Err(Error::StackUnderflow);
    }
    let b = forth.stack.pop().unwrap();
    let a = forth.stack.pop().unwrap();
    forth.stack.push(a * b);
    Ok(())
}

fn divide(forth: &mut Forth) -> Result {
    if forth.stack.len() < 2 {
        return Err(Error::StackUnderflow);
    }
    let b = forth.stack.pop().unwrap();
    let a = forth.stack.pop().unwrap();
    if b == 0 {
        return Err(Error::DivisionByZero);
    }
    forth.stack.push(a / b);
    Ok(())
}

fn dup(forth: &mut Forth) -> Result {
    if forth.stack.is_empty() {
        return Err(Error::StackUnderflow);
    }
    let val = forth.stack[forth.stack.len() - 1];
    forth.stack.push(val);
    Ok(())
}

fn drop(forth: &mut Forth) -> Result {
    if forth.stack.is_empty() {
        return Err(Error::StackUnderflow);
    }
    forth.stack.pop();
    Ok(())
}

fn swap(forth: &mut Forth) -> Result {
    if forth.stack.len() < 2 {
        return Err(Error::StackUnderflow);
    }
    let len = forth.stack.len();
    forth.stack.swap(len - 1, len - 2);
    Ok(())
}

fn over(forth: &mut Forth) -> Result {
    if forth.stack.len() < 2 {
        return Err(Error::StackUnderflow);
    }
    let val = forth.stack[forth.stack.len() - 2];
    forth.stack.push(val);
    Ok(())
}

fn print_top(forth: &mut Forth) -> Result {
    if forth.stack.is_empty() {
        return Err(Error::StackUnderflow);
    }
    let val = forth.stack[forth.stack.len() - 1];
    print!("{} ", val);
    std::io::stdout().flush().unwrap();
    Ok(())
}

fn print_stack(forth: &mut Forth) -> Result {
    print!("Stack: ");
    for val in &forth.stack {
        print!("{} ", val);
    }
    println!();
    std::io::stdout().flush().unwrap();
    Ok(())
}

use std::io::Write;

实际应用场景

Forth在实际开发中有以下应用:

  1. 嵌入式系统:由于其小巧和高效,常用于嵌入式系统
  2. 实时系统:Forth的交互性和高效性使其适用于实时系统
  3. 教育工具:用于教授计算机科学和编程概念
  4. 系统编程:早期的操作系统和引导程序使用Forth
  5. 硬件控制:用于控制硬件设备和机器人
  6. 脚本语言:作为应用程序的扩展语言
  7. 游戏开发:某些游戏使用Forth作为脚本语言

算法复杂度分析

  1. 时间复杂度

    • 数字压栈:O(1)
    • 基本运算:O(1)
    • 用户定义词汇:O(n),其中n是词汇定义的长度
    • 整体执行:O(m),其中m是所有tokens的数量
  2. 空间复杂度

    • 栈空间:O(n),其中n是栈中元素数量
    • 词汇表:O(w×d),其中w是词汇数量,d是平均定义长度

与其他实现方式的比较

// 使用宏实现的简化版本
macro_rules! forth {
    ($($token:tt)*) => {
        {
            let mut stack = Vec::new();
            forth_impl!(stack, $($token)*);
            stack
        }
    };
}

macro_rules! forth_impl {
    // 数字
    ($stack:expr, $num:expr $($rest:tt)*) => {
        $stack.push($num);
        forth_impl!($stack, $($rest)*);
    };
    
    // 加法
    ($stack:expr, + $($rest:tt)*) => {
        let b = $stack.pop().unwrap();
        let a = $stack.pop().unwrap();
        $stack.push(a + b);
        forth_impl!($stack, $($rest)*);
    };
    
    // 结束
    ($stack:expr,) => {};
}

// 使用外部库实现
use nom::{IResult, branch::alt, combinator::map, character::complete::{digit1, space1}, multi::many0};

#[derive(Debug, Clone)]
enum ForthToken {
    Number(i32),
    Word(String),
}

fn parse_number(input: &str) -> IResult<&str, ForthToken> {
    map(digit1, |s: &str| {
        ForthToken::Number(s.parse().unwrap())
    })(input)
}

fn parse_word(input: &str) -> IResult<&str, ForthToken> {
    map(nom::character::complete::alpha1, |s: &str| {
        ForthToken::Word(s.to_string())
    })(input)
}

fn parse_token(input: &str) -> IResult<&str, ForthToken> {
    alt((parse_number, parse_word))(input)
}

// 函数式风格实现
use std::collections::HashMap;

pub struct ForthFunctional {
    stack: Vec<i32>,
    definitions: HashMap<String, Box<dyn Fn(&mut Vec<i32>) -> Result<(), Error>>>,
}

impl ForthFunctional {
    pub fn new() -> Self {
        let mut forth = ForthFunctional {
            stack: Vec::new(),
            definitions: HashMap::new(),
        };
        
        forth.definitions.insert("+".to_string(), Box::new(|stack: &mut Vec<i32>| {
            if stack.len() < 2 {
                return Err(Error::StackUnderflow);
            }
            let b = stack.pop().unwrap();
            let a = stack.pop().unwrap();
            stack.push(a + b);
            Ok(())
        }));
        
        forth
    }
    
    pub fn eval(&mut self, input: &str) -> Result<()> {
        for token in input.split_whitespace() {
            if let Ok(num) = token.parse::<i32>() {
                self.stack.push(num);
            } else if let Some(func) = self.definitions.get(token) {
                func(&mut self.stack)?;
            } else {
                return Err(Error::UnknownWord);
            }
        }
        Ok(())
    }
}

总结

通过 forth 练习,我们学到了:

  1. 解释器设计:掌握了构建简单解释器的基本原理
  2. 栈操作:理解了基于栈的语言如何工作
  3. 词法分析:学会了如何解析输入文本
  4. 错误处理:熟练处理各种边界情况和错误
  5. 数据结构:了解了如何使用HashMap存储词汇定义
  6. 内存管理:学会了如何避免内存分配攻击
  7. 递归处理:理解了如何处理递归定义

这些技能在实际开发中非常有用,特别是在构建DSL(领域特定语言)、脚本引擎和配置处理器时。Forth虽然是一个古老的语言,但其设计思想在现代编程中仍然具有重要价值。通过这个练习,我们不仅实现了Forth解释器,还深入理解了解释器的工作原理。

通过这个练习,我们也看到了Rust在构建安全且高效的语言解释器方面的强大能力。这种结合了安全性和性能的语言特性正是Rust的魅力所在。

Logo

AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念,把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起,为开发者提供从开发、训练到部署的一站式体验。

更多推荐