Advanced Compiler 作業三 r91922010 蔡方培 Question: What are the so-called "action table" and "goto table" of GCC? (Hint: Try to find them in c-parse.c.) Answer: 1. As we know, the parser of GCC is automatically generated by Bison, which is a YACC-compatible compiler compiler. According to Bison's homepage, "Bison is a general-purpose parser generator that converts a grammar description for an LALR context-free grammar into a C program to parse that grammar." Looking inside the GCC source tree, we have found that the grammar rules for the C programming language is contained in c-parse.y. After the bootstrapping build, c-parse.y is converted to c-parse.c. Since GCC uses Bison to convert c-parse.y to c-parse.c, we couldn't be familiar with the internals of the parser used by GCC without looking deeper into the general actions taken by Bison. So the first step is to read the source code for Bison. 2. In Bison version 1.28 there are only 10 C header files, and 23 C source files in src/ directory. We then have a closer look at output.c, in which the codes used by Bison to write its output in the C programming language is located. 3. The functions inside output.c are: action_row (), default_goto (), free_itemsets (), free_reductions (), free_shifts (), goto_actions (), matching_state (), output (), output_actions (), output_base (), output_check (), output_defines (), output_gram (), output_headers (), output_parser (), output_program (), output_rule_data (), output_stos (), output_table (), output_token_translations (), output_trailers (), pack_table (), pack_vector (), save_column (), save_row (), sort_actions (), token_actions () 4. The tables (both action and goto), parser codes, output headers ,and output trailers are all written via these functions. Moreover, there are important comments provided: ///////////////////////////////////////////////////////////////////////////////////// output.c (60): yydefact[s] = default rule to reduce with in state s, when yytable doesn't specify something else to do. Zero means the default is an error. yydefgoto[i] = default state to go to after a reduction of a rule that generates variable ntokens + i, except when yytable specifies something else to do. yypact[s] = index in yytable of the portion describing state s. The lookahead token's type is used to index that portion to find out what to do. If the value in yytable is positive, we shift the token and go to that state. If the value is negative, it is minus a rule number to reduce by. If the value is zero, the default action from yydefact[s] is used. yypgoto[i] = the index in yytable of the portion describing what to do after reducing a rule that derives variable i + ntokens. This portion is indexed by the parser state number, s, as of before the text for this nonterminal was read. The value from yytable is the state to go to if the corresponding value in yycheck is s. yytable = a vector filled with portions for different uses, found via yypact and yypgoto. ///////////////////////////////////////////////////////////////////////////////////// 5. Now we trace into c-parse.c to see how the tables work. (以下只扼要列出) c-parse.c(2239): (1). /* First try to decide what to do without reference to lookahead token. */ yyn = yypact[yystate]; if (yyn == YYFLAG) goto yydefault; // 如果不用看 lookahead token,就跳到 yydefault 去了 [動作 (3).]。 (2). // 需要看 lookahead token 的話… /* Not known => get a lookahead token if don't already have one. */ // …就做下面這些事 (2-0). /////////從 YYLEX 讀進一個 token//////////// if (yychar == YYEMPTY) { YYDPRINTF ((stderr, "Reading a token: ")); yychar = YYLEX; } (2-1). ////////////設定 yychar1 和 yyn//////////// /* Convert token to internal form (in yychar1) for indexing tables with */ if (yychar <= 0) /* This means end of input. */ { yychar1 = 0; yychar = YYEOF; /* Don't call YYLEX any more */ YYDPRINTF ((stderr, "Now at end of input.\n")); } else { yychar1 = YYTRANSLATE (yychar); } yyn += yychar1; if (yyn < 0 || yyn > YYLAST || yycheck[yyn] != yychar1) goto yydefault; yyn = yytable[yyn]; <======================用到 yytable[] 了。 (2-2). c-parse.c(2292): /* yyn is what to do for this token type in this state. Negative => reduce, -yyn is rule number. Positive => shift, yyn is new state. New state is final state => don't bother to shift, just return success. 0, or most negative number => error. */ // 略 // c-parse.c(2330): yystate = yyn; (3). c-parse.c(2334): <<不必看 lookahead token>> /*-----------------------------------------------------------. | yydefault -- do the default action for the current state. | `-----------------------------------------------------------*/ yydefault: yyn = yydefact[yystate]; if (yyn == 0) goto yyerrlab; goto yyreduce; //////////////////////////////////////////////////// /////////進入一個巨大的 switch-case。/////////////// //////////////////////////////////////////////////// c-parse.c(4621): /* Now `shift' the result of the reduction. Determine what state that goes to, based on the state we popped back to and the rule number reduced by. */ yyn = yyr1[yyn]; yystate = yypgoto[yyn - YYNTBASE] + *yyssp; if (yystate >= 0 && yystate <= YYLAST && yycheck[yystate] == *yyssp) yystate = yytable[yystate]; else yystate = yydefgoto[yyn - YYNTBASE]; goto yynewstate; //////////////////////////////////////////////////////////////////// //////////error handling or error recovering //////////////////// ////////////////////////////////////////////////////////////////////