ORC FAQs

ORC is based on Pro64. Major changes have been done in the code generator and the profiling framework. We intend to continue improving ORC in both features and performance for Itanium processor and its follow-ups.

Will there always be two separate sources, one ORC, the other Pro64?

Currently Pro64 source is hosted by the open64-user group, also found in SourceForge. Release 1.0 has been merged into Pro64 source. It has been incorporated into Open64 0.14 release. More merge has been discussed but no concrete proposal at this point.

How well were the compilers tested?

For 1.0 release, a number of test suites for C, C++, and Fortran90 were used, as well as a number of open source programs. Tests were run at -O2, and -O3 levels of optimization. The tests were run on simulators and Itanium systems. Also a number of large (>400,000 source lines) applications were used, thanks to University of Alberta, TsingHua University and University of Minnesota.

For 1.1 release, the release is not a full scale release. Hence, it has not gone through the same rigorous testing as 1.0, yet we have done our best to ensure the quality of the 1.1 release. The same test suites and open source programs were used during our testing, with more optimization combinations (O2, O3, IPA, profiling) than that of 1.0. No external organizations were involved for this release. The same large application used during 1.0 testing were not used for 1.1 release either.

For 2.0 release, we have gone through similar rigorous testing as for 1.0 release. Most of the testing is done with the cross compiler. The native compiler has gone through less testing due to lack of resources.

For 2.1 release, we have gone through similar rigorous testing as for 2.0 release. And most of the testing is done with both cross compiler and native compiler.

Can I use ORC to build the Linux/IA-64 kernel?

we have not attempted to build the Linux/IA64 kernel, and we have no plan to do so. Hence, it is safe to say ORC cannot build Linux. However, the following instruction is extracted from Readme file of 0.13 Pro64(tm) release:

Edit the Linux/Makefile and replace the definitions for CROSS_COMPILE, CC, and CFLAGS with the following:

CROSS_COMPILE=
CC = orcc -D__KERNEL__ -I$(HPATH) -D__LP64__ CFLAGS=$(CPPFLAGS) -Wall -Wstrict-prototypes -O0 -fomit-frame-pointer -ffixed-r15 -CG:emit_unwind_info=off -Wf,-O2 -PHASE:p -D__OPTIMIZE -OPT:Olimit=5000

What is the status of native ORC compiler on IA-64 Itanium machines?

For 1.1 release, we have included a binary that is built natively. The binary is in the tar file orc-1.1.0.native.tgz. This compiler, although built -O0, still outperforms that of a cross built compiler running on an Itanium machine. This binary have not been tested as thoroughly as the cross built compiler.

For 2.0 release, we have included an optimized built compiler (-O2 with all internal consistency checks removed, except that WOPT.so and BE.so are built at -O0). The binary is in the tar file orc-2.0-bin.tar.gz

You should see large compile time improvement from this compiler. This native compiler are not tested as thoroughly as the cross built compiler

Are there tools that come with the compilers?

We have also included two new tools for performance tuning and analysis - "hpe.pl", a hot path enumeration tool and a cycle counting tool, "cycount.pl" . Details are described below.

What are the differences between release 1.0 and 1.1?

Three major focuses in 1.1: IPA functional, performance improvement and full profile feedback functional. To turn on InterProcedural Optimization, add -IPA in your compile line. You still need to specify -O0, -O2 or -O3 besides that -IPA flag. Performance has also improved compared with 1.0. We have measured an average 10%+ performance gain compiling with 1.1 release, under the same optimization flags. Two types of profile/feedback is now fully supported: Whirl level profiling and code generation level profiling.

There is also change in default optimization behavior. With 1.1 release alias analysis (memory disambiguation) defaults to assume user code is ANSI-type compliant. Also, compiled code is assumed to become part of main executable, where we assume functions defined in an main executable (non-dso) will not be preempted. To get back to 1.0 behavior for both, use -OPT:Olegacy.

What are the differences between release 1.1 and 2.0?

The major focus in 2.0 has been performance. Although performance is not the primary goal of this Open Research Compiler, it is vital for researchers who base their work on this compiler to know that the binary produced by this compiler is very competitive. Hence, their work is also trustworthy and sound. The concentration for ORC has always been in the integer side. This is mainly due to insufficient resource. We have not studied, nor conduct measurement for the relative performance of floating point code for ORC compared to other research or production compilers. Given that ORC is based on SGI's product compiler, heavy on scientific applications, we have reason to expect that it has the capability to be among the best, should there be resource put onto it. In the integer side, we are pleased to say that to the best of our knowledge, the code produced by this compiler is very competitive, to all existing IA64 compilers, at optimization level -O2, -O3 and -IPA, with and without feedback information.

A major performance improvement of 2.0 over 1.1 is that of C++ code quality. With 2.0, we have spent a lot of efforts to ensure C++ style code can achieve similar kind of speed up people expects to achieve when they write C code.

Besides performance enhancement, 2.0 release also includes Itanium-2 micro-architecture support. A new Itanium-2 machine model is added into the compiler. Now the compiler optimizations such as instruction scheduling and bundling are Itanium-2 aware. The option to turn on Itanium-2 mode is -TARG:platform=itanium2 or -itanium2. The default for 2.0 release is Itanium1 machine. Performance investigation and enhancements specific to Itanium-2 will be in the next release.

What are the differences between release 2.0 and 2.1?

The major focus in 2.1 has been the performance on Itanium-2 systems. The 2.0 release was already capable of generating code based on Itanium-2 machine model, but the performance for 2.0 was on Itanium rather than Itanium-2. After extensive work, we are pleased to say that to the best of our knowledge, the code generated for both Itanium-2 and Itanium by ORC is very competitive to all existing IA64 compilers, at optimization level -O2, -O3 and -IPA, with and without feedback information.

In addition to all features in 2.0, the 2.1 release has added or improved a number of features: cache optimizations, loop-invariant code motion in code generator, switch optimization, multi-way branches and renaming in instruction scheduling, SWP, unrolling, etc. In addition, we provided a new inter-procedural framework to balance RSE traffic and explicit spills. A dynamic instrumentation tool, called Pin, is also available along with 2.1.

The 2.1 release is also more robust comparing with the 2.0 release with a number of defects fixed in both cross and native build environments.

By default, ORC 2.1 generates code for Itanium-2 systems. It also can generate code for Itanium by adding the option "-itanium".

How do one generate best optimized code?

Optimization results will vary according to your specific program/application. Different optimization phases are involved at various optimization levels: -O2 goes through the global optimizer and code generator, -O3 will go through the loop level optimizer in addition, -IPA will invoke the inter-procedural optimizer. Hence, in general, -O3 code should run faster than -O2. Also, -O3 -IPA will outperform that of -O3 generated code. -O3 also allow the compiler to perform more aggressive optimization that may cause differences in results also (such as allow wrap around to be safe, allow change in floating operation ordering,...). Occasionally, when the compiler sees that the procedures it's compiling is too big, it might choose to turn off optimization to curtail compile time (you will see a warning message when it happens). You can use the option -OPT:Olimit=0 to ensure the compiler not to turn off optimizations.

How to use Whirl and code generation phase profile/feedback?

Profile/feedback is based on a compile time instrumentation, training run and collection of feedback data followed by another compilation/optimization based on feedback data. There are two feedback phases in ORC implemented. Each phase will add one extra compilation pass. The two phases are Whirl feedback phase and code generation feedback phase.Whirl feedback will help primarily IPA for better inlining and other optimizations at Whirl level. Please note that currently Whirl profiling is only enabled at -O3. Code generation phase feedback will help optimizations during code generation. To utilize both Whirl profiling and code generation phase profiling, the entire process then becomes a three pass compilation/run process. First pass will instrument Whirl.Second pass will read feedback data due to Whirl instrumentation, at the same time, do instrumentation in code generation time. Third pass will read feedback data generated from instrumentation of Whirl as well as from code generation time.

The options to do instrumentation are:
-fb_create feed_back_file_name -fb_type={1, 2, 4, 8} -fb_phase={0, 4}
The options to use feedback data are:
-fb_opt feed_back_file_name

where   fb_type = 1 : whirl; 2 : cg edge; 4 : cg value; 8: cg stride
   fb_phase = 0 : before very high whirl optimizer (vho), 4 : before cg region formation
directory specifies where the feedback data file will be produced. Currently, only the above types and phases are supported.

e.g. To use feedback data of whirl profiling and edge profiling, one needs to go through the following,

        compile with options: -fb_create feed_back_file_name -fb_type=1 -fb_phase=0
   run binary
compile with options: -fb_opt feed_back_file_name -fb_create feed_back_file_name -fb_type=2 -fp_phase=4 -O3
   run binary
   compile with options: -fb_opt feed_back_file_name
final run of binary

How to use the compilers to achieve peak performance?

To use the compilers to generate peak performance binary code, you need to turn on all optimizations in ORC (including inter-procedural analysis (IPA), inlinling, stride pre-fetching, procedure reordering, etc.), as well as the various profiling.The process consists of three phases, as described below:

1. The first phase is Whirl profiling instrumentation. An example of the compiler options used is shown below:

"-O3 -ipa -fb_create fb.mid -fb_type=1 -fb_phase=0 "

After linkage, run the application with "train" input data set. It'll generate whirl feedback files with names like fb.mid.instr0.aginqw.

2. The second phase involves whirl profiling annotation, stride profiling instrumentation and edge profiling instrumentation, e.g.,

"-O3 -ipa -fb_opt fb.mid -fb_create fb.mid -fb_type=10 -fb_phase=4"

After linkage, run the application with "train" input data set. It'll generate feedback files with names like fb.mid.instr0.01324, which

combine the information obtained by edge profiling and stride profiling.

3. The last phase uses all the profiling information collected and turns on all optimizations, e.g.,

"-O3 -ipa -fb_opt fb.mid"

For further optimization opportunities, you may use -OPT:Olimit=0 as described above.

Are there documents that come with the compilers?

We have included in the release some documents related to our added features and optimizations. More will be coming. The documents will mostly be in the code generator area, related to our changes/additions. For documents related to Pro64, please refer to the publication list. Those marked with * in the list reflect actual implementation in various components of the compilers. We have also done three tutorials in the past two years. Two in Micro34 and Micro35, one in PACT02. Each tutorial covers different aspects of the compiler. You can find the tutorial material in the ORC site in sourceforge.

Interactions of ORC and IA-64 Linux Questions

How do I install ORC compilers on IA-64 Linux systems?

The binaries are packaged in tar-ball form. Just untar the file downloaded, run the script "install-bin.sh". Help manual of this script is:

Usage: install.sh [-hHnc] [-t toolroot] [-l native-archive-root]

              -h --help              give this help
              -H --hierarchy      print orcc binary hierarchy
              -n --native           install native components
              -c --cross             install cross components
              -t --toolroot TR   use directory <TR> as the root of orcc binary
              -l --libroot LR    use directory <LR> as the root of native archives
              -q --prerequire     print the pre-requirement.
              -e --example         print example installation session

Invoke "install.sh -H" to get better unstanding of "toolroot" and "libroot"

To install compiler on IA-64 Linux system, simplily by invoking ./install -n.
The environment variables $TOOLROOT will affect installation as well as orcc's run-time behavior. $TOOLROOT refers to root for the binary hierarchy of ORC suite. For example, if this variables is not set, the full path of orcc is /usr/bin/orcc (it obviously requires root proprietary), otherwise, ${TOOLROOT}/usr/bin/orcc.

If $TOOLROOT is set, you need to add $TOOLROOT/usr/bin/ to $PATH.

Following are 2 installation examples:

e.g 1
Assume the account name is joesmith and the IA-64 host name is uranus, the login shell of joesmith is GNU BASH.

[joesmith@uranus joesmith]$vi $HOME/.bashrc # set $TOOLROOT and $PATH

[joesmith@uranus joesmith]$cat $HOME/.bashrc | grep "TOOLROOT\|PATH"
export TOOLROOT=$HOME/music_and_photo
export PATH=$PATH: $TOOLROOT/usr/bin
[joesmith@uranus joesmith]$source $HOME/.bashrc
[joesmith@uranus joesmith]$tar zxvf orc-2.0-bin.tar.gz
[joesmith@uranus joesmith]$cd orc-2.0-bin; ./install.sh -n

e.g 2
Assume the account name is joesmith and the IA-64 host name is uranus, the login shell of joesmith is GNU BASH.

[joesmith@uranus joesmith]source $HOME/.bashrc
[joesmith@uranus joesmith]tar zxvf orc-2.0-bin.tar.gz
[joesmith@uranus joesmith]cd orc-2.0-bin; ./install.sh -n -t $HOME/music_and_photo
[joesmith@uranus joesmith]vi $HOME/.bashrc # set $TOOLROOT and $PATH

[joesmith@uranus joesmith]cat $HOME/.bashrc | grep "TOOLROOT\|PATH"
export TOOLROOT=$HOME/music_and_photo
export PATH=$PATH: $TOOLROOT/usr/bin

Please note that the binary in orc-2.0-bin.tar.gz is prepared for RedHat 7.2 Linux system. Some users experienced certain problems when installing RedHat 7.2 on Itanium I machines. Our test shows the compiler works fine on successfully installed machines.

How do I install ORC compilers on IA32 Redhat 6.2 Linux systems?

Sometimes it's desirable to install the compilers on IA32 machines and do cross compile on a bare Linux 6.2 box. In this case you can use NUE, an Itanium simulation environment, to get a "virtual native IA64 system" on IA32.

    1. Download NUE from HP website and install it.
2. After installing NUE, you might need to re-link following 2 folders (depending on your
        NUE version number) to make orcc also workable outside NUE.

        if /nue/usr/include/asm is a symbolic link then change it to /nue/usr/src/linux/include/asm, by:

cd /nue/usr/include

rm asm
ln -s /nue/usr/src/linux/include/asm asm

if /nue/usr/include/linux is a symbolic link then change it to /nue/usr/src/linux/include/linux/, by:

                cd /nue/usr/include
                rm linux
                ln -s /nue/usr/src/linux/include/linux/ linux

3. Download gcc (we only used 2.95.2 release and 2.96), and then
configure --prefix=/usr; make all install

            and then do a gcc -v to make sure you have the right version.

   4. You don't need to build the binary at this point, just install that from the download.

    To build cross environment, keep in mind a number of things:
       1. The NUE environment is a simulated IA64 system on top of the IA32 Linux box. However, we do not need to enter NUE to perform cross compilation.

        2. ORC relies on NUE's cpp (C PreProcessing) functionality as well as its native header files. Therefore NUE is an essential setup for cross compilation.
       3. By entering "nue", you are switched to a simulated native box. But you need not do so for cross compilation.
        4. From the IA32 Linux box, your file structure is really under /nue.
   5. To do cross compile, do ${TOOLROOT}/usr/ia64-orc-linux/bin/orcc file.c. NOTE: The path for cross orcc is a little bit different from that of the native one, i.e., "ia64-orc-linux" is interposed between "/usr" and "/bin"

How do I install ORC compilers on IA32 Redhat 7.1 Linux systems?

orc-2.0-bin.tar.gz is not workable on this platform due to the shared object compatibility problem. A cross compiler on a 7.1 system is possible, you need to build that from the source tree.

How do I install ORC compilers on IA32 Redhat 7.2 Linux systems?

For IA32 system with Redhat 7.2 installed, the ORC compiler will be used as a cross compiler. We have provided a script INSTALL.cross that will install the compiler binaries. The binaries are packaged in tar-ball form. Just untar the file downloaded, run the script "install.sh -c". We recommend the users install NUE 1.1 on RedHat 7.2 or higher. Since NUE 1.0 does not run on this system, you will be doing strictly cross compilation in this case.

How do I make a native built compiler?

We have provided a sample makefile (Make.native) in the same directory as Make.cross. Simply do a make with Make.native.

How do I install multiple versions of ORC compilers on IA-64 systems?

Sometimes it's desirable to install multiple versions of the compilers on a machine, for debugging or experimental purposes. Assuming you have two different versions of ORC compiler binaries, installed in different directories, simply set the environment variables TOOLROOT and COMP_TARGET_ROOT to point to the directories for the desired binary and archives and it should work.

How do I rebuild the Fortran front-end?

The Fortran front-end (i.e., mfef90) will not be built by default. If you want to rebuild it, you need to install the ORC binary first.

Then do the following:

cd ${ORC_SRC_ROOT}/src/osprey1.0/targia64_ia64_nodebug/crayf90/sgi; make BUILD_COMPILER=SGI

If I do cross compile on an IA32 machine, how do I produce IA-64 binaries?

In order to pick up the right set of archives or dynamic shared libraries, the simplest way is to produce object files in cross compilation, copy the objects to the target IA64 systems and do the final link there to produce the binary, as follows:

1. At IA32 side:

[IA32]% orcc –c hello.c –o hello.o

[IA32]% ftp IA64 (transfer hello.o to an Itanium machine)

2. At IA64 side:

[IA64]% orcc hello.o –o hello

[IA64]% ../hello

hello world!

ORC 2.0 release allows you to produce IA64 binary on IA32 machines directly. Simply follow the install instructions to set the environment variables and install the pre-built native archives properly.

Reporting bugs and problems

Where do I report bugs?

We cannot promise to fix every bug reported. If you think that you have uncovered a bug in the ORC compilers, you can post that to ipf-orc-support@lists.sourceforge.net. We will do our best if the bug is found to be ORC specific. For Pro64 specific bugs, we encourage you to post the problem in open64-devel@lists.sourceforge.net.

How do I decide if the bug is ORC specific?

Our changes are primarily in the code generator area. To find out if the problem you have is specific to code due to our changes:

If the bug persists at –O0 level, it is likely Pro64 specific.

Otherwise, add “-CG:opt=0” and if the bug persists, it is most likely not in the code generator area

Otherwise, add “-ORC:=off” and if the bug goes away, the bug is ORC related.

How do I report bugs?

You can help us quickly turn around the fixes by doing some up front work.

Minimize the test cases. Provide a fully preprocessed file (-E option) to avoid dependence on specific header files.

Give us a full command line description to compile the tests.

Tell us how to run your program if the symptom occurred at runtime. If the program needs data input, please append it as well.

You can even help by utilizing the triage tool we provided to narrow down the optimization, procedure and BB/region/instruction that is showing the problem.

Using tools that come with the compiler

How to use hot path enumeration tool?

The hot path enumeration tool (hpe.pl) can be used to enumerate hot paths in a PU. It is suitable for performance analysis.

It works on an assembly file (generated by ORC).

usage : hpe.pl [options] file
Options:
-h --help                   Display this information
-pt --prob_threshold <num>   A path probability threshold to suppress less executed paths. Default is 0.4
-fr --freq_ratio     <num>   A path frequency ratio to suppress less executed paths. Namely, if a path frequency is <num> times less than that of the hottest path in the same PU, we won't output it. Default is 100
-pu --pu_name     <str>      PU name to process. Default is null, to process all PUs
-i --insn_ptn       <ptn>     An instruction(a regular expression) to statistic along the hot paths
-d --davinci                Produce DaVinci files for PUs
-v --verbose                   Give a verbose introduction about this tool
file                                        An assembly file generated by ORC

How to use cycle counting tool?

The cycle counting tool(cycount.pl) can be used to count the cycle number of hot functions in SPEC2KINT benchmarks. It's suitable for statically investigating ORC performance degradation. It works on SPEC2KINT assembly files generated by ORC.

Usage: cycount.pl [options] benchmarks
Options:
-h --help                   Display this information
-d --spec_home <dir>   Set <dir> as the SPEC2KINT home. Default is $HOME/spec2000
-l --log        <filename> Give a log file name. Default is STDOUT
benchmarks                   SPEC2KINT benchmarks. Type 'int' for all 12 SPEC2KINT benchmarks

Adding new optimizations or phases

Are there any coding conventions?

Please find the coding convention guideline here. This guideline is designed to stay close to the Pro64 coding style and convention. You can also find documentation on how the compiler handles memory management and various other issues.

How do I add my own tracing options?

One can have either summarized or detailed trace of any specific optimization performed. The traces are dumped to the file xxx.t where xxx.o is the desired output object file. To add your own tracing options, do:

1. Make sure what you want to add is a phase-specific trace flag. Assume your phase is XYZ (a corresponding number xyz can be found in osprey1.0/common/util/tracing.h). It will be used like this:

-Wb,-ttXYZ,0xnn

( -Wb,-ttxyz,0xnn is the same )

2. Modify xyz_defs.h to define your flags

(the minor number above, nn. The major number is mm. ).

Please add them in the end.

3. Your code using these flags should be like:

if (Get_Trace(TP_XYZ, YOUR_FLAG)) {

YOUR_CLASS.Print(TFile);

}

where Tfile is the file handle of the trace file.

4. common/util/tracing.h is the file of interest to add tracing.

How do I change the driver and related files when I add an optimization?

When you add your phase in the compilers, the following issues are important:

1. Suppose you want to add a phase inside the code generator

2. All files you want to add should have lower-cased file names, with words separated with underscore, such as: if_conv.cxx.

3. You should define (declare) a flag in cg_flags.cxx(.h) controlling whether your phase should be run.

4. You should add an element in arrary Options_IPFEC in cgdriver.cxx describing your flags.

There are already several flags there, you can take them as examples.

5. Your flags should have prefix: IPFEC_Enable_XXX, such as IPFEC_Enable_If_Conversion.

At least, they should have prefix: IPFEC_XXX.

6. For other components, they are all similar. The files common/com/xxx_config.h and common/util/flags.{h,c} are files of interest.

How do I effectively use gdb for debugging the compiler?

Pre-requirements:

1. Your gcc version needs to be 2.95.2 or 2.96

2. To enable debugging, you can build the entire compiler with

make BUILD_OPTIMIZE=DEBUG

Or you can build individual components (such as wopt.so) by going into the corresponding targia_xxx directories, and build with "BUILD_OPTIMIZE=DEBUG".

To single step inside the backend components (we'll show the CG portion, other components are similar):

1. Remember to set the environment variable LD_LIBRARY_PATH.

For cross compiler this variable should be set as
export LD_LIBRARY_PATH=${TOOROOT}/usr/ia64-orc-linux/lib/gcc-lib/ia64-orc-linux/2.0:$LD_LIBRARY_PATH

For native compiler it should be

export LD_LIBRARY_PATH=${TOOROOT}/usr/lib/gcc-lib/ia64-orc-linux/2.0:$LD_LIBRARY_PATH

2. First run orcc using options " ... -show -keep ... ". It will keep some needed intermediate files.

   orcc -show -keep kk16.c

   It will print some info like this:

/home/xyz/orc-2.0/usr/ia64-orc-linux/altbin/gcc -D_LANGUAGE_C -D_SGI_COMPILER_VERSION=10. -D__host_ia32 -D__INLINE_INTRINSICS -D_LP64 -D__ia64=1 kk16.c -E > kk16.i

/usr/ia64-orc-linux/lib/gcc-lib/ia64-orc-linux/2.0/gfec -O0 -dx -quiet -dumpbase kk16.c kk16.i -o kk16.B

/usr/ia64-orc-linux/lib/gcc-lib/ia64-orc-linux/2.0/be -PHASE:c -G8 -O0 -TENV:PIC -m1 -INTERNAL:return_val=on -INTERNAL:mldid_mstid=on -INTERNAL:return_info=on -show -TARG:abi=i64 -LANG:=ansi_c -fB,kk16.B -s -fs,kk16.s kk16.c

Compiling kk16.c (kk16.B) -- Back End

Compiling main(0)

/usr/ia64-orc-linux/bin/as kk16.s -o kk16.o

/usr/ia64-orc-linux/bin/ld -dynamic-linker /lib/ld-linux-ia64.so.2 -rpath-link ...

3. Now we can use the input file and options for BE as the input to gdb when debugging BE:

gdb /usr/ia64-orc-linux/lib/gcc-lib/ia64-orc-linux/2.0/be

(gdb)break be_debug

(gdb)run -PHASE:c -G8 -O0 -TENV:PIC -m1 -INTERNAL:return_val=on -INTERNAL:mldid_mstid=on -INTERNAL:return_info=on -show -TARG:abi=i64 -LANG:=ansi_c -fB,kk16.B -s -fs,kk16.s kk16.c

4. After gdb stopped, set another breakpoint and trace into the component that you are interested in.

How do I make sure my new phase is not the cause of unnecessary long compile time?

All optimization phases are timed by the compiler for ease of measurement. Please make your phase timed also.

It's simple and you only need to add two lines in your code, one after entering your phase, for example:

Start_Timer(T_Aurora_LOCS_CU);

and another before leaving your phase, for example:

Stop_Timer(T_Aurora_LOCS_CU);

and remember to include timing.h in your code.

Naturally, you'll need to define your timing phase, please look at file be/com/timing.h.

After re-compiling, you can get timing info using -Wb,-ti1.