Porting of programs from VAX to AXP.
====================================

I)    To get optimal performance from the AXP architecture.
II)   Features/problems in the new compilers on AXP.
III)  Porting problems VAX -> AXP.

I)  To get optimal performance from the AXP architecture.
=========================================================

A)  proper alignment of data
B)  choosing algorithms to maximize benefit from cache
C)  choosing programming style to allow maximal pipelining

II)  Features/problems in the new compilers on AXP.
===================================================

A)  C
B)  FORTRAN
C)  PASCAL

III)  Porting problems VAX -> AXP.
==================================

A)  problems with system services
B)  problems with macro32
C)  architectural problems
D)  problems with undocumented features
E)  MACRO64 assembler
F)  VEST

Illustration of the importance of proper data alignment and cache.
==================================================================

Performance chart:

                   VAX 4200                    DEC 3400
                   (rated at 5 VUPS)           (rated at 110 Specmark)

                   aligned    not aligned      aligned      not aligned
                   4 byte     4 byte           8 byte       8 byte

small data           1095        365            26665         2300
(64KB)

medium data          1040        350            10800         1200
(3.2MB)

large data            440        240             7700         1160
(16MB)

Numbers=number of REAL*8 initialized pr. millisecond CPU time (not real time).

Conclusions:
  - REAL*8 must be longword aligned on VAX but quadword aligned on AXP
  - performance increase with correct alignment is a factor 2-3 on VAX
    but a factor 5-10 on AXP
  - the performance increase in keeping the data in cache on AXP is
    a factor 2

TEST_ALIGN.FOR
==============

      PROGRAM TEST_ALIGN
      INTEGER*4 I,J,K
      INTEGER*4 SIZ(3)
      BYTE DUMMY(9)
      REAL*8 X(2000001)
      EQUIVALENCE (DUMMY,X)
      DATA SIZ/8000,400000,2000000/
      DO 400 K=1,3
        DO 200 J=1,100
          DO 100 I=1,SIZ(K)+1
            X(I)=I
100       CONTINUE
          CALL FOOL(X)
200     CONTINUE
        DO 300 I=1,9
          CALL TEST(DUMMY(I),SIZ(K))
300     CONTINUE
400   CONTINUE
      END
C
      SUBROUTINE TEST(X,SIZ)
      INTEGER*4 SIZ
      REAL*8 X(*)
      INTEGER*4 I,J,T
      INTEGER*4 PAS$CLOCK2
      T=PAS$CLOCK2()
      DO 200 J=1,100
        DO 100 I=1,SIZ
          X(I)=I
100     CONTINUE
        CALL FOOL(X)
200   CONTINUE
      WRITE(6,'(1X,F10.3)') (100.0*SIZ)/(PAS$CLOCK2()-T)
      RETURN
      END
C
      SUBROUTINE FOOL
      RETURN
      END

Compilers and alignment.
========================

FORTRAN
-------

(FORTRAN 6.1 for AXP)

The alignment within COMMON blocks and STRUCTUREs are determined by the
ALIGNMENT quailfier (default is /ALIGNMENT=(COMMONS=PACKED,RECORDS=NATURAL)).
The WARNING qualifier controls whether warnings are issued for non
natural aligned variables in COMMON blocks (default is /WARNING=ALIGNMENT).

Other variables are always natural aligned.

PASCAL
------

(PASCAL 5.0 for AXP)

The alignment within RECORDs are determined by the ALIGNMENT qualifier
(default is /ALIGNMENT=ALPHA_AXP, which use natural alignment - on VAX
the default is /ALIGNMENT=VAX, which use packed alignment).

Other variables are always natural aligned unless another alignment is
explicit defined in the code as an attribute for the variable.

C
-

(DEC C 1.3 for AXP)

The alignment within STRUCTs are determined by the MEMBER_ALIGNMENT
qualifier (default is /MEMMBER_ALIGNMENT, which use natural alignment -
on VAX the default is /NOMEMBER_ALIGN, which use packed alignment).
The qualifier can be override temporarrily with the #pragma member_alignment
and #pragma nomember_alignment directives.

Other variables are always natural aligned.

Points
------

1)  Natural alignment is necesarry for optimal performance (see previous
    example).
2)  The AXP compilers "like" natural alignment, while the VAX compilers
    "like" packed alignment, which makes identical records have
    non-identical sizes on the two architectures.
3)  Be carefull when calling f.ex. MACRO32 routines with natural aligned
    records.

ALIGN_SIZE.C
============

#include <stdio.h>

struct example {char a; double b;};

main()
{
   struct example test;
   printf("%d\n",sizeof(test));
}

Maximizing benefit from pipelining.
===================================

The AXP are a highly pipelined architecture meaning that it benefits
greatly from optimization techniques such as loop unrolling.

Example:

                           VAX 4200               DEC 3400
                      (rated at 5 VUPS)     (rated at 110 Specmark)
                                        /opt=unrol:1   /opt=unroll:5  
FORTRAN standard loop (1)    219820         8360           5350  
FORTRAN unrolled loop (5)    163430         5340           4710  
assembler unrolled loop      157490         4450           4490  

numbers=millisecond CPU time used for 100000 calls to dot poduct of
        1000 elements array

Conclusions:
  - the benefit of a 1->5 loop unrolling is a saving of 25% on VAX and
    a saving of 35% on AXP
  - the AXP compiler is just as good to unroll as the programmer, so
    manual loop unrolling on AXP are a waste of time
  - both on VAX and AXP are handwritten assembler code still faster
    (not much 5% on VAX and 15% on AXP in this example)

Compiler defaults for loop unrolling:

                      VAX             AXP

FORTRAN                no            0 = depends on the code, often 4
C                      no            0 = depends on the code, often 4
PASCAL                 no             no

LOOP_UNROLL.FOR
===============

      PROGRAM LOOP_UNROLL
      INTEGER*4 N
      PARAMETER (N=1003)
      INTEGER*4 I,T
      REAL*8 X(N),SUM
      INTEGER*4 PAS$CLOCK2
      REAL*8 DOT1,DOT5,DOTASM
      DO 100 I=1,N
        X(I)=I
100   CONTINUE
      T=PAS$CLOCK2()
      DO 200 I=1,100000
        SUM=DOT1(N,X)
200   CONTINUE
      WRITE(*,*) '1   :',SUM,PAS$CLOCK2()-T
      T=PAS$CLOCK2()
      DO 300 I=1,100000
        SUM=DOT5(N,X)
300   CONTINUE
      WRITE(*,*) '5   :',SUM,PAS$CLOCK2()-T
      T=PAS$CLOCK2()
      DO 400 I=1,100000
        SUM=DOTASM(N,X)
400   CONTINUE
      WRITE(*,*) 'ASM :',SUM,PAS$CLOCK2()-T
      END
C
      REAL*8 FUNCTION DOT1(N,X)
      INTEGER*4 N
      REAL*8 X(*)
      INTEGER*4 I
      DOT1=0
      DO 100 I=1,N
        DOT1=DOT1+X(I)*X(I)
100   CONTINUE
      RETURN
      END
C
      REAL*8 FUNCTION DOT5(N,X)
      INTEGER*4 N
      REAL*8 X(*)
      INTEGER*4 I
      DOT5=0
      DO 100 I=1,MOD(N,5)
        DOT5=DOT5+X(I)*X(I)
100   CONTINUE
      DO 200 I=MOD(N,5)+1,N,5
        DOT5=DOT5+X(I)*X(I)+
     +            X(I+1)*X(I+1)+
     +            X(I+2)*X(I+2)+
     +            X(I+3)*X(I+3)+
     +            X(I+4)*X(I+4)
200   CONTINUE
      RETURN
      END

DOTASM.MAR
==========

        .title  dotasm
        .psect  $CODE quad,pic,con,lcl,shr,exe,nowrt
        .entry  dotasm,^m<iv,r2,r3,r4,r5,r6,r7>
        movl    @4(ap),r2               ; r2=N
        movl    8(ap),r3                ; r3=X
        clrd    r0                      ; r0=DOTASM
        divl3   #5,r2,r4
        mull3   #5,r4,r4
        subl3   r4,r2,r4                ; r4=MOD(N,5)
        mull2   #8,r4
        addl2   r3,r4                   ; r4=X[MOD(N,5)+1]
        mull3   #8,r2,r5
        addl2   r3,r5                   ; r5=X[N+1]
        cmpl    r3,r4
        bgeq    200$
100$:   muld3   (r3),(r3),r6
        addd2   r6,r0
        addl2   #8,r3
        cmpl    r3,r4
        blss    100$
200$:   cmpl    r3,r5
        bgeq    400$
300$:   muld3   (r3),(r3),r6
        addd2   r6,r0
        muld3   8(r3),8(r3),r6
        addd2   r6,r0
        muld3   16(r3),16(r3),r6
        addd2   r6,r0
        muld3   24(r3),24(r3),r6
        addd2   r6,r0
        muld3   32(r3),32(r3),r6
        addd2   r6,r0
        addl2   #40,r3
        cmpl    r3,r5
        blss    300$
400$:   ret
        .end

DOTASM.M64
==========

        $routine DOTASM,kind=stack,saved_regs=<r2,r3>
        $linkage_section
c8:     .long   8
        $code_section
        ldl     r22,(r16)               ; r22=N
        mov     r17,r23                 ; r23=X
        .base   r27,$ls
        $call   OTS$REM_I,args=<(r16)/l,c8/l>
        mull    r0,8,r2
        addl    r2,r23,r3               ; r3=X[MOD(N,5)+1]
        mull    r22,8,r22
        addl    r22,r23,r24             ; r24=X[N+1]
        fmov    f31,f0                  ; f0=DOTASM
        subl    r3,r23,r1
        ble     r1,200$
100$:   ldg     f10,(r23)
        mulg    f10,f10,f22
        addg    f22,f0,f0
        addl    r23,8,r23
        subl    r3,r23,r1
        bgt     r1,100$
200$:   subl    r24,r23,r1
        ble     r1,400$
300$:   ldg     f10,(r23)
        ldg     f11,8(r23)
        ldg     f12,16(r23)
        ldg     f13,24(r23)
        ldg     f14,32(r23)
        addl    r23,40,r23
        mulg    f10,f10,f22
        mulg    f11,f11,f23
        mulg    f12,f12,f24
        mulg    f13,f13,f25
        mulg    f14,f14,f26
        subl    r24,r23,r1
        addg    f22,f23,f27
        addg    f24,f25,f28
        addg    f26,f0,f0
        addg    f27,f28,f29
        addg    f29,f0,f0
        bgt     r1,300$
400$:   $return
        $end_routine DOTASM
        .end

Porting from VAX C to DEC C.
============================

DEC C is fully ANSI compatible, which VAX C is not !

There are a lot of differences !

1)  Much stricter syntax-check even with the default /STANDARD=RELAXED_ANSI89
    (and guess what happend with /STANDARD=ANSI89).

    You will have to make a lot of small modifications (f.ex. type casts).

    And if we want to see all the warnings about sloppy code, then the
    qualifier /WARNING=ENABLE:ALL will surely find them.

    Most of theese problems can be worked around with the /STANDARD=VAXC
    qualifier.

2)  The C RTL are now automaticly searched by LINK, so no VAXC.OPT file
    anymore. (the routines are prefixed DECC$, if you need them in other
    languages)

3)  The preprocessor are now a real ANSI C preprocessor.

VAX C: #module ident string
DEC C: #pragma module ident string

VAX C: #define dmpi(i) printf("i" "=%d\n",i)
DEC C: #define dmpi(i) printf(#i "=%d\n",i)

VAX C: #define init(i) x_/**/i=i
DEC C: #define init(i) x_##i=i

4)  The builtins has changes significantly (obviously, since many of the
    builtins gives directly access to the instruction set). Among the many
    small goodies are __ALLOCA, which have been missed in VAX C for years.

5)  In VAX C sizeof('a') returns sizeof(char), while DEC C sizeof('a')
    returns sizeof(int), also when /STANDARD=VAXC is specified.

6)  Variables explicit declared as "extern" are no longer default
    implemented as psects but as global symbols. And as a consequence
    they do no longer match f.ex. COMMON BLOCK's in FORTRAN.

    To restore the old behaviour use: /EXTERN_MODEL=COMMON_BLOCK or
    #pragma extern_model common_block.

    The default behaviour is now /EXTERN_MODEL=RELAXED_REFDEF or
    #pragma extern_model relaxed_refdef.

There are an entire manual "DEC C Migration Guide for OpenVMS VAX Systems"
that discuss the VAX C to DEC C migration (on VMS VAX, but many of the
facts can be used on AXP too).

Conclusion: a proper port can take some time, but the /STANDARD=VAXC kludge
can be used for a fast temporary port.

Porting from FORTRAN 5.x to 6.x.
================================

There are some small differences.

1)  Unless the /NOSEPERATE qualifier is used, then all subroutines/functions
    in one source-file is compiled to one object-module, which enables
    global optimization. [AXP only]

2)  A new qualifier /RECURSIVE allows recursive programming in FORTRAN
    (it uses stack-allocation for local variables).

    I strongly recommend to always use /WARNING=UNINITIALIZED with
    /RECURSIVE, since uninitialized stack-variables are not zero !

3)  Capability of reading/writing non-native binary-formats (big endian,
    IBM floatings,IEEE floatings on VAX).

4)  A new intrinsic function IARGCOUNT returns the number of arguments
    a subroutine/function is actually called with. This superseedes
    all kinds of small MACRO routine son VAX that do not work on AXP.
    [V6.1]

5)  A new qualifier /WARNING=UNUSED helps you to clean up your code.

6)  If you install FORTRAN 6.x on VMS VAX 5.x/6.0, then you get new FORRTL
    and MTHRTL, so EXE-files linked here can not be run on other VAX'es
    running the same VMS version, but without FORTRAN 6.x installed. And
    the images can not be VESTed on AXP either. The release notes describe
    how to workaround this problem (link with the old "standard" libraries)
    in section 1.1.1.3 !

    In short:
       - use the /MATH=V5 qualifier
       - use the defines:
           $ DEFINE MTHRTL FORTRAN$MTHRTL-VMS
           $ DEFINE VMTHRTL FORTRAN$VMTHRTL-VMS
         before linking

7)  FORTRAN on VAX supports the following piece of pseudo FORTRAN 66:

        CALL SUBBO('This is a string')

        SUBROUTINE SUBBO(S)
        INTEGER*4 S(*)

    (the compiler issues some information, that the linker uses to fixup
    the call)

    FORTRAN on AXP does not support this.

8)  There is a bug in some versions of the compiler regarding generation
    of DST records for subroutines with extra ENTRY points.

Steve Lionel has written a nice little guide, which is available via
anonymous FTP !

(URL: ftp://ftp.digital.com/pub/Digital/info/whitepaper/migrating-fortran.ps)

Conclusion: Easy porting, since most changes are new features.

Porting from PASCAL 4.x to 5.x.
===============================

I have not encountered any problems (but I have not ported very much
code either, so do not jump to conclusions).

The PASCAL compilers optimization on AXP is much better than on VAX
(compared to the other compilers).

Differences in the LINKer.
==========================

There are a few differences with the linker too:

1)  The C RTL are linked in by default (one options file out).

2)  The transfer vectors are created different.

VAX MAR-file:
        .MACRO  TVGEN ZZZZ
        .TRANSFER ZZZZ
        .MASK   ZZZZ
        JMP     ZZZZ+2
        .ENDM   TVGEN

        TVGEN   name1
        TVGEN   name2
        TVGEN   name3

AXP OPT-file:
SYMBOL_VECTOR=(name1=PROCEDURE,name2=PROCEDURE,name3=PROCEDURE)

Note that psects are tread the same way as procedures: namepsect=PSECT !

3)  The very usefull but undocumented/unsupported option UNSUPPORTED=1
    are missing.

There are other differences too. Study the LINKER manual carefull, if
you have complex LINK setups.

Problems with system services.
==============================

In general all system services are present and behaves identical on VAX and
AXP. Most of the differences are due to the differences in versions of
VMS on VAX and AXP, which will eventually disappear when VSM VAX and
VMS AXP merges.

There are a few system-services that refer to pagecount and start/end
addresses. The pagecounts are not a problem, because they refer to
pagelets of 512 byte, not physical pages of 8192 bytes.

But the start/end addresses can be a problem as input arguments.
The addresses are rounded down and up to nearest page boundry and
on AXP that is a physical page boundry of 8192 bytes !

INADR.C
=======

#include <stdio.h>
#include <prtdef.h>

#define N 100

int mem[N];

int sys$setprt();

main()
{
   int stat;
   int inadr[2] = {&mem[0],((int)&mem[N])-1};
   int retadr[2];
   stat = sys$setprt(inadr,retadr,0,PRT$C_UR,0);
   printf("stat   = %8x\ninadr  = %8x %8x\nretadr = %8x %8x\n",
          stat,
          inadr[0],inadr[1],
          retadr[0],retadr[1]);
}

Problems with macro32.
======================

The purpose of the MACRO32 is easy porting of VMS and special priviliged
user applications. As a consequence it is very effecient and low-level
(and supports writing device drivers etc.). It is also not as good as
it could have been for porting ordinary user applications.

There are many good porting hints in "Migrating to an OpenVMS AXP System:
Porting VAX MACRO Code".

Declaration of JSB-routines.
----------------------------

JSB routines must be declared in the MACRO32 compiler.

VAX MACRO32:
        JSB     name
        .
        .
        .
name:   .
        .
        .
        RSB

AXP MACRO32:
        JSB     name
        .
        .
        .
name:   .JSB_ENTRY INPUT=<Rx,Ry,Rz>,OUTPUT=<Ra,Rb,Rc>
        .
        .
        .
        RSB

If not input and output are declared, then the compiler assumes that there
are no register dependencies and saves/restores everything, so the routine
do not work properly. And no warnings !

Poor floating point support.
----------------------------

The MACRO32 compiler do not support floating point register operations.
It only supports floating point operations on memory locations. This
gives poor performance (it gets even worse because the floating point
instructions are being translated into JSB calls). And be aware: the
compiler do not give any warnings about the register operations -
the results are simply wrong.

Ignore directive.
-----------------

The .LINK directive are ignored. The compiler do not give a warning, but
the directive has no effect.

Warnings.
---------

Quadword operations, which the compiler can not be sure are natural aligned
results in warnings.

Entry masks.
------------

Remove the iv bit specification from entry-masks. Compilation results
in errors.

Bugs.
-----

The MACRO32 compiler on VMS AXP 6.1 has a bug, which sometimes makes the
EDIV instruction generate wrong results. Workaround: use LIB$EDIV !


Architectural features/problems.
================================

Integers.
---------

AXP is a real 64 bit architecture, so it has 64 bit integers.

FORTRAN:  INTEGER*8 gives a 64 bit integer, and the qualifier
          /INTEGER_SIZE=64 makes it default for INTEGER

C:        you need to use the type __int64 (or if you include ints.h, then
          you can use int64) - this is not exactly emphasized in online
          help

PASCAL:   INTEGER64 is the dattype for 64 bit integers

Conclusions:
  - new integer type with greater range

Floating point.
---------------

navn     type/size/precisison       VAX                   AXP

 F       VAX 4 byte single          full support          full support
 D       VAX 8 byte double          full support          only load/store
                                                          all calculation as G
 G       VAX 8 byte double          full support          full support
 H       VAX 16 byte quadrouple     hardware old VAX      no support
                                    emulation new VAX
 S       IEEE 4 byte single         FORTRAN 6.x can       full support
                                    read/write
 T       IEEE 8 byte double         FORTRAN 6.x can       full support
                                    read/write
 X       IEEE 16 byte quadrouple    ?                     emulation in
                                                          FORTRAN 6.2 ?

Compiler switches:

           FORTRAN/PASCAL/C

F          default on AXP
S          /FLOAT=IEEE

D          /FLOAT=D
G          default on AXP
T          /FLOAT=IEEE

Conclusions: 
  - new standard datatype (IEEE, used by Intel and various RISC/UNIX platforms)
  - H-floating is missing
  - G-floating has beaten D-floating

Procedure calling standard.
---------------------------

AXP is not using the VAX procedure calling standard. In an attempt to
speed up procedure calls the first 6 arguments are no longer
passed in a seperate argumentlist, but in 6 hardware registers R16-R21
for integer and F16-F21 for floating point. The VAX proceduure stack frame
are not used anymore either.

Consequences:

1)  It is not easy to write a routine NARG, which in the call-sequence
    X -> Y -> NARG can tell Y how many arguments X called Y with.

    FORTRAN now has the IARGCOUNT function to get this information.
    C has the standard stdarg/vararg. PASCAL also has facilities to
    do this. MACRO32 code can still refer directly to (AP).

2)  It is no longer possible to write a routine that inserts an
    error-handler for the routine calling this routine. The
    LIB$ESTABLISH/LIB$REVERT no longer exist as routines. The
    FORTRAN and C compilers on AXP recognizes them and generates inline
    code. PASCAL and MACRO32 programs can not use them. MACRO32 code
    can still refer directly to (FP). Both C and PASCAL has its own
    error-handling routines, that still works.

3)  Fake returns do not work any longer too. (A calls B calls C returns to A)

4)  Tracing via the call stack is not possible the same way on VMS AXP
    as VMS VAX. All MACRO code refferring to something(FP) should be
    carefully examined.

There are some good descriptions in "OpenVMS Calling Standard".

For those who want to start playing with the VMS AXP stacks, then the
routines LIB$GET_CURRENT_INVO_CONTEXT and LIB$GET_PREVIOUS_INVO_CONTEXT 
are the way to go (they are documented in the manual mentioned above).

Problems with undocumented features.
====================================

Program linked directly with VMS.
---------------------------------

VAX: $ LINK anything+SYS$SYSTEM:SYS.STB/SELECTIVE_SEARCH
     (or .link "sys$system:sys.stb"/selective_search in MACRO32 code)
AXP: $ LINK/SYSEXE anything
     (or SYS$LOADABLE_IMAGES:SYS$BASE_IMAGE.EXE/SHARE in options-file)

(DCLDEF.STB are still there)

Size of SYS0 fields.
--------------------

Certain fields in SYS0 space has changed size.
Example: SYS$GW_IJOBCNT -> SYS$GL_IJOBCNT.

SYS0.C on VAX
-------------

#include <stdio.h>

globalref sys$gw_ijobcnt;

main()
{
   printf("Number of interactive jobs=%d\n",sys$gw_ijobcnt);
}

SYS0.C on AXP
-------------

#include <stdio.h>

globalref sys$gl_ijobcnt;

main()
{
   printf("Number of interactive jobs=%d\n",sys$gl_ijobcnt);
}

Read/write in other process address-space.
------------------------------------------

The well-known hack for reading/writing in another process
address-space via a double/single special kernel mode AST are
extremely difficult in VMS AXP. It is said to be possible, but
DEC has supplied to new routines (undocumented/unsupported)
EXE$READ_PROCESS and EXE$WRITE_PROCESS to do it.

Format of EXE-files.
--------------------

The format of EXE-files has changed. Any program reading and interpreting
image header and GST/DST must be rewritten. See the difference between
the $IHADEF/$IHDDEF and $EIHADEF/$EIHDDEF macros for details.

SYS$IMGACT.
-----------

I have not yet been able to get the undocumented/unsupported system service
SYS$IMGACT to work properly on VMS AXP.

MACRO64 assembler.
==================

It is actually rather simple to use. It i mostly like MACRO32, but one has
to get used to the load/store way of addressing.

I doubt that it will ever be very musch used. It is not bundled with VMS.
MACRO32 is still used for priviliged code. DEC has announced that
writing device-drivers in C will be supported soon.

VEST.
=====

The purpose of VEST is easy porting of user applications with no direct
dependence of VMS.

It is based on both translation and interpretation.

I have moved two applicatiosn with VEST:
  - a 75 block executable with a 500 block shareable
  - a 500 block executable

Both programs worked fine after VESTing.

Efficiency chart:

                   VAX 4200                    DEC 3400
                   (rated at 5 VUPS)           (rated at 110 Specmark)

                                               VESTed       native

integer                  13240                 2260          430
calculations

floating point           36510                 2470          420
calculations

character                25420                 8500         2250
moves

Numbers=milliseconds of CPU usage on specified operation on small data.

Conclusions:
  - VEST gived an overhead of a factor 3-5
  - the gain of the VAX -> AXP change depends on the type of operations
    (integer: factor 25, floating point: factor 75,character: 10)

[DEC's tests on the Specmark programs says a factor 2-6 in VEST overhead
and the hardware upgrade scaled to same number of VUPS gives a factor
12-56]

VEST.FOR
========

      PROGRAM VEST
      CALL TESTINT
      CALL TESTFLOAT
      CALL TESTCHAR
      END
C
      SUBROUTINE TESTINT
      INTEGER*4 I,J,T,A(4000),B(4000)
      INTEGER*4 PAS$CLOCK2
      DO 100 I=1,4000
        A(I)=I
100   CONTINUE
      T=PAS$CLOCK2()
      DO 300 J=1,1000
        DO 200 I=4,3997
          B(I)=(A(I-3)+A(I-2)+2*A(I-1)+8*A(I)+2*A(I+1)+A(I+2)+A(I+3))/16
200     CONTINUE
        CALL FOOL(B)
300   CONTINUE
      WRITE(*,*) PAS$CLOCK2()-T
      RETURN
      END
C
      SUBROUTINE TESTFLOAT
      INTEGER*4 I,J,T
      REAL*8 A(4000),B(4000)
      INTEGER*4 PAS$CLOCK2
      DO 100 I=1,4000
        A(I)=I
100   CONTINUE
      T=PAS$CLOCK2()
      DO 300 J=1,1000
        DO 200 I=4,3997
          B(I)=0.0625D0*A(I-3)+
     +         0.0625D0*A(I-2)+
     +         0.125D0*A(I-1)+
     +         0.5D0*A(I)+
     +         0.125D0*A(I+1)+
     +         0.0625D0*A(I+2)+
     +         0.0625D0*A(I+3)
200     CONTINUE
        CALL FOOL(B)
300   CONTINUE
      WRITE(*,*) PAS$CLOCK2()-T
      RETURN
      END
C

      SUBROUTINE TESTCHAR
      INTEGER*4 I,J,T
      CHARACTER*4000 A,B
      INTEGER*4 PAS$CLOCK2
      DO 100 I=1,4000
        A(I:I)=CHAR(MOD(I,256))
100   CONTINUE
      T=PAS$CLOCK2()
      DO 300 J=1,1000
        DO 200 I=1,3901
          B(I:I+49)=A(I+50:I+99)
200     CONTINUE
        CALL FOOL(B)
300   CONTINUE
      WRITE(*,*) PAS$CLOCK2()-T
      RETURN
      END
C
      SUBROUTINE FOOL
      RETURN
      END