Subject: Solution to median, rows and columns problem
From: richard.levine@canrem.com (Richard Levine)
Date: Thu,  8 Jun 95 23:35:00 -0500
Organization: CRS Online  (Toronto, Ontario)

Preamble

        The following question was part of the discussion on
"Collecting Questions about J" in comp.lang.apl.  This document
is an organized summary of all the messages and  fragments
which led to a solution.  It's a little long, but, then again,
so was the discussion. (:>) (::>>) (:::>>>)

Thanks to all who contributed.If there
are any mistakes or significant omissions, please let me know.

Introduction

Consider the function "mean" which can be defined as follows:

mean =. +/ % #

This function has the "nice" property that it can be applied for
each row, for each column, or, in the case of a rank-3 array, for
each "tube". (A "tube" is an "ad hoc" term for all the elements
with the same row and column specification, but in different
planes.)

The syntax to apply mean is:

mean"0   mean for each atom (Note: For "mean" the result is
identical to argument.)
mean"1   mean for each row
mean"2   mean for each column
mean"3   mean for each tube
mean     same as mean"3  (in general, mean is same as mean"n
where n is
         the rank of the argument)

This syntax generalizes to arguments of any rank.

The Question

        Given an algorithm f (for example, the median), where f
works correctly for a list argument, but not necessarily for each
row, column, tube, etc., can we provide a suitable method for
defining a new function  nf  that has the nice property of the
mean; that is, it can be applied for each row, for each column, for
each tube, etc. as described above.
(Note.  Such a function f was described as having a"natural" rank
of 1, defined as:  f  is identical to   f"1  )

The Solution

The new function  nf  can be defined as follows:

nf =. ((''&$) @ f)"1 @ (0&|:) :: [

This has the nice property, as described above, that (for rank-3
arguments) ...

nf"0   applies f to each atom
nf"1   applies f to each row
nf"2   applies f to each column
nf"3   applies f to each tube
nf       is the same as nf"3

The results are placed in an array of the appropriate shape.

A brief explanation of the function ("verb") follows.

0&|:
Move the first axis to the tail  (that is, for all elements whose
coordinate specifications are identical except for the first
coordinate, put on the same row)

(''&f) @ f)"1
For each row, apply f and ensure result is a scalar (atom)

 :: [
This is a simple adjustment so that the function  nf  does not fail
with atomic ("scalar") arguments, including (nf"0).  An atom need
not (and cannot) be transposed.  The "identity" function ( [ ) is
invoked through the adverse conjunction when the transpose fails
with an atomic argument (which is outside the domain of the
transpose function.)

J does the rest, placing the result of each application of ((''&f) @
f)"1 in an array of appropriate shape.

Example

For example, consider the case of  (f =. median).
   NB. Define median. Adapted from ...
   NB. "Some Notes on Introducing J with
Statistical Examples"
   NB. (Smillie, June 1995)
   am =. +/ % #
   sort =. /:~
   midindices =. (<. , >.) @ -: @ <: @ #
   midvalues =. midindices { ]
   median =. am @ midvalues @ sort

   NB. Set box characters
   9!:7 '+++++++++|-'

   [y =. 3 4$7 4 5 50 1001 1002 1003 1004 3
1 7 51
   7    4    5   50
1001 1002 1003 1004
   3    1    7   51

   nf =. ((''&$) @ f)"1 @ (0&|:) :: [
   f =. median

   NB. Observe that median and nf give
different answers ...
   NB. ... over the columns of a table.
   NB. (nf is correct!)

   (median"1 ; nf"1) y
+----------+----------+
|6 1002.5 5|6 1002.5 5|
+----------+----------+
   (median"2 ; nf"2) y
+--------+--------+
|7 4 5 50|7 4 7 51|
+--------+--------+
   (median ; nf) y
+--------+--------+
|7 4 5 50|7 4 7 51|
+--------+--------+

   NB. Check it out!  median gives correct
answers when ...
   NB. ... the values are in a list.

   median 50 1004 51
51

Discussion

User Point of View

        This solution illustrates that, from a "user" point of view,
if we desire a "median" function with the nice property of
applying correctly for each row, column, tube, etc, the median
algorithm as provided for us (above) need not be touched.  The
transformation  (nf)  where (f=.median) will automatically have
the desired property.  The same comment applies to any function
f  which works correctly for rows, but does not generalize easily
(if at all) to columns, tubes, etc.

Designer Point of View

        From a "designer" or "software provider" point of view, it
was suggested that designers design "infinite rank" into their
functions, where appropriate.  As noted, many functions are
easily defined to have infinite rank (i.e. easily defined to work on
arrays of any rank):  for example, variance, standard deviation,
sum of squares, average (mean), moving average, maximum,
minimum, range, etc.  For such functions  f"r  suffices for f to
apply to cells of any rank  r.

Comparison with "Insertion"

        Comparison could be made with the use (in J) of dyadic
verbs and the "insertion" adverb (/), where the rank conjunction
may be used as in (+/"0), (+/"1), (+/"2), etc. and may be
interpreted as applying, respectively, "for each atom", "for each
row", "for each column", etc.

Terminology

        There was a related discussion concerning the merits and
user preferences for the various terms for the above
computations; for example,  "sum over the rows", "sum for each
row", "sum of the rows" or even to avoid these terms altogether in
favour of "cells of rank-n" terminology.   (Note similar
computations in APL using the "reduction operator" and a
"coordinate specification" and the terms used to describe the
results in these cases.)

Use of Transpose

        Concerning the use of "transpose" in the definition, it was
noted that the use of "transpose" may involve considerable data
movement.  Data movement is indeed expensive, and should be
avoided, but it was further noted that this should not dissuade us
from using "transpose" where required.  For future work in APL
(and J) it was also noted that good interpreter or compiler design
can make the performance of transpose, etc., a non-issue.

Acknowledgments

        Thanks to all who contributed to this discussion.
(Principal public messages were from Fraser Jackson, Roger Hui,
Robert Bernecky, and Richard Levine; and also several private
messages.)
