Best way to compare two arrays for "equality" (same subscripts and values)
I've been trying to write a method to compare two local variables, which may be arrays, for "equality" - that is, to see if they have all the same subscripts (if they're arrays) and values. This is the best I've come up with so far - are there any better/simpler approaches out there?
/// Returns true if arrays <var>first</var> and <var>second</var> have all the same subscripts and all
/// the same values at those subscripts. <br />
/// If <var>first</var> and <var>second</var> both happen to be either undefined or unsubscripted variables,
/// returns true if they're both undefined or have the same value.<br />
/// <var>pMessage</var> has details of the first difference found, if any.
Method CompareArrays(ByRef first, ByRef second, Output pMessage) As %Boolean [ ProcedureBlock = 0 ]
{
New tEqual,tRef1,tRef2,tRef1Data,tRef1Value,tRef2Data,tRef2Value
Set pMessage = ""
Set tEqual = 1
Set tRef1 = "first"
Set tRef2 = "second"
While (tRef1 '= "") || (tRef2 '= "") {
#; See if the subscript is the same for both arrays.
#; If not, one of them has a subscript the other doesn't, and they're not equal.
If ($Piece(tRef1,"first",2) '= $Piece(tRef2,"second",2)) {
Set tEqual = 0
Set pMessage = "Different subscripts encountered by $Query: "_
$Case(tRef1,"":"<end>",:tRef1)_"; "_$Case(tRef2,"":"<end>",:tRef2)
Quit
}
Kill tRef1Value,tRef2Value
Set tRef1Data = $Data(@tRef1,tRef1Value)
Set tRef2Data = $Data(@tRef2,tRef2Value)
#; See if the $Data values are the same for the two.
#; This is really only useful to detect if one of the arrays is undefined on the first pass;
#; $Query only returns subscripts with data.
#; This will catch only one being defined, or one being an array and
#; the other being a regular variable.
If (tRef1Data '= tRef2Data) {
Set tEqual = 0
Set pMessage = "$Data("_tRef1_")="_tRef1Data_"; $Data("_tRef2_")="_tRef2Data
Quit
} ElseIf (tRef1Data#2) && (tRef2Data#2) {
#; See if the value at the subscript is the same for both arrays.
#; If not, they're not equal.
If (tRef1Value '= tRef2Value) {
Set tEqual = 0
Set pMessage = tRef1_"="_@tRef1_"; "_tRef2_"="_@tRef2
Quit
}
}
Set tRef1 = $Query(@tRef1)
Set tRef2 = $Query(@tRef2)
}
Quit tEqual
}Comments
I dug up a pre-dynamic objects version of a utility method from a REST test and cleaned it up a bit (hopefully not introducing any bugs in the process):
ClassMethod compareArrays(ByRef actual, ByRef expected) As %Status [ PublicList = (actual, expected) ]
{
; compare root node
set deix=$d(expected,eval),daix=$d(actual,aval)
if deix'=daix {
quit $$$ERROR($$$GeneralError,"$d(actual)="_daix_" instead of "_deix)
}
if deix#2,aval'=eval {
quit $$$ERROR($$$GeneralError,"actual="""_aval_""" instead of """_eval_"""")
}
set status=$$$OK
set eix="expected",aix="actual"
for i=1:1 {
set eix=$q(@eix),aix=$q(@aix)
quit:""=eix&&(""=aix)
set seix="("_$p(eix,"(",2,*),saix="("_$p(aix,"(",2,*)
if seix'=saix {
set status=$$$ERROR($$$GeneralError,"found """_aix_""" instead of """_eix_""" at position "_i)
quit
}
set deix=$d(@eix,eval),daix=$d(@aix,aval)
if deix'=daix {
set status=$$$ERROR($$$GeneralError,"$d(aix)="_daix_" instead of "_deix_" at position "_i)
quit
}
if deix#2,aval'=eval {
set status=$$$ERROR($$$GeneralError,"actual("""_aix_"""))="""_aval_""" instead of """_eval_""" at position "_i)
quit
}
}
quit status
}
Comparing them, I only see two things I prefer in my version. First, in this line of your method I would use four-argument $piece with * as the fourth argument, just in case the subscript contains "first" or "second":
If ($Piece(tRef1,"first",2) '= $Piece(tRef2,"second",2)) {
Second, I would use a public list with first and second, rather than turning off procedure block for the entire method.
Actually I don't see any value for checking $data for intermediate subscript (and check their consistency only at the most beginning of a function). Here is my [hopefully] simpler version
if $data(@refL) '= $data(@refR) {
// they are not consistent: one is non-array
return 0
}
do {
// fetch next data node subscript and it's value
set refL = $query(@refL, 1, valueL), refR = $query(@refR, 1, valueR)
if refL="" || (refR="") {
quit
}
set subL = $qlength(refL), subR = $qlength(refR)
if subL'=subR || (valueL '= valueR) {
return 0
}
// check each subscipt individually
for i=1:1:subL {
if $qsubscript(refL, i) '= $qsubscript(refR, i) {
return 0
}
}
} while refL'="" && (refR'="")
// only after all checks passed
return refL=refR
DebugArrayCompare()
new
set m(1,1,1)=11,m(1,2)=12,m(2,1)=133
set n(1,1,1)=11,n(1,2)=12,n(2,1)=133
write $$CompareArrays($name(m),$name(n)),!
set n(3,1)=0
write $$CompareArrays($name(m),$name(n)),!
quit
Here's the one I thought up. Does not use indirection.
/// Returns true if arrays <var>pFirst</var> and <var>pSecond</var> have all the same subscripts and all
/// the same values at those subscripts. <br />
/// If <var>pFirst</var> and <var>pSecond</var> both happen to be either undefined or unsubscripted variables
/// returns true if they're both undefined or have the same value or one is undefined and the other empty
/// <var>pMessage</var> has details of the first difference found, if any.
ClassMethod CompareArrays2(ByRef pFirst, ByRef pSecond, Output pMessage) As %Boolean
{
Set pMessage = ""
Return:(($Data(pFirst) '= 10) || ($Data(pSecond) '= 10)) $Get(pFirst) = $Get(pSecond)
Merge First = pFirst
Merge Second = pSecond
Set Key = $Order(First(""))
/// Iterate over first array
While (Key '= "") {
/// $Data on undefined var does not modify second argument
Kill SecondVal
/// If the second array does not have the same subscript
/// or the values are different, quit
If (($Data(Second(Key), SecondVal) = 0) || ($Get(SecondVal) '= First(Key))) {
Set pMessage = "Different subscripts at " _ Key
Return $$$NO
} Else {
/// Otherwise remove this element from the second array
/// In here: Second(Key) = First(Key)
Kill Second(Key)
}
Set Key = $Order(First(Key))
}
/// Second array should have no subscripts
/// If there are any, that means they are not present
/// in the first array, and so different
If $Data(Second) = 10 {
Set pMessage = "Different subscripts at " _ $Order(Second(""))
Return $$$NO
}
Return $$$YES
}How is "best" defined here? If you wan't fastest, and shortest, I have two options for you. This following code also works with both locals and globals, and avoids the bug of using $PIECE() to trim off the global/local name which won't work on globals which contain a "(" in their namespace (admittedly unlikely).
This is the fast version:
; $$FDQ($NAME(a),$NAME(b))
; Find first different nodes in two trees (or subtrees). Will
; work with locals or globals, except locals of the form % or
; %<digit>. Returns a string containing the two references where
; the first difference separated by "'=". If a node is found in
; one tree that is not present in the other, the missing
; reference is replaced by a question mark ("?"). If both trees
; are the same, an empty string is returned.
;
FDQ(%1,%2) ; [10]
NEW %3,%4,%5,%6,%7,%8,%9,%0,% ; [20]
SET %3=$DATA(@%1,%5)#10,%4=$DATA(@%2,%6)#10
QUIT:%3'=%4||(%3&&(%5'=%6)) $S(%3:%1,1:"?")_"'="_$S(%2:b,1:"?") ; [30]
SET %7=%1,%8=%2,%3=$QLENGTH(%1),%4=$QLENGTH(%2)
lq SET %1=$QUERY(@%1,1,%5),%2=$QUERY(@%2,1,%6) ; [40]
SET:%1'=""&&(%7'=$NAME(@%1,%3)) %1="" ; [50]
SET:%2'=""&&(%8'=$NAME(@%2,%4)) %2=""
QUIT:%1="" $SELECT(%2="":"",1:"?'="_%2) QUIT:%2="" %1_"'=?" ; [60]
FOR %=1:1 SET %9=$QS(%1,%3+%),%0=$QS(%2,%4+%) Q:%9'=%0 Q:%9="" ; [70]
IF %9="",%0="" GOTO:%5=%6 lq QUIT %1_"'="_%2 ; [80]
QUIT:%9]]%0 "?'="_%2 QUIT %1_"'=?" ; [90]
; ------------
; [10] %1,%2 Reference to nodes under test.
; [20] %3,%4 Upto [30] used for Do %1,%2 exist (respectively)?
; After [30] used for count of subscripts of %1,%2.
; %5,%6 Values of %1,%2.
; %7,%8 Copies of %1,%2 used to help find end subtree.
; %9,%0 First different subscript of %1,%2.
; % Loop index for scanning down subscript list.
; [30] Return if the existence of %1 and %2 differ or if either exist
; (doesn't matter which), and the values differ.
; [40] Go to next node on each side (which we know exist).
; [50] Check if we have moved past the end of the subtree.
; [60] If either or both %1,%2 put us at end of subtree, return.
; [70] Find the first different subscript or both will be "".
; [80] If both final subscripts "", subscripts are the same so check
; values, and either return of loop.
; [90] Subscripts don't match, return determine order so we can return
; node that is missing.
; $$FDR($NAME(a),$NAME(b))
; Find first different nodes in two trees (or subtrees). Will
; work with locals or globals, except locals of the for %, %1,
; %2, %3, or %4. Returns a string containing the two references
; where the first difference separated by "'=". If a node is
; found in one tree that is not present in the other, the missing
; reference is replaced by a question mark ("?"). If both trees
; are the same, an empty string is returned.
;
FDR(%1,%2) ; [10]
NEW %3,%4,% ; [20]
SET %3=$DATA(@%1,%5)#10,%4=$DATA(@%2,%6)#10
QUIT:%3'=%4||(%3&&(%5'=%6)) $S(%3:%1,1:"?")_"'="_$S(%2:b,1:"?") ; [30]
SET (%3,%4)=""
lr SET %3=$ORDER(@%1@(%3)),%4=$ORDER(@%2@(%4)) Q:%3=""&&(%4="") "" ; [40]
IF %3=%4 SET %=$$FDR($NA(@%1@(%3)),$NA(@%2@(%4))) G:%="" lr Q % ; [50]
QUIT:%3]]%4 "?'="_$NAME(@%2@(%4)) QUIT $NAME(@%1@(%3))_"'=?" ; [60]
; ------------
; [10] %1,%2 Reference to nodes under test.
; [20] %3,%4 Upto [30] used for Do %1,%2 exist (respectively)?
; After [30] Subscripts of %1,%2.
; % Results of recursive call.
; [30] Return if the existence of %1 and %2 differ or if either exist
; (doesn't matter which), and the values differ.
; [40] Go to next subscript at this level.
; [50] If the subscripts are the same, check the sub-tree
; recursively. Loop or quit, depending upon finding a difference.
; [60] If subscripts differ, there is a missing node. Return the
; missing one.
Not tested for speed, while I expect this version should be rather fast as it compares common parts of both references rather than individual suscripts. Enjoy!
tttcmp(fgname,tgname,bKill,nErrTotal,nErrTop) ; Compare [sub]array @fgname with [sub]array @tgname;In:; fgname - "original" [sub]array; tgname - its copy to check with;; bKill - kill @tgname if it matches to @fgname (default = 0); nErrTop - # of mismatches to find to stop comparison;;Out:; returns 1 on full subscripts and data match, else - 0.; ByRef nErrTotal - # of mismatches.;new x,y,xtop,ytop,i,flOK,flQ,xquit,yquit,nErr,xstart,ystartset bKill=$get(bKill,0)set nErrTop=$get(nErrTop,1)set x=fgname,y=tgname write !,"Comparing original "_fgname_" with imported "_tgname_":"set xstart=$length($name(@x,$qlength(x)))+$select($qlength(x):1,1:2)set xtop=$select($qlength(x):$extract(x,1,$length(x)-1)_",",1:x)set ystart=$length($name(@y,$qlength(y)))+$select($qlength(y):1,1:2)set ytop=$select($qlength(y):$extract(y,1,$length(y)-1)_",",1:y)set flOK=1,flQ=0,nErr=0,nErrTotal=0for i=1:1 do quit:flQ. set x=$query(@x),xquit=x=""!(x'[xtop). set y=$query(@y),yquit=y=""!(y'[ytop). if xquit,yquit write " OK. i=",i set flQ=1 quit. if xquit!yquit write " NO!!!: i=",i,$select(xquit:" "_fgname_" is shorter than "_tgname,1:" "_tgname_" is shorter than "_fgname) set nErrTotal=nErrTotal+1,flOK=0,flQ=1 quit. if $extract(x,xstart,$length(x))'=$extract(y,ystart,$length(y)) write !,"!!! Ref NEQ: i=",i write !," x=",x,!," y=",y set nErrTotal=nErrTotal+1,nErr=nErr+1,flOK=0 set:nErr'<nErrTop flQ=1 quit:flQ ;!,$e(x,xstart,$l(x)),!,$e(y,ystart,$l(y)),. if $get(@x)'=$get(@y) write !,"!!! Data NEQ: i=",i write !," *** x = ",x,!," x => ",@x,!," *** y = ",y,!," @y => ",@y set nErrTotal=nErrTotal+1,nErr=nErr+1,flOK=0 set:nErr'<nErrTop flQ=1 quit:flQ. else set nErr=0if flOK,bKill write !,"Killing "_tgname_"..." kill @tgnameelse write !,"Not Killing "_tgnamequit flOK
One more hint from Russian forum:
%GCMP - Compares two globals in the same or different namespace.