While working on first-passage probabilities, I encountered this problem. I want to find a NumPythonic way (without explicit loops) to leave only the first occurrence of strictly increasing values in each row of a numpy
array, while replacing repeated or non-increasing values with zeros. For instance, if
arr = np.array([
[1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5],
[1, 1, 2, 2, 2, 3, 2, 2, 3, 3, 3, 4, 4],
[3, 2, 1, 2, 1, 1, 2, 3, 4, 5, 4, 3, 2]])
I would like to get as output:
out = np.array([
[1, 0, 0, 2, 0, 0, 3, 0, 0, 4, 0, 5, 0],
[1, 0, 2, 0, 0, 3, 0, 0, 0, 0, 0, 4, 0],
[3, 0, 0, 0, 0, 0, 0, 0, 4, 5, 0, 0, 0]])
While working on first-passage probabilities, I encountered this problem. I want to find a NumPythonic way (without explicit loops) to leave only the first occurrence of strictly increasing values in each row of a numpy
array, while replacing repeated or non-increasing values with zeros. For instance, if
arr = np.array([
[1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5],
[1, 1, 2, 2, 2, 3, 2, 2, 3, 3, 3, 4, 4],
[3, 2, 1, 2, 1, 1, 2, 3, 4, 5, 4, 3, 2]])
I would like to get as output:
out = np.array([
[1, 0, 0, 2, 0, 0, 3, 0, 0, 4, 0, 5, 0],
[1, 0, 2, 0, 0, 3, 0, 0, 0, 0, 0, 4, 0],
[3, 0, 0, 0, 0, 0, 0, 0, 4, 5, 0, 0, 0]])
Maximum can be accumulated per-row:
>>> arr
array([[1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5],
[1, 1, 2, 2, 2, 3, 2, 2, 3, 3, 3, 4, 4],
[3, 2, 1, 2, 1, 1, 2, 3, 4, 5, 4, 3, 2]])
>>> np.maximum.accumulate(arr, axis=1)
array([[1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5],
[1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4],
[3, 3, 3, 3, 3, 3, 3, 3, 4, 5, 5, 5, 5]])
Then you can easily mask out non-increasing values:
>>> m_arr = np.maximum.accumulate(arr, axis=1)
>>> np.where(np.diff(m_arr, axis=1, prepend=0), arr, 0)
array([[1, 0, 0, 2, 0, 0, 3, 0, 0, 4, 0, 5, 0],
[1, 0, 2, 0, 0, 3, 0, 0, 0, 0, 0, 4, 0],
[3, 0, 0, 0, 0, 0, 0, 0, 4, 5, 0, 0, 0]])
Here's one approach:
m = np.hstack(
(np.ones((arr.shape[0], 1), dtype=bool),
np.diff(np.fmax.accumulate(arr, axis=1)) >= 1)
)
out = np.zeros_like(arr)
out[m] = arr[m]
Output:
array([[1, 0, 0, 2, 0, 0, 3, 0, 0, 4, 0, 5, 0],
[1, 0, 2, 0, 0, 3, 0, 0, 0, 0, 0, 4, 0],
[3, 0, 0, 0, 0, 0, 0, 0, 4, 5, 0, 0, 0]])
Explanation
np.fmax
+ np.ufunc.accumulate
to get running maximum for each row.np.diff
is bigger than or equal to 1.np.hstack
to prepend a column with True
for first column (via np.ones
).arr
(via np.zeros_like
) and set values for the mask.