[m-rev.] for review: speed up find_closest_seq

Julien Fischer jfischer at opturion.com
Fri Jan 26 13:36:15 AEDT 2024


On Fri, 26 Jan 2024, Zoltan Somogyi wrote:

> For review by Julien, since he reviewed the related earlier diff.
>
> I am in two minds about the effect of this change on the speed
> of find_edit_distance when invoked directly, i.e. not via find_closests_seq.
> On the one hand, it it a slight slowdown, for the reason listed in the
> log message. On the other, find_edit_distance should still be much faster
> than it was two days ago, before my previous diff.
>
> We *could* keep around the old code of find_edit_distance, which
> does not have the overhead of computing the minimum cost in each row.
> However, this would require at least some code duplication, and it would
> not help the compiler, since the compiler *always* invokes find_edit_distance
> through find_closests_seq, not directly. That's why this diff does not do that,
> though I could be persuaded otherwise.

I don't think keeping the old version is worth it.

> Speed up find_closest_seqs.
> 
> library/edit_distance.m:
>     When find_closest_seqs computes the distance between the query sequence
>     and a second or later candidate sequence, it already knows the minimum
>     distance it has seen so far, and it knows that it will throw away
>     any distance that exceeds this minimum. Allow it to tell find_edit_distance
>     to stop computing the distance once it is certain that it will be
>     thrown away.
>
>     This increases the cost of computing each row slightly (since we now
>     compute the minimum cost in each row), but it reduces the number of rows
>     we have to compute, and in typical uses cases of find_closest_seqs,
>     we expect the second effect to be dominant.
>
>     In situations in which the caller invokes find_edit_distance directly,
>     not through find_closest_seqs, we get a slight slowdown, since the second
>     effect cannot happen.

That's fine.

Julien.


More information about the reviews mailing list