Saturday, May 28, 2011

Case Insensitive Diff Algorithm in PHP

So, one day the business analyst wanted to compare what people where typing in as their address and the corrected/validated results we use. My first thought was "Oh that'd be easy, we just use a diff function on a report!" What I didn't think about was that most people type in the lowercase, but the corrected address are in all caps (as per Post Office specs).

Okay, so we just need a case insensitive diff algorithm, but wait, there's more. "Marshal Hollow Rd" and "Maral Hollow Rd" only show up as a tiny difference, but obviously they're completely different addresses. We needed to highlight the differences in the words more. So, now we need a case insensitive diff that judges differences based on words rather than characters.

What follows was my solution to this need. I wrote this a number of years ago, and I no longer remember much about it. I do know that I did not invent the diff algorithm. There was a comment with this link in it. Likely I just wrote what is said on that page and modified it to handle our special needs. If I read the website in the comment and studied the code, I could probably regain my understanding of this algorithm, but I don't think I really need to understand it anymore as in the intervening years I've not needed a similar algorithm since.

Here's the code!

No comments:

Post a Comment