Given a string of words: How to find every word (case insensitive) in a varchar(1000) column in MySQL ignoring punctuation?
I have two database
b with titles of scientific papers in them. I want to merge those databases to one single database
- It is possible that
acontains titles which are not in
band vice versa.
- It is possible that a title is in both databases
- It is possible that the cases of the letters and the punctuation do not match:
- "This is a Title." VS. "this is a title"
- "This is - yet another - title." VS. "This is yet another title"
- "The k-mean algorithm based on bla." VS "The k mean Algorithm based on bla"
First I thought of using a levenstein distance function inside of MySQL to match the same titles in both databases, but looking at millions of rows I don't know if this would perform well enough. Then I thought of a fulltext search to match the titles but as far as I know fulltext searches do not match common words so the matchings would not perform well on similar titles which are in fact different.
I do not need a 100 % in the matching procedure as a result. But I want to have the rate as high as possible. Any advice?
One idea that comes to mind is to create a search column containing the text without any punctuation and in lower case, and to compare that.
If you are using some other language or platform in connection with mySQL, it might be easiest to do the normalization there - I can't think of a native mySQL function to strip punctuation and such. It's surely possible but maybe only using a hellishly complex set of
- A Linux, Apache, MySQL, PHP (LAMP) stack is a combination of open source software that is typically installed together to enable a server to host dynamic […]
- MySQL is an the most popular open-source database management system, in most of hosting company its has been bundle by LAMP(Linux, Apache, MySQL, PHP/Python/Perl) stack. Another alternative […]
- MySQL is the world’s most popular open-source database. Despite its powerful features, MySQL is simple to set up and easy to use. Below are some instructions […]