Given a string of words: How to find every word (case insensitive) in a varchar(1000) column in MySQL ignoring punctuation?


Problem :

I have two database a and b with titles of scientific papers in them. I want to merge those databases to one single database c.

  • It is possible that a contains titles which are not in b and vice versa.
  • It is possible that a title is in both databases a and b.
  • It is possible that the cases of the letters and the punctuation do not match:
    • "This is a Title." VS. "this is a title"
    • "This is - yet another - title." VS. "This is yet another title"
    • "The k-mean algorithm based on bla." VS "The k mean Algorithm based on bla"

First I thought of using a levenstein distance function inside of MySQL to match the same titles in both databases, but looking at millions of rows I don't know if this would perform well enough. Then I thought of a fulltext search to match the titles but as far as I know fulltext searches do not match common words so the matchings would not perform well on similar titles which are in fact different.

I do not need a 100 % in the matching procedure as a result. But I want to have the rate as high as possible. Any advice?



Solution :

One idea that comes to mind is to create a search column containing the text without any punctuation and in lower case, and to compare that.

If you are using some other language or platform in connection with mySQL, it might be easiest to do the normalization there - I can't think of a native mySQL function to strip punctuation and such. It's surely possible but maybe only using a hellishly complex set of REPLACE() calls.


Mysql Tutorials

Mysql Howto..

how to export only selected files in mysql to a .csv [closed]

How to export a MySQL database to JSON?

How to fetch values with a MySQL query?

How to create foreign key that is also a primary key in MySQL?

How to display data in multiple columns with php?

Spring3, Security3, Hibernate, MYSQL - How to install user tracking into database

How to define a deferred constraint in MySQL

Sqlite or MySql? How to decide? [closed]

How to query the order result in this table structure in MYSQL/ SQL?

How to fetch data between weeks in different year in Mysql [duplicate]

How to remove accents in MySQL?

how to move mysql database or data directory to another location

How to remove partial duplicates from a mysql table?

How to protect user specified table name from SQL Injection in C#, using MySQL

How to compare two DB tables of different number of rows based on one attribute in MySql?

How to optimize repeat MySQL sorted selects

How to decrease the timeout for my python application connecting to mysql server

Mysql - How to get last record using GROUP BY

How to create dynamic table in mySql

Mysql show data in Pivot View

How to safely escape quotes in R to submit text to SQL?

How to get count of two fields from two different table with grouping a field from another table in mysql

How do I add millions of rows to a live production mysql table?

MySql: Show columns but exclude everything except the field names

How to improve speed of this mysql query processing

how to display the contents of a single row in a table using mysql command line?

How to count the results of a where clause in order to calculate a proportion a MYSQL select clause?

How to get all table names from MySQL in CakePHP

show the list of selected id in mysql php

how to do auto increment?