Sunday, May 1, 2011

How do you write a search function that is easy to comprehend (/maintainable)? Maybe in a modular manner?

Every time I see a search function, the code behind it is a mess. Several hundreds of lines, spaghetti code, and almost ALWAYS as one huge method. A programming language (Java/C#/PHP/etc) is used to construct one big fat SQL query. Many, many if else's.

There must be more elegant ways to do this than this? Or is this what you get when you use RMDBS instead of a flat data structure?

I'd be willing to learn more about this topic, perhaps even buy a book. /Adam

From stackoverflow
  • Use the query object pattern. If you can, also use an ORM, it will make things easier. The implementation details depend on your platform and architecture, but here are some samples:

    Brian Sullivan : Agreed. NHibernate's ICriteria API makes this a snap (relatively, anyway). Much better than the crazy dynamic SQL stored procs with a gajillion parameters that some of our older systems use.
  • In my current project we use a simplified version of the query object pattern that mausch mentions. In our case we have a search criteria object that consists of a field and a value, and several such objects can be added to a list. We did have an operator property from the start, but it was never used so we removed it. Whether the criteria are treated as AND or OR depends on the search method that is used (I would say that it is AND in 95% of the cases in that project).

    The find methods themselves do not do a lot with this information; they will invoke stored procs in the DB, passing the criteria as parameters. Most of those procs are fairly straight forward, even though we have a couple that does involve some string handling to unpack lists of critera for certain fields.

    The code from a caller's perspective might look something like this (the Controller classes wraps repetetive stuff as instantiating a search object with a configurable implementation*, populating it with search criteria and such):

    CustomerCollection customers = CustomerController.Find(new SearchCriterion("Name", "<the customer name>"));
    

    If more than one search criterion is needed a collection can be passed instead. Inside the finder function the code will loop over the collection, map the present values to appropriate parameters in an SqlCommand object.

    This approach has worked out rather well for us.

    *) The "configurable implementation" means that we have created an architecture wher the search objects are defined as abstract classes that merely will define the interface and contain some generic pre- and post validation. The actual search code is implemented in separate decendent classes; which amongst other things allowed us to quickly create a "fake data layer" that could be used for mocking away the database for some of the unit tests.

  • Have you looked at the Lucene project (http://lucene.apache.org)? Its designed exactly for this purpose. The idea is that you build and then maintain a set of indexes that are then easily searchable. The lifecycle works like this:

    • Write a bunch of sql statements that index all of the searchable areas of your database
    • Run them against the full database to create an initial index of your data
    • Every time the data changes, update these indexes.

    The query language is much simpler then, you're queries become much more targeted.

    There is a great project in the hibernate tool suite called hibernate search (http://search.hibernate.org) that does the maintenance of your indexes for you if you are using hibernate as your ORM.

    Esko : Additionally, if you're looking into Lucene, you may be interested of Compass -> http://www.compass-project.org/
  • I've been tinkering with this thought a bit (since I actually had to implement something like this some time ago) and I've come to the conclusion that there's two ways I'd do it to make it both work and especially maintainable. But before going into those, here's some history first.

    1. Why does the problem even exist

    Most search functions are based on algorithms and technologies derivated from the ones in databases. SQL was originally developed in the early 1970's (Wikipedia says 1974) and back then programming was a whole another kind of beast than it is today because every byte counted, every extra function call could make the difference between excellent performance and bankruption, code was made by people who thought in Assembly...well you get the point.

    The problem is that those technologies originally have mostly been carried over to modern world without changing them (and why should they be changed, don't fix something which isn't broken) which means the old paradigms creep around too. And then there's cases when the original algorithm is misinterpreted for some reason and you end up with what you now have, like slow regular expressions. A bit of underlining here is required though, the technologies themselves aren't bad, it's usually just the legacy paradigms which are!

    2. Solutions to the problem

    The solution I ended up using was a system which was a mix of builder pattern and query object pattern (linked by mausch already). As an example if I were to make a pragmatic system to build SQL queries, it would look something like this:

    SQL.select("column1", "column2")
       .from("relation")
       .where().valueEquals("column1", "hello")
               .and().valueIsLargerThan("column2", 3)
       .toSQL();
    

    The obvious downside of this is that the builder pattern has the tendency to be a bit too verbose. Upsides are that the each of the build steps (=methods) are quite small by nature, for example .valueIsLargerThan("a", x) merely may just be return columnName + ">=" + x;. This means they're easily unit-testable and one of the biggest upsides is that they can be generated easily from external sources like XML/whatnot and most notably it's rather easy to create a converter from, say, SQL query to Lucene query (Lucene has automation for this already afaik, this is just an example).

    The second one I'd rather use but really avoid is because it's not order-safe (unless you spend a good amount of time creating metadata helper classes) while builders are. It's easier to write an example than to go into more detail what I mean, so:

    import static com.org.whatever.SQL.*;
    
    query(select("column1", "column2"),
          from("relation"),
          where(valueEquals("column1", "hello"), 
                valueIsLargerThan("column2", 3)));
    

    I do count static imports as a downside but other than that, that looks like something I'd really want to use.

0 comments:

Post a Comment