Sunday, May 1, 2011

Which is more important? DB design or coding?

Which is more important: The design of the database? Or the design of the application code?

There is a lot of information out there about reusable code (from Carl Franklin at dnrtv.com, CSLA.net, et. al.), but I don't see too much information about Database Design and its impact on the life of an application (particularly how bad design decisions early on affect the application later in its 'life'.

From stackoverflow
  • If you are careless with either one you are doomed.

  • Short answer: both. The chain is just as strong as the weakest link.

  • Depends one where you are in knowledge about both...and product requirements.

    Neglect either side and your product could be in trouble.

    That said, I tend to follow a DDD style of coding, where I define everything but my database first. That gives me a better idea of what data needs to be stored.

    Then, once that is complete I can create and tune my database to suite.

  • Design if you're a DBM.

    Coding if you're a programmer.

    These aren't mutually exclusive and should both be well executed.

  • Both

    Great code can be ruined by a horrible db design and a great db design can be ruined by horrible code.

    George Mauer : Can't say I've ever seen a _great_ db design. I'm just sayin'...
  • I don't think you can separate the two in the manner you are describing. One will invariably influence the other. For example, a solid database design that is easy to maintain and performs well will mean fewer code changes. Well architected code and a strong understanding of your use-cases will lead to a neat and maintainable database schema.

    For my money, I would spend more on a solid business layer and build my database to support it, but that is my knowledge bias.

  • They are not mutually exclusive. Both must be rock solid to have a chance at a rock solid solution.

  • In one sense, you cannot separate the two: DB design is coding -- it's just not coding in a procedural language.

    However, I've worked with systems that had poorly-designed procedural software, and I've worked with systems that had poorly-designed database schemas (schemata?). In my experience, fixing the schemas is far more difficult due to upgrade and compatibility issues. I can imagine systems where this might not have been the case.

  • Generally, Database structure is more important, since it provides the structural framework on which your code is developed. In general (and YMMV quite considerably), refactoring your DB structure after completing a phase of development is significantly harder than simply refactoring code which depends on a stable database. The reason is simple; refactoring your DB structure usually forces code changes; the reverse is rarely true.

    Quite simply, your code depends on your database more than your database depends on your code. (If this is not the case, you may need to rethink your design.)

    To address your edit; I think a lot of folks writing / blogging about this type of issue tend to come quite strongly from the "coding" side of things, these types of folks tend to consider database design to be trivial, and less interesting than coding interesting solutions. Essentially, to someone who likes to solve "tricky problems" (which tends to be the people who blog more), the coding side is more interesting than the fundamental design issues. And while the fundamental design issues aren't "sexy", they're extremely important (and Database Design is a VERY fundamental design issue).

    Dan Breslau : I'm not sure why this was downvoted. +1 to counter it.
    George Mauer : Important Caveat: The database is more important in applications that are coded a certain way. If you abstract the data behind a strong domain model (such as in DDD) then code design is FAR more important than what the database looks like. Most applications of significant size can probably benefit from a hybrid approach. For example a strong domain model to encapsulate business logic (DB not so important) and forms over data for reporting (db design becomes quite important then).
    le dorfier : The database design should *never* be the structural framework for your code. They must be isolated from each other by a strong abstraction layer. Your code should not depend on your database at all; the database should be designed to support your code. You've got it exactly backwards. (But sadly, that's how it *should* be, not how it is in practice.)
    George Mauer : @dorfier Don't be silly, there's a lot of applications where it is perfectly acceptable to do it "the wrong way". When clients just want a simple window into an already existing database for example, demo projects, or just a simple CRUD app. Fowler discusses quite a few scenarios in PoEA and we all know that what he says is mandate. That being said, the further you abstract from the database the more maintainable you're likely to find your code (assuming you havent messed that up of course)
    le dorfier : Yup, I overstated it to make my point. But I still think that the utility screens need to have some kind of User Story to start with. And Fowler is my personal hero but I think he's weakest on database refactoring - not in error, but not as much depth as I'd like to see.
    McWafflestix : The question comes, what purpose does the database serve? Is it merely a persistence layer? Or is there inherent value in the structure of the data persisted, and the operations that can be performed on that data due to the structure of it?
    Steve Schnepp : I used to say coding is much easier when you have very expressive structures since you just have to connect the dots. A DB is just that : queryable data structures with integrity.
    Stephanie Page : @le Dorfier: So your app will live forever, but your data comes and goes? I tend to believe the opposite. I can think of dozens of languages/frameworks/styles/distribution methods but really only one significant platform in databases for the last 30+ years. Our databases have apps that come and go. Apps are added to do new analysis... Apps are removed when the technology isn't supported anymore... but the data remains.
  • Both are just part of the implementation of that on which, of course, you spend the bulk of your up-front time - requirements and design.

  • I agree that both are critical, but there are substantial techniques you can use at the view, function, stored procedure level to make up for fairly horrendous underlying schema faults. On the other hand, if your coding is bad, short of fixing it of course, there's not much you can do at the design level to fix that.

  • If you are unsure of your skills in either area then do your best to seperate the two as much as possible. The worst scenario is writing a tangled mess that can't be easily corrected or maintained later.

    McWafflestix : Good point; the separation is critical.
  • Having worked with an appalling database design before, I must put my hat in the database (or data model / ORM) design ring.

    Get together with some people knowledgeable in your company/client about the problem area, and get all of the data required on paper, the group it by logical areas, then you will start forming a data model which you could turn into Objects, Database Schemas or an .xsd, etc. Each item of data will have a name, a type, maybe maximum length for strings, or be a set or list or map of certain minimum or maximum capacities.

    Whether you design the database first after this, or the OO model is up to you, but at least you made an effort to get a sane partitioned model up front.

    In fact in an MVC design, I would classify the OO data model (classes in Java/C#) as the Model and intrinsically linked with the database schema (with added transients and utility methods of course). Your controller - the "coding" in your question - should really implement business logic using the model as represented by the objects you extract from the database via a DAO/ORM.

    McWafflestix : It's funny how people tend to come down on the "both matters" side until they've experienced a truly hideous DB design... :-)
    JeeBee : It nearly made me cry, literally, and we couldn't change it as it was a third party system. Least fun month of coding ever.
    le dorfier : Interesting - you vote for the database, but end up describing how important the OO model and outward are - which is the software.
    SnOrfus : @le dorfier: If I understand JeeBee correctly, I think in this case (as is the case with some OR mappers) the 'OO Model' is a code representation of the DB, so the two would logically be the same (but not physically).
    le dorfier : Then the design process is flawed to start with. If you start with the schema and work back to the code this way, then you end up with the worst kind of application - a thin layer over the database that turns users into data entry clerks, doing table maintenance without regard to users' workflow.
    JeeBee : I don't see how the user interface would be restricted in design by having a logical data model designed from user and task requirements.
  • It has been my experience (and I've been involved in fixing database problems for around 30 years and have dealt with hundreds of different databases) that all too many of the problems in database performance are from the inappropriate attempts to reuse code. Functions are far slower than inline code. Cursors reusing a stored proc that inserts one record at a time are far, far, far (light years) slower than set-based code. Reusing a proc that gives back what you need and ten other fields is wasteful of server and network resources. Using an existing view that joins to ten tables when you only need information from three of them is wasteful of server resources. Code reuse in databases is not nearly as good a thing as it is in other places. This is not to say that code shouldn't be reused if need be. Just that it should never take precedence over performance. Databases are unfortunately not well designed to reuse code. Most databases are not object oriented and object-oriented thinking in designing or accessing them will often result in a poor design.

    Before you can even think about code reuse, you need to have a rock solid normalized database design. You need to think about how you will be extracting as well as inserting data into the database. You need to think about how well will this work once there are many users and records because redesigning a basic database structure at this point often becomes too expensive and time consuming. It is often far easier to refactor the application code than the basic database structure and, far too often, the database refactoring does not get done. If you change the table structure to go from a denormalized table to a parent child structure because you find out the current structure isn't suiting your needs, then you may end up changing hundreds or even thousands of queries against this table. This is why it is important to spend a lot of time on database design, you won't get the chance to revisit it later due to time/money constraints. If you think of the database as the foundation of the house and the application code as the structure you will see why this is true. It is far harder to change the foundation with the structure on top of it than it is to move the internal walls. Databases and the applications which access them are the same way.

  • It depends on what is important to your business. Ideally, you shouldn't short change either, but if you must, you should also ask yourself this question:

    Is your application there to handle data, or does the data transcend your application?

    In other words, if the code part of your application blew up today, but your data is still there, how bad of a disaster would it be? If you answer is:

    I can always write the code to replace the app, but without the data, we're doomed.

    Then you had better make sure that your data is sound, because it will probably outlive any code you write today. That's not to say you shouldn't put a great amount of effort into writing a solid codebase, but the code is ultimately transient whereas your data is not. If you're stuck with bad code, you can rewrite, but if you have bad data, it will likely have much broader implications.

    On the other hand, if the data is really only there to make sure that your code works well, and the code itself is more important (the inverse of the above scenario), you should ensure that you have a good codebase, and revisit any deficiencies in the data later.

    EDIT

    In most enterprise applications, the data are far more important. I have worked on conversion projects in the past where the code was far past its life, but migration was delayed for so long (sometimes decades) because the data was so bad that it took a significant and very discretionary effort to get the data to a point of health where it could be migrated.

    le dorfier : They are three entirely different things - software, database schema, and data. Without data neither the software nor the schema are worth anything.
    Michael Meadows : Agreed, but a bad schema makes the data less available, especially when the code reaches the end of its lifecycle.
  • I can see I'm going to be swimming upstream, but I'm pretty strongly biased toward software being the answer to your question.

    While your software can adapt to a weak schema, there's not much your database can help you with if your software is disfunctional. I've had a couple cases where I've been able to take a popular front-end application and totally rebuild the database without serious disruption, because the users don't see the database directly. (Which won't be true if the software is crap.)

    So I'd say pay attention to what's closest to the user first.

    Michael Meadows : The problem is that a bad data design can force its influence on your code design choices. For example, if you find that you have to make a design compromise due to a performance deficiency created by a bad database structure or relationship. This often happens when databases are not normalized at the appropriate level (either over normalized or undernormalized).
    Michael Meadows : To finish my thought, it's much easier to refactor code than it is to refactor a database.
    le dorfier : Having done a lot of both, I my experience is strongly the other way. A good database developer can refactor a database much more effectively and quickly than it takes to refactor any substantial codebase. I'd love to face you in single combat on that one any day.
    le dorfier : But the basic point you make I agree with, which is that either can degrade your application. And there seem to be fewer people around who can refactor your database than your code. Or at least have any confidence that they can succeed.
    Michael Meadows : @le dorfier, refactoring the database itself can be done well by a skilled DBA, but the problem is application dependencies, especially if multiple applications depend on the schema. It gets even worse if you're doing ETL operations on your tables.
    le dorfier : First, ETL requirements shouldn't have have mucb influence on OLTP database design. Sorry, but that's the priorities.
    le dorfier : Second, any schema changes to the database are only justified if they improve its fitness for accurately persisting domain data; and if the abstraction correctly reflects domain data, it should be for the purpose of better fulfilling the interface contract with the abstractions that represent the domain.
    Michael Meadows : ETL tends extract from OLTP and load to OLAP. Refactoring can affect that extraction step.
    Michael Meadows : I'm not necessarily disagreeing with you on any of your points, but reality is a harsh mistress, and good design is usually the first thing to jump out the window. Over an application (code and data) lifecycle, most enterprise applications don't retain their architectural purity. Future coupling tends to hurt more on the database end than the application end.
    le dorfier : Here's a fundamental reason I think databases are more adaptable than software. They both deal with the same artifacts/classes/domain/whathaveyou. But applications must deliver both data and behavior, while databases mostly deliver data. Second and probably more important, software is what users see. The only data they see is what the software shows them.
  • The domain model should be independent of persistence implementation details (although the technology does place some constraints on the model) — http://www.infoq.com/articles/ddd-in-practice

    You should focus on the domain model. With great ORM technology like Hibernate/NHibernate the database will only be an implementation detail.

    Books you should read if you are doing .NET web development:

    1. ASP.NET MVC Framework Preview (only 100 pages, and will get you started)
    2. NHibernate in Action
    3. Domain Driven Design and Development In Practice (not read it, but I will, and it's free)
    4. Refactoring: Improving the Design of Existing Code
    Lotus Notes : I personally find it bizarre and questionable the notion that an ORM would be a substitute for good database design when you're dealing with any sort of complex financial or resource data. Not to mention a lot of reporting platforms pull data directly from the database.
  • Neither is unimportant, but...

    Bad database design might make writing good programs impossible. Also, you can usually rewrite bad code but if you have lots of data in the database, you just have to live with the bad decisions made in design phase.

    le dorfier : Bad programs make the database design irrelevant. And there's absoluotely no reason you should lose data when you change the schema.
    Michael Meadows : @le dorfier, I have seen bad schema create bad data (because the mapping becomes less clear). Once an application is in the maintenance phase, project teams tend to only modify the application enought to compensate for the bad data (not fix it), which only compounds the problem. It becomes less about losing the data than figuring out what it's supposed to mean.
    Lotus Notes : @le dorfier Unless your data is actually used by something other than your application. Data lives on, but applications are temporary. Have fun convincing your boss that the structure of a 10 year old database needs to be changed so that you can have more convenient syntax in your new application.
  • Maybe, I'm not very experienced in database design, but my feeling is : if your business classes are well designed, the only point where you access the databse is in your repositories (DDD speaking).

    So a change in the database is just a change in the implementation of your repository. A bad database design will make your repository hard to code, and slow to perform, but it will not impact your business layer (90% of your code).

    If you try to modify your business layer because of your DAO layer, why not to modify your business layer because of your presentation layer ? and then good luck to satisfy all constraint and good practices!

    I think that both are important, but coding and database design should not be in the same hands. The more important for the developer is to isolate himself from the work of the db designer.(Even if the Db designer and the developer are the same person, you should not have to think about two thing at the same time)

  • You DB design is most important.

    I'm a coder, and I prefer code, but ... if you screw up the DB design, your code will be a nightmare. Your code won't have a chance!

    Even when you try to refactor the DB design, you will have so much work around code, that fixing it all will be overwhelming.

    This isn't a preference or even a close one, it's very much leaning toward the DB design being most important.

    EDIT: Even if you were to have key-value pair tables where everything got dumped into it, that would still be a DB design based on business requirements.

  • It depends on your perspective. If you're a DBA then the db, if you're a developer than the code.

    I've seen developers utterly abuse database with "bag" tables and i've seen DBAs create monsterous application code that is fine if you understand the structure of the database but opaque otherwise.

    Ergo, both are critically important and if you're only experienced in the one you should get someone experienced to look at the other or improve your own skill set where it is lacking.

  • Both are important but poor design in any of the upstream activities are generally more difficult to correct. Changes to the functional specs for example will naturally generate a lot more work then some verbiage changes on the interface.

    Any meaningful changes to the database generally require coding changes as well, so I personally spend a lot more time agonizing over database design decisions then coding decisions (although I'm sure I'm not the only one who has spent an hour or two trying to find the perfect name for a class).

  • Paraphrasing Knuth -

    It is far more important to use an efficient data structure. A bad structure will slow down your application regardless of your algorithm and a good structure can even eliminate the need for certain algorithms.

    I think that this applies equally to DBs. You are ultimately building a massively linked data structure. If you don't use the right methods, your application will be slow, regardless of how much trickery you put into it.

  • A database is an artifact of a computer program that models a given process and system

    Ideally it should not be of any concern

    Current object persistence technology is not quite there yet, so it probably will be some of your concern for some time to come

    The question is: What is the database there for?

    If it is there to persist the objects in your model of the processes and systems then you shouldn't have much to do with it

    If it is there to fix holes in your model then you will spend a lot of time on it

  • Both are important of course, it's a symbiotic relationship.

    But if your DB is jacked, no amount of good code can make your app shine.

    However, if your DB is really good, then good code can make it even better (but bad code can still ruin it).

  • IMHO Good database design in more important then good coding (good coding is also important but compare to database design).

    1. Simply because Database will store your valuable data.
    2. Bad coding may not affect the database but will affect the scalability, performance of the whole application, etc. Over a period of time this can be fixed by refactoring(ofcourse some cost involved).
    3. Refactoring a database is much more costlier and difficult.
    4. Also normally we have multiple different apps running on same database so its a common factor.
  • I will not attempt to duplicate many of the fine comments made thus far. Nor will I spend any time identifying the dubious statements also made.

    But I will add the following points.

    If you are like most people you are referring to a RDBMS as the database in your question. An RDBMS is essentially a slave. It listens on a port and is duty bound to attempt to service all requests that come over that port. It has no way of know which of those requests are just plain stupid or ill-advised. Thus it is is easy for the most perfectly designed DB to be abused to the point of locking up the server. This implies that the code is more important. DBAs everywhere can be found pulling out their hair in response to some of the dumber things that app developers throw at the servers they manage.

    So the best advice I can give you on the topic is to ensure that the DB is accessed by an API that is written by the same developer who designed the DB. Make it the responsibility of one dude (or one group reporting to the same dude) to ensure sound design decisions are made for both. If you're that dude, then don't skimp on one at the expense of the other. Design your API so that a refactoring of the DB can be done transparently to the clients of the API.

  • Depends what you enjoy of course, but the future is toward abstracting away the data layer -- that is, considering the data store to be an implementation detail rather that the core of the app.

    You'll hear the term "persistence ignorance" in certain circles. The goal is that the domain model (business entities) are designed without knowing how the data will be stored. Behind the scenes, it may be in SQL, AmazonDB or XML.

    If you enjoy DBA stuff, that's where you should be. But if you want to be more on the application side, get to know ORM frameworks.

  • Look at it this way

    1. Changing the code means changing the code
    2. Changing the database will probably force you to change the code as well

    Ergo it is important to get the database as stable as possible as early as possible

  • This is an error of reasoning that data are fundamentally different from executable code. The theory of cellular automata is full of example were data and code are strongly "intricated".

    But, as once choosen BD is less easily modifiable, you should thing about the global architecture. But if you are unsure of the final one, you should do the most to think about the most easily expendable BD architecture.

    My experience show it is easier to modify the code than the BD.

  • I couldn't say that database design is more important then code design but it's at least the same. If you wrote great code but your DB is bad then your application will have problem with performance and vice versa.

    What about DB design there is lots of materials about it (including books). Look at the subject of database normalization

  • "Apps are temporary, but data is permanent"

    for me db design is MORE important, it is the foundation of the app you are going to code.

  • Its important to do both accurately and professionally. To avoid future headaches do it properly now, as changes, fixes or enhancements are much harder when things are set in stone.

  • The design of the database and how your code interacts with that data, will have a huge impact on your performance and your agility.

    I can't say much about importance, but once you've set your database and it's full of millions of rows.. changing it is going to be tricky and time-consuming. Sort of.. the sins of the DBA will be patch-fixed in code.

    There's also always the choices you have to make between being agile (creating an extendable database able to solve problems or answer questions you didn't know about when designing it) and performing/scaling well. Again, sometimes it might be better to use code to create a cache outside of the actual DB to create some of that performance.

  • Domain understanding is most important. You have to understand what you want to do with your application. From domain understanding, you need to understand the kinds of information you plan to store and the relationships between information units. You also need to understand the use-cases.

    After that, database design is most important. A database is the best place to codify the information from your domain analysis. Once you have the data you need, you should normalize the data to the highest degree possible. Then you must understand the range of legal values for each field and write data validation procedures for each one (in the form of check constraints, stored procedures, regexes, or whatever else makes sense).

    I have found more often than not that when your data is highly normalized, the volume of code required to complete your application drops significantly for a few reasons:

    • You are not writing code to do search, sort, index, join, and group data. SQL does a lot of that for you.
    • Normalized data is not duplicated. That means each update happens in one and only one place. If you have to write code to keep data synchronized in many places, code becomes bloated and error-prone.
    • You don't have to write as much error-handling code, because a lot of it is in your data validation code.
    • When your code volume drops, you need fewer developers to maintain the code. That means that modules owned by different people are less likely to diverge from the overall intent of the program.

    Code is still important, but now you just have to write a lot less of it.

  • we've re-written the website 3 times in 4 years. the database hasn't changed. (well, a little)

0 comments:

Post a Comment