Thursday, February 10, 2011

Generate webpages directly from database or cache?

[I'm not asking about the architecture of SO, but it would be helpful to the question.]

On SO, when a user clicks on his/her name and clicks on "responses" they see other users responses to comment threads, questions, and answers in which they have participated. I've had the sneaking suspicion that I've missed certain responses out there, which made me wonder: if you had to build that thing, would you pull everything dynamically from the database every time a user requested it? Or would you modify it when there is new related activity in the application? Or would you build it in a nightly daemon process?

I imagine that the real answer is that it's dynamically constructed every time, but that the tables are denormalized in such a way so as to make the thing less time-consuming. How would you build it?

I'm asking about any platform, of course, not only on .Net.

  • I would pull it dynamically from the database every time. I think this gives you the best result from a user experience and then I would apply the principal that premature optimization is evil. Later if there were performance issues I would look into caching.

    I think doing it as a daemon/push process would actually result in more overall work being done. That is the updates would happen more frequently than the users are requesting the info.

    Yar : You could do the push only for users that have had activity on their account in the last X minutes.
    Brian Fisher : Good point. I would just try the simplest way first as I would guess the performance would be adequate.
    Yar : Yeah, I'm just beginning to question all the anti-optimization rhetoric. I think that if you want the responses tab to conain 100% of the responses all the time (it doesn't), you'd have to plan for that complexity in advance... async updates through a message queue are fun, perhaps...
    le dorfier : A database query unioned to a "Pending Transactions" query of the queue could work, if the queue has the proper fields and indexes. The queue would be 100% in-memory with any reasonable design.
    DrJokepu : I agree, I think trying to design a scalable architecture is not premature optimization.
  • Obviously, when an answer or comment is posted, you'll want to identify the user that should be informed in their responses tab. Then just add a row to a responses table containing the response text, timestamp, and the user to which it belongs. That way you can dynamically generate the tab with a simple

    select * from responses where user=<userid> order by time desc limit 30
    

    or something like that.

    p.s. Extra credit to anyone that can write a query that will remove old responses - assume that each person should have the last 30 responses in their responses tab.

    Yar : But you can delete comments, for instance, and eliminate yourself from a thread. I should try that and see if I still get informed.
    Kyle Cronin : Deleted comments do, or at least used to, appear in the responses tab.
    Yar : Neither of your two comments appeared in my responses tab, so I would really like to know how it works on SO itself.
    Kyle Cronin : In the responses tab you'll find answers to your questions and comments on your posts. You won't find responses to comments; for example, you won't find this comment in your responses thread, as this is really a response to my answer, and would appear in my responses tab were it not posted by me.
    Yar : Aha! Never noticed that. Thanks, Nobody.
  • I expect that userid would be a natural option for the clustered index. If you have an "Active" boolean field then you don't need to worry much about locks; the table could be write-only except to update the (unindexed) Active column. I bet it already works that way, since it appears that everything is recoverable.

    Don't need no stinking extra-credit response remover.

    Yar : Hahaha re: the extra-credit response... I have to admit that I'm not interested in the actual syntax of the queries, but rather just the general structure.
    Yar : What's the table? A RESPONSES tables like @nobody recommends? Or can you do it without mirroring the UI in the Database?
    From le dorfier
  • I would assume this is denormalized in the database. The Comment table probably has both and answer_id and an answer_uid so the SQL to find comments on you answers just run against the comment table. The same setup would work on the Answer table. Each answer has a question_id and a question_uid.

    Having said that, these are probably the same table and you have response_to_id and response_to_uid and that makes lots of code simpler and makes the "recent" tab a single select as well. In fact the difference between the two selects is one uses the uid and one uses the response_to_uid.

  • I'd say that your UI and your database should both be driven by your Application Domain; so they will reflect each other based on their common provenance there.

    Some quick notes to illustrate, using simplified Object Role Modeling as discussed by Fowler et al.

    Entities

    Users
    Questions
    Answers
    Comments

    Entity Roles
    (Note: In Object Role Modeling, most Roles are reflexive. Some, e.g. booleans here, are monopolar)

    Question has User
    Question has QuestionVersions
    Question as Answers
    Question has Comments

    Answer has AnswerVersions
    Answer has Comments

    Question has User
    QuestionVersion has Text
    QuestionVersion has Timestamp
    QuestionVersion has IsDeleted (could be inferred from nonNULL timestamp eg)
    QuestionVersion has DeltedByUser
    QuestionVersion has DeletedTimestamp

    Answer has User
    AnswerVersion has Text
    AnswerVersion has Timestamp
    AnswerVersion has IsDeleted
    AnswerVersion has DeltedByUser
    AnswerVersion has DeletedTimestamp

    Comment has Text
    Comment has User
    Comment has Timestamp
    Comment IsDeleted (boolean)

    (note - no versions on comments)

    I think that's the basics. These assertions drive ERDs in ORM. Hopefully it's self-evident how they drive the User Stories as well.

    I don't think an implementation of a normalized design like this would require denormalization - especially since I think it's clear (from behavior) that queries => UI displays are cached to be refreshed 1X per minute.

    Yar : You mean cached "naturally" by the Web Server, DB Server, and all other entities in the system?
    le dorfier : No, it looks to me like at least the main question list is explicitly cached and refreshed 1X per minute. Inferred from behavior - if you try to force a refresh, you'll only get anything new about that often.
    From le dorfier

0 comments:

Post a Comment