Thursday, July 30, 2009

Cooperative Update Exchange in the Youtopia System

In the Youtopia project, we are building a system to allow communities to share structured data. There is an increasing amount of such data being shared on the Web, but there are few tools to make this sharing and management easy for everyone. What is needed is the equivalent of Wiki software for structured data, i.e. a collaborative database management system.

Designing a collaborative DBMS poses unique and exciting technical challenges because all functionality must be fully decentralized. In a recent paper to appear in VLDB 2009, we tackle one such challenge, namely, the problem of update exchange.

Update exchange is a process where changes to data in one table are automatically propagated to other tables as directed by a set of mappings (or tuple-generating dependencies). For example, suppose table A contains information about restaurants in New York State and table B contains data about restaurants in Ithaca, as well as restaurant reviews. If a new restaurant is opened in Ithaca and
entered into table A, a mapping can automatically propagate that information to table B, notifying the owner of table B of the new restaurant and giving them a chance to supply a review.

In our paper, we introduce a model for update exchange which is fundamentally cooperative. It is a variant of the classical chase that includes human intervention at crucial points. Thus, the users have a great deal of control over how the mappings fire. An additional advantage of our model is that the mappings may form arbitrary cycles - we are able to drop the acyclicity restrictions often found in the literature without risking infinite sequences of new tuple insertions.

Our paper presents our update exchange model and explains how to design a practical system that incorporates it. The key observation is that human intervention is slow and unpredictable, and the system must never delay new incoming queries and updates just because it is awaiting human input on an older update. This means that in general, multiple chases will be going on in the system simultaneously. We study the potential for interference among such chases. We provide a serializability framework and concrete algorithms that make it possible to avoid all such interference, should it be unacceptable in a given deployment scenario.

We are currently putting our work into practice and building a prototype of the Youtopia system. Stay tuned for updates, and we hope to see you at the talk in Lyon!

Saturday, July 25, 2009

What is Creativity?

The root of the words "create" and "creativity" come from the Latin creatus and creare, meaning "to make or produce," or literally, "to grow." - Jane Piirto

Creativity does not simply apply to art or writing, as early schooling tends to lead us to believe. Creativity is applicable to business, to physics, politics, pretty much every imaginable aspect of living. It is just a matter of making something new, something novel. (Of course, ideas don't grow in vacuum chambers, so interpretation of creativity does depend on zeitgeist.)

"To say that Thomas Edison invented electricity or that Albert Einstein discovered relativity is a convenient simplification.... But Edison's or Einstein's discoveries would be inconceivable without the prior knowledge, without the intellectual and social network that stimulated their thinking, and without the social mechanisms that recognized and spread their innovations. To say that the theory of relativity was created by Einstein is like saying that it is the spark that is responsible for the fire. The spark is necessary, but without air and tinder there would be no flame."
- Mihaly Csikszentmihalyi

Csikszentmihalyi defines "Big C" and "little c" as the two types of creativity, differing in magnitude. "Big C" Creativity is what leads to changing domains, changing entire ways of life, by someone who is (or becomes) well known by others in the relevant field. Einstein's theory of relativity, Beethoven's Moonlight Sonata, and Twain's literary works fall under Big C. "little c" creativity, on the other hand, concerns the individual's day-to-day life, and activities such as finding the fastest way to a destination using side streets, or figuring out how to keep rabbits out of the vegetable garden.

Everyone is born with a propensity towards creativity. Many debates have occurred over the years on whether the "amount" of creativity can be measured, or if it is correlated with other factors, such as intelligence, giftedness, talent, socioeconomic standing, and so on. Perhaps there are some facets of creativity that are only innate, or some factors that facilitate creativity (verbal acuity, an eye for detail, a particular physique) that are innate, but there are other aspects that can be actively developed.

Being creative is necessary for a fulfilling life, and it is something that should be developed and cultivated throughout one's lifetime. Unfortunately, creativity can be suppressed by the process of being educated or growing up, in favor of practicality or pragmatism. The good news is that at any time, creativity can be picked up again, re-nurtured, and re-embraced, and that it does not have to exist entirely separate from practicality.

Next time: Teaching Creativity with Too Much Freedom, and Being a Creative Individual

Further Reading

Creativity: Flow and the Psychology of Discovery and Invention (Mihaly Csikszentmihalyi)

Understanding Creativity (Jane Piirto)

Thursday, July 23, 2009

Invited talk at CIAA'09 on applications of automata in XML processing

Last week I gave an invited talk at the 14th International Conference on Implementation and Application of Automata (CIAA'09) in Sydney, Australia. I talked about applications of automata in XML processing. More specifically, I addressed three problem areas:
  1. Using automata for XML validation, where I addressed DTDs and XML Schema (more precisely, restraint competition grammars), their automata theoretic analogs, and efficient streaming validation.
  2. Using automata in XML publish-subscribe and complex-event processing systems, and
  3. Using automata for evaluating highly expressive node selecting queries with a constant number of sequential passes over the data.
I also talked a little about the industrial importance of complex-event processing systems.

Overall, the conference was a very nice opportunity to meet some old friends. Unfortunately, it was not a equally great opportunity to make new ones among the Koala population (i.e., form a coalition). All the local Koala's seemed to live in zoos and were said to die from stress if one touches them. (So that's illegal.) Generally, the wildlife is well protected there. I saw a sign at the beach which put picking up seashells or barnacles under a fine of up to $22,000.

(Note: the picture is from Wikipedia. I saw no Koalas, just barnacles.)

SIGMOD 2009 New Researcher Symposium

The goal of this year's SIGMOD New Researchers Symposium, which I co-chaired with Nesime Tatbul from ETH Zurich, was to give graduate students and junior researchers advice on various challenges involved in human factors, such as successfully making the move from graduate student to being part of, and productive in, a team of researchers in an industry lab or to becoming junior faculty and having to build up a research group. The two panels, consisting of well-known members of the data management community, also covered issues such as attracting and mentoring graduate students or junior researchers and providing good leadership of a research group. The slides used by the panelists are now available here. (Note: some panelists did not use slides.)

Wednesday, July 22, 2009

Teaching Creativity with Little Freedom

I remember going through elementary school, and coming to greatly dislike certain phrases that my teachers seemed to enjoy repeating, whether in praise—"Wow, that's so creative!"—or admonishment—"Try to be more creative."

At that young age, I had already cynical/scornful tendencies of such banal forms of encouragement. I was looking around at my peers' work, and oftentimes couldn't, for the life of me, see how it represented anything that I hadn't seen done before. To quibble, the term used was 'creative', not 'original', but I'm sure the teacher had seen it all as well.

So what did it mean to be creative? And furthermore, how was one supposed to become creative?

We had plenty of time for art, "creative" writing, and other such activities, which appeared on the schedule once or twice a week. It seemed, however, that these activities were either too structured, or not structured enough. I recall from second grade, a project where we had to draw leaves using provided stencils, cut them out, and color them. As the teacher demonstrated the process, she colored her leaves yellow-green. We all took out our boxes of crayons. I saw my left and right neighbors taking different shades of green, but I decided I didn't want green leaves. Out of my box of twenty-four, I picked a deep red and a bright orange, mixing them on paper to create a beautiful, fiery-looking leaf.

While I was admiring the blending of the waxy hues, my right-side neighbor noticed, and elbowed me, saying "You're supposed to make the leaves green."

I looked around, and quailed, seeing that everybody else in the room was using green. It could be that I hadn't been paying attention, but I had assumed the leaves could be any color we liked. The teacher came around to tell me that the leaves should be green. Why did they have to be green? Because they all had to look alike.

My seven-year-old self was completely stumped. I couldn't turn the leaf over and color that side; it would be facing the wrong way. I didn't have any more paper to work with; we were given only one sheet, which I had already cut into pieces. I decided to try and cover the orange crayon with dark green. Suffice to say, it turned out badly.

At the end of the week, all of the leaves were hung up above the blackboard, including my orange-and-green mess. Only now do I wonder what would have happened to me if I had simply said "So what?" to my neighbor and let it be.

Next: What is Creativity?

Tuesday, July 21, 2009

An Evaluation of Checkpoint Recovery for Massively Multiplayer Online Games

One problem that current Massively Multiplayer Online Games (MMOs) face is how to provide persistence for their virtual worlds. We have studied this problem in a recent paper accepted for publication at VLDB 2009.

MMOs use standard DBMS on the back-end to provide transactional guarantees for state updates. This decision is appropriate for updates that require full ACID guarantees, such as in-game financial transactions and item exchanges. The bulk of state updates in MMOs does not need, however, full ACID properties. For example, character movement comprises a large amount of the updates applied to a virtual world. As it turns out, game logic such as collision detection prevents character movement updates from generating any conflicts. We call these updates local updates.

The amount of local updates a game must process may exceed hundreds of thousands or millions per second. For performance, these updates are applied in main memory only, and game developers hand-code persistence logic for durability.

In our paper, we have experimentally evaluated the performance of main-memory checkpoint recovery techniques for MMOs. Our study shows that these techniques are a viable alternative to provide durability for local updates. Notwithstanding, not all techniques we have studied are equally suited for MMOs. MMOs have stringent latency requirements, ruling out methods that introduce long pauses in the game. Here is a summary of our recommendations to game developers:

  1. Methods that perform copy on update of dirty objects only have clear latency advantages over methods based on eager copies of the game state. They avoid latency peaks by spreading their overhead over a number of game ticks.

  2. When update rates are so dramatically large and skewed that the entire game state gets updated in a single tick of the game, little can be done to reduce the latency impact of the checkpoint algorithms. In this extreme situation, an algorithm based on an eager copy of the entire game state introduces the minimum pause in the game.

  3. Methods based on a double-backup organization either match or outperform log-based alternatives in terms of recovery time.

  4. The best method for a wide range of parameters is copy on update combined with a double backup. This method outperforms alternatives by up to a factor five in latency without any degradation in recovery time.

Our evaluation is based on a detailed simulation model of the checkpoint methods, available for download here. This simulation model has been validated against a real implementation of a relevant subset of the techniques. We plan to make our implementation also available for download. Keep tuned for updates in the near future!

Hope to see you at our talk in Lyon!

Sunday, July 19, 2009

New Social Gaming Company?

Techcrunch has an article about a new social gaming company called Tiny Speck. I think social gaming will be huge: We will all be constantly connected, we will carry around various intellectual prostethics, and we will have time to spend as the checkout line at the grocery store. It will also be interesting to see how games adapt once our attention span for a game is only five minutes? To sample what's out there check out the various games at Zynga (social games) or Kongregate (online flash games, many for the five-minute attention span).

Friday, July 17, 2009

Teaching a Foreign Language In a Virtual World

The New York Times has an article yesterday about an interesting type of game: A Virtual World where children will learn languages. Unfortunately, Wiz World Online is all in Chinese so I could not play it, but this is definitively a development to watch, and I would love to hear from people who have played it to understand how the virtual world works. I think that games have great potential to be used as a teaching tool where we can use AI and machine learning to personalize the game and learning experience without the personal attention of a teacher.

PS: When I just made a small modification to this entry, Google showed me an add for "Learn Chinese online with Beijing's best teachers."

Demo of CAT (the Cornell Anonymization Toolkit) at SIGMOD 2009

At the recent SIGMOD conference, we presented our new tool for limiting disclosure in data publishing: CAT, the Cornell Anonymization Toolkit. While there has been a lot of interesting work on different algorithms for privacy-preserving data publishing, there are few tools that combine them with a usable GUI that visualize what these algorithms achieve. CAT has an easy-to-use interface; it guides the user through the process of preparing a dataset forpublication while limiting disclosure through the identification of records that have high risk under various attacker models.

CAT currently implements l-diversity and t-closeness.

Below are a few screenshots of CAT; the code is available here. We welcome feedback from users about features that you would like to have and what aspects of the tool you think work well and which need improvement.

Blackboard buys TerriblyClever

Blackboard, the company that (among other applications) sells the course content management system that Cornell uses, has purchased MobilEdu, a company run by undergraduates from Stanford that developers university-centric services (such as GPS-supported maps, a directory, bills, courses, etc.) for the iPhone. Congratulations to the founders! I can only hope that this will have impact on Blackboard beyond new apps on the iPhone; the interface of the version of the Blackboard course management system that I have used has no Web 2.0 features and is very clunky. In addition, management of assignments and submission of solutions does not work well.

MayBMS demo at SIGMOD

MayBMS demo was presented at the 2009 ACM SIGMOD conference in Providence, RI. This was the first MayBMS demo after its official release on Sourceforge earlier this year. As part of the demonstration we showed two web applications built on top of the MayBMS server. The first one was a human resource management application that uses what-if analysis to aid managerial decisions, and the second one showed how random graphs and social networks can be modeled and queried as probabilistic databases. See the conference poster here.

Saturday, July 11, 2009

Bulding an OS for netbooks with two developers

Techcrunch reported on July 8 that besides Google, French company Jolicloud is building an operating system for netbooks. They seem to be building on Linux, but what impressed me most is that their team consists of just two developers. I found this encouraging for us at a university where we are always limited by just working with a few selected graduate students. It is the idea and the people that matter, not the size of the team!

Friday, July 10, 2009

Ashwin's thesis recognized

Cornell DB group alumnus Ashwin Machanavajjhala has won the 2009 ACM SIGMOD Doctoral Dissertation Award honorable mention. Congratulations Ashwin!