learnedax | (no subject)

Current Mood: Underwhelmed (as usual?)

Entry tags:

(no subject)

So, Livejournal now has, in their words, "full Tags support." Of course, their idea of "full" is letting you view one specific user's entries filtered by content tag, making it really an inline version of the Memory system*. It's a pretty trivial addition to the S2 infrastructure to let you filter any view by tag, and a not-too-difficult advanced configuration feature to allow specification by user, or even substring/regex matching. I've toyed with adding the functionality to my own style, which is impractical mainly because I can't add a configuration page for it. Well, they can, and did, it's just weak.

Well, at least their beginning to address the (frequently requested and easy to implement) tagging feature, maybe it will improve over time.

*That is, it let's you memo(r)ize keyword filtering on post creation, rather than making it an extra step, but is otherwise functionally the same as the Memory system, AFAIK.

Flat | Top-Level Comments Only

Perhaps if LJ allowed a more straightforward API to their data, such that I could could up my own version of the user side, without compromising their management of private data...

I suspect that the problem has nothing at all to do with them not having enough cycles to implement these features in the UI; rather, it's probably all about efficiency. The kinds of features you're looking for are easy in a small-scale system, but generally pretty hard to scale properly unless you specifically design for them at the data level.

So I'm betting that the reason this degree of flexibility isn't available in the APIs is that the internal data structures can't support them scalably, and they don't want to expose functions that can easily bring the system to its knees. We can wish for otherwise, and hope that they improve it, but I'm pretty sympathetic to their situation: there's no such thing as arbitrary scalability, and they have to pick their battles.

(Yes, I spend a good deal of my time nowadays having these arguments from the other side. Indeed, I spend a fair amount of time explaining to my own management why we don't want to add function X to our APIs, because a benignly-intentioned user could accidentally crush us with it...)

It's possible, I suppose. While many of these features are niche-interest enough that they haven't felt like it's worthwhile to implement them, it seems unlikely to me from what I know of LJ's structure, though not impossible, that a carefully constructed open accessor API would be that strenuous on their system. What I want, broadly speaking, is a way to run what are effectively compound queries on their database. It is certainly possible to set that up so that it's insecure, but I think it can be constructed in a limited-enough way to keep it from being to dangerous. In terms of the load, it ought to be similar to current usage, since that involves what I would presume are complex DB queries already. Maybe people writing their own front-ends are more likely to hammer the accessors, but there are already scripts that rifle through a large number of LJ pages, and being able to grab the information without all the HTML and do the processing client-side could actually decrease their server load.

I don't know, maybe that's naïve. I've never tried to write a convenient and safe API of that nature, so it's quite possible there are major problems with doing it that I haven't considered. Superficially, it seems like they could do it.

Generally, it isn't that simple. While a normal relational DB *can* do arbitrary compound queries, the efficiency of those queries varies wildly. In general, you have to build your clustering and keys pretty carefully around the expected queries, and avoid running too many queries that fall outside those structures. An ad hoc query can take literally orders of magnitude more DB time than one that was designed for.

The issue is mainly if something catches on. While a small number of *any* kind of query isn't going to cause problems, they have to assume that any API they open may potentially get a great deal of traffic. An open query API is *very* dangerous under those circumstances, unless you've specifically built the DB structures for that kind of open access from the very beginning.

I've only seen one major server-based company recently that has that open an API -- and that is a *very* expensive piece of software (on the order of hundreds of dollars a year per seat), so the company can afford to throw as much hardware at it as they need. And they clearly made a strategic decision that this is what their company does: they're trying to be an open platform as a key element of company strategy, so they've put a great deal of effort and resources into it.

And of course, then there are the security implications. The more open and flexible the API, the greater the chance of unintended security leaks via clever data joins. This is why most large-scale systems (like we have at Convoq) don't even have open APIs *internally* -- our internal systems address themselves to carefully-designed and vetted stored procedures buried in the database, so that we understand both their security and performance implications extremely well. That's a major consideration: the more powerful the API, the more dangerous it is, because it gets that much harder to comprehensively understand what can be done with it.

Frankly, the past couple of years have been a real education for me: building large-scale client/server systems is just *different* in a number of ways. When you're trying to build a server that is intended for millions of users, you structure it quite differently than you would normally expect -- performance and security trump most other considerations, and impose a lot of interesting constraints. Dealing with those effectively is a lot harder than you'd expect...

Flat | Top-Level Comments Only