learnedax | (no subject)

Current Mood: Underwhelmed (as usual?)

Entry tags:

(no subject)

So, Livejournal now has, in their words, "full Tags support." Of course, their idea of "full" is letting you view one specific user's entries filtered by content tag, making it really an inline version of the Memory system*. It's a pretty trivial addition to the S2 infrastructure to let you filter any view by tag, and a not-too-difficult advanced configuration feature to allow specification by user, or even substring/regex matching. I've toyed with adding the functionality to my own style, which is impractical mainly because I can't add a configuration page for it. Well, they can, and did, it's just weak.

Well, at least their beginning to address the (frequently requested and easy to implement) tagging feature, maybe it will improve over time.

*That is, it let's you memo(r)ize keyword filtering on post creation, rather than making it an extra step, but is otherwise functionally the same as the Memory system, AFAIK.

Flat | Top-Level Comments Only

And really, I'd like to tag other users' entries, as well as my own. Then I could keep all my active conversations in one place, and they wouldn't die out when they fell off whatever filter pages people are reading them on.

Well, that seems more likely than full view-filtering, at this point, because that's how memories work now, they're just slightly cumbersome to use. Having optional filtering on the union of multiple factors like tag and date would be even better, but LJ has not so far been forthcoming with the level flexibility I'd like in managing my viewing.

If they'd just indicate when a memory has a new comment, I'd be golden. I already have a memory heading called "current conversations", where I keep stuff I'm interested in: unfortunately I have to dive in to see if there's anything new.

Except, of course, for people who have these styles that don't seem to allow adding their entries to a memory. Or am I missing something here?

There's an option to always view people's posts using your own style; it appends "&style=mine" to the UR. I like it, and use it, as some of my friends have odd notions about readability. It also ensures the navigation buttons are present and where I want them, including the Memory button.

Hmm, not sure now whether I took that out or whether my original template had the button missing. I should probably add it back in at some point.

Yet another thing LJ could do to let us customize our information: allow monitoring comment trees by root node, whether that's a post or a given comment that sparks a new thread. Ordinarily marking a whole post is fine-grained enough, but ideally one of the available options for 'watching' a thread would be to have later comments emailed to you the way the comments/replies are now; there are lots of times when I'd like to see if there's a reply to someone's reply to me, but I wouldn't want to track all the comments on a given post.

So, again, LJ doesn't offer nearly enough in the way of content management. But then, not many systems do. Livejournal does a good enough job for most of the time, but the somewhat random set of options they allow is frustrating for an open source project. The entire point of an open platform such as theirs is to allow anyone to add what they want, but I can't do that unless I start an entirely separate installation, which of course would have no user base. Perhaps if LJ allowed a more straightforward API to their data, such that I could could up my own version of the user side, without compromising their management of private data...

Anyway. Maybe I'll see what I can do.

Perhaps if LJ allowed a more straightforward API to their data, such that I could could up my own version of the user side, without compromising their management of private data...

I suspect that the problem has nothing at all to do with them not having enough cycles to implement these features in the UI; rather, it's probably all about efficiency. The kinds of features you're looking for are easy in a small-scale system, but generally pretty hard to scale properly unless you specifically design for them at the data level.

So I'm betting that the reason this degree of flexibility isn't available in the APIs is that the internal data structures can't support them scalably, and they don't want to expose functions that can easily bring the system to its knees. We can wish for otherwise, and hope that they improve it, but I'm pretty sympathetic to their situation: there's no such thing as arbitrary scalability, and they have to pick their battles.

(Yes, I spend a good deal of my time nowadays having these arguments from the other side. Indeed, I spend a fair amount of time explaining to my own management why we don't want to add function X to our APIs, because a benignly-intentioned user could accidentally crush us with it...)

It's possible, I suppose. While many of these features are niche-interest enough that they haven't felt like it's worthwhile to implement them, it seems unlikely to me from what I know of LJ's structure, though not impossible, that a carefully constructed open accessor API would be that strenuous on their system. What I want, broadly speaking, is a way to run what are effectively compound queries on their database. It is certainly possible to set that up so that it's insecure, but I think it can be constructed in a limited-enough way to keep it from being to dangerous. In terms of the load, it ought to be similar to current usage, since that involves what I would presume are complex DB queries already. Maybe people writing their own front-ends are more likely to hammer the accessors, but there are already scripts that rifle through a large number of LJ pages, and being able to grab the information without all the HTML and do the processing client-side could actually decrease their server load.

I don't know, maybe that's naïve. I've never tried to write a convenient and safe API of that nature, so it's quite possible there are major problems with doing it that I haven't considered. Superficially, it seems like they could do it.

Generally, it isn't that simple. While a normal relational DB *can* do arbitrary compound queries, the efficiency of those queries varies wildly. In general, you have to build your clustering and keys pretty carefully around the expected queries, and avoid running too many queries that fall outside those structures. An ad hoc query can take literally orders of magnitude more DB time than one that was designed for.

The issue is mainly if something catches on. While a small number of *any* kind of query isn't going to cause problems, they have to assume that any API they open may potentially get a great deal of traffic. An open query API is *very* dangerous under those circumstances, unless you've specifically built the DB structures for that kind of open access from the very beginning.

I've only seen one major server-based company recently that has that open an API -- and that is a *very* expensive piece of software (on the order of hundreds of dollars a year per seat), so the company can afford to throw as much hardware at it as they need. And they clearly made a strategic decision that this is what their company does: they're trying to be an open platform as a key element of company strategy, so they've put a great deal of effort and resources into it.

And of course, then there are the security implications. The more open and flexible the API, the greater the chance of unintended security leaks via clever data joins. This is why most large-scale systems (like we have at Convoq) don't even have open APIs *internally* -- our internal systems address themselves to carefully-designed and vetted stored procedures buried in the database, so that we understand both their security and performance implications extremely well. That's a major consideration: the more powerful the API, the more dangerous it is, because it gets that much harder to comprehensively understand what can be done with it.

Frankly, the past couple of years have been a real education for me: building large-scale client/server systems is just *different* in a number of ways. When you're trying to build a server that is intended for millions of users, you structure it quite differently than you would normally expect -- performance and security trump most other considerations, and impose a lot of interesting constraints. Dealing with those effectively is a lot harder than you'd expect...

Tagging is all well and good in terms of adding meta-information that can be wrangled in any number of ways, but the big problem I have with it is that too many places (del.icio.us, gmail) are using tags as a substitute for genuine sorting and filing. I like the ability to have multiple 'categories', but I don't want to throw everything in one giant pool and then have to filter through it based on metadata, because then I have to go through the effort of screening material and deciding what tags I'm going to be looking for in the future. I'd rather be able to sort things into folders and subfolders (or some other hierarchy) and then add tags as well. Then, if something only goes into one part of my taxonomy, I can just put it there, a process that is generally much easier than any of the existing methods for adding tags. And if I don't care, It can go tagfree without getting in the way of any new material.

As an addendum, I use del.icio.us, and like doing so, but I have over 50 tags showing in my column on the right simply because tagging is a very flat hierarchy. If I try to increase my search speed by reducing the number of tags visible, I have to reduce the flexibility and functionality of tags. It's a bad UI conflict.

Well, see this article (http://www.shirky.com/writings/ontology_overrated.html) for a somewhat interesting, if fairly obvious, analysis of the reasons for using tagging the way del.icio.us does. There are good reasons for both routes, but I think a happier medium could be obtained by allowing categorization on a per user basis. That is, if you have 50 tags which to your mind at the moment fit in 6 categories, you should be able to make 6 groups to handle them, refining the groups' contents at any time as you see fit.

Proper handling of user-defined grouping, ideally including addition, subtraction, union, and intersection, is a well-understood and cheap to implement process that far too many systems lack for no obvious reason. Bringing things back to Livejournal, I ought to be able to define a Carolingia friend group as the intersection of Boston and SCA, and then lock a post to Carolingia + a specific 3 people, without jumping through inordinate hoops. Granted, the majority of users are not clamoring for this feature - but it's not really very expensive at all, either.

I have yet to get the tags to work properly - I suspect I may not be using S2, and I only have so much time to wank on LJ. I'm not inclined to change my style just to get use of that feature.

That being said, I'm probably going to use them to keep track of which posts are public, friends-only, or custom. Anything else is, as you pointed out, made mostly redundant by Memories.

Flat | Top-Level Comments Only