Tagging Management

This document specifies the basic user actions for:

For the site administrator

General Observations on Tagging

Tagging is a highly dynamic activity. Certain tags are long lived and designate topics of permanent interest, e.g. science, politics. Many more tags are short lived and refer to topics of current discourse, e.g. Michael Jackson Trial, Apple Tiger. Use of tagging propagates by example and serves for being listed in the right category. Tagging is thus an advertisement of the content from the content author. A secondary use of tagging may be for own convenience in classifying archival material. In the latter case, the tagging may come from the user itself.

Anyway, we must take into account the fact that the tagging vocabulary undergoes constant change and that no fixed set of tagging rules will remain permanently valid.

We note that a rule set will both imply tags based on content words as well as designate phrases of interest that may link to partner services. For example, there can be a wikipedia rule set which will only link to wikipedia whenever article headings from wikipedia appear in the text. This will not imply tags. On the other hand, there may be rule sets that do imply tags based on various text criteria.

Tag Administration

The ODS DataSpace Settings page contains a link to "Content Tagging Settings."

Content Tagging Settings

This page is the main page governing the account's cross-application tagging settings.

Above the list appears a search box. This is a text search on titles and terms included in rule sets. Looking for "semantic web" will find rule sets that have semantic and web as matched words or where semantic and web occur in the title or description.

Rule Set Edit

This page allows editing the rules which compose a rule set.

The rule name link goes to a page that shows the above as editable fields. If the public check-box is not checked, all uses of the rule set from all non-owner users will be revoked, so that the rule set disappears from their list of selected rule sets if it occurred there.

Rules Sets

This page may or may not allow editing the rule set, depending on whether the user is the owner of the rule set. If editing is not allowed, the name is static text and the add rule form is not shown. The import rules option will not be enabled but the export option will be offered.

Rule Set Export Format

The rule set XML format is:


<?xml version="1.0" encoding="ISO-8859-1" ?>
<tagging-rules xmlns="http://www.openlinksw.com/tags/rule-set">
<rule-set name="--name of the rule" shared="--0 or 1 depending of Public is checked or not" 

xmlns="http://www.openlinksw.com/tags/rule-set">
  <rule>
    <pattern>text of search pattern.</pattern>
    <is-phrase>0 or 1</is-phrase>
    <tags>tags implied by pattern, separated with commas, the tags element may be absent 
if no tags are implied and the pattern is a phrase pattern.</tags>
  </rule>
</rule-set>
...  
</tagging-rules>

Application Tagging Widgets

Different applications will reference tagging capabilities in different contexts but will typically not have entire pages devoted to tag functions.

Blog

Submitting a Post

A blog post will have tags associated to it. This can be because of:

  1. in the text of the post
  2. The user associating tags through the posting form.
  3. Automatic tagging by matching the content against tagging rules.

To retag an existing post, one simply opens it for editing and enters tags in the Tags text-area. The tags in the text itself will not be removed but other tags will be removed and reinserted by the same logic as at post submit time.

The post tagging controls form a group with:

Additionally, as an editing accelerator, a list of tags used in posts by the author can be shown. Clicking on an item will add the item into the user tags list if it is not already there. The basic tagging form should allow typing tag names without passing through any define new tag operation. The tag vocabulary is after all dynamic and open ended.

Display of Tags

- blog-tags widget


<vm:blog-tags n="top-n-shown" />

This widget is a possible part of a blog template. It renders as a list of the top n tags of the blog, with font attributes reflecting relative importance. See the Technorati list of popular tags for font effects.

All occurrences of tags are counted in the blog as by:


select top n tag, count (*) 
from tags 
where blog = this_blog 
group by tag
order by count (*) desc;

The tags is here assumed to be a table encoding the blog-post-tag relation.

If the list was truncated, ... is shown as the last entry. Each tag will be a link to a page showing posts having the tag in question on the regular blog view template page.

FeedsManager

When reading a news article, the tags of the article can be shown as links underneath the text. The tags are again a user account, news feed, article, tag relation, with the pk consisting the of the above 4 parts.

Some tags of the article are in the body of the article and are provided by the source and are immutable. These are in the <a href"...technorati.com/tags...> format. Other tags are added by the user. All tags are in the above mentioned table but only tags set by the user can be removed. Every transaction that sets tags rereads the content to see what tags come from the content.

The post tagging controls group can be identical with the blog one.

Addendum to FeedManager tagging spec

When a feed is received and new articles are detected therein, then the data will be tagged according to the tagging rules chosen by the owner user of each FeedManager instance which subscribes to said feed. The tagging coming from the feed itself is stored with the user 'nobody', the tagging implied by the tagging rules of the news reader owners is stored with the user equal to the news reader owner.

When a non-owner user of a news reader searches for tags, this user will find only tags the user itself has defined, plus tags that come from the feed itself.

As an option, we may consider that the owner may choose to make its own tags searchable by non-owner users. We may also consider the possibility of by default having the owner user's tags visible to the users with access to this reader instance.

For now, we make it so that each subscribing instance's owners rules are applied to the incoming and are then recorded as that user's tags. The rest of automatic tagging will be specified later.

News Tag Reports

Tagging allows new types of hype metrics. These may be relevant at the scale of the news aggregation site as well as at the scale of a single set of subscriptions, i.e. a single FeedManager instance. The tags considered for such metrics should primarily be tags occurring in the feeds themselves. In the absence of this, these may be tags inferred by the owner user's tagging rule sets. Showing personalized statistics according to multiple tagging rule sets is prohibitive.

Generally, arbitrary data mining queries on news cannot be supported in a hosted context. This is due to the fact that 1. On one hand, their time and space requirements are unpredictable and secondly users do not know how to write efficient queries. Export of data for off-line analysis by the user's own software may be supported.

The news hosting site may offer various canned reports. Certain assumptions have to be made on the user's information needs and preferences. These center on following trends and identifying opportunities for visibility.

Tags Ranking

The first report is tags in the feed of the last day or last week. The count of tags is shown together with the count of posts with the tag. A third column shows the change in ranking compared with the situation a day / week ago. We may borrow graphics from top 40 hits or such. New entries should be specially marked, as well as items that have gained significantly.

The report may have a cutoff at 40 tags or so, with a next page button for the next 40 tags.

Tags may be shown as links, going to a view of posts with the tag in question, newest first.

Reports across the whole hosting site will have to be prepared on a batch schedule. Reports on individual instances' content may be made on demand.

Wiki

The page tagging controls from blog are directly applicable to wiki. The relation for tags is there - wiki cluster, wiki word, tag. A direct Technorati tag in the body counts as an immutable tag.

The basic viewing page can just show the tags as text links with an edit tags link at the end. The edit tags control group can be identical with the blog one.

Tags and RSS

All applications produce RSS. The Dublin Core subject field contains the names pf tags associated with the item. If explicit Technorati tag anchors are in the body these are also in the RSS feed. Other tags are not in the body but only in the DC subject element,

Learn More

CategorySpec CategoryTagging CategoryWiki