As a follow-up to the previous article on tag management, and the great feedback we have received, we would like to get your feedback on whitespace in tags.
Here is the question: is a space a legitimate character within a tag, or is it only a valid separator?
Why does it matter? Arguments for the jury, in favour of whitespace in tags or against it.
- Pro: Tags represent concepts. A concept is sometimes a single word, but sometimes multiple words that cannot be separated. If the tag is “dog animal”, really you are saying it is both a dog and an animal, and thus the two words are two concepts. If, on the other hand, the tag is “Sierra Leone”, (country in West Africa), this is one concept. It is not both in Sierra and Leone. We can think of lots of other examples where the two words together make one concept.
- Con: Most tagging systems do not allow whitespace, indeed, many treat it as a separator. In many tagging systems, if I type “cat dog” I will automatically get two tags, one cat and one dog, and the whitespace will be treated like an automatic separation. Wiki systems work similarly. For obvious reasons, maintaining maximum upwards compatibility with other tagging systems, or at least not locking us out, makes a lot of sense.
Here are our possibilities:
- Separator: Whitespace is a separator. If you type in “dog cat” or “Sierra Leone”, that is two distinct tags in each case. If you want to keep “Sierra Leone”, you need to manually and explicitly type Sierra_Leone, SierraLeone, Sierra-Leone or some variant. This is the del.icio.us method.
- Allow: Whitespace is allowed. If you type in “dog cat” or “Sierra Leone”, that is one tag in each case. For the “dog cat” case, if you want two tags, you will either need to type “dog” then hit an add button (or similar) then “cat” and add, or use a recognized separator, e.g. “dog, cat”. Obviously whitespace at the beginning or end of a tag name will always be ignored. This is the Gmail method.
- Auto-Convert: Whitespace is not allowed, but isn’t a separator either. Rather, every time you type in whitespace, it is automatically converted to an underbar ‘_’. So, if you type “dog, cat” you will get two tags, one of “dog” and one of “cat”. On the other hand, if you type “Sierra Leone”, you will automatically get “Sierra_Leone”. This is the Wikipedia/MediaWiki method.
One thing we definitely do not want is for there to be multiple options. Too confusing, and support is a nightmare, creating the worst of all worlds. Thoughts and feedback are always appreciated. If a working example would help, let us know in the comments.
Avi.

I’m quite comfortable using _ as a separator.. and I wouldn’t advocate using whitespace as a separator just because as you said to maintain compatibility
auto_convert could be ok.. maybe a tool tip can popup whenever something has been auto-converted (which you can then turn off)?
Personally, I like the idea of allowing white space, because it is more natural and, as you say, there are too many times when you want to use two or more words as a concept. In most of the Mac OSX applications I’ve tried, they allow white space… anything you type until you type ENTER, which ends the concept and puts you in mode to write an additional concept.
That being said, I also favor doing that which is most compatible… it wouldn’t be difficult to adjust.
I would prefer to allow the use of white space within tags (as in “Sierra Leone”). Although I have been using del.icio.us almost since its inception, I fined the white space “restriction” to be artificial and awkward.
The inclusion of white space in tags makes the feel more akin to natural language, and hence easier for anyone not welded to a computer from birth (if you see what I mean 😉
I can see the attraction of only using white space as a separator from an ease of implementation perspective, but my preference as an end user is to allow it’s inclusion.
I also prefer allowing whitespace, but it does raise a host of problems. The comments so far seem to lend towards the third solution, which is not to use whitespace as a separator, but to auto-convert it. Is that correct?
Please allow white space in tags. Simply include designation of a tag delimiter in preferences (e.g., “,”) and take the tag to be anything between delimiters.
Also, be sure to allow most punctuation and special characters in tags, and keep the usual practice of allowing the delimiter too if properly escaped (e.g., with double delimiter).
The underscore option is so ugly (and conflicts with text underlines), and hyphens are already legal punctuation in text. So don’t use them between tag words.
It used to be that file names were only 8-characters. Now multiple text words and many special characters are allowed… because computers can work with the patterns, allowing people to see the text names and phrases they are familiar with. Adopting the convention of something named “del.icio.us” seems almost absurd.
I’m for allowing white space, a la “Sierra Leone” on the basis that it’s easier, more natural and more aesthetically pleasing.
I’m not sure where the compatibility issue comes into play, but there are already numerous other standards out there and Surfulater can’t be compliant with all of them. So maybe the best approach is to have a utility to convert tags to other formats as required, e.g. convert all white space between delimiters to an underscore or convert the first character of any term following a white space to upper case and delete the white space (wiki-ize).
Yes, it is desireable to allow whitespace. The choice of an encoding is less clear. From previous comments I take it that you intend the database to be independently accessible. If that is the case you want a simple encoding scheme like the underscore replacement. But that conflicts somewhat with other suggestions to allow the underscore character itself and even the punctuation characters used to separate tags.
My take would be to allow any character in a tag, including whitespace embedded in a tag, except for a nominated separation character, such as a comma. You should also strip whitespace from the start/end of tags to avoid confusion.
New questions: should tags be case-sensitive? what about other character sets?
I just reread your intro and realise that underscores are NOT necessary at all and therefore I’m repeating your suggested ‘allow’ option (ie Gmail). John’s idea of using a configurable separator makes it a nobrainer imho.
White space is white space and should be ignored. Has anyone ever heard of a “white space delimited file”? No, comma delimited (CSV) is standard. “Sierra Leone” is a single tag. If you want to separate the words into individual tags, delimit them properly, e.g. with a comma. All leading and trailing spaces are always thrown out.
Yes, you could offer a configurable tag separator but why? All touch typists create lists separated by commas, tabs, or carriage returns. Surfulater doesn’t need to be any fancier than that, in my humble opinion (BTW, I don’t like acronyms either, ergo, no IMHO).
And auto-convert can be done but “auto” anything is always fraught with peril. Most of the time “auto” is acceptable. But when I don’t want something to “auto” I must have a way to either prevent it or revert it. In the absence of a prevent or revert option I suggest avoiding “auto”. Besides, Sierra Leone is the country, not Sierra_Leone or SierraLeone. So my tags will be words/phrases I don’t use in everyday life (“natural language”)? I don’t think so. I agree with David Laing comment re: del.icio.us and with everything John Hanna wrote.
Start with the KISS principle before adding complexity. And multiple options is an absolute no-no.
Craig
I am glad that I put in the response comment above. People seem to be very strongly in favour of whitespace, and every other character, as legitimate in a tag, except for a well-defined separator, which would be a comma.
We agree wholeheartedly with the option comment. We do not want additional options, as it both violates KISS from a user perspective, but also greatly increases our support burden.
Avi
hi nev — maybe i’m missing something, but here’s a question I thought of:
I would ask: why does the conceptual differnce (between white space as separator vs non-separator) really matter? Arent separators conceptually kind of irrelevant when it comes to tags?
Lets say I typed in “sierra leone”.
scenario 1: White space is separator, and it goes in as two tags. Later on, I search for ‘sierra leone’, and it shows all results that have both sierra “and” leone, and thus the article i want is in the results list. (articles that only have ‘sierra’ or only have ‘leone’ as tags can perhaps show up in the results in a section called ‘close matches’, if they show up at all).
Scenario 2: white space is not a separator, and it goes in as one tag, ‘sierra leone’. Later I do a search for ‘sierra leone’, and the article i want shows up in the results page.
In either scenario, the article i’m looking for shows up in the results page. What am I missing? Is it a question of being able to prioritize the search results?
well, If the white space question really does matter, then I for one would vote against either camelcase or underscores (or anything that moves TOO far away from natural language). Simpy.com for instance does tag separators with commas, which feels natural and is easy to type rapidly. I think metafilter does the same.
But again, it seems to me that white space could just be a separator and the idea of linking ‘sierra’ with ‘leone’ is maybe less of an issue than it seems? Again, maybe i’m missing something…
Morning all, gray skies and probably rain ahead today, at least here at The Cape.
For folks addressing me (Neville) here, note that Avi wrote this blog post, hopefully the first of many.
Before I started work on Tags I did a lot of research on tagging systems and how people use tags. You will find that I’ve written more posts on the blog on tagging than on any other subject.
The bad news is there are no standard use cases for tags. Some allow single words, others multiple words. Some get you to enclose multiple words in quotes, others use a comma separator. Some only let you pick pre-defined tags from a list, others let you type them in and create new tags on the fly.
Rightly or wrongly most tagging systems only allow single words. Delicious is a stand-out example of this. My decision was therefore to start off with single words, and treat space as a separator, as this fitted in with popular use.
Further when you are *typing* in tags, space sort of makes sense as a separator. The other comment I’d make is when you *look* at a text field full of tags (as in Surfulater) to my eyes each word looks like a separate tag, versus words grouped by a comma separator. Finally as Avi wrote there are issues of compatibility with other tagging systems as we move forward.
Note that I said “start” with single words. If I’d started with allowing multiple words and folks didn’t want that, then going backwards to single word tags would have been troublesome.
My personal preference aligns with most of you here and that is to allow multiple words and don’t treat space as anything special. Further I would use comma as a separator as we do now, and not make this configurable. Blog followers will know I hate options.
Having said that, there remains a strong case for single word, space separated tags. There are no outright winners here.
PS. I loath the notion of auto-converting whitespace. Folks will think their keyboard is acting up and want to throw it against the wall.
One more try…
The conclusion last expressed in @12 doesn’t seem to make sense.
It suggests that people don’t effectively use punctuation [“when you *look* at a text field full of tags (as in Surfulater) to my eyes each word looks like a separate tag”].
It will seem to keep us from using natural multi-word terms for tagging.
It will force us to use awkward and unnatural work-arounds like making the single-word tags from multiple words connected with some “special” characters (like the “_”, “-“, “.”, or capitalization shifts) that most everyone indicated they didn’t like.
Why not just take the tags to be whatever people enter between commas? Then no changes are needed to shift to multi-word tags later!
John,
The sentiments expressed by you here and most of the others, seem pretty strongly in favour of natural language, i.e. space is a valid character within a tag, and commas are natural separators.
The *only* reasons to keep whitespace as a separator are:
– cross-compatibility with other systems that might not allow whitespace
– user experience with other systems that use whitespace as a delimiter
Neither of these seems sufficient to override the natural language and ability to use compound words that users want.
In other words: we agree with you, and intend to keep whitespace as a valid character within the tag, not as a separator.
Avi
@John,
“Why not just take the tags to be whatever people enter between commas? Then no changes are needed to shift to multi-word tags later!”
That is the plan. Avi and I discussed all of your feedback at length yesterday and decided to change to multi-word comma separated tags.
The only question was when to do it, and in the end we decided it needs to be in the V3.0 release, else some folks get too used to space as a separator.
Neville