Hacker News new | past | comments | ask | show | jobs | submit login
Client-side full text search in CSS (redotheweb.com)
213 points by fzaninotto on Sept 10, 2013 | hide | past | favorite | 56 comments



First off, this is fascinating, and I love it. I never thought of doing anything like this, and I have no idea how they came up with it. Awesome idea and great execution.

But this:

>The advantage of using CSS selectors rather than JavaScript indexOf() for search is speed: you only change one element at each keystroke (the <style> tag) instead of changing all the elements matching the query.

makes it sound like they just don't know how to write DOM-efficient JS, and probably never profiled it or their implementation. I would be shocked if you can't relatively-trivially make a faster JS implementation, and even more if you can't make a significantly faster 'smarter' one with e.g. a more optimized search index, since you can make tradeoffs the CSS processor very likely cannot.


I used a similar technique once to align dynamically some elements as column (no I couldn't use an array).

> I would be shocked if you can't relatively-trivially make a faster JS implementation

I'm not brave enough to do a jsperf testcase, but out of my frontend profiling experience, I would guess that his solution is significantly faster that the naive one you see in most of the case using jQuery, i.e.

    $('.elements').each(function() {
      $(this).toggle(matchSomeCriterias(this))
    })
This naive solution is slow because:

    - jQuery.toggle() assign inline style on each elements.
    - It directly fetch values from the DOM to match nodes
    - It trigger a lot of reflows
But yes, I'm like you pretty sure that a decent implementation that work on an in memory mapping of <value to match> => <DOM node>, use class toggling to show/hide element and make use of document.createDocumentFragment() beat this CSS method.

Poor javascript performance is most of the time due to excessive reflow triggering, this CSS beat naive JS solutions because it only trigger one.

Edit: oh and an issue I had at the time was that for IE8 and older you can't use element.innerHTML on a style element. You have to use element.styleSheet.cssText


Yeah, the jQuery case is kind of what I was picturing they were comparing it against, but I didn't want to say it. But reasonably-performant plain JS is pretty simple, it's just a bit different than you tend to see in jQuery-infested realms[1]. And reflows aren't that hard to understand or control, and it can also easily only do one - just do it all in a loop and don't try to read values after setting related ones.

[1]: http://cl.ly/image/2I1v3H053L0Y jQuery really is great, but getting away from it is great too.


> And reflows aren't that hard to understand or control, and it can also easily only do one

Yeah you easily get bitten by jQuery because it mix reads with writes under the hood.

I prefer to use it most of the time for browser compat / easier maintainability, but it's crazy the performance you can save by bypassing it in hotspots.


Pretty cool, but yeah, you have to be careful of CSS injection (as mentioned by the author). There isn't too much harm that can be done if the user is typing this in himself or herself, but if the search query is pulled from the URL there might be some security implications.

For example, enter this into the search field:

    "]), body, a:not([data-index="
This will hide the entire page. The last "a:not" selector is really inconsequential-- I just had to close the opening parenthesis and this just happens to work.


I'm curious if this is truly a 'security' concern, or more like a hackability issue. Though I could enter that string into the page myself, I can't actually start running scripts on the page or anything, can I? Really, this doesn't seem any more 'dangerous' than opening the web-inspector and changing the CSS to hide. Do you agree?

(It may sound like I'm trying to be an ass, but I am actually curious).


It's possible to execute remote Javascript through CSS in Internet Explorer and possibly Firefox according to this: http://stackoverflow.com/questions/476276/using-javascript-i...

You can also clickjack, i.e. make a button that does something important invisible and stretch it across the entire page. Next time the user clicks, they'll inadvertently be clicking the button.

Edit: I did some research and testing and it looks like XBL and element behaviors are no longer possible in Firefox and IE 10, thankfully:

http://stackoverflow.com/questions/9679527/do-moz-behaviors-... http://msdn.microsoft.com/en-us/library/ie/hh801219%28v=vs.8...


I can't get that to work in Firefox 25.


If you can get a user to click a link which exposes CSS injection, it can be a security issue. For instance, you could change the text of links and buttons, and otherwise trick the user into doing something that s/he wouldn't otherwise.


My initial reaction is that you would need an additional moving part to turn this into a vulnerability. Say you had the ability to permalink to a filtered view -- the querystring param could carry an XSS payload... breaking out of the CSS context early, or maybe (and this is off-the-cuff speculation) staying in the CSS context but adding a rule which refers to an attacker-controlled file.


This is a very clever idea. There are a few limitations that I think would prevent this from being very usable in practice. One is that it only supports a single word. If you type "Ona Bednar" into the field, you get nothing.

Another problem that would only start to show up on a larger dataset is that because the index is all concatenated directly together, it matches strings that span several words. A user searching for their pal Harry Mesbro in the list might be confused to find that typing in his last name also brings up Yvette Hammes.


"Ona" and "Bednar" are in separate data fields. Do standard JS implementations handle this case? Is that even desired - treating each word as an isolated search? Interestingly, because of the concatenated data issue, "onabednar" does return the record ("it's a feature!").

The example only supports single words because it contains only single-word data. If you edit one of the indexes to include a space it works.


I have never seen another search implementation that only allowed you to search one word. You don't treat each word as an isolated search, you simply check that each word is contained in a field. I guarantee that almost any user will be confused by a search that works like this demo the moment they try to search for someone's full name.

But no, it won't work correctly just by adding spaces to the index. That will only work if the fields you enter are adjacent. So if you typed "Ona Bednar", it would work, but "Bednar Ona" won't. That's not how users expect search to work.

For a more ordinary example of what I mean, suppose the dataset included the middle name. Users will expect to find Ona Justine Bednar if they type "Ona Bednar". If it doesn't work that way, it's broken.


Well, I was talking about live search filtering of a dataset on the page, like this demo. Most are simple Ctrl+F text searches (when I said adding spaces would work, that was intended to only cover the case of an unbroken substring within one field). After a quick look I didn't find any demos that support multi-field searches, though it would certainly be possible.


I didn't find any demos that involved multiple fields at all. But "multiple fields" is not a concept that exists in most users' worlds. Names exist, and email addresses exist. They wouldn't try to search for "ona schamberger.frank@wuckert.com", but they will try to search for "ona bednar".


That's because the data-index field has been created with all whitespace removed. If the field was created with whitespace preserved, you could search with the original whitespace.


With the original whitespace, yes, but not with fields reordered or omitted between other fields. Users expect search to work basically the same way that a simple Google search works, and that's not hard to implement.


Indeed it isn't. My other comment on this post actually outlined a simple method to do as much.


Interesting, but I can't help feeling that a better implementation would be to split ethe input on white space, and build a slightly more complex selector such that a search for "term1 term2" would set the style to:

  .searchable { display: none; }
  .searchable[data-index*="term1"][date-index*="term2"] { display; block; }
and an empty input would hav eno selectors (or .searchable {display:block;} ).

It's slightly more code, but much more usable.


I ended up implementing it this way. Nice idea. Just FYI there's a typo on line two (display; block should be display: block).

Got stuck on that for about 5 minutes. That's what I get for copy pasting :)


Fun hack but since it relies on JS it's difficult to see why cutting IE8 out makes sense for such a negligible speed up (assuming said speed up actually exists).


It makes sense on paper, but I'd be interested to see actual benchmarks, just to see how much of a speed boost you get with this method. I certainly didn't notice it on my machine (tested against the jQuery UI Autocomplete)

I chuckled after reading the article at the somewhat misleading title. I was assuming it was pure CSS, no JS...and was wondering how the CSS was firing events! :)

This is still pretty neat, IMO, and I like how it emphasizes the "markup as your data model" concept.


This can be made to work in IE8 using two selectors instead of one. See Kbenson's solution.


This is quite simply a very clever hack. Obviously it isn't up to production standard, but from a hack point-of-view it's thinking outside-of-the-box and I love it. Good to see people thinking of nifty ideas like this. CSS and HTML are getting to the point where they can do what was once only possible in Flash, then Javascript and now in CSS.


That is great, I could immediately think of a few possible uses for this. Client side searching can be quite heavy on the machine if there's a lot of data.


If you have lots of data, you can offload your searching to something like swiftype.


Thanks!


The point about it being efficient because you're only changing one element doesn't sound correct. If you change the styles on the page, the browser is at least going to have to iterate over all of the items with the searchable class (assuming that it doesn't build some sort of index). If you did it in JS, you could try to make it more efficient by indexing the data first.


This is interesting, but requires you to transfer essentially your entire data-set to the client, increasing your overall transfer.


You wouldn't use this on that case.

This will be used when the data is already sent to the client side.


In what case would the data be "already sent to the client side" that doesn't involve essentially transferring the entire data-set?

The search is certainly useful with 100 records, but if I can return only the 10 matching records from an AJAX request, that's 90 records I didn't have to return in the first place.


Great point. You could could progressively backfill the search to protect the initial load and user experience. You could cut down on overall bandwidth by only backfilling search data if the user has shown a preference for using search. The first search experience is a little slower, but overall system work and client side implementation using the posted method could provide a good balance.


One example I can think of is searching for matching documents with some other method (on the server side), and then searching inside each document with this method.


but what if you had an infinite scroll ala twitter feed? then you can do client side search on that list without going back to the server.


At which point, you've already transferred the entire data-set. I'm not saying it's not useful. I'm saying that it requires a lot of data to be "better" than server-side search.


First off, very clever. Second, I have a case for which I might actually use this.

Using multiple textboxes, apply each search term to a nested mongo resultset. Visually narrow down the data structure you want to get out of mongo, and generate the mongo query you'd use to get that data.


It needs separators between the fields in the data attribute, otherwise it could have false positives. e.g. "abe" will find "ona bednar" because "abe" is in the data attribute "an_abe_ndar..."


Pretty cool. The only drawback that I can see right now is the need to send all searchable data twice, increasing overall amount of data that needs to be sent to client.


You could have Javascript populate it on page load, since it's still required to update the CSS.


If you are generating the HTML on the client side then your overall data transfer would be the same.


If one is inserting the raw text into all those data attributes then hopefully there isn't too much text involved. Without too much extra markup involved as well.

Plus, one would have to write that client side code to parse all that text to insert into said data attributes.

Although, I suppose if you're just doing this on a list of data, as in the example, then it would likely be reasonable.


The advantage of using CSS selectors rather than JavaScript indexOf() for search is speed

Where's the performance comparison?


The biggest drawback that I could do easily in JavaScript is that it can't highlight the matching terms.


Presumably you can query for the matched elements pretty easily and do your highlighting/whatever then.


I would probably use Angular's filter for this


off - the site is not mobile friendly and I hate that.


[deleted]


This is cool. But, am I missing something? I can already do this with CTRL+F.


You, me and the rest of us can. But our users, may not appreciate the Ctrl+F.

Besides, there is a big difference between highlight matching (Ctrl+F) and show only matching, especially in apps where the dataset at client side is large.


You don't need to use javascript to hide/show matching elements. He also mentions something about it being faster than just an indexOf(), but gives no real benchmarks, so take that with a grain of salt.


(Rereading again..) The JS claim is silly. He is saying that because we are dynamically creating a single style, that show/hide logic will be faster. I disagree with this, since adding a CSS selector to the page means the page needs to rematch that new selector against every element of the page and re-render each matching element, same as if we had just changed the matching elements with javascript.

EDIT: thinking on this some more, it actually should be a bit faster for huge collections, since javascript has to update items individually, whereas the browser has probably optimized this code path.


Well, it's fewer DOM manipulations, right? Touch a single style element vs updating a bunch of them.


It's fewer DOM manipulations from Javascript yes. The browser will still internally change the styles associated with the DOM elements, just without the JS overhead.


Yeah, I think you're missing the point that it's not using Javascript at all but only CSS.


Actually, it's rather heavily dependent on Javascript as that's how most of the functionality is handled. Plus some server side is likely required to insert the raw text to be searched in the data attributes. I guess you could hand-code all that text inside numerous data attributes but different people may do things differently.

Although, it is clever but rather limited.


... what do you think is handling the key pressing?... It's not just CSS. The only thing this does that's different than iterating over every element is it splits the hiding to the CSS engine.


woosh




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: