Hacker News new | past | comments | ask | show | jobs | submit login
How to build a search engine with Ruby on Rails (testdouble.com)
196 points by alokrai 38 days ago | hide | past | favorite | 63 comments



This is a bit off topic but I wanted to say I’m really glad there are still people out there hacking on some ruby + Postgres projects and writing about it. I feel that ruby is an excellent language and dread the demise of it.


I think if Rails didn't exist and someone released it today, it would be exciting and worth switching to.


As someone who has recently started using Ruby for some personal projects - I agree. I like the simplicity of it and that I can do simple command line one liners with it.


Exactly, I only started maybe a month ago and incredibly simple to get into.


Ironically, I think that's part of the reason it became trendy to hate on Ruby and Rails: they make it really easy to do things, which means they also make it really easy to do ill-advised things. But that's not really the fault of the lanaguage or the framework - writing good and maintainable code is a skill you need to work at and develop over time, no matter what language you're using.


Ruby is a beautiful language - Rails is opinionated (much like its creator). If the ways Rails is opinionated works well with your use-case, it's wonderful to use but if not, it's simply horrible.

I don't think RoR usage has declined (maybe as evidenced by the number of "I'm a newb" comments here?) but rather think it's become mainstream so it's simply missing the hype it had when it was the new hipster technology.


It's not so different to the same kind of elitism that gives, say, PHP or JS a bad rap.

They just happen to be successful, and popular.

I don't think the trope of Ruby being old and boring is such a bad thing either, it just means it's stabilised and has a strong ecosystem that requires little to no effort to get set up. And it still gets a lot of love with every Christmas release adding something desirable and new.

I also think it will require some immense innovation or paradigm shift to unseat Ruby/Rails as a de-facto framework for rapid web prototyping. I would still kick off a project with Rails in favour of trying to early-adopt some new approach to development.


Some of the most interesting projects I stumble on as part of the public sector digitalisation in EU are made in Ruby on Rails (Southern Europe, especially Spain/Portugal), PHP (all of EU) and Python (Eastern Europe).

Having recently switched most of our internal development from C# to Python and Powershell I get it. I can’t imagine not building small web apps in Python and I imagine PHP and RnR are exactly the same experience of getting things to simply work extremely fast.


Parts of GOV.UK [1] and the Paris.fr website [2] are also good examples

[1] https://github.com/DFE-Digital?q=&type=&language=ruby&sort=

[2] https://www.paris.fr


First one that comes to my mind is Decidim (https://github.com/decidim/decidim, https://decidim.org/)


>Having recently switched most of our internal development from C# to Python and Powershell I get it.

How was the move from C# to Python?


How is it demised? Shopify, Stripe and many new YC startups use it.


I think it's easy to say it's declining based on things like Red Monk's ranking, but with GitHub, Stripe and Shopify behind it, it's not going anywhere soon. And apparently Rails us is slowly increasing: https://arstechnica.com/gadgets/2021/09/php-maintains-an-eno...


Isn't that graph hopelessly tainted by eg. the dominance of WordPress? I mean, yes that's PHP, but just because 80% of sites are running it doesn't mean 80% of development is in PHP. Far from it.


All those kind of charts have the same problem. It’s like asking “what food do people like the most?”, and answering “rice” because so many people eat it every day. The problem is that the question is so vague you can answer it many different ways, and all of the answers are correct.

What people usually mean when they use those stats is “look how popular my choice of language is”.


It is, but discounting PHP because of WordPress from that graph is still interesting - I doubt there are free one-click installs for ASP.Net like there are for WordPress for example, but are there for Ruby on Rails?


The comparison does not make sense. WordPress is a CMS that happens to be written in PHP, while ASP.NET and Ruby on Rails are frameworks for building web services which don't do anything by themselves. At best you would get a blank "Hello world" page and nothing more from a "one-click" installer.


That graph counts site using Shopify. And shopify represent 80%+ of RoR usage.


Yeah, three companies worth an estimated $300B can keep any language alive. It’s fascinating to track the second wind of out-of-fashion languages, held aloft by a few mature companies that depend on it.


I'd argue even without them Ruby would be alive in the sense that the core language would get frequent updates. Even Perl is still alive https://github.com/Perl/perl5, even in a healthier state than I would have thought.


Oh wow, Perl is still getting monthly updates - I knew it was on the decline but I thought the whole Perl 6/Perl 7/Perl 11 fragmentation had left things in a worse state.


I’ve heard of Raku (Perl 6), but didn’t realize there was a Perl 7 or Perl 11. Pretty interesting problem. In Java land you get Java 8, current Java, Groovy, Kotlin, etc. So Perl is in good company in this regard.


Perl 11 was a thought experiment. It never actually became anything beyond a website expressing that notion. It got no traction from anybody apart of the originators. And of that I'm not really sure either.


Facebook and PHP are kind an interesting twist on that. Facebook spent a lot of time and money on Hack/HHVM, then the PHP core devs responded with a new version of PHP that is, in many cases, faster. Though Hack/HHVM is still better for async work.


Gitlab, too


Stripe does not use Rails. It uses Ruby for some backend services with its own custom ORM framework, logging etc.


Rails hasn't had any interesting updates since version 4, and arguably introduced a few regressions IMHO (I'm looking at you, active storage and action cable)

Ruby hasn't evolved much either.

Rails only scales so far and then scaling gets really challenging, obviously the named companies have figured it out, but its not easy. For example, rails due to the high coupling between models and the database, rspec test get very contrived trying to setup the db state for each test run. Compared to other languages like Python or golang where the sql-layer is abstracted from the model layer.

Compared to languages like Python which provide very similar developer experiences but also the power of ML modeling, ruby is falling behind.


Rails doesn't scale? Github's the largest code repository site in the world. Stripe's one of the largest fintech sites in the world. Shopify is one of the largest ecommerce sites in the world. There's also Airbnb, TripAdvisor, many others that are huge.

Also, you can write shitty, slow code in any language. You can make too many calls to the DB in any language. You can also scale to Google size in any language.

Python isn't a quick language. Also most of the ML bits are C++ bits. Ruby is faster than Python for most things. But Google (and others) have done well with Python (and other bits). Facebook got huge with PHP (which no one truly likes).

Ruby is a fine language (also very pleasant to use), Rails powers many huge companies, sometimes nitpicks are just that.


> Rails doesn't scale? Github's the largest code repository site in the world.

You know, i think i understand both of the viewpoints here. Personally, i'd say that Rails doesn't scale as well as i'd expect it to. You can definitely build scalable systems in it, though you'll end up throwing a whole bunch of hardware resources, when compared to certain other languages and technology stacks, to serve similar load.

For example, right now i self-host a GitLab (https://about.gitlab.com/) instance for managing my code repositories, CI builds and so on. Even with just me using it (alongside some automated processes), it routinely eats up close to 4 GB of RAM, which in my case is an entire VPSes worth and costs me about 60 Euros a year with Time4VPS (affiliate link, if you'd like to check it out: https://www.time4vps.com/?affid=5294) but would cost me way more in AWS, GCP etc. One could argue that that's not too expensive, but not everyone earns a lot of money and running 10-20 VPSes does eventually add up, since i can't afford colocation and my residential homelab setup with a WireGuard tunnel to bypass ISP NAT with a proxy VPS is pretty slow, even if i can afford more storage, RAM and CPU power that way.

Compare that situation to projects like Gogs (https://gogs.io/), Gitea (https://gitea.com/), GitBucket (https://gitbucket.github.io/) and sourcehut (https://sourcehut.org/) - i'd argue that all of them on average use less CPU resources and memory for accomplishing similar tasks.

However, we cannot ignore the fact that using Ruby might have been exactly what allowed for quickly creating the functionality of GitLab and many other platforms and tools out there, GitHub included, so the choice between usable software and innovation in the near future and performant software possibly years from now is a tricky one.

There are probably good arguments for both, but noone can declare either to be better. Personally, i don't mind using Ruby, Python or even PHP when it makes sense and i don't need to worry about scalability from day 0.


it routinely eats up close to 4 GB of RAM, which in my case is an entire VPSes worth and costs me about 60 Euros a year

You are absolutely correct that running other peoples Ruby code is expensive. I would argue similarly for Java because of the high RAM requirements.

However, if you are a company developing your own software, if you use a more productive technology and it spares just 1 programmer then you have paid for a few hundred VMs and running costs become insignificant compared to labour costs.

This is an argument in favour for both Rails and the Java monsters.


Github does make it work, but you can see there's a tradeoff for that. Like in this blog post:

https://github.blog/2020-08-25-upgrading-github-to-ruby-2-7/

You see mentions of 70 second application boot times. Also interesting is the reference that a future upgrade to Ruby 3.x might increase performance by a factor of 3. That would mean they are leaving quite a lot on the table right now.

Basically, yes, they make it scale, but that's not free.


Ruby 3 being 3x faster is based on a video game emulation benchmark. Rails sites won't see anything like that. The ruby team chose a really specific benchmark to target and it's frustrating that it's not communicated well, leading to understandable misunderstandings.


Somewhat interesting then, that Github cites the 3x without that context.


Stripe never used Rails. It uses Ruby with in-house ODM libraries, logging frameworks, and more. And a lot of those features are being moved to Java services.


Man, is calling ActiveStorage a regression ever spot on. Wish we'd never have used it and stuck with Paperclip.

ActiveJob was Rails4, so I can see where you're coming from there, but ActionMailbox from Rails6 has been a huge win for us. However Webpack has been a disaster. And then the odd evolution of Turbolinks and Stimulus in parallel.

With Rails7 rolling back Webpack and uniting Turbolinks and Stimulus under Hotwire, ActiveStorage finally getting to a good place, and ActiveRecord getting more and more niceities between 4 and 7, then factor in Ruby3's nice perf improvements, I'd say Rails is actually in a much more interesting place than it has ever been. The road from 3 to 7 has been rocky for sure, but for the first time in awhile I'm actually excited to spin up a new Rails project.


I highly doubt that you will reach the scaling limits of rails unless you have a massive product and even if you do Python would not be a good replacement.


Depends on the product. The Rails architecture is fine for CRUD apps, and Heroku will happily take more and more money for more and more database capacity. As long as your ARPU is solid, you're fine scaling like that.

But there are more things in the world than CRUD apps. I'm building tools to track hate on social media, and there's a lot of social media out there. Rails, or any RDBMS-centered architecture, won't cut it. And if you're not using Rails, then I think the case for using Ruby isn't very strong these days; Python has a bigger community and is much stronger in important niches, like ML. So we picked Python as our default language not because the Python runtime is vastly more efficient than the Ruby one, but because the ecosystem is a much better match for our needs.

I wish it were the other way, as honestly I like Ruby better as a language. But my personal tastes are ultimately a pretty small factor when I'm picking the tooling for a project.


For implementing ye olde business logic, which is far more common in software development than cool tech such as ML that is discussed a disproportionally high amount compared to its real-world usage, Ruby on Rails wins. Sure, there are lots of frameworks that are good for shipping business features quickly, but Ruby on Rails is still among the leaders of that group due to concise code, magical defaults that give you a boost, an excellent ORM, view helpers for sooooo many common scenarios and edge cases that the JS world is still playing catch-up to, etc. And a lot of the new defaults in Rails 7 are aimed at reducing complexity and rescuing developers from the insanity of front-end development.

Great for CRUD, yes, but plenty more than that. Soooooo many apps don’t need to wrestle with a huge amount of data or even a data warehouse or analytics or a data lake or any data science or anything requiring even a blazing fast query time on a super complex gnarly data model, and most of those would ship faster by using Rails, and be easier to maintain as well.

But the devs don’t tell management that……


There's a lot of "ye olde business logic" in what I'm building, for people who often change their mind about what they want. A Rails API (with Postgres and Elasticsearch) makes this fairly easy to handle, especially with the limited staff we have. A Vue.js client provides the fancy UI the users want.


You and the person you're responding to are agreeing that you should use the right tool for the job, and there are plenty of jobs that Rails isn't right for.


If you are just trying to pull as many records from an api and run it through ML Python would make a good choice.

If you wanted a website to handle traffic rails or even php would be a better. Both have a more mature ecosystem and both allow for rapid development.


If you will grow big enough surely you will need some kind of web presence - then you can introduce Rails :) Or don't. Honestly having the team on one language makes a lot of sense.


You still have the GIL in python to overcome.


Not really. ML in Python basically means lots of C++ libraries and you can just split processes as much as you want as it's easy to parallelize. You could do the same in R, Ruby, any scripting language really (but Python already has all the glue stuff ready to go and is easy).


Ruby (MRI) also has a GIL


It sure does.


You don't know what you are talking about. Every Rails release from 4 adds a ton of new stuff, bug fixes to the framework.

ActiveStorage is great addition - storing files on AWS now takes 30 minutes at most to setup. You don't need to use external gems for the same functionality now. Labeling it as a regression is nonsense as Rails before didn't have the functionality at all.

Ruby also evolved a ton since Rails 4. Just look at changelogs or something.

Scaling Rails is same as scaling any other stack. If your software grows it will run into bottlenecks no matter what you use. Rails is no harder to scale than anything else. It mostly is about people, not stacks anyways.

And i think mentioning ML points in a direction that you don't know much about normal web development since for 99% of web things you don't need any ML at all.


ActiveStorgae was launched half-baked. Downloading files on the back end was missing from 5 and not introduced until 6. This seriously burned us as we had to put off implementing a ton of what should be easy reporting features until we could get around to upgrading from 5 to 6. Rails strategy of bundling everything together was a huge problem. With Rails7 it looks like ActiveStorage has finally reached parity with Paperclip, so while its nice that there's a built-in option, fundamentally we're still back to where we were years ago, as its not like it took longer than 30 minutes to setup Paperclip either.


Arguably the most popular filetype stored with a rails app is images, but ActiveStorage didn't even support CDNs. 3rd party gems like CarrierWave supported CDNs do.

Instead of adding exactly what gems you need, rails forces you to jump through hoops to removed bloatware (ActiveText, ActiveCable, ActiveStorage).

Moving off of the rails asset pipeline to webpack was silly. If you're running a SPA, you should have a fully separated web-app and tooling stream instead of trying to merge your backend and frontends. For server side rendering, asset pipeline was good enough IMHO.

ActiveText doesn't even support CRDTs. Otherwise it doesn't really add much value to advanced text editor eco-systems.

The change logs are either adding bloatware that only Basecamp asked for, bug fixes, or security patches.

> And i think mentioning ML points in a direction that you don't know much about normal web development since for 99% of web things you don't need any ML at all.

This sounds like someone that has either only worked with webapps with very few users and not at big tech scale. You can get away with not using ML in b2b products with 5 customers, but at some point you will need to add ML to scale.


This feels like the idiom of 'those who can't do, teach' except its...

puts "those who aren't building interesting systems in #{RUBY_ENGINE} write comments about #{RUBY_ENGINE}'s demise"

No one cares what you are developing your product in. Maybe some intern during an interview is salivating about your $BESPOKE_IMPLMENTATION, but at the end of the day, Ruby and other dynamic languages solved a very explicit problem of "hey compiled languages are great and I can design a very high level system and then needle down into whatever I want to behave in very explicit ways, but damn I want to get this done in like 30 minutes instead of over 3 days"

I feel like anyone who needs a blog post about some sorceress' concoction of systems to feel validated about making something that solves a problem, especially a problem of theirs - even if it's for your grandma, or a few close friends, or even just yourself... you just don't get it. waxing philosophically about tech is fun and gets any number of rocks off but at the end of the day, if you love scheme and know how to whip up something in 30 seconds that would take an hour in java 17 pro max (ellison edition), go for it, publish it, run it on your raspberry pi. This fixation on chasing the dragon every 15 minutes is a complete waste of time and is effectively super counterproductive.

How many breaking changes has node gone through? Our boy Ryan Dahl literally reimplemented a JS runtime as Deno in Rust because he was like "ya i made node and it was cool for a bit but this is getting out of hand" Great! Use Deno! Use Node! Use Fortran if you're productive in it. The idea that you have to curate what shit you're learning because of a predicted decline in the ecosystem of said shit is such a waste of time and I shake my head in sadness at anyone who isn't learning something because they fear it might be deprecated tomorrow. Be the change you want to see in the world. If anything, the APIs youre using in $POPULAR_LANGUAGE will be deprecated and the APIs that are not getting blog posts on HN are actually still stable. Run it in a container! Who cares!

You know what will be not important? deep ponderences of some tech rolled out over an API like AWS Lambdas or Google's AMP stuff, or React, or Angular (remember that? Lmao! Only a few years ago!). Focus on fundamentals and avoid the constant churn of new features that get deprecated 6 quarters from now.


Ruby is still an amazing language

Rails needs some help


How so? Plenty of large companies still use Rails. Plenty of startups still use it. I remember when SPAs were all the rage, now every JS framework is doing SSR, just like Rails did/does. What full-stack framework is as complete *and* allows for the same speed of development?


What would you change about Rails?


Ruby is alive and well. We don’t use Rails specifically (Hanami + ROM.rb + DRY.rb), but absolutely love Ruby as a language and the ecosystem surrounding it. It’s productive, and as powerful as you need it to be.

When it comes to building feature-rich web applications quickly and sanely, Ruby is still hard to beat IMO.


Doesn't React beat it quick easily?


Quick... Rails generates front-end and backend with a one line command.

How many components in react would you need to add for 1 crud ?


no


Great article. By the way, ActiveRecord's #merge is golden, and I'm under the impression it's not as mainstream as it should be.

I use it extensively to avoid duplicating scope code.

For instance:

class Listing

   scope :active, -> { where("expired_at > ?", Date.current).where.not(suspended: true) }
end

class User

  has_many :listings
end

So instead of doing this (which is terrible):

user.joins(:listings).distinct.where("listings.expired_at > ? AND listings.suspended != FALSE", Date.current)

You can simply:

user.joins(:listings).distinct.merge(Listing.active)

Rails docs are amazing, but #merge doesn't get enough love. Maybe I'll issue a pull request to improve it with some examples like this and the ones from the article.


that’s one of those little niggles i’ve had about rails that gets solved so neatly eventually. i started with rails 3 (#merge is a rails 4 addition) so i didn’t know about #merge for a while (#or was another nice addition), and was doing a lot of that ugly chaining in the beginning. same with js - with the move to frontend without webpacker and node, i’m eager to try out rails 7 alpha to see what the dev ergonomics are like now.


I love the section about getting rid of Active Record only to discover that the handwritten SQL equivalent is unmaintainable and slower.

Also if you're on the fat-models team and you have lots of behaviors and business logic on your models, it's often easier to just go with Active Record, because the second you wander off the beaten path (e.g. handwritten joins) you start dealing with weird franken-models that contain attributes from multiple tables, but not their behaviors (methods, callbacks).


Great post at the perfect time. I am currently wrestling with some gnarly, expensive queries in postgres and this gives me some great leads to try out.

Also, I appreciate the Rails love.

I know this is not a Who Is Hiring post, but if you are into Rails/Postgres and in the market (between UTC-4 and UTC-8 timezones), feel free to send me a note: gabe at instrumentl.com. We are doing some "biggish" data work helping nonprofits find grants and other fundraising opportunities.


Good post, well written, taught me a couple of things I didn't knew and uses Ruby in a clean way. I don't think it meant to be uber scalable but last time I had to build search in ruby I had to use Ferret (a great lib) and it was not this straightforward.


Everyone asks how to build a search engine with Ruby on Rails, but nobody asks why build a search engine with Ruby on Rails...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: