Hacker News new | past | comments | ask | show | jobs | submit login
The End of Moore’s Law and Faster General Purpose Computing, and a Road Forward [pdf] (p4.org)
80 points by banjo_milkman 12 days ago | hide | past | web | favorite | 70 comments

We've built up layers and layers and layers of inefficiencies in the entire OS and software stack since the gigahertz wars took us from 66Mhz to multiple GHz in the 90s.

The software industry is awful at conserving code and approaches through the every-five-years total redo of programming languages and frameworks. Or less for Javascript.

That churn also means optimization from hardware --> program execution doesn't happen. Instead we plow through layers upon layers of both conceptual abstraction layers and actual software execution barriers.

Also, why the hell are standardized libraries more ... standardized? I get lots of languages are different in mechanics and syntax... But a standardized library set could be optimized behind the interface repetitively, be optimized at the hw/software level, etc.

Why do ruby, python, javascript, c#, java, rust, C++, etc etc etc etc etc not have evolved to an efficient underpinning and common design? Linux, windows, android, and iOS need to converge on this too. It would be less wasted space in memory, less wasted space in OS complexity, less wasted space in app complexity and size. I guess ARM/Intel/AMD would also need to get in the game to optimize down to chip level.

Maybe that's what he means with "DSLs", but to me "DSLs" are an order of magnitude more complex in infrastructure and coordination if we are talking about dedicated hardware for dedicated processing tasks while still having general task ability. DSLs just seem to constrain too much freedom.

Correct me if I'm wrong, but isn't this exactly the problem LLVM was designed to tackle?

If you're targeting non-CPU designs--such as GPUs, FPGAs, systolic arrays, TPUs, etc.--it is very much the case that you have to write your original source code differently to be able to get good speedups on those accelerators. It has long been known in the HPC community that "performance portability" is an unattainable goal, not that it stops marketing departments from trying to claim that they've achieved it.

LLVM/Clang makes it much easier to bootstrap support for a new architecture, and to add the necessary architecture-specific intrinsics for your new architecture, but it doesn't really make it possible to make architecture-agnostic code work well on weirder architectures.

True. If you want performance - you have to re-write the code for the new architecture otherwise it is pointless to develop the new core.

The problem with developing a good processor architecture is you have to always maintain legacy compatibility without sacrificing performance - because you know software.

This adds layers of extra HW with each passing generation of the processor lying around for some legacy code.

So is the reference to DSL in the article an attempt at performance portability by providing a language that hardware can optimize better?

For those who have not looked yet: John Hennessey presentation. Argues -- with a lot of detail -- that Moore's law has closed out, that energy efficiency is the next key metric, and that specialized hardware (like TPU) might be the future.

When I buy a machine, I am now perfectly happy buying an old CPU, and I think this shows you why. You can buy something from as far back as 2012, and you're okay.

However, I do look for fast memory. SSDs at least, and I wish he had added a slide about the drop in memory speed. Am I at inflection?

Perhaps the future is: you buy an old laptop with specs like today and then you buy one additional piece of hardware (TPU, ASIC, Graphics for gaming etc).

(for the average user) the CPU speed isn't relevant anymore, even the number of cores. Internet speed plays the biggest factor when watching a movie, opening a web page, using some cloud-based app. Then memory speed and GPU speed (even in cell phones) come second: how fast will your CPU grab that data and process something.

In niches, of course, CPU speed matters. I want to compile something in half of the time. I want to train my AI model faster. But what really bugs me is when I have to wait for a refresh on some site, this really makes me loose focus (and then I come to HN to see some news and time goes by).

>CPU speed isn't relevant anymore

Only because most software is conceptually stuck in the 50s and Intel fell asleep at the wheel. Having extra cores can allow you to move away from cludgy and insanely complicated solutions to much simpler code. In turn, this enables faster development and experimentation.

Generally code becomes more complicated when you have to run it on multiple cores.

I personally am working on a toy actor library to test if easy but still reasonably efficient parallelization is possible even with heavy communication needs. I see a 2.3x slowdown when I run it on a single core vs the baseline that uses locks but it reaches parity at higher levels of parallelism and congestion when you have 10 million actors each 32 bytes in size running on 4 threads.

As soon as you need to acquire two locks that are only known at runtime it starts to become increasingly difficult to find a solution. It will usually be highly specific to the problem at hand and not generalize at all when the problem changes slightly.

If you limit yourself to only acquiring one lock at a time then generalizing becomes simpler but you now have to implement something very similar to transaction rollback. It isn't the end of the world and it is definitively possible but compared to not worrying about it at all in single threaded code it's a massive headache.

Some of the most drastic simplifications I've made to real-life code in the recent years were achieved by splitting a "classic" program into multiple independent agents. Making things "parallel" wasn't even the goal. The goal was to clean up some messes.

>As soon as you need to acquire two locks that are only known at runtime

Why would you need to acquire two locks that are only known at runtime when using the actor model?

I'm an average computer user. I like to watch videos and game on weekends. For me CPU speed remains a massive bottleneck. I like to play Kerbal Space Program and, with realism mods, it really needs general computing horsepower.

That said, moore's law is irrelevant. I don't care about transistor density, or even CPU count. I care about processing power. Give me a 4-slot motherboard to mount for AMD threadrippers ... I would consider that as an option. What matters to me is price.

Hi! 4-slot motherboard for AMD? That's something the "average user" wouldn't even know where to begin.

I understand your poing, but when I think of "average user", I mean users who wouldn't dare to open a case, and wouldn't even know where is the CPU and why there are so many coolers inside.

>energy efficiency is the next key metric, and that specialized hardware (like TPU) might be the future.

This is nonsense pushed forward by large corporations who want to own all your data and computational capacity.

Admittedly knowing little about how multi-core CPUs work, I've always thought the next breakthrough in CPU tech would be a hardware-based scheduler, a chip that effectively load balances threads to ensure all cores are used equally and simplifies the writing of software. The dev writes thread-safe code and the hardware does the rest. I wonder how feasible that really is.

That sounds perfectly reasonable to me, but John Hennessy literally wrote the book on computer architecture. Towards the end of the deck he has a slide that we shouldn't expect large gains from improved architecture (on general purpose chips) in the future. I'm inclined to believe him, although I would be interested in hearing a deeper proof/disproof of the architecture you proposed.

This is essentially what hyper-threading in modern processors is attempting to do.

No, it's not?

Point one: Indeed, untold millions of dollars can be saved by improving data center efficiency. But if the processors are in desktops instead, where the price pain isn't actually enough to drive improvements in efficiency, then we as a society just pay a higher power bill... Furthermore, data center power tends to be greener than general purpose city grids.

Point two: they could have as easily said GPUs to avoid the fear of data centre only proprietary lock in... The point is the same: we seem to have hit the point where specialized linear algebra coprocessors make a huge difference.

>price pain isn't actually enough to drive improvements in efficiency

You have something reversed here.

Hardware companies that make consumer devices are pumping barrels of money in improving CPU power usage to enable longer battery life on mobile devices.

Google, on the other hand, promotes insanely power-hungry neural nets as the future of computing, because that's the kind of future that gives them the deciding edge in the market.

Energy efficiency (performance/watt) is the key metric because servers are limited by cooling/power dissipation and mobile is limited by battery capacity.

His chart shows TPU with ~80x the performance/watt of a CPU, indicating the potential advantages of domain-specific architectures over general purpose for specific applications.

None of this should be surprising to anyone who uses a GPU.

> "that energy efficiency is the next key metric"

What limits cpu speed? => https://electronics.stackexchange.com/questions/122050/what-...

More efficiency means more overclocking. Also, better cooling/dissipation methods and less gate delay.

We may be hitting the limits of wire thinness, but I get a strong feeling (ie not an expert) we've got a decent ways to go before we hit the limits of clock speed.

I'm not an expert either, but from what I do recall of my EE courses in college...

Circuits with respect to frequency, past a given point capacitance starts to look like an inductor at high frequencies. The charge and discharge sloshing happens fast enough that power really gets wasted to heat or other leaks (but mostly heat for the things we build).

My personal gut-feeling is that faster Hz probably isn't happening without some kind of radically different / exotic design or process. Further that is the case it's unlikely to scale down. That kind of system might be useful for AIs, large governments/corporations, and maybe game consoles that need a lot of power and are wall-powered so they don't care.

That's... extremely not what's limiting clock speed. For one, at this regime the resistance tends to swamp out the inductance.

It absolutely is; performance is mostly limited by power usage and dissipation, and the above is describing why at faster frequencies more power is used.

Power dissipation isn't coming from wires looking like inductors...

My i7-2600k @ 4.4GHz agrees with you. The only reason I would upgrade would be for better USB 3+ support (I have a hard time with anything more complex than a SuperSpeed flash drive).

SSDs are by far the most cost-effective upgrade you can get nowadays. HDDs tend to be the bottleneck for boot times and general "snappiness" nowadays.

The reason I would upgrade from there is PCIe 3.0+, and thus support for fast NVME cards. They are almost as much a step up again from SATA SSD, as SATA SSD was from spinning rust IMHO

IIRC the sandy bridge generation could only do PCIe 2.0.

Very true, I did just recently pass up a great deal on an NVMe drive because I can't use it in my current setup. I believe PCIe 2.0 will also bottleneck USB 3.2 (or Gen 2x2 or whatever the naming scheme is now), and whatever GPU I upgrade to next (I read that it's a bottleneck with the GTX 1080 and up)

NVMe is the only thing that will cause a noticeable improvement for me though, seeing as I still game on a 1080p60 monitor and generally don't need that sort of speed from any USB peripheral.

Still, the processor itself kicks ass and I think the only reason why most people would need to upgrade are for newer peripherals.

i seem to recall something about some carbon technology recently that is supposedly the next big revolution in semiconductors, and might revive Moore's Law? I wanna say it was about how carbon nanotubes can be used as an excellent semiconductoer, and because their width is at the atomic scale, they can result in even smaller transistors.

or something to that effect. Not sure where I read it tho. Maybe on here

There was a thing here recently about carbon nanotube transistors. I also watched a video on youtube where they showed briefly how it works, it was very interesting. Essentially very very very small mechanical switches!

>Perhaps the future is: you buy an old laptop with specs like today and then you buy one additional piece of hardware (TPU, ASIC, Graphics for gaming etc).

If Google has its way, the future will be: you buy a Chromebook and do all the real work on Google Cloud. Note that John Hennessy is a chair of Alphabet. And in case someone here forgot, TPUs were developed by Google and so was Tensorflow.

This ties in nicely with chiplets: https://semiengineering.com/the-chiplet-race-begins/ - a way to integrate dies in a package, where the dies can use specialized processes for different functions - e.g. analog or digital or memory or accelerators or CPUs or networking etc. This would make it easier to iterate memory/CPU/GPU/FPGA/accelerator designs at different rates, and reduce development costs (don't need to support/have IP for every function, just an accelerated set of operations on an optimized process within each chiplet). But it will need progress on inter-chiplet PHY/interface standardization.

So yes, if you compare matrix multiply in Python vs SIMD instructions, you will find a big improvement. Much harder to do that for more general purpose workloads.

And it doesn't scale: https://spectrum.ieee.org/nanoclast/semiconductors/processor...

And in many cases, if you normalize all the metrics, e.g. precision, process node, etc. You'll find that the advantage of ASICs is greatly exaggerated in most cases and is often within ~2-4X of the more general purpose processor. E.g. small GEMM cores in the Volta GPU actually beat the TPUv2 on a per chip basis. Anton 2, normalized for process, is within 5x ish of manycore MIMD processors in energy efficiency.

In other cases, e.g. the marquee example of bitcoin ASICs, that only works because of extremely low memory and memory bandwidth requirements.

A possibly stupid question from a neophyte: what was the driving force behind Moore's law when it was in operation? Did it become a self-fulfilling prophecy by becoming a performance goal after becoming enshrined in folklore, or is there an underlying physical reason?

Moore's law is part of a general set of principles regarding learning curves, see generally:


During WWII, it was observed that each doubling of output reduced labour costs (or increased labour efficiency) by 20%.

Moore's law is dependent on the density of transitors (the count doubles for a given cost every two years). Increased density => increased computing power and efficiency, and speed.

Chip design is dependent on numerous factors: die size (e.g., 14 vs. 9nm photolithography dies), silicon purity, fab cleanliness (much as with cascade refrigeration, chip fabs now have multiple concentric zones of increased cleanliness), and the power and capacity of the software that's used in chip modelling itself.

The law is also not entirely exogenous as it relies on market forces and demand: need for increased computing power tends to proceed at a predictable rate, and the ability to make use of more capacity is also constrained by existing practices, software, programmer skill, etc.

Then there are the other non-CPU bottlenecks. Disk and memory have long been the foundations of that, increasingly it's networking. The tendency of old technology and layers not to die but to be buried in ever deeper levels of encapsulation means that efficiencies which might be gained aren't due to multiple transitions and translations -- the reason why a 1980 CPM or Apple II system had a faster response than today's digital wireless Bluetooth keyboards talking to a rendered graphical display. Bufferbloat, at the network stack, is another example.

But: the main driver for Moore's law is increased density leading to increased efficiency (the same centralising tendency present in virtually all networks), bound and limited by the ability to get power in and heat out (Amdahl's observation that all problems ultimately break down to plumbing).

Dennard's laws explain the physics of why smaller transistors are faster and more efficient, but overall Moore's Law was closer to a self-fulfilling prophecy. There's no intrinsic reason why each generation was targeting 2x density and 18/24-month cycles are probably convenient from a business perspective but not essential.


The transistor can only get so small before it stops working. There are many issues with required extreme ultraviolet light sources (lasers) and allowed amount of impurities in silicon waffer. And R&D cost for each iteration of lithography is getting higher while bringing less benefits.

Yes, the existence of an upper bound on transistor count follows easily from the atomic nature of matter. The Wikipedia article on Moore's law lists multiple disparate "enabling factors" which do not seem to have much to do with one another. The conjunction of which comprise an explanation of sorts, but I'm wondering whether there's a simple observation or fact that ties them all together, apart from my Sociological theory.

Slide 36 compares the TPU with a CPU/GPU. This is apples to oranges comparison. One uses an 8bit Integer multiply while the other uses a 32b Floating Point multiply which inherently uses at least >4X more energy[1]. If you scale the TPU by 4, it is not an order of magnitude better. The proper comparison should be between the TPU and an equivalent DSP doing 8b computations. That would show if eliminating the energy consumed due to the Register File accesses is significant.I suspect most of the energy saving comes from having a huge on chip memory.

[1] From slide 21

Function Energy in Pj

8-bit add 0.03

32-bit add 0.1

FP Multiply 16-bit 1.1

FP Multiply32-bit 3.7

Register file *6

L1 cache access 10

L2 cache access 20

L3 cache access 100

Off-chip DRAM access 1,300-2,600

Big chipmakers are turning to architectural improvements such as chiplets, faster throughput both on-chip and off-chip, and concentrating more work per operation or cycle, in order to ramp up processing speeds and efficiency https://semiengineering.com/chiplets-faster-interconnects-an...

Scaling certainly isn’t dead. There will still will be chips developed at 5nm and 3nm, primarily because you need to put more and different types of processors/accelerators and memories on a die. But this isn’t just about scaling of logic and memory for power, performance and area reasons, as defined by Moore’s Law. The big problem now is that some of the new AI/ML chips are larger than reticle size, which means you have to stitch multiple die together. Shrinking allows you to put all of this on a single die. These are basically massively parallel architectures on a chip. Scaling provides the means to make this happen, but by itself it is a small part of total the power/performance improvement. At 3nm, you’d be lucky to get 20% P/P improvements, and even that will require new materials like cobalt and a new transistor structure like gate-all-around FETs. A lot of these new chips are promising for orders of magnitude improvement—100 to 1,000X, and you can’t achieve that with scaling alone. That requires other chips, like HBM memory, with a high speed interconnect like an interposer or a bridge, as well as more efficient/sparser algorithms. So scaling is still important, but not for the same reasons it used to be.

It is not that I disagree with Hennessy, but I think it is premature to conclude that general-purpose processors have reached the end of the road. There is a healthy middle in between specialized and general-purpose design. Exploiting that middle is what I think will deliver the next generation of growth. That is exactly what naturally occurred with SoC and mobile design.

The raw computational capabilities of the TPU don't really prove anything. Of course co-design wins. Whether it is vison or NLP -- NN training has dominant characteristics. The arithmetic is known: GeMM. The control is known: SGD. Tailoring control and memory-hierarchy to this is a no-brainer and of course the economic incentives at Google push them in this direction and of course the expertise available at Google powered this success. For other applications it is not so clear.

Finding similar dominance in other applications is trickier. To accelerate an application with a specialized architecture you need dominating characteristics in the apps memory-access, computational, and control profiles.

It's odd that the presentation doesn't discuss alternatives to using silicon. Ultimately, this is akin to saying that there are limits on how small a vacuum tube we can make. We already know of a number of other potential computing platforms such as graphene, photonics, memristors, and so on. These things have already been discovered, and they have been shown to work in the lab. It's really just a matter of putting the effort into producing these technologies at scale.

Another interesting aspect of moving to a more efficient substrate would be that power requirements for the devices will also lower as per Koomey's law https://en.wikipedia.org/wiki/Koomey%27s_law

Well...no. what it says is there are limits on how small of wires we can make, and how small of layers make a material functional (about 5).

Wires can't get smaller without compromising RC (and thus speed). Quite horrifically: this is way more an issue than the transistor.

Graphene and photonics don't help this. At all. It isn't a matter of how small a tube. You physically need 5nm to insulate, and 5nm for a functional material. So a 5nm device with a 5nm spacer and a 5nm space to the next device is about it. The smallest pitch of any physical device is 20nm. The critical pitches in wafer are about 30nm and 40nm, so in an ideal world, we can reach 3x, ever. It doesn't matter which material you choose.

And yeah, you can stack up, but not in quite the way you dream, and thermal and processing issues make this hard in most domains. When I build, I deposit at temperatures, which affect underlying layers. So stacking doesn't quite work as you might expect. Again, real materials Ina real flow are actually different, and not in a trivial 'just make it work' reducible fashion.

Memristors may not really exist, and are useful in the context of high speed memory. That has real physical challenges. And people.have spent billions for decades on this problem.

Anyway, this is missing some background, but the presentation is great.

We already know you can use individual atoms as transistors [1]. So, clearly we can go a lot smaller than 5mm here. Obviously there are challenges in scaling these new substrates up to create useful chips, and creating the infrastructure to put them into mass production. My point is that we know this is possible, and an inflection point will come where investing in these new substrates starts being more lucrative than trying to squeeze more out of silicon.

[1] https://www.sciencedaily.com/releases/2018/08/180816101939.h...

> It's odd that the presentation doesn't discuss alternatives to using silicon.

You must have missed slide 41, which has a "beyond silicon" bullet.

ah appears that I did

That we are exploring other computing substrates does not mean that those substrates will be economical or practical to use. They either will be or they won't (a determination which is ultimately dependent on the laws of physics). Our exploration of other substrates is a necessary but insufficient precondition to actually putting other computing substrates into production.

It's kind of silly to assume that what we have now can't be improved significantly. We've only been doing this for a very short time in the grand scheme of things. We're certainly nowhere close to hitting any limitations when it comes to the laws of physics here.

We just won't know until we get there. We have hit fundamental limits of physics for other technologies. Planes don't really go any faster today than they did 40 years ago. Cars can go faster than they did 40 years ago, but they don't in practice. We don't have flying cars. Etc.

I'm not saying that none of these other computing substrates won't work, just pointing out that the simple fact that we are exploring them does not mean that they will. Technological progress is neither guaranteed nor automatic.

I mean yes we literally don't know right now. I'm just saying given where our civilization is at, I think it's pretty safe to assume that we have a lot of room to grow here. Once we've done computing for a thousand years or so we'll be looking back at this the way we look at cavemen banging rocks together.

Seems like a decent probability outcome to me. But it also seems like a decent probability outcome that we hit the end of the line of what’s feasible with silicon lithography in the next couple of decades and computing substrate progress basically stalls out for decades or longer until some other breakthrough happens. A parallel would be the stall in progress on personal automobiles until lithium ion batteries made electrification possible in the last decade.

Agree completely.

"WASTED WORK ON THE INTEL CORE I7", slide#12 (page 13 in pdf) is fascinating to me. But I want to know how the data was collected, and what the % wasted work actually means.

40% wasted work, does that mean that they checked the branch-predictor and found that 40% of the time was spent on (wrongfully) speculated branches?

It also suggests that for all of the power-efficiency faults of branch predictors (aka: running power-consuming computations when it was "unnecessary"), the best you could do is maybe a 40% reduction in power consumption (no task seems to be 40% inefficient).


When someone says Intel i5 or i7, I immediately wonder if they're talking about 2008 i7 or 2019 model.

Intel would be smart to retire whole i3/i5/i7/i9 branding. People seem to think every i5 or i7 is the same.

> People seem to think every i5 or i7 is the same.

Unfortunately, this is a feature, not a bug. Intel wants their branding to have this effect... the lay-person isn't supposed to understand Sandy Bridge (i7-2700k) vs Skylake (i7-6700k)

So Intel wants laypersons not to realize there's something faster available and to upgrade their x86 based systems?

Before that era people didn't know much about the details either, but they did understand 800 MHz was faster than 533 MHz.

Still be too early to call the end of the march of microprocessors though.


The limits they are running up against are indeed crisises, but they're probably going to be able to find that they can copy whatever it is that biology is doing and squeeze out quite a bit more. The tradeoffs will get a lot weirder though.

Humans are not good at general purpose computation. Your linked article states the brain achieves 1 exaflops, and cites http://people.uwplatt.edu/~yangq/csse411/csse411-materials/s... for this number. That document states the value with no citation or rationale.

I can do far less than 0.0001 single precision floating point operations per second, so whatever the context for "1 exaflops" is, it isn't general purpose computation.

EDIT: this seems sort of like saying that throwing a brick through a window achieves many exaflops because simulating the physics in real time would require that performance. I'd like to read more about this value and how someone came up with it, but googling just gives me that same scienceabc article and stuff referencing it.

Nahh, you could do more than 0.0001 floating point operations per second. To beat that you need to do a single floating point operation in two hours, which is quite achievable with paper and pencil ;-)

0.01 floating point operations per second seems harder, but perhaps humanly doable.

I'm easily distracted.

Amin's keynote is relevant here: https://onfconnect2019.sched.com/event/RzZl

The basic form of computing is becoming distributed. More are coming.

I'm amazed that it's less than a picojoule to do an 8 bit add.

The Landauer limit is about a billion times smaller than this, so there's room for power savings before we hit any physical limits.

So what's the name of the metric flop/sec/USD because that keeps on growing exponentially thanks to GPUs/TPUs, a paradigm shift predicted by Ray Kurzweil.

Is there a video of this talk available somewhere?

Also can someone tell me what p4 is? Looks like almost every company and a bunch of universities are "contributors" there.

P4 is a domain-specific language for specifying packet forwarding pipelines, i.e. the hardware that takes packets in one port, decodes their headers (e.g. destination MAC or IP address), munges them somehow (e.g. updating TTL, destination MAC, and checksum), and sends them out another port. This enables you to build all sorts of network devices from Ethernet switches to IP routers to RDMA fabrics, etc.. You can compile P4 onto a CPU, a smart NIC, an NPU, a programmable ASIC, an FPGA, etc.. It can also be used a bit like EBPF and compiled into a pipeline in the Linux kernel.

Basically P4 allows you to (re)program your network data plane to do whatever you want, and you can create new network protocols or change the way existing ones work without having to change your hardware and without losing line rate performance.

It's also somewhat like EBPF, but it compiles to hardware as well as software.

One interesting example of switch company using P4 is Arista, who have rolled out multi-function programmable switches (7170 series) that can be repurposed/reprogrammed with different personality profiles/operational modes as needed. Some of the profiles are things like stateful firewall/ACLs (up to 100k), large-scale NAT (again 100k), large-scale tunnel termination (up to 192k), packet inspection/telemetry (first 128 bytes), and segment routing (basically source routing over network segments.) And it is also user-programmable.


P4 is a domain-specific programming language for accelerated packet header processing in switches and NICs.

One of the more interesting things I’ve read on HN in awhile. Seems like this will result in a large paradigm shift for the computing industry.

I think we've already seen the shift with cellphones.

I think consumer facing performance processors will fade.

Data centers will continue to push for more performance. It could mean less rack space, less power consumption, and less to manage.

Cell phone/tablet focused processors will become powerful enough to handle the majority of daily tasks while enjoying extended battery life.

There's an internet meme about "Imminent death of Moore's law predicted".

All Moore's law talks about is the density of transistors on a chip, and it's never been a linear progression of numbers. Recently I've seen news articles about some research into 5nm processes and other methods for increasing density of components on silicon, so it seems Moore's law (really Moore's rule of thumb or Moore's casual observation) isn't done yet.

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact