If someone were making a network of workstations at a company that needed high performance from commodity hardware, what would be the best thing to use if they were starting from scratch?
You said 'network of workstations'. Then, you would match the application(s) to the hardware. For instance, in a 100% ground-up custom call center I used to run, we distributed diskless Linux to individual nodes and then had them pre-emptively cache portions of the NFS-mounted environment to tmpfs (ie. RAM) in order to preclude IO blocking. The point of the system (ie. the application) should come first, the architecture second (ie. in reality the architecture has to second-guess changes to the application, which generally come thick and fast). RAM is cheap. (Note: A non-workstation, general purpose computing cluster would have different architectural strategies.)
There's not enough information to really answer this question. What is "high performance"? What is "commodity hardware"? How much storage do you actually need? What is the use of the data?
Of course, if you mainly are using Hadoop you'll probably be playing around with HDFS.
If you just have lots of Data and need to process it, maybe Ceph.
If you need POSIX/NFS, you'll probably look at some sort of clustered NFS solution (a la GPFS).
You've also got Gluster and Lustre.
My general recommendation would probably be to examine Ceph first.
Edit: I was assuming you meant some sort of networked storage system, I didn't realize it was for workstations. My answers are probably super overkill for what you need.
I'm wondering about workstations where everyone is accessing shared directories and gigabytes of data, so lots of IOPS and throughput to replace Netapp and not suffer 50MB/s. Basically my thought is that with combinations of SSDs, PCIe SSDs and things like FreeNAS (or some BSD+ZFS configuration) this shouldn't be an extravagant luxury.
You can now get 1 TB SSDs for ~$380, put 8 of these in RAID-10 (or 5 with HW RAID) shared over a 10 GBE NIC with NFS to workstations with 1 GBE / Jumbo frames. Nothing fancy, performance should be good.
That's great for a couple of computers but that isn't a full solution for a couple hundred people all hammering network storage. Also the original post is about NFS, so what do you do with the commodity setup, use NFS or something else?
>but that isn't a full solution for a couple hundred people all hammering network storage.
Come on, I have seen 8 10k SCSI drives/RAID10 in a Windows 2000 box be used by "a couple hundred people". It wasn't great, but this setup would blow that away, esp. if the hosting machine as a decent amount of RAM (32 GB+).
That's too simple for specification. For instance: How much are you ready to pay for the performance in terms of usability, and what are the actual use cases?
For instance NFS is pretty poor if your use case is serving shared network directories for workstations (not really great with complex ACLs), or if you require proper authentication for the resources (the Achilles' heel of NFS - where's my kerberos support?!?!!). You might be better off with CIFS.
As the other comments already pointed out, there are several alternatives for several needs. Performance alone should not guide your selection.
We're using Kerberos to authenticate the NFSv4 shares, which seems to work fine. The usernames of the clients need to exist on the server though (but doesn't have to have the same uid as on the client) - you can probably set up a centralized user database with ldap, sssd, ipa or something like that - we havn't done that yet though.
NFS has many problems but Kerberos support isn't one of them. It was designed from the start to be part of NFSv4, and is the most straightforward way to secure it.
v4 (or v4.1) added a few things for increased security. You can probably use LDAP/Kerberos with it. In my experience, R/W throughput with CIFS is fairly bad.