I've been going down this rabbit hole lately and it's really fascinating.
Workflow 1: feed a sawtooth oscillator through filters controllable with knobs. Eventually I realized that it will always sound mechanical, no matter how many filters you stack. That led to
Workflow 2: feed a sawtooth oscillator through a convolution reverb that uses a custom impulse response. For impulse responses, use random sounds downloaded from the internet (like wood strikes), or mixtures of existing instrument sounds. But that felt limiting, so I moved on to
Workflow 3: generate an impulse response wav file with Python, and use that in a convolution reverb to filter the sawtooth. This gave me some more interesting and configurable echoes, but then why start with an oscillator at all? So
Workflow 4: write code to generate the sound on the fly, as a sequence of samples. This way I can mimic some nice properties of physical sounds, like "the nth harmonic has initial amplitude 1/n and decay time 1/n". Also I can get inharmonicity, smearing of harmonics (like in PADsynth algorithm), and other nice things that are out of reach if you start with periodic oscillators.
If I could go back and give advice to my months-younger self, I'd tell me to skip oscillators and filters, and jump straight into generating the sound with code. You need to learn some math about how sound works, but then you'll be unstoppable. For example, here's a short formula I came up with a month ago for generating a pluck-like sound: https://jsfiddle.net/yd4nv5Ls/ It's much simpler than doing the same with prebuilt bricks.
The whole experience made me suspect that there's an alternative approach to building modular synths, based on physical facts about sound (as opposed to either starting with oscillators, or going all the way to digital modeling of strings and bows). It would be similar to physically based rendering in graphics: for example, it would enforce a physically correct relationship between how high a harmonic is and how long it rings, and maybe some other relationship about what happens to harmonics at the start of the sound, etc. But I'm not an expert and can't figure out fully how such a synth would work.
It seems to be easy to implement digitally, but it really isn't, because a lot of the nuances and non-linearities that add weight and colour to synthesis with real electronics aren't present in simple digital emulations.
For pure DSP the choice is more or less between open additive, modal (which is a kind of constrained additive), AI-constrained additive, which is what Google have been playing with, and physical modelling, which is digital modelling of strings and bows.
If you want to "enforce a physically correct relationship" between etc you're going to want AI-constrained additive or physical modelling.
The aesthetics of all of this are a different topic altogether.
I was thinking of relationships like these:
1) Decay time of nth partial falls as a certain formula of n.
2) Frequency of nth partial is slightly different from n * fundamental, by a factor which is a formula of n.
3) Spectrum of nth partial isn't a delta function, but a hump whose width is a formula of n.
All these ideas come from physical effects, but you can use them to generate sounds directly, without any physical modeling or AI. My hunch is that could be more such ideas and they could play together nicely.
Yamaha experimented with this in the 90s with their VL series physical modeling synths, but it never caught on, mostly I think because if you want to have convincing results, you really need alternative midi controllers like a breath controller for woodwind and brass intruments.
An alternative take on why physical modeling synths never really caught on is GigaSampler. It was the first sampler (as far as I can remember) that could playback samples from hard disk, by only keeping the first second or so of the samples in memory. This made it possible to have sampled instruments where for example each key of a piano was sampled at various velocity/loudness values. Resulting in a sampled piano that could span multiple gigabytes. At a time where 128MB of RAM was still quite a lot, this was quite revolutionary. While physical modeling can produce convincing sounds with a potential expressiveness that no sample based instrument will ever match, it's base sound still doesn't sound as 'real' as a properly sampled instrument, recorded in a nice room with good microphones.
 simple overview, including some soft synth alternatives: https://www.musicradar.com/news/tech/blast-from-the-past-yam...
 example breath controller: https://www.akaipro.com/ewi5000
 Review of Gigasampler's successor: https://www.soundonsound.com/reviews/tascam-gigastudio-4
(It helps that some of my favorite producers and composers have used Pianoteq - for me that's Guy Sigsworth and Thomas G:Son, but the Ludovico Einaudi endorsement really clinches it for me.)
I'd much rather be able to just plug MIDI into a plug-in to get say a saxophone line for a song than having to buy a top tier saxophone, learn to play it in a perfectly soundproofed room with great microphone, DAC, ect.
You could also ask someone who already knows how to play the sax to do it for you, and use a midi based sax sound until you have the score perfected as a stopgap.
They are also developing a completely new type of multidimensional embouchure sensing mouthpiece with which to play the instruments. It should be easy to learn but offer deep potential for expressiveness.
Also note that the goal of these instruments is not to faithfully emulate the timbre of any existing instrument (for that use a sampler) but to emulate dynamic behavior with is where the true expressiveness of wind instruments comes from.
Disclosure: I am developing the mouthpiece.
The famous Karplus–Strong algorithm used a burst of white noise as the excitation, but I've had more success using an asymetric triangle-shaped impulse that resembles the shape of the drawn string.
A couple of friends and I are experimenting with collaborating using Github and pull requests.
1. I was using VCV Rack on Windows, but I think my friend was not, so you may need to tweak Git's line ending settings
2. Some changes you just don't want to merge. The other person could have different audio output settings. There can also be rounding errors in the knob values so sometimes the values can be edited slightly even if they were 1.0 before, even if you don't think you changed them, which makes merges a little noisier. But `git add -p` made short work of cleaning that up.
Edit: This is also assuming that you coordinate who takes turns editing at one given time - "ping pong" is easier than free for all.
That said, I got myself a Moog Mother-32 recently, which is a hardware semi-modular synth, and I found it much more musical than software alternatives. I could immediately create much more musical pieces than I could with the software – even though I have MIDI peripherals for the software and the software can create any module I can imagine.
Another good option is SunVox; it's part modular synth, part tracker in a really cool UI. It includes a bunch of simple examples that demonstrate how to use the synth modules.
Back to the topic at hand. I've played with software gizmos, and recently picked up three secondhand units from the Korg Volca range. I'm not particularly musically talented, trying to change that over time, I find the physical controls way way way more intuitive.