
Is multimedia programming inherently hard? - Brazilian-Sigma
I spent 100+ hours developing a system that performs fairly simple functionality: capture screen&#x2F;connected webcams on  motion detection, add overlays, and save it as .mp4.  The system grew up to be a mess, so I rewrote it from scratch again spending 60+ hours, this time with functionality to capture sources in parallel.
It uses Windows Media Foundation API, written in 7000 lines of C++ code.<p>The thing is, it doesn&#x27;t work and has nasty bugs on slower systems, it crashes once in a while, sometimes has audio&#x2F;video synchronization issues, etc. - the bugs are mostly very hard to reproduce.  Prior to this, I considered myself to be a good programmer but it has almost made me reevaluate my decision whether I should pursue programming anymore (I&#x27;m a CS student), not to mention the toll it took on my self-worth and mental health.<p>The question is, what&#x27;s the reason after spending  so much time, the end result is mostly unusable: quirkiness of Windows API, C++, or just my incompetence as a programmer?  Or is multimedia programming by nature tricky?<p>P.S. Please ignore my username and new account as I&#x27;m too embarrassed to ask from my real account that gives away my identity.
======
photawe
If I understand what you're saying correctly, you would need to pretty much be
medium-to-intermediate at multi-threaded programming + C++.

Capturing screen/webcams and adding extra layers can get really complex. If
you can't simulate and easily test it, it's a recipe for disaster.

Using MF API is quite old (relies on DirectX 9 as far as I remember).

C++/COM/multi-media is not such a good match - I always felt like puking when
having to deal with COM interfaces.

I would strongly suggest you switch to C# (note: I've programmed 13 years in
C++, and the last 9 in C#). There's pitfalls in this too - namely, not sure
if/how easy you can easily do multimedia without UWP (Universal Windows
Platform). That's a big can of worms, really hard to grasp, big learning
curve, but well worth it in time (if you want to see what you can achieve, you
can take a quick look at www.phot-awe.com - that's UWP).

Having said all the above - I highly suggest C#/UWP, but DO NOT expect an easy
ride. It will take quite a while to be productive, but that will be time well
spent.

------
thecupisblue
60+ hours is not that much tho.

> The thing is, it doesn't work

What doesn't work? Why doesn't it work? Is the source of the problems in code
or in the libraries?

Have you covered all the edge cases and handled all error states?

>and has nasty bugs on slower systems

Due to concurrency issues or?

>it crashes once in a while

That happens! Find/Write a reporting tool and analyse the traces :)

>Sometimes has audio/video synchronization issues

Smells like concurrency issues to me.

Hey,it's all good. You wrote something cool that connects a dozen different
technologies, did it all in C++ in a short time frame - and you're just a
student! That's great!

You're not a bad programmer - maybe you're just bad at system design/software
architecture. And that's not bad - that's great. A lot of folks go into the
field and work for a few years until they start learning about architecture
and experience what problems it can cause. You just had that experience, now
go read up and figure out how to rewrite it into V3. Take some extra time to
think about the models, the abstractions, the components, go through edge
cases in your head, figuring out if and how your system acccounts for them.
Try not to overengineer it - you probably will - but don't think that means
you will suck forever, you will learn with time and experience.

Imagine if Picasso quit after his first sucky drawing. Maybe even HN wouldn't
be here.

~~~
Brazilian-Sigma
Thanks for your answer, I think I want to address a few points.

>What doesn't work? Why doesn't it work? Is the source of the problems in code
or in the libraries?

>Due to concurrency issues or?

It's difficult to name one but probably the biggest issue at the moment is
frame rates randomly drop at some point the the resultant video is garbled (1
minute is played in a few seconds, etc.)

Possibly concurrency is one of the reasons. The Media Foundation encoder
expects video and audio frames at regular intervals and if because of
performance issues, some frames drop, it gets into a vicious cycle and frame
drops accelerate until it comes to stand still.

>You're not a bad programmer - maybe you're just bad at system design/software
architecture.

<rant> Unfortunately, I think I am because I suck at algorithmic challenges
(competitive programming), system design, reading documentation for external
APIs, in short nearly everything that is to programming. I learned programming
at 13, it goes to show I have been programming for quite some time and others
who started it in university went to do amazing things while I suck at basics.
I dreamed of starting a tech startup someday but it seems unlikely now.</rant>

>figure out how to rewrite it into V3

My client is a SOB and doesn't appreciate my efforts nor rewards them, so it's
unlikely. Besides, I have signed NDA so starting open source project is out of
the question. I'm thinking of starting some other projects, though.

>Imagine if Picasso quit after his first sucky drawing. Maybe even HN wouldn't
be here.

I liked that lowkey compliment but I doubt I am worthy of this.

------
psyklic
You show admirable qualities that good programmers have. For example --
boldness to take on a challenging project, tenacity to pursue a rewrite, and
drive to pursue improvements such as parallelism. After all, if you only
pursue easy projects then you are unlikely to improve your skills!

There are definitely challenges involved with multimedia apps. They demand
good performance, and they require you to rely on third-party libraries and
drivers.

I assume you are decent at debugging to already have gotten as far as you
have. So, I would continue to take guesses at what the problem might be then
test whether you are correct. Based on your description, perhaps the CPU
cannot always keep up with the framerate. You could test this (for example) by
seeing if rendering a lower-resolution output would alleviate some of the
bugs.

If none of your guesses seem to work, pare your code down to make it as simple
as possible (e.g. a basic webcam app without overlays). Once you get it
reliably working, then slowly re-add your features. Worst case, you can always
post-process the video after streaming it to disk, or have a tool such as
FFmpeg add the overlay.

Good luck!

~~~
Brazilian-Sigma
I am starting to appreciate incremental advice. As for taking on challenging
projects, it seems like, on surface level, a very trivial problem to solve.
They keep talking about the hard "technical problems" (you know, what's the
hardest technical problem you solved is a common interview question).

A particular question piques my interest: if this is a simple problem, what
are the hard problems and can I even approach them if I fail at a simple one?
Or alternatively, if this is a hard problem, why is it hard despite delivering
little value to the end user, and what are the simple problems?

~~~
psyklic
I suggest focusing on problems that interest and challenge you, and not worry
if others consider them "hard" or not. The interview question probably depends
more on how you explain the problem, rather than its actual difficulty.

------
lucozade
Multimedia is tricky but, from the way you describe it, you've come across the
more general problem that writing concurrent software in general, and parallel
software in particular, can be a real sod.

What you're likely to be experiencing are bugs due to race conditions. These
are notoriously hard to debug.

It's one of the reasons why there's so much fuss around languages like Rust
and Go. They have builtin features that try to make these types of issue less
prevalent. I appreciate this doesn't help you directly but should make you
feel a bit better; there are whole engineering teams working to avoid the
types of issues that you're experiencing.

In terms of how to help you move forward, that's quite hard in this format.
Sorry.

~~~
Brazilian-Sigma
I appreciate your comment. You have described Rust and Go, it never fails to
amaze me how languages differ in their power as Paul Graham would have put it.

As a sidenote, I wanted to use Python/Electron.js for it but my client
wouldn't allow it. I had to convince him to use Media Foundation, else he
would have asked me to write an H264 encoder from scratch!

>In terms of how to help you move forward, that's quite hard in this format.
Sorry.

Didn't get it, do you mean the way I formatted my question? I'm sorry but I'm
not a native English speaker.

~~~
lodi
Your formatting and English are fine; parent poster meant that it's hard to
give good advice over hacker news comments.

------
saltcured
It sounds like you are talking about real-time data capture, analysis,
visualization, encoding, and archiving. If so, what you describe is not just
multimedia. It is a real-world hybrid of multiple specialities, and definitely
becomes more difficult because you have to address all these aspects at once
to meet your original goal.

If you had started with pre-recorded videos, you could imagine having pursued
the challenges incrementally. Process a video and output some frame-by-frame
analysis metadata. Process the analysis results and visualize them in the
context of the source video imagery. Re-encode the results to a good storage
format. Explore options for quality, correctness, performance, and resource
requirements.

But, if you need to do this on live streams and run for indefinite periods of
time, you suddenly need to be much more aware of computational delays, working
memory sizes, buffering of input and output data, etc. You need it to work
consistently and predictably while handling the full workload and any
potential variability in other elements of the system. How much can the
incoming video stream vary in bandwidth? How much can decoding, analysis,
visualization, and encoding speeds vary with changes in the video content? How
reliably can the input and output IO paths support the bandwidth?

And, as you mentioned, it becomes challenging to then squeeze this into a
slower system than it was originally designed and tested on. Ultimately, you
have to design for failure: think about how you want it to perform if it
cannot satisfy the full objective in the time and resources available. Drop
frames? Degrade stream quality? Drop entire low-priority streams...?

------
kleer001
Yes, it's hard. Plugging anything into the real world at real time is hard.
That is, constraining data to certain bandwidths to play or compress. The real
world is noisy at every level and that's difficult to program around.

------
cyberdrunk
Your intuition is correct, programs which have to do complex processing on
lots of data (e.g. a video feed) in real-time are just inherently very hard.

------
s1t5
Not an answer to your question but I'm struggling with getting anything done
at all and I'm impressed by the amount of work that you've put into this.

------
brudgers
You are not your code.

------
lodi
Don't beat yourself up about it. My impression is that you're probably an
above-average coder.

First of all, doubting your own competence is a _good_ sign. Especially
combined with a willingness to ask questions and act on feedback. I'd be
extremely skeptical hiring someone who had no self-doubt.

It sounds like you just tackled too many novel things at the same time:

\- 160h, 7000 line project is a very large project for a student. Managing
software complexity is its own skill.

\- Multithreaded programming is hard.

\- Performant, low-latency programming is hard.

\- "Systems" programming is hard.

\- Video capture, codecs, etc. are a narrow niche. Microsoft COM API's are not
exactly the simplest things...

\- C++ is a particularly unforgiving language. It doesn't help that the
language is so versatile that 90's C++ libraries/tutorials/etc look nothing
like "modern" C++ from the 00's, which was itself obsoleted by more "modern"
C++ from the 10's...

In terms of advice, I get the impression that you're not looking forward to
writing a third version of this project, so here are a few generic tips for
whatever you choose to do next:

\- It sounds like you wrote the code that does whatever the application is
supposed to do, but didn't write the supporting tests/tooling/etc. For
example, elsewhere you mention that you think the application might be
dropping frames due to this or that, but you're not sure. The next step--and
honestly you need to do this preemptively even before there are any problems--
is to add very detailed logging to provide you with hard data. Furthermore, in
an audiovisual application like this, you'll probably have to write
specialized tools to parse and analyze those logs graphically since they'll be
dumping a tremendous amount of data to disk. Don't be afraid to write more
test and tool code than main application code.

\- Sometimes it's worth it to take some extra time to perform "studies"\--code
that you write to experiment with something, then throw away. For example,
elsewhere in this thread you asked if it would be better to buffer 10 frames
at a time or to process frames synchronously. Well, this is a great
opportunity to do some "science": think about what you expect will happen,
fork the code to a new branch in source control, implement the change in
quick-and-dirty way, and check the results. If it confirmed your hypothesis
and helped, switch back to your original branch and re-implement it in a clean
way. If it confirmed your prediction and didn't help, switch back to your
original branch and just make a note of this experiment in the documentation.
Negative results like this are important since they can reduce the surface
area of things you need to troubleshoot later. If the result was surprising in
some sense, investigate further and if post to IRC or StackOverflow if you get
stuck.

\- Regarding "solving hard problems". Forget about how hard or easy something
is in an absolute sense. I wouldn't even worry too much about comparing
yourself with peers. Just worry about self improvement. And the best way to
improve yourself is to introduce a little bit of novelty at a time. For
example, if you're learning a new language, re-implement some toy application
you've already written before in C++. Don't try to learn a new language while
also introducing known-hard things like networking, concurrency, etc.

