Hacker News new | past | comments | ask | show | jobs | submit | dwrodri's comments login

Genuine question: I'm aware that 100% automation for pretty much all agricultural processes is likely highly impractical if not borderline infeasible. As someone inexperienced with either robotics or agriculture, are you saying that chasing the "100%" is bad or are you saying that we're already mostly past the point of diminishing returns? My uninformed opinion is that I agree with the statement "let farmers farm", but we should keep looking at robots / software / AI to find ways to let farmers farm more for less. But keeping the mindset of building tools to enable individuals to do more is essential, moreso than the idea of replacing the farmer.


This is part of the misconception.

Many agricultural processes are already automated, in the field.

Combines drive themselves, drone spraying, on and on. If you go to a real, modern farm you will find a ton of automation. We don't need them to be vertical or indoors to do so.


Farming is HUGELY automated, its not automation that is the problem, it is people outside of farming thinking they can improve processes without FIRST understanding what they are doing.

For example, vertical farming aims replace the SUN with LED lights. The "garden bot" aims to replace rain and watching the weather with a watering robot. These are not the things plants "want".


I came back to this reply after a little bit of research and investigation into our own stack. Admittedly, I ended up in this role after hopping around the startup scene a little bit but after doing a little digging... honestly I am a little embarrassed as an MLOps person that I have only barely heard of this name and never really looked into this more. We have lots of in-house tooling that is trying to imitate what Airflow does, but we spend a lot of time maintaining ourselves and it might make a lot of sense if this just works for our cloud environment, and it would lift a huge burden off of the rest of our engineering team to go work on other stuff.

Thanks again. I am a big fan of stable, mature, software that just does it's job and this looks exactly like that.


I think I will read this multiple times and look into Apache Airflow. Thank you for this.


7 years of the same title format is all you need.


I am building a TCG card management app as a way to keep my machine learning skills sharp and to learn iOS development.

Pretty standard CRUD flows + plenty of opportunity to get comfy with iterating and deploying software on iOS.

This app will basically be the playground for me to work on: - Frontend / UX - Computer vision (scan in cards w/ Camera) - recommender systems (what cards are good suggestions to add to this deck?) - Clustering (what decks are similar to this deck?)

Websites like TappedOut and Archidekt are great products, but as someone who plays casually and doesn't do much to keep up with releases I find assembling fun Commander decks from singles to be quite a time investment, and I'd like to see if I can focus in on that UX specifically for myself.


Tangentially related to the post: I have what I think is a related computer vision problem I would like to solve and need some pointers on how you would go about doing it.

My desk is currently set up such that I have a large monitor in the middle. I'd like to look at the center of the screen when taking calls. I'd also like it to appear as though I am looking straight into the camera, and the camera is pointed at my face. Obviously, I cannot physically place the camera right in front of the monitor as that would be seriously inconvenient. Some laptops solve but I don't think their methods apply here as the top of my monitor ends up being quite a bit higher than what would look "good" for simple eye correction.

I have multiple webcams that I can place around the monitor to my liking. I would like to have something similar to what is seen when you open this webpage, but for a video. hopefully at higher quality since I'm not constrained to a monocular source.

I've dabbled a bit with OpenCV in the past, but the most I've done is a little camera calibration for de-warping fisheye lenses. Any ideas on what work I should look into to get started with this?

In my head, I'm picturing two camera sources: one above and one below the monitor. The "synthetic" projected perspective would be in the middle of the two.

Is capturing a point cloud from a stereo source and then reprojecting with splats the most "straightforward" way to do this? Any and all papers/advice are welcome. I'm a little rusty on the math side but I figure a healthy mix of Szeliski's Computer Vision, Wolfram Alpha, a chatbot, and of course perseverance will get me there.


This is a solved problem on some platforms (Zoom and Teams), which alter your eyes so they look like they are staring into the camera. Basically you drop your monitor down low (so the camera is more centered on your head) and let software fix your eyes.

If you want your head to actually be centered, there are also some "center screen webcams" that exist that plop into the middle of your screen during a call. There are a few types, thin webcams that drape down, and clear "webcam holders" that hold your webcam at the center of your screen, which are a bit less convenient.

Nvidia also has a software package you can use, but I believe it is a bit fiddle to get setup.


> Some laptops solve but I don't think their methods apply here as the top of my monitor ends up being quite a bit higher than what would look "good" for simple eye correction.

I appreciate the pragmatism of buying another thing to solve the problem but I am hoping to solve this with stuff I already own.

I’d be lying if the nerd cred of overengineering the solution wasn’t attractive as well.


If you want overengineered and some street cred, instead of chaging the image to make it seem like you're looking in a new place, how about creating a virtual camera exactly where you want to look, from a 3D reconstruction??

Here's how I'd have done it in grad school a million years ago (my advisor was the computer vision main teacher at my uni)

If you have two webcams, you can put them on either side of your monitor at eye level (or half way up the monitor), do stereo reconstruction in real time (using e.g., opencv), create an artificial viewpoint between the two cameras and re-project the construction to the point that is the average of the two camera positions to create a new image. Then, feed that image to a virtual camera device. The zoom call connects to the virtual camera device. (on linux this might be as simple as setting up a /dev/ node)

It's much easier to reconstruct a little left / right of a face when you have both left and right images, than it is to reconstruct higher / lower when you have only above or below. This is because faces are not symmetric up/down.

This would work, it would be kinda janky, but it can be done realtime with modern hardware using cheap webcams, python, and some coding.

The hardest part is creating the virtual webcam device that the zoom call would connect to, but my guess is there's a pip for that.

Any imager would do, but quality would improve with:

* Synchronized capture - e.g., an edge triggered camera with, say, a rasp pi triggering capture

* Additional range information, say, from a kinnect or cell phone lidar

* A little delay to buffer frames so you can do time-series matching and interpolation


Have you seen the work done with multiple Kinect cameras in 2015? https://www.inavateonthenet.net/news/article/kinect-camera-a...

Creating a depth field with monocular camera is now possible, so that may help you get further with this.


One approach you could try is to use the webcam input to create a deepfake that you place onto a 3D model, one you can rotate around.

It should be doable real-time, but might be stuck in the uncanny valley.

Also maybe look at what Meta and Apple's Vision Pro are doing to create their avatars.


Nvidia broadcast does a pretty good job of deepfaking your eyes to look cat the camera... then again thats not as fun


FaceTime can do the eye correction.


If you really want to see some esoteric computer architecture ideas, check out Mill Computing: https://millcomputing.com/wiki/Architecture. I don't think they've etched any of their designs into silicon, but very fascinating ideas nonetheless.


Something which wasn't addressed fully but might be worth discussing further: Both Unity and Unreal Engine 4/5 make use of Vulkan, and thus every game made with these engines which runs on Windows and Linux almost certainly is using Vulkan somewhere (please correct me if I'm wrong!). I have a very hard time believing that people are making fewer games with these engines now.

This isn't to say that Windows games built on these engines aren't entirely running on DX12 code either. I think most games these days give you the choice to pick your graphics backend. It's an impossible ask, but I'd love to see what the stats are for Unreal/Unity graphics API usage across games.

That being said, on the iOS/macOS front, I don't know what these games are using to deploy to that platform. It could be that they use MoltenVK, but I could also see them using OpenGL or their own Metal rendering pipelines. As someone who grew up gaming on a desktop PC, I forget that smartphones and tablets are the future of the gaming industry. It felt weird to see Apple showing someone sitting on their couch with their iPhone 15 connected to the television, playing No Man's Sky via bluetooth controller, but simultaneously really cool.

It appears to me that languages born out of a design-by-committee process struggle to make anyone exceptionally happy, because the only way the language moves forward is by keeping all of its members equally miserable, or my masquerading as one language when in reality it's closer to five different languages held together by compiler flags and committee meetings.


To a first approximation, games using Unity/UE use D3D on Windows, OpenGL and Vulkan on Android, and Metal on iOS. Native Linux builds are not worth it, Proton (which uses Vulkan) is good enough.

Vulkan is not really a design by committee API, it is pretty much what happens when an IHV (AMD in this case) with poorly performing drivers gets to design an API (Mantle) without any adults in the room. D3D12 strikes a somewhat and Metal a much better balance in terms of usability.


The design by committee is spot on, if anything, AMD rescued OpenGL vNext to turn into yet another Long Peaks, or OpenCL 2.0.

Had not been for them to offer Mantle to Khronos, to this day you would most likely be getting OpenGL 5 with another extension batch, not that Vulkan isn't already an extension soup anyway.


I know Unity at least defaults to DX11 and I believe it doesn't package other renderers at all unless you explicitly enable them in Project Settings or use a rendering feature that requires them, and I can't imagine many people are digging into the renderer settings without good reason


Perhaps I am projecting my experience from the before times, when GPUs only supported certain versions of DirectX, or certain features were causing crashes on different systems.

It never super common but I remember doing it recently for Path of Exile while trying to get the most out of my Macbook's performance.


Both engines support DirectX 11 and 12, Metal, Vulkan, GNM and GNMX, and NVN.

Mostly game builds tend to use the native option of the target platform by default.


This is the sort of thing I expected to see when Chris Lattner moved to Google and started working on the Swift for Tensorflow project. I am so grateful that someone is making it happen!

I remember being taught how to write Prolog in University, and then being shown how close the relationship was between building something that parses a grammar and building something that generates valid examples of that grammar. When I saw compiler/language level support for differentiation, I the spark went off in my brain the same way: "If you can build a program which follows a set of rules, and the rules for that language can be differentiated, could you not code a simulation in that differentiable language and then identify the optimal policy using it's gradients?"

Best of luck on your work!


Thanks! You may find DeepProbLog by Manhaeve et al. interesting, which brings together logic programming, probabilistic programming and gradient descent/neural networks. Also, more generally, I believe in the field of program synthesis there is some research on deriving programs with gradient descent. However, as also pointed out in the comment below, gradient descent may not always be the best approach to such problems (e.g., https://arxiv.org/abs/1608.04428).


>> "If you can build a program which follows a set of rules, and the rules for that language can be differentiated, could you not code a simulation in that differentiable language and then identify the optimal policy using it's gradients?"

What's a "policy" here? In optimal control (and reinforcement learning) a policy is a function from a set of states to a set of actions, each action a transition between states. In a program synthesis context I guess that translates to a function from a set of _program_ states to a set of operations?

What is an "optimal" policy then? One that transitions between an initial state and a goal state in the least number of operations?

With those assumptions in place, I don't think you want to do that with greadient descent: it will get stuck in local minima and fail in both optimality and generalisation.

Generalisation is easier to explain. Consider a program that has to traverse a graph. We can visualise it as solving a maze. Suppose we have two mazes, A and B, as below:

        A               B
  S □ □ ■ □ □ □   S □ □ ■ □ □ □ 
  ■ ■ □ ■ □ ■ □   ■ ■ □ ■ □ ■ □ 
  □ ■ □ ■ □ ■ □   □ ■ □ ■ □ ■ □ 
  □ ■ □ ■ ■ ■ □   □ ■ □ ■ ■ ■ □ 
  □ ■ □ ■ □ □ □   □ ■ □ ■ □ □ □ 
  □ ■ □ ■ □ ■ □   □ ■ □ ■ □ ■ □ 
  □ □ □ □ □ ■ E   E □ □ □ □ ■ □ 
Black squares are walls. Note that the two mazes are identical but the exit ("E") is in a different place. An optimal policy that solves maze A will fail on maze B and v.v. Meaning that for some classes of problem there is no policy that is optimal for the every instance in the class and finding an optimal solution requires computation. You can't just set some weights in a function and call it a day.

It's also easy to see what classes of problems are not amenable to this kind of solution: any decision problem that cannot be solved by a regular automaton (i.e. one that is no more than regular). Where there's branching structure that introduces ambiguity -think of two different parses for one string in a language- you need a context-free grammar or above.

That's a problem in Reinforcement Learning where "agents" (i.e. policies) can solve any instance of complex environment classes perfectly, but fail when tested in a different instance [1].

You'll get the same problem with program synthesis.

___________

[1] This paper:

Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability

https://arxiv.org/abs/2107.06277

makes the point with what felt like a very convoluted example about a robotic zoo keeper looking for the otter habitat in a new zoo etc. I think it's much more obvious what's going on when we study the problem in a grid like a maze: there are ambiguities and a solution cannot be left to a policy that acts like a regular automaton.


Thank for taking the time to explain such a worked out example. I was indeed picturing something a long the lines of "If you could write a program equivalent to a game where you solve a maze, could you produce a maze-solver program if the game were made in this runtime.


Not really. The world of Bayesian modelling has much fancier tools: Hamiltonian MC. See MC Stan. There’s also been Gibbs samplers and other techniques which support discrete decisions for donkeys years.

You can write down just about anything as a BUGS model for example, but “identifying the model” —- finding the uniquely best parameters, even though it’s a globally optimisation —- is often very difficult.

Gradient descent is significantly more limiting than that. Worth understanding MC. The old school is a high bar to jump.


I wrote a Gibbs Sampler to try and fit a Latent Dirichlet Allocation model on arXiv abstracts many moons ago! I'd probably have to start from primitive stuff if I were to give it another go today.

I agree with everything you've said so far: getting to the point where you can use gradient descent to solve your problem often requires simplifying your model down to the point where you're not sure how well it represents reality.

My lived experience--and perhaps this is just showing my ignorance--I've had a much harder time getting anything Bayesian to scale up to larger datasets and every time I've worked with graphical models it's just such a PITA compared to what we're seeing now where we can slap a Transformer Layer with embeddings and get a decent baseline. The Bitter Lesson has empowered the lazy, proverbially speaking.

Tensorflow has a GPU-accelerated implementation of Black Box Variational Inference, and I've been meaning to revisit that project for some time. No clue about their MC sampler implementations. Then I stumbled across https://www.connectedpapers.com/ and Twitter locked up it's API, so admittedly both of those took a lot of the wind out of my sail.

Currently saving up my money so that I can buy Kevin Murphy's (I think he's on here as murphyk) two new books that released not too long ago https://probml.github.io/pml-book/. The draft PDFs are on the website, but unfortunately I'm one of those people who can't push themselves to actually read a text if it's not something I can hold in my hands.


MC: Monte Carlo


I have been planning to work on something like this. I think that eventually, someone will crack the "binary in -> good source code out of LLM" pipeline but we are probably a few years away from that still. I say a few years because I don't think there's a huge pile of money sitting at the end of this problem, but maybe I'm wrong.

A really good "stop-gap" approach would be to build a decompilation pipeline using Ghidra in headless mode and then combine the strict syntax correctness of a decompiler with the "intuition/system 1 skills" of an LLM. My inspiration for this setup comes from two recent advancements, both shared here on HN:

1. AlphaGeometry: The Decompiler and the LLM should complement each other, covering each other's weaknesses. https://deepmind.google/discover/blog/alphageometry-an-olymp...

2. AICI: We need a better way of "hacking" on top of these models, and being able to use something like AICI as the "glue" to coordinate the generation of C source. I don't really want the weights of my LLM to be used to generate syntactically correct C source, I want the LLM to think in terms of variable names, "snippet patterns" and architectural choices while other tools (Ghidra, LLVM) worry about the rest. https://github.com/microsoft/aici

Obviously this is all hand-wavey armchair commentary from a former grad student who just thinks this stuff is cool. Huge props to these researchers for diving into this. I know the authors already mentioned incorporating Ghidra into their future work, so I know they're on the right track.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: