Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What problem in your industry is a potential startup?
658 points by takinola on Dec 9, 2016 | hide | past | favorite | 706 comments

Data management.

1) Cleaning the data as it comes in rather than in batches so we can use it sooner, invalid data is discarded, outlier detection, normalizing inputs etc....

2) Warehousing of the data with proper indexes so you can perform some advanced queries on unstructured data

3) Some data is sent in bulk at the end of day, some of the data is streamed in fire hose style. How can we preprocess the fire hose data so that we don't have to wait until the end of the day to parse it all.

4) Oh and all of this data is unstructured and comes from 75 different sources.

Soon the average hedge fund will have more people just cleaning and managing data than they do in quantitative research, dev ops, software development and trading.

Oh and lots of the data is considered proprietary so while AWS/Azure, etc is fine, sending it to a third party to process is not.

TL/DR Help me, I'm drowning in data. How do I get the time from when I acquire data to when I trade based on it down to a reasonable time frame, where reasonable is closer to hours rather than days/weeks.

Great questions. I worked on similar problems in the weather/ag space for a few years, trying to minimize the time between data was acquired and data is ready to inform a decision.

We threw every rule out the window in the name of performance _when fetching raw data from external sources_. So we had weather station networks, NOAA forecast runs and NASA satellite data in a workable schema in our shop way faster than average. Mix of C, PowerShell, Perl, and the nonstandard parts of T-SQL, highly parallelized, tricky but fast.

After the "workable schema" was established, the rules came back and we acted more responsibly. Smart instead of clever.

Ran this stuff all day long, getting every piece of data asap. Things that can only be calculated with a full day of data we poked and prodded the meteorologists to express in "partial aggregates", which to me were just like the map steps before an EOD reduce.

Took a lot of mutual understanding and iterating but worth it in the end. When the ultimate data source (satellite or radar site for us) posted its last hour of data, we were 95% done with the day's computation work. We do our last step, publish our numbers, and bam, Our ag clients have this stuff a day earlier than they are used to.

hi mmsmatt, you mentioned your worked on the weather/ag space, is it ok to ask you a few questions? thank you!

Check out Striim


It's a native streaming platform so your data will be cleansed, processed, scanned for outliers event-by-event rather than in batches. We have dozens of streaming connectors IT/Enterprise/Web data sources. We also support initial load for your firehose data. For unstructured data, we have support for RegEx based parsers.

Shoot me a message if you have any more questions. We have many big name users in Aerospace, Banking, Device manufacturing, and Logistics industries.

We solved RegEx and can guarantee worst-case linear time matching for any number of expressions. We should talk!


    guarantee worst-case linear time matching for any
    number of expressions
Or you could just use RE2: https://github.com/google/re2/wiki/WhyRE2

It's open source, and match time is linear in the length of the input string.

(Disclosure: I work at Google on a different open source project)

RE2 is great! I first discovered it while building namegrep.com and after benchmarking it against PCRE I never looked back. :)

Interesting. We worked in this area for many years, did the 'startup' in this area, now acquired by Intel, open sourced the product (ob plug: https://github.com/01org/hyperscan ) and I would never claim that we are anything near 'solving' regex.

More like "mitigating" or "occasionally getting regex slightly more right than some other solutions". There are many different approaches to regex and all seem to focus on different parts of functionality (RE2 focuses on quick compiles and simplicity, libpcre has 'all the functionality', we're about streaming + large scale + high performance if you can tolerate long compiles and lots of complexity). A number of new projects are trying very interesting approaches, like icgrep and the Rust regex guys.

I would be curious to hear about your approach.

This is a area I like to work.

I have seen local companies working for months/years to finally use his BI package but the trouble at the step 1 is big (and also, to put the data in a "nice" schema).

The problem is that enter in this space is hard. Years ago I was at a company that have a niche product (in foxpro) for this kind of task, and I have dreamed about build something like this based in my experience, but get the funding for this kind of "boring" task is hard (more in my country, Colombia).

P.D: If wanna help, we can talk. I can't give a magical solution but at least I find this kind of "boring" jobs compelling ;)

Check out Holistics.io (disclaimer: I'm a cofounder). While I can't say we can solve all your listed problems above, we provide enough tooling on top of your DW to help you pre-process (clean, aggregate) them.


And on the part about firehose data, you might already know this, but Kafka and their line of work should be aligned with what you're after.

Integration platform as a service (iPaas) that can work with streaming, batch, and Big Data using a drag-n-drop UI e.g. https://www.youtube.com/watch?v=SfEuG7Dg_O8

Check out my company, SnapLogic: https://www.snaplogic.com/

I think startups Trifacta and Paxata are doing this or some aspects of this ... see a comparison (from 2015): https://www.quora.com/How-does-Paxata-compare-to-Trifacta

I don't know much about Paxata but I think Trifacta are well-regarded in industry and academia. Trifacta founders worked on research / open-source-? project Data Wrangler http://vis.stanford.edu/wrangler/ and turned it into Trifacta.

I don't know much about either product in truth.

There are good open source options for each step here - is the solution you are looking for just a UI and easy install process? Or would your ideal solution make all of the decisions for you - data structure and format, which data is/isn't valid, what output options are possible, managing server resources, etc.?

I'm new to this sort of thing - can you elaborate on some of the open-source options for those steps?

I'm not very familiar with the open source options since after many years of coding this by hand, I work with what I know. I am a developer that works with data, not a Data Scientist, so I don't really know the lingo and whatever hipstery terms people are using these days. I will answer to the best of my ability, though (mostly for my sake, who knows if this will be useful):


Open Refine seems to be the best product in this category. I haven't used anything but my own tools to do this before, so I can't really offer any advice.


My understanding is that this is just a fancy way to talk about a database with a schema designed for analytics. There are many open source databases which do this very well, the one I use being Cassandra (and/or KairosDB), though it is also likely the one that is hardest to use. For a beginner, you might want to refer to this SO answer: http://stackoverflow.com/questions/8816429/is-there-a-powerf...

Data processing/collection:

This is something that is incredibly dependent on the data sources, so I likely can't tell you anything that will help. Most of my data sources I've worked with have been internally sourced log files, messages from ZMQ, or CSV data - you might be working with something far different though, since there are lots of public data sets and such which are common. Ideally, this would be integrated into the tools that you are using to clean the data, but I don't know if that exists.

Handling input from many different sources at different rates is not a very hard problem to solve if your system is build correctly - you could for example run a daemon for each data source which will populate the database when there is new data available, then send a message off to the processing engine, which will integrate the data into whatever reports you are running.

Specifically for a use case of a hedge fund, the reports could be triggered by a message which is sent when the new data is available, and processing could be done in parallel in Lambda or similar dependent on need to get a nearly instant return, enabling nearly real-time reporting.

In my current job I use Informatica Cloud, which either by itself or in combination with other Informatica products can do these things. I have two main complaints about it:

1. The UX is subpar. It insists on running in only a single tab at a time, and attempts to open multiple tabs will instead override whatever it considers to be the master tab. This is a huge pain, because I often need to have a mapping workflow open in one window and some other relevant part of the application open in another. Instead I have to save, go find the thing I want, and go back. Another problem is that when working with data sources containing tons of fields, there's no easy way to search.

2. It offers an expression language to perform some computational tasks, similar to what you'd find in Excel, but it's hamstrung by a poor UI and a limited amount of functions. The built-in editor for expressions is really poor (see Tableau for an example of a great editor for a simple Excel-like language; it even has type linting) and, unless I've misunderstood something, you can't declare any variables so you end up with huge nested expressions. There aren't many functions available, so something as simple as removing whitespace ends up as lstrip(rstrip(foo)). In combination with no support for statements (or at least a let expression like in lisp) this makes any nontrivial data munging completely indecipherable.

I've looked around in this space and it seems like there are a variety of products, but the supplier of our main CRM will only support Informatica Cloud. I think that a company that can offer a product that does what you've said but makes a serious effort at UX could cause users to revolt and demand to use it! I know the joke is that Slack is just a pretty IRC with better UX... but that's exactly why it has become so successful.

In terms of data munging, take a look at Microsoft's Power BI. It's visualization software but it has a nice data munging mode that, crucially, keeps track of all the changes you make and displays them in a linear format. This is great for getting a quick idea as to what was done with the data and is essential for doing reproducible data analyses. Unfortunately, Power BI also suffers from poor UX in insisting on tiny fonts and gray-on-gray palettes that are totally unreadable to anyone over 30.

In my limited experience this is science, art, and craft. I don't think there is a specific method to do this in an automated way.

What I envision a solution to be like, would be something like an configurable/codeable OpenRefine (was Google Refine) with streaming ingestion/extraction, with a validation engine/parsing engine (something more elegant than regex, but you can drop into that if necessary) and maybe a pluggable event processor (i.e. a Spark or Flink). I would love to work on such a problem, and solve it.

One thing you don't really mention here, but it's mentioned a lot in the comments, is the data extraction piece. Is data extraction a pretty solved problem at this point, and it's really the intelligent cleaning, transforming, then warehousing / analysis that's the unsolved issue?

> TL/DR Help me, I'm drowning in data

Omg, so many sales pitches. You should figure out which of those were automatically generated by someone who's bot is crawling HN and using NLP to find posts like this, and then hire them. There's basically 0 chance that isn't happening...

I've been working on similar issues, and developing towards solving them. I work with mixed schemas, handle user defined processes inline and allow you to gather stats and show everything as it streams + accommodate a workflow. I'm only 1 month from launching my beta and want to give away licenses to people who can use it and give me feedback. here is the landing page for it: http://ohm.ai OH and it uploads nothing... it all runs in your browser anthony dot aragues at gmail dot com if interesed

> 4) Oh and all of this data is unstructured and comes from 75 different sources.

Sounds like a job for Apache Camel?

I think Palantir provides solutions to this problem

The Palantir solution, from what I understand, boils down to "lots of consultants (FDEs) writing glue code".

Palantir doesn't really have a product, they write tons of code to put everything together and make it look seamless from the UI but they don't have anything drop in as far as I know.

For $$$

I tried to evangelize THIS very problem with my previous company (a somewhat successful managed data infra service) for a year, it was AMAZINGLY difficult to make my executives even understand the problem and its magnitude.

Have you tried Quandl? They solve some of those problems for financial data.

Chollida. I have been working with a startup in Seattle tackling these very issues. Super great team and great software, please get in touch and try it out! datablade.io

Have you looked into Snowflake? Seems like their solution satisfies most of your requirements, including native ingestion of unstructured data. The one caveat is that all source data must first be loaded to S3. https://www.snowflake.net/

Take a look at Apache Merton's architecture. They seem to have dealt with a number of inherent issues in the problem putting together data processing pipelines of open source components and ensuring they work together. The project is active but incubating right now.

Will hedge funds take such a huge risk of using 3rd party centralized (or centrally developed) data management system?

Provided they can deploy the 3rd party solution onsite (aka "on premise"), this wouldn't be seen as a "huge risk".

And what about the risk of 3rd party dev error which will throw several funds at once out of the business?

Great question, but I don't see this much different from other industries or companies with millions or billions of dollars at stake. They don't all roll their own software inhouse, and most companies (hedgefunds or not, small or even large) simply cannot afford to for financial and commercial risk-management reasons unless they can justify the software truly being a core competitive competency.

The same line of inquiry has been evaluated for most 3rd party software that companies rely on. For this specific instance of data collection and cleaning, I'm imagining it's not going to be a much different calculus, although perhaps you'll see a higher percentage of firms choosing to roll their own if they have the chops and pockets (e.g. Two Sigma, Bridgewater, Goldman Sachs, etc.).

I will note that there are commercial mechanisms firms could try to implement to try to limit the downsides in case something like this happens: warranty & damages provisions, and insurance are two come that spring to mind. I'm sure there are numerous other considerations in the age-old "build or buy" cost-benefit analysis.


On a smaller scale my EasyMorph might be of help. It's a lightweight ETL and you can do with it way more than with data preparation tools http://easymorph.com

Hey - we did this. We should talk. Well not #2 exactly...but we solve most of your problems.


I've actually been working on something like this for a while now, and found your comment about proprietary data interesting. Would this mean that hosting this data in a third party server is out of the question for you? OK with NDA?

Splunk provides this kind of solution and it's pretty amazing :)

This may be something for you: https://www.snaplogic.com/

My last company did this. You don't know how hard it is to get people to turn their data over to you.



Problem solved on all points.

100% fully functional, fully featured, 100% free download right from the website.

Disclaimer: I work for splunk.

A family member is a lawyer in the Worker's Comp, SS, and Family Law space. THE software for lawyers in this space is called A1 Law. It solves a lot of real problems lawyers in that space have (form letter generation, calendar integration, case management)... but it's so slow to use new technologies. They advertise PalmOS integration. My family member has to have their own server in a closet running the server version of this so his team can use it! He has no idea how to manage a server, it's absurd that he has to.

Everyone I know in law is dissatisfied with every part of their tech stack. If someone could come up with an integrated SaaS solution, and be SUPER careful about compliance... they would be printing money.

I would strongly encourage people to think twice about trying to sell software to lawyers.

It's a Sisyphean task. They are, as a rule, extremely anti-technology and conservative. At a previous startup, we had built software which was saving customers many hours a week—yet it was still an uphill battle to get paying approval.

If even after all the warnings in this thread you really want to build legal software, focus on disrupting lawyers instead of selling to them.

In general, law and media are two of the worst fields for technology.

I worked in a tech department of a rather large law services organization. There was a desire to maintain certain inefficiencies so that more hours could be billed to clients. If it could be done in 25% of the time, that's 75% less they could bill clients for 'attorney time spent.'

See how well pitching 'do it faster and make less' goes over.

Can't they still bill whatever they want? If 10 hours of proofreading, form filling, photocopying, and filing would be billed for $1500 (10 hours x $150 an hour), couldn't they still charge $1500 if the software took 80 milliseconds to do the same job?

Oh, the clients expect an itemized bill? Simple, the above charges would be "10 legal intern equivalent hours @ $150/hour". If a client questions it, the lawyer can explain that they are now using a very expensive piece of software instead of interns and attorneys for certain tasks, but felt it was an ethical obligation to quote the cost in a human understandable way. Turn the arbitrary pricing into a positive!

And of course your software should be able to quote all its tasks in these legal intern equivalent hours. This also leaves the lawyers hands clean since they can say that the software came up with hourly figure, not them.

I think the problem is one of mindset. From my perspective, even looking at accountants. Their industry and the clients they serve think that they are selling their "time", and not a service by itself. So, reducing the time it takes the lawyer/accountant to do something simply means less time being billed. It does not mean that they can now charge "more" for the time, as the cost per unit of that expert's time appears to be fixed by some other mechanism. Like seniority and years of experience, and not efficiency.

Perhaps bringing it back to a development perspective might shine some more light on it for us. Imagine you're a freelance developer and you've now developed (or bought) a fancy piece of software that allows you to do plenty of code-generation and reduce the amount of menial database layer code that needs to be written. You're now say 1.5x more efficient at delivering a product. What are you to do then? I doubt many clients would agree to a once-off fee for usage of your fancy code generation tool, even if you phrase it as saving "4 intern developer hours", and charge appropriately. There is also probably a cap on the hourly rate they're willing to pay you. Either that, or you change to a per-deliverable or product pricing model.

Sometimes change does need to be slow.

Exactly. The legal industry predominantly uses hourly billing and making up "equivalent hours" would be extremely unethical.

It's part of why I encourage everyone I know (particularly developers) to switch from hourly to fixed-price billing. Any efficiencies you gain should belong to you, not the customer. (There's also the fact that I find a lot more people are willing to pay $10k for X than $250/hr for 40 hours.)

The problem with fixed price as a developer is that rarely are requirements exactly understood or detailed enough to actually be able to bid the job. "Export to PDF" ok.. no problem -- that then turns into: "can you add page numbers? Can you support A4 - and US letter sizes?" And you have scope creep-- one more little thing isn't reasonable to say no to, however it quickly becomes a death by a thousand paper cuts. Ok, so now you price in that enevitable scope creep, so now your price is much higher. "$5000 to export to PDF? That's crazy!" -- yes but I am anticipating the fact that you don't know exactly what you want. "But we do, we made it clear!"

You see how that goes. Project pricing leads to a guessing game. Billing hourly is fair for everyone, at least in software. If I am more efficient, I pass that onto the customer. I don't 'lose' money -- it usually results in more work.

Imagine charging $80 for some corn because I want to make the same money as if I had guys hand-picking and hand seeding and doing the entire farming process without machines. That corn only cost me $0.10 to produce but I am charging a price as if I didn't have modern efficiencies. I would sell a lot less corn and actually profit less due to both competition and price elasticity. People would look for alternatives to corn.

In software, not passing on efficiencies means that there would actually be a smaller market for software development. Imagine how bad the market would be for us if we wrote everything in assembly. A simple web site might cost $100m and there'd be exactly 5 people in the world building websites.


I did some fixed price work this summer for a project where I thought the scope was unusually well understood by both sides. About 3 months, 60k USD if done by a fixed deadline (yes - fixed scope, fixed price, fixed deadline!) and as far as I was concerned from the original spec I had it done within about 6 weeks.

Of course, I spent the rest of the project time politely asking the customer to sign it off and doing the odd freebie to try and keep them happy but mostly at home, not working and not wanting to take anything else on in case they turned round and said I'd screwed up somewhere massively.

Perhaps unrelated, but I still haven't been paid for all of it either. Still, if I do eventually get paid it all it will have worked out better than charging per hour.

That's why Scope of Work documents exist: to protect against scope creep.

If a customer demands additional features, you prepare a Change Order and say, "OK, here is how long it will take and how much extra it will cost."

After a while they learn discipline and stop asking for changes half-way (or more) into the project.

Here is another perspective: the vast majority of features I've build as part of Change Orders rarely, if ever, got used. Granted, I make sure all relevant stakeholders are involved in the creation of the initial Scope of Work. That way, there are no late-comers who demand changes/additions.

The way I think about it is that general efficiency gains flow to the customer, while my unique efficiency gains are mine. So if it might take the average developer 100 hours to finish something but I can do it in 80, then I should charge as though it took 100.

The problem with hourly billing is it very poorly aligns incentives. It actually discourages efficiency because the easiest way for me to make more money is to take longer.

Also, psychologically, most clients are not comfortable with the vast differences in appropriate pay between developers. Even in the worst case (where scope was poorly defined and/or I estimated poorly), I'm making more now than I ever did with hourly billing.

If you had a monopoly on modern farming, it would absolutely make sense to charge $40 for corn. You'd soak up all the demand (since you're undercutting the $80 hand-harvesters) while still having massive profit margins.

Getting good at scoping is difficult but by no means impossible.

You can pass some, but not all efficiency gains on for a net win. Windows would cost billions if Microsoft only had one customer.

I chose my accountant because I could record my spendings online. He can just generate my yearly reports on one click in his backend, which he bills €80 for, "per click". He gives better advice because he has more customers, thanks to this efficiency, he doesn't spend time on mundane stuff and spends most his time meeting customers like me who present him various problems to solve, therefore he's more accoustomed to problem-solving.

Comapanies in my coworking space switch one after one. One has gone from a ~6500€ yearly bill to ~3500€ (3 employees), while improving reportability.

Non-industrialized accountants are just as necessary as human cashiers: Not. Lawyers are a bit harder to industrialize.

Yes - it's like Amdahl law, the faster they do their job, the more percentage of work that remains is marketing and finding clients.

Not saying something like this would be impossible, just saying what I saw. However, the line of reasoning could be slippery. For example, they could say "1000 lawyer hours at X dollars an hour." When questioned, the company says "we could of had them trace all the documents with a pen to copy them by hand. Instead we used a photocopier, so we're billing you for the awesome technology." Seems like it'd be a hard sell and possibly unethical (in another way than it already is). I do think things might be able to be changed in terms of mentality, though.

> Can't they still bill whatever they want?

I think contracts (between law firms and their clients) use hours worked because they don't know upfront how complicated cases will be, how long it will take etc. It's not just for "understandable pricing". Your "bill whatever they want" suggestion is basically saying that at the end, the law firm can quote whatever price they want, and the client agrees up front to pay that.

> In general, law and media are two of the worst fields for technology

Could you expand on the "media" part?

I could go on for days, but it's generally a low-profit and low-margin business where even the most successful companies are not huge successes.


Actually, I'm a lawyer and a product owner at a big accounting firm, so I'm pretty well positioned to discuss this topic. I have half a dozen solid startup ideas that I'm tempted to take and start running with right now. Documentation management alone is in desperate need of innovation. I build toy apps in my spare time that cut thousands of dollars of charge hours young associates waste on bullshit tasks, just within my one small practice group. The whole legal software industry is a joke.

But here's the real problem for anyone looking to innovate in that space: the customers. Lawyers are as a rule anti-technology, slow to adapt new techniques, and set in their ways. Worse yet, they just bill their clients for their shitty software like Lexis or WestLaw, so they aren't even personally motivated to reduce costs.

> cut thousands of dollars of charge hours young associates waste on bullshit tasks

Doesn't this take money away from your firm? It is only when firms are competing on cost, time or client recognized quality that they will institute better workflows via software.

Depends, there are billable hours e.g. time spent with customers and non-billable hours e.g. administrative tasks not specific to any customer such as payroll, transfer of knowledge between work colleagues. So by targeting reduction of non-billable hours, you would have a compelling reason to sell to firms who bill by time.

That's pretty much true in general. Our fees are dictated more by the market than our actual costs. So if we were billing fewer hours, we'd just raise the hourly billing rate for our associates.

From our perspective in the national standards group, we would actually want our associates to just spend more time on value added activities. Instead of wasting time organizing PDFs of exhibits and monkeying around in spreadsheets, we want to them evaluating the relevant legal and technical tax issues. So it's not precisely cost control that is the primary concern, but quality assurance.

>Doesn't this take money away from your firm?

No, because we are in fact competing on cost and client-recognized quality, and to a lesser extent time. Plus our fees are driven more by the market than by our actual costs, so if we billed fewer hours, we would simply bill at a higher rate to reach the same expected fee while still maintaining our position in the marketplace. Or if we could reduce our fee, we might be able to win more market share.

The pejorative term in the industry for padding billing with useless busy work is "fee justification," which really shouldn't ever be necessary. Especially in my practice area, because there's always more work that can be done to flesh out our deliverables, which in turn makes them more effective for convincing the IRS (or state equivalent) or an appeals judge. When I say I've cut thousands of dollars of charge hours, we didn't simply stop charging those hours, we allocated them to more useful, value added activities.

Right now, staff spend far too much time inefficiently manipulating data in Excel, manually organizing exhibits, and a variety of other mundane, low cognitive effort tasks (I can't really specify what kinds because that would essentially doxx me). They feel productive, they look productive, and they meet their charge hour goals. And it allows them to procrastinate on the more mentally taxing work, like evaluating the relevant legal and technical tax issues, which in turn detracts from the quality of our service. Our clients aren't paying us to be extra-expensive outsourced spreadsheet monkeys. They're paying us to eliminate uncertainty about complicated legal and tax issues. So freeing up engagement budget and the staff's mental bandwidth to focus on the high value added cognitive services is tremendously useful in improving quality.

And in terms of time, we compete on that in some cases where there's an audit, exam, or appeal deadline and the client came to us late in the game. But that's an edge case and relatively rare. Certainly having a reputation for being quick, efficient, and timely wouldn't hurt our market position, though.

This is great news! I did a PoC for a document search system for discovery, OCR and full-text search. We deemed it too hard of a sell for law firms. Maybe the landscape is changing.

Ha, never thought about it but I guess it's problematic to compete on cost in a scenario where the clients often need the best advice they can get...

PactSafe is disrupting the online agreements space, has already raised some serious capital and landed some big name clients with its SaaS platform: http://www.forbes.com/sites/matthunckler/2016/11/03/pactsafe...

A lawyer friend of mine told me his firm doesn't use software to check all the "hereafter referred to" are bound to their template and vice-versa. They instead have to print the document and go through it with a highlighter.

The firm charges their clients on an hourly basis, so they don't really have an incentive to be more efficient.

I feel like charging on an hourly basis is a common pattern in many industries that opens the doors to competition from startups with different pricing structures...as long as the startup can do everything in a manner compliant with the existing industry.

Logojoy, for instance, is an example of a service that supplants human labor with a single "good-enough" deliverable at a low price, and does so in a fraction of the amount of time. I imagine this would be much more difficult in legal settings, but LegalZoom seems to be alive and kicking, so it must be possible.

I completely agree.

To your second paragraph, I would add that it's hard for customers (and lawyers) to figure out what is "good-enough" in the legal setting. I'm a lawyer and there's a lot of stuff you can find on the internet that I personally think is good enough (I would use it in my personal affairs because the risk of the missing edge cases being an actual problem is slim) but I wouldn't be comfortable recommending it as a solution to a client because those missing edge cases are a real malpractice risk.

In the case of a logo, good enough is whatever the client thinks is good enough. In the case of a lot of legal solutions, good enough is often a murky risk/reward calculation based on legal concepts the client may not understand completely.

I still think there's enormous room for improvement, both in helping clients understand the concepts and the risks they're taking, and also in providing better automated solutions.

Is there a place for a law consulting company that just consults you on how to save money or how to find what's "good enoudg" ?

I think it could be done, especially if you could gather enough information to show people the likelihood of certain problems happening given their circumstances. The biggest problem is that without software to do the heavy lifting, you're spending so much time talking to the client that you might as well be their lawyer. And then even if you save them money, their "real" attorney might argue against your advice or retrace all your steps at an hourly rate.

I'm sure that there are lots of legal consulting companies that do this for people and entities that consume lots of legal services but the real trick is providing it profitably to "unsophisticated" people doing a one time thing.

> I feel like charging on an hourly basis is a common pattern in many industries that opens the doors to competition from startups with different pricing structures...as long as the startup can do everything in a manner compliant with the existing industry.

That last step's a real doozy, though. Startups are a field that thinks "move fast and break stuff" is actually a good idea. That kind of thinking works when you're slinging viral social media and personal productivity services, but it is catastrophic when you try to move into an industry where your customers' lives or livelihoods are on the line.

Yes, it is insanely frustrating. I think I did fairly well in law school in part because I wrote a program to auto-format my cites, which saved me hours of mindless, awful, pedantic, irrelevant blue booking.

>The firm charges their clients on an hourly basis, so they don't really have an incentive to be more efficient.

While I agree that the billable hours system reduces the incentive to be more efficient, I don't think it removes it entirely. Otherwise lawyers would still be using typewriters to draft memos. In my experience, removing some of the inefficiencies frees up time and mental bandwidth to focus on activities which actually benefit the client. More time reading cases, researching, evaluating issues. And you can bill for that.

You hit the nail on the head, there is an incentive to automate legal services ala legal zoom but for lawyers themselves the more tedious and paper based the process is the more they can make.

Hourly billing is (very slowly) going away. Fixed fee arrangements will be king (people want predictability). So then a lawyer will want to be as efficient as possible. The legal industry is admittedly behind the times, but they do continue to move forward.

> for lawyers themselves the more tedious and paper based the process is the more they can make.

I don't get it. I'm pretty sure they're not hurting for new cases, so they'd make up any losses in fewer hours with more clients.

This is what got Henry Ford fired from making watches, isn't it?

I have a family member who is a lawyer and who has a similar type of software setup. The biggest obstacle I notice in the legal industry is that many simply don't care. They actively dismiss software as being unimportant even though they rely on it every day. It's a very bizarre case of the legal industry hating the very industry that could help them.

Note: perhaps my experiences aren't representative of the industry as a whole.

This is exactly correct. As a lawyer and a software developer, I long ago gave up on the idea of selling software to solve the problems of lawyers and/or law firms. Lawyers tend to be terrible customers of technology, if for no other reason than that they have established completely backward incentives that reward inefficiencies and information deficits.

The only "legal tech" that can succeed (in my opinion) is the kind that eliminates the need for lawyers, but then you're up against a different problem: people who think lawyers are magical wizards who can invoke spells to keep lawsuits and regulators at bay. It's really hard to convince many people that they don't need a lawyer, even though lawyers and law firms are almost never accountable for the advice they give.

I agree with eliminating the need for lawyers for most things, but the biggest problem about it is that in small cities/towns (maybe big ones too, I just have no experience in that domain) judges and lawyers are "buddies". People with the exact same charges can get radically different sentences depending on if they have a paid lawyer vs no lawyer or a public defender. There's a public defender in my town who also has his own private firm, and it's amazing how differently the judge and DA respond to whether or not you hired him or the town did. If all of that isn't bad enough, you can see the judge, DA , and lawyers all making backroom deals and exchanging favors. And they do it fairly blatantly in my town. I've rarely seen an objective case and it's a shame because law is perceived as a "sacred" domain where objectivity rules.

So you don't agree that 'the person who represents themself has an idiot for a client'?

If you're going to court, bring a lawyer. Courts are the domain of arcane procedures and common sense has no place there. My comments above refer to transactions, compliance, etc.

Probably smart to bring one to court with you but maybe not required for drawing up a standard will where the few assets you own should just go to your next of kin

Lawyers write human readable code that's compiled and run by a judge or interfacing APIs (institutions such as financial ones)

Your advice is akin to saying "hey you inexperienced coder, write some production ready code but don't test it and when the only time it needs to run, give it a try. Hope you don't screw it up! When there's another coder in the room who can claim 'oh no he meant to set my financial variable 100X not 10x' and can convince the compiler to agree with them"

This analogy falls apart pretty quickly. You can't compile legal work product and no one is accountable if it doesn't run, unless you're at the point where malpractice comes into play. Malpractice is really, really hard to prove, though.

But most judges are making subjective decisions and not just "running code". The US constitution is law; can you compile it into code such that a computer could tell you whether a particular piece of legislation was unconstitutional? If you could, why hasn't such a computer replaced most of the US Supreme Court?

I investigated doing a SaaS product in the legal space. One of the things I heard multiple times from lawyers is that they are more likely to buy a product if they can bill their clients directly. What you are talking about would not fall into that category. That doesn't mean you couldn't build a successful business, but I think it is important to understand that the market has different rules.

I've heard the same thing from multiple people attempting to build for the legal space. Lawyers won't pay for software to save time because that has a negative ROI for them. OTOH they may pay for tools that helps them provide more services, bill more time, find more clients, or decrease the chance of mistakes.

> OTOH they may pay for tools that helps them provide more services, bill more time, find more clients, or decrease the chance of mistakes.

But... saving time means they have more time to provide more services, accept new clients, and review their documents to decrease mistakes. Is the relationship not apparent in their minds?

That was done in the UK. I wrote the first working version for Legal Cost Finance, who offered instant credit facility "to make justice affordable to everyone". It took them 3 years to take off even when the whole case was that they literally brought bulk of pre-paid (!) customers to legal firms.

(...) if they can bill their clients directly.

Can you explain what you mean? Letter generation etc is still usefull, I don't see what billing has to do with it. They can still charge what they want to.

I think he means pricing schemes are more simple when you go with a flat "200$/hour rate". Obviously it's shit for the client since they have no idea how many hours will be spent on the case, but that's not the lawyer's problem.

Sure. The purpose of a SaaS product would be to improve efficiency and save time. Given lawyers typically charge by the hour, they would lose money unless they could bill their clients directly for the use of the more efficient software to make up for the lost revenue due to saved time. However billing clients for legal software is not the norm.

Make the slightest mistake and you are getting sued!

And getting sued by lawyers, at that.

Exactly this. I wouldn't code legal software for any amount of money.

Last time I had a consultation with my lawyer, she fired up WordPerfect. There is a blast from the past.

”Initial release: 1979; 37 years ago”

Quite amazing that it’s still around.


Many people here use emacs or vim...

That's subjective, so you should've kept your subjective opinion away from HN.

I still get virus / malware emails from the AOL account of an IP lawyer I worked with in 2011 / 2012. Also a blast from the past every time.


My impression is that the legal industry is most of the reason why WP is still being used.

Word Perfect is the vi of word processors. I still program in vi.

You most likely program in vim or another vi successor. Possibly with a ton of configuration settings and plugins that weren't around in vi. Big difference.

Fair enough. It is vim -- I still refer to it as vi because I'm that old. And yes, I do take advantage of undo and syntax highlighting. I'm sure quite a few of the modalities I use over the years have been added (oh! Tabs, very important). But the point is, relatively old control interfaces can be very effective even after 30 years.

O.o Last time I had a consultation with my lawyer, he fired up One Note on his tablet.

Lots of commenters here have mentioned parts of law practice that involve lawsuits, complex negotiations, and so forth.

Workers Comp, Social Security, Family Law, and elder law in general aren't as glamorous. Clients for those services don't have such deep pockets as the other corners of the law do.

It's likely that a good SaaS-based system with embedded knowledge of jurisdictional rules (in the US, federal and state rules) could be successful.

But the sales cycle for a new product? Getting early adopters? Prepare for some pain.

What about MyCase? https://www.mycase.com/

Not familiar with the space, but seems like what you are describing.

If I didnt have to risk becoming a lawyer, I'd love to combine some legal education with my programming and machine learning skills.

I always thought there should be a one year masters of law program - no path to practicing, but gives you the frameworks and knowledge to be a good consumer of law

That's interesting! A1 looks archaic. An interesting startup in this space is https://www.upcounsel.com. They provide document management, calendaring and the such but operate as a marketplace platform. Would absolutely agree that the barriers to entry (compliance etc.) seem to be a big opportunity to offer solutions in the law space. As an interesting note: It looks like UpCounsel serves primarily as a marketplace due to upending the current law firm system which other comments have mentioned as a big barrier to technology adoption in the space. Interesting space nonetheless!

There's a new startup called Doxly that might be a good SaaS solution: http://www.forbes.com/sites/matthunckler/2016/12/05/doxly-co...

I really hope the tech talent talent doesn't help lawyers sue people more effiently. What a waste. If you do, I truly hope lawyers are your customers.

whats wrong with suing people ? if someone's actions damaged you, would you want recompense ?

The oil and gas industry is ripe with potential start ups. Here are a few that come to mind:

1. A better system for automation and measurement. Current solutions aren't ideal when it comes to setting up new systems as well as updating and maintaining existing systems. We build several million dollar facilities a month and each one has automation and measurement equipment that has to be individually set up and programmed. Each technician does things a slightly different way, and the end result is a different set of automation and measurement logic at each facility.

2) Fiber optic DATS (distributed acoustic and temperature sensing) data handling and interpretation. This is a fairly new type of technology in which a fiber optic line is installed in the wellbore. The fiber optic line basically acts as a 15,000' strand of thermometers and microphones placed every 3'. The data from one installation is on the order of terabytes per hour. Oil and gas service companies that offer this service don't know how to handle this amount of data. The problem could probably be solved with S3 or something.

3) Drilling optimization. Create a software suite that utilizes ML/AI to help drilling engineers figure out the best way to drill a well is. It's a perfect ML/AI application. Lots and lots of training data available, easily defined input and output parameters, etc. Drilling engineering is full of hard, non-linear problems and humans are just really bad at it. The only way to be good at it is to drill lots and lots of wells and then listen to your gut.

Any more insights about applying drones to 1? I work on deep learning for aierial imagery startup (tensorflight.com). It seems like we could structure some information from drone images.

If you are interested in helping us understand the problem and potentially solve it together contact me at kozikow@tensorflight.com

Are you looking to develop applications in forest inventory?

Yes, our current application is focused on Orchards, but we are planning to enter forestry as well.

Hi, would you happen to have any additional information on #3? (links to companies that require these services and/or industry reports)?

I am work with technology that might be useful for this type of application. Thanks!

I guess I'll explain how the business side of drilling a well works. An operator (think Chevron) will decide where and when the well is drilled. They will then hire a drilling company (say H&P) that owns and operates rigs to drill the well. Even though the drilling company operates the rig, they basically drill it however the operator ask them too.

So to answer your question, any operator requires these services, though most dont know it. A company called Pason is the leading company in the drilling data industry. Their bread and butter is just data measurement and streaming, though they recently have entered the analytics space. Their technology seems pretty promising.

Are you in this industry? Where could I learn more about #2? Sounds like something up my alley...

I am. Google "fiber optic DAS" and you will find quite a lot.


Yeah I found a few articles but not a whole lot of specifics. You're suggesting something like this box? http://www.optasense.com/wp-content/uploads/2015/01/FiberOpt...

I feel like it'd be really hard to break into this field without some data to play with. Catch-22...

If you have access/desire to share some data like this I'd love to chat more (email in bio). Sounds like an interesting problem.

All of our data is proprietary unfortunately. This leads to another start up idea: a data consortium company for this type of work. I don't think we would mind giving the data up if there was a legitimate way to do so and if there was some benefit for us (I.e. advancing the rate of progress in this field).

Interesting. I'm actually part of a agricultural genomics data consortium with similar concept (companies contribute $$ and data in exchange for licensing rights to research results).

@athollywood Are you available for offline discussions? Maybe you could put some contact info in the public section of your profile. Feel free to email me: xenon@mailworks.org

I might have a pretty killer solution to #1. Could you answer some followup questions? My email is in my profile if you want to reach out.

@athollywood, I work in Energy research, is there an email I could reach you on?


Every time I change jobs as an H1-B employee, I've to fill in the same ridiculous data with every law firms weird interface. I wish the US Digital Services would focus on streamlining forms and having auto-import from all the data they already have about me (e.g. automatically translate I-94 records to how much time I actually spent in the US, infer my past I-797 records automatically, have a one time education related upload since that obviously never changes). I realize there are certain valid reasons the agencies don't share data, but I find that hard to believe in an era of infinite surveillance, they can't use the surveilled data to at least make my life easier. I can see how the immigration law industry would never allow this, but I can hope.

The green card process is another minefield.

Also for Schengen countries, I've to apply for a visa every time I travel, and they make me list every time I visited the Schengen zone in the past 5 years, fill out the same application form across different countries, and get the same paystubs and letters from employers. Even a tool that could just machine read all the documentation a particular country requires for a specific visa, and just goes and pulls everything that can be pulled (bank statements, pay stubs, fill in travel dates based on the flight ticket emails in my inbox, hotel reservations and so on.) Just make it convenient for me to travel :)

Unfortunately any assumptions built into an immigration business are likely to be upended soon. I'd wait at least a year before trying to solve this problem because the regulatory environment could break your resulting startup.

> I'd wait at least a year

I'd say closer to four to be safe.

But then you'd have to wait four more. And then four more? Four more after that?

he means trump is much worse with regards to immigration than anyone else in recent years

Airside solves for part of this. http://airsidemobile.com

like a TurboTax for all immigration forms?


I am in same boat. There is a YC startup simplelegal that is trying to do this. But startups can't fix main issues with the immigration.

SimpleLegal is not working on immigration. You may be thinking of Simplecitizen. Teleborder also tried. Along with many other non-YC companies. It's not a easy problem.

Source: am cofounder of SimpleLegal

But they try to make it hard because they only want the most perseverant/rich immigrants to make it into the country.

Well, the perseverant ones are having their time spent on bureaucracy, and living in a constant state of foreboding, decreasing the economic value they can add to the country :)

As a structural engineer, I see a good opportunity to make reinforced concrete design software available in a SaaS format. The competition is outdated, clunky, requires local installation and messing about with licenses. Design firms are paying $1000-$3000/year per user/seat for what amounts to a pretty basic app.

Unfortunately, there are very few people that understand both computer science and structural engineering.

I suspect much engineering software is ripe for innovation. My wife is a Water Resources Engineer- a specialized form of Civil that focuses on "when it rains, where the hell is all this water going to go?".

The software for that kind of modeling is apparently pretty basic, pretty expensive, buggy, etc.

A friend of mine was an environmental consultant, and went to startup weekend. They successfully made a SaaS app that would spit out an environmental report in minutes instead of days of manual entry. Just a really good example of small applications that have a huge benefit in old industries http://www.enterratech.com/reports/1/

Reminds me of the story that made the podcast rounds not too long ago, about the Mississippi River Basin Model.

I thought it was shut down largely because much better software simulations were made available; are they still kinda crummy?

99% Invisible's America's Last Top Model. The Mississippi River Basin model was shut down because the computer models were cheaper and "good enough." They still use physical models today for other projects, albeit on a much smaller scale.

Fluid simulation is a very difficult problem to simulate, structures are a lot simpler.


I think that the models are quite advanced and where the money goes.

It does not mean that you just can't take a couple of web developers and make it usable, but as the market is small it might pass the price point with a supersonic bang...

My brother in law would agree with this, he is building an open source project for collecting all of the analysis that civil and structural engineers have to do on every project into a single repository where people can share and augment the calculations. (think if it as github for civil and structural engineers)

I would love to see that as well. I have a project with similar goals (posted above). I would like for users to be able to simply call built in functions to do the normal tedious stuff like stress in rods, beams, etc.

Can you post the link to the github repo?

I have a friend who works in construction glass. (It's NYC, that's a big market). They ship millions of dollars of complex, custom molded glass every year. Everything's kept track of by emailing excel spreadsheets.

That's disgusting. I feel filthy. I replaced a system in my first client school that also used excel attachments for tracking student data, exams results, attendance, assignments, etc. I could physically feel better once I saw my own implementation replacing email attachment system. Since then email-excel has been a sensitive topic for me.

I've recently found a department for the company I'm working for needs help as they have literally reached the limits of Excel.

They have a sheet with around 30 rows and 150 columns, and they have 100 of these sheets (in a single Excel file). Some parts use formulas, but usually when somebody needs to change something they need to go through every single sheet. The issue is now when they try to add new data Excel won't let them.

I don't even want to know how they share the file or do backups.

I work in the healthcare industry. Basically we ARE the industry nowadays, and we use excel and word to keep everything "organized". There are some half-ass designed software,websites, and databases that are used as well, but it's amazing how a multi-billion dollar company can rely on this level of technology. I think they get these bids to run state government programs, and have absolutely no plan in place. And for some reason instead of just automating or updating things, the company just throws bodies at the problems and makes everything "production" based. I'm sure a lot of places are similar, but this is a white collar factory on such a massive scale, it literally sickens me. There are so many channels that approval for changes has to go through, that by the time some small minor change is implemented its already way too late, too distorted by having so many hands touch the problem, and too outdated.

"it literally sickens me."

Goo thing you're in healthcare...

hahaha idk why but for some reason that made my day.

An app that you can import any spreadsheet into and that generates web/mobile apps to collect the same information would be cool.

And that pushes the data back into the same spreadsheet and a relational database with change history.

Isn't that essentially Google Sheets + Forms? You create a spreadsheet and then generate a data entry form from it automatically.


Is there a queryable relational database?

Same with the electronics manufacturing industry, where inventories and BOMs are done in Excel. It is a pain in the ass, time consuming and error-prone. But that's what managers want.

Smartsheet is a pretty good direct replacement for emailing excel spreadsheets.


A fellow structural engineer here. I think that there's little room for innovation in the "calc" area. The cost of doing calcs is a small fraction of the overall budget of a structural project. Modeling/drawings is where it's at. The analysis/design toolset of a structural engineer (the FEA/design programs) hasn't changed in essence since the 80's outside of drafting/modeling, and for good reason: the marginal cost of an engineer perfecting their analysis exceeds marginal revenue. There's a sweet spot where an experienced structural engineer knows to stop refining their calcs, with the rest of the effort is spent on detailing, which sets apart good structural design from mediocre.

Where I would invest (if I were Autodesk or their competitor) is in releasing CAD tools for free in exchange for a consent to use the designs/details internally for ML purposes. Would love to contribute if anyone is working on such a product.

I might be working on something that you would be interested in. MVP is at www.cadwolf.com. I am a structural engineer with an MS from UT Austin. Website in written with php using Laravel and angular framework for JS.

The plan is to link CAD to the mathematics and then link the finite element to this as well. The system would also function as a sort of github for engineering where users can find and use functions to do most standard analysis. Email is in my profile if anyone is interested in talking.

I checked out cadwolf and as an engineer myself I find it very interesting. I am curious to understand a few points: - Why do you center everything around documents? Is it more because people are used to it, or do you believe that they are best fit for design-tasks? - I saw that to update multiple documents after a requirement changes, you need to open them one by one, in the order of their dependencies. Have you tested that this is still a viable approach, once you have thousands of dependencies and multiple users in a complex design?

I really like the equations and how you only allow to make formally correct equations (including units). Anxious to see how this develops.

(full disclosure: I am co-founder of a Software which tries to achieve the same aims using different concepts: www.valispace.com)

What I call "Document" are not files in the sense of a word file. They look like text files because that is what I thought engineers would be comfortable dealing with. However, they function as programs. Documents can be used as programs within other documents as well. Documents fill both the need to solve the calculations and to document them in one place. It eliminates the need to update documentation, have multiple platforms, etc.

There are places for users to upload and store data as well - datasets.

As of now, the code solves equations in javascript within the browser. This is why documents have to be opened when a requirement changes - because I have no server side of the code to solve them without the browser. It isn't a long term solution, merely a step in the building on the platform. My next step is to add a server side code that is capable of solver more complex and larger equations on the server. When that is done, changes to requirements will update documents without the need to open them manually. I plan on using python and there are several large libraries available.

This will allow me to link documents to CAD. When the math changes, the CAD will change as well. Once that is done, I will add a finite element meshing and solution system to create an engineering platform that essentially does everything.

I like your site. It's nice to see other people addressing these problems. I am also an aerospace guy. I worked on the shuttle for a while and then designed some components for the Orion. Shoot me an email if you want to talk more.

Have you got an example/names of existing software? I'm interested to see what kind of features they have.

Tekla Structures has dedicated reinforcement concrete design features. The licence cost is probably closer to 10k per seat.

You need computational geometry, computer graphics, and structural engineering expert level domain knowledge to implement anything. You need to create traditional 2D machine/construction design drawings from the 3D models. Then you need to sell it to corporations, whose work, most of all, must be dependable and free of guess work.

You need to know what sort of geometries you can use to model the reinforcements. Then you need to know how to design the system so it can handle very large amounts of geometry.

The worst of all is you need to deal with god awful industry standard formats- DWG, DGN, IFC, Step/Iges and so on. Maybe DWG import and export first.

To have any real chance you need a guy or two who are good with numerical code, someone who is familiar with e.g. Game engines, soemone who knows computer graphics, a structural engineer to tell how he does his job and what the thousand inconsistencies in the field are (this is not a trivial domain like housing or transport), a sales/marketing guy to connect and push the product.

And, like someone else estimated, the potential market is not gigantic - which is kinda funny because we all depend on reinforced concrete but don't need so many engineers for the design work...

In my original post, I was more leaning towards member design software, such as spColumn and S-Concrete.

These are much more standalone, and don't have many of the issues you listed.

No one I know is using the automated concrete design built into analysis programs like ETABS, Tekla, etc.

> I was more leaning towards member design software, such as spColumn and S-Concrete.

The utility of your software tools will be very limited if you are restricting yourself to only member design instead of total structure solutions like ETABS. Why should engineer pay you at all if they can use spreadsheet for free to do what you do with your SaaS?

> No one I know is using the automated concrete design built into analysis programs like ETABS, Tekla, etc.

Not too sure about this because I know quite a lot of people who are using these tools. Any reason why the people you know don't use ETABS or Tekla?

> Why should engineer pay you at all if they can use spreadsheet for free to do what you do with your SaaS?

Why do businesses invest in new tech? Why pay for excel when I can use a pen and calculator? The answer is because it makes them more efficient. We have excel sheets to do the same thing, matlab code to do the same thing, and yet here we are paying for these member design tools because they are the most efficient for us. If you save an engineer even a couple of minutes for each element they are designing, you essentially pay for the software.

>Any reason why the people you know don't use ETABS or Tekla?

We do use ETABS extensively for analysis. We don't use it for design. It is foolhardy to trust the automated RC design in these software. That seemed to be the standard of practice around here, but perhaps it is different in other areas of the world.

> It is foolhardy to trust the automated RC design in these software

Do you mind if I ask why? I'm working on a sort of general approach toward designing trustworthy engineering software, and I'm trying to collect as many reasons as possible for "can't trust the software".

Since you can already do the analysis (like ETABS), and you are planning to do individual member design, why put the two and two together and do an automated RC analysis+design software? There is no reason to distrust an automated software anymore than separate analysis+design software.

There absolutely is a reason: seismic design. We end up doing a lot of data manipulation between the FEA stage and member design stage.

Its not a distrust so much as a fundamental flaw. For simple gravity design it works fine, but even then we are using spColumn because its just quicker for us.

Care to explain why you have to manually do lots of data manipulation between FEA and member design? Why not write a software that can automate these whatever data manipulation? Seems to me that an All-In-One software should have no problem doing analysis, the-whatever-data-manipulation, and the member design.

I am a fellow engineer (Satellites in my case) and we have been fed up with engineering-tools in general (specially systems engineering), which seems to only consist of Excel-Spreadsheets and document-management systems. Even in the space industry there has been practically no innovation since the 60's more than digitalization of documents.

We are working since 1.5 years with some engineers on a software to solve this: www.valispace.com

I would be curious to hear from you whether what we are building with a focus on the space-industry also applies to structural engineering.

What is the total size of this market - how many users/seats are there out there for this?

Not sure.

In the US there are about 281,400 civil engineers [1]. I couldn't find more detailed information on structural engineers.

-Assume about 10% are practicing structural engineers who need to design concrete structures = 28140.

-Assume a company wants 1 license for every 2 engineers = 14070. (I base this off the fact that my company has 6 licenses for 12 engineers, but we may be higher than average)

-Assume we could get 10% market share = 1407 subscribers.

-Assume $1000/subscriber/year = $1,407,000 from the US market

Obviously this isn't a very rigourous analysis.

[1]: http://www.bls.gov/ooh/architecture-and-engineering/civil-en...

I think this is an overoptimistic view of the size of the market. The field is highly fragmented. A large fraction of structural engineers are contractors or work for smaller firms (think wood design, rather than concrete/steel) and wouldn't be target customers for section analysis and concrete detailing software. An absolute majority of those that work for the larger firms don't require anything more than Excel spreadsheets. From my experience, a typical structural engineer is a rather savvy computer user, often times writing Excel macros or AutoLISP scripts to automate their tasks.

Its at least a structured analysis. Thanks for teaching something today.

So, current market size about 10k * 14,070 = 140M.

A large not-going-name-it software package for modeling steel and concrete structures creates alone over 100M a year. It hardly dominates the market so depending on how the market is segmented a good estimate is probably any number between 1 and 10 times this.

Single seat licences are not the only revenue model. Once a product gains traction consulting, training and providing VIP helpdesk and bugfixing services factor in as well.

Out of curiosity, what do you think of what flux.io is doing? If a SaaS structural engineering application could be integrated with that, would it address your needs?

(For what it's worth, I'm doing something similar in the transport planning space. And yes, bridging the gap between that and modern CS is a substantial piece of work.)

To be honest, I looked at flux's homepage and still don't know exactly what it does or how it would be integrated into the problem at hand.

I am working on doing this. The project is called cadwolf. The MVP is up now and I will coming out with a full version in January. If anyone has comments, love to hear them. If you are interested in collaborating, let know too.

Is this something you're interested in talking more about? My company specializes in comp sci + engineering work. Your email isn't listed but mine is in my profile.

A ticketing system that doesn't sux (I like RequestTracker, but it shows its age). Top players are ridiculously overpriced.

My management style is like this: every task/request is numbered, placed in a queue and assigned to a professional.

What I expect from my ticketing system:

- every manager should be able to assign tasks to someone and set the order they must be executed. He needs know what his team is doing and when they finish each task. - every professional should know what to do and what are the priorities. - everything is numbered and linked, all communication recorded.

Everything should be well integrated with email (please, don't send me a notification email about an answer and an url, send me the f* answer). If I answer the email, everything goes into the system, I should be able to send commands to the system by email (for example, add a keyword in order to make it a comment instead of answering).

The problem here seems to be that users/customers insist on customizing any such app to death.

Personally, I think the optimal ticket system would have this data for each ticket:

* A unique, prefixed ticket # (JIRA gets this right)

* An assignee (like an email To:)

* A reporter (like an email From:)

* A one-line summary (like an email Subject:)

* A multi-line body (like an email body, but ideally with markdown)

* Attachments (like email attachments)

* History for edits of all of these (not like email!)

That's it! It really is basically email, but with a unique ID, and editable with history instead of immutable with replies, and a decent UI, perhaps RSS + notifications.

Unfortunately, everybody else seems to think that their ticketing system should embody their vaguely defined and ever-changing workflow, prioritization, approval, and release management system, so they want to be able to add any number of possible statuses, approvals, workflows and and all the rest. Once you add that, you end up with another JIRA or ClearQuest or BugZilla, and the cycle repeats itself.

This sounds very like Redmine [0]. It's ostensibly for project "issues" however it's extremely customizable and all of the above are included in the default config and not much more. It sounds like if you removed about 2 default fields it'd be perfect for the ticketing system you describe above. Plus RSS + Notifications + a solid API.

[0] http://www.redmine.org/

Even the older Trac has all of those features.

A ticketing system is a tool to support a workflow, and any friction it creates with the preferred workflow is waste.

As is (consequently) friction it creates in changing the workflow as needs change.

Thats just the fundamental and immutable nature of the problem domain.

I am pretty happy with Enchant. It's is pretty simple, but flexible enough for up to medium scale environments.

The app you seek is Asana.


I'm not associated with them, but I have used them successfully for months at a time (better than most productivity software). The reason is it is well integrated and similar to email.

Asana is pretty bad. It takes forever to load, which sucks when people send you URLs to tasks/projects and you have to open them individually and wait almost 10 seconds for each to open.

Asana's start up time is just ridiculously slow. Probably the slowest webapp I've ever used. Also, you can't assign more than one person to a ticket, which is a pretty big limitation

JIRA also doesn't let you assign more than one person to a ticket. Could you expand on why this is a problem? I'm not familiarized with this problem space so I'm just curious.

The recent GitHub updates let you assign multiple people to reviews and such, but I find it's usually better to tag everyone you want to look at something. I don't think assigning something will send a notification.

I blogged recently about some problems I have with JIRA:


In a nutshell, I argue that the problem with most ticket systems is that they do not constrain the domain enough, so they wind up having similar problems to email (sifting through a chronologically-ordered pile of text rather than structured, semantically-ordered information).

Your comments make me think the crux of the problem is that people want tickets to be like email and use email to manage them. I'm not sure you can ever overcome the "chronological pile-up" problem if you allow email as a user interface to ticketing.

The simplest solution to the 'chronological pile-up problem' (nice name BTW) is a Wiki model, where replies are appended by default but the entire content can be edited if necessary. (C2 demonstrates this quite well.) For simple problems, conversations behave exactly the way they used to, but when it starts getting complex someone can go in and rearrange the conversation into a more logical form. This actually maps quite well to email: by default replies are appended to the bottom, but they can also be inserted inline (some mailing list etiquettes even demand this) or indeed the entire conversation can be rewritten. You'd probably want some sort of merge algorithm in case someone replies to an older email.

In fact, my usual approach to dealing with tickets/issues/emails which start to develop this problem is to make my own private copy of the thread and edit it in precisely this manner, though I'm the only one this benefits since it doesn't get sent back upstream.

I also have an idiosyncratic way of organizing this stuff, which is basically to use Emacs + mu4e to search my mail, and if I need to create order, write a new document from scratch. I have a coworker who does what you do, now that you mention it—he will take a series of email, dump it into Word and edit it until it is a useful document of some sort.

I still think there is something here though. Stack Overflow replaced message boards, which were basically HTML versions of mailing lists, and part of that was identifying the semantics of question, answer and comment and defining new operators and new expectations for them.

A wiki is a good approach but because it's totally free-form, the user gets stuck doing the work of keeping things hygienic.

JIRA allows you to edit all the properties of a ticket whenever, but it generates such a huge cloud of email notifications in the process, it kind of disincentivises you from using it. And nobody is in the habit of rereading the page to see what is different since last time.

> Your comments make me think the crux of the problem is that people want tickets to be like email and use email to manage them. I'm not sure you can ever overcome the "chronological pile-up" problem if you allow email as a user interface to ticketing.

I agree that's partly it, but that seems ok when you're in the thick of discussing a problem/fix. If you're doing a code review or something after a fix has been pushed, you actually want certain messages to stand out to describe resolutions and whatnot.

So like gmail where you can star/mark certain replies as important and those messages would show up at top-level in the ticket, where all other messages are collapsed.

We're in our first month of releasing Zammad [1] an open source Zendesk alternative with pretty neat features. You can check out some screenshots or a free 30 day trial oft our hosted solution on our commercial site [2]. I really like your feature ideas and will later create issues for them. Would be great if you add some too if you have more of them.

Full disclosure: I'm part of the maintainer staff.

[1] https://github.com/zammad/zammad/ [2] https://zammad.com

Additional features: custom fields w/ user-chosen types: free text field, drop down list, etc.); time tracking (I spent n hours/minutes on this ticket); these should be searchable.

Major feature that allows me to work around any shortcomings in your office: API access to everything and/or database access (preferably direct read/write access, but even if it's just a downloadable .sql.gz it's a huge benefit).

I'm probably not a typical user, though, FWIW.

I've been building support and dev/ops ticketing systems for years and I still haven't found a platform that suits all needs.

For my latest startup I went looking for a service desk tool. The key criteria was "feels like email". The moment any alternative required a user signup just to lodge a support request, I ruled it out.

I ended up choosing Groove. I don't recommended it. All ticketing systems suck, this one just sucked the least for my support desk. Groove doesn't extend to other ticket types, and it's nowhere near as flexible or extensible as JIRA, and the mobile experience is horrible. But it does "feels like email" for my customers better than every alternative you care to mention.

> every manager should be able to assign tasks to someone and set the order they must be executed. He needs know what his team is doing and when they finish each task.

That sounds like unnecessary micromanaging. You couldn't possibly have enough detailed knowledge to know the proper order of tasks in all cases. Possibly even most cases.

I agree that communicating the priorities are important, but the boots on the ground have a much better understanding of what they're working with than you do.

we use Github issues + Zenhub for that (though Github has recently implemented a lot of Zenhub's features). Managers mainly use the 'boards' and 'milestones' views; devs use 'boards' and whatever else. Messages are included in emails and you can reply by email.

What would a reasonable price for this product be for you?

anything billing for data volume instead of by attendant would be a good start.

how about using an app like Trello for that?

Currently we use a mix of trello, smartsheet, slack and god knows what. It is a mess.

Semi-related... I work in wellness and healthcare.

I don't know about you, but I despise filling out the same forms over and over again when seeing new healthcare providers. I'd love to start a service modeled after granular smartphone permissions where

(a) I check in at a new office (scan a code, they scan my code, beacon, something like that)

(b) the office then requests x, y, and z information

(c) a push is sent to my phone where I can review the information and approve or disapprove some or all permissions

(d) a final step of either entering my pin at the office, using my thumbprint on my device, or something else.

The key components would be storing the data encrypted at rest, following HIPAA and then some, having a solid auth protocol (keys, jwts, etc).

I think adoption would be helped because the public are already used to permissions like these when installing apps.

The benefits are a lack of paper trail, no one is going to not shred my SSN, my most up to date data is now available, and instead of hosting N apps/databases, I'm storing 1 and can reduce my maintenance, customer support issues because one for all, all for one.

Edit: edited for readability.

Too much inertia on the provider side for this to catch on and reach critical mass - many septuagenarian sole practitioners out there using paper diaries / files, and larger organisations with some monstrosity written in COBOL (or MUMPS?) that will never change to accommodate this.

I'd suggest something much more low-tech - a website where you can punch in all your details - insurance, allergies, medical history, etc, etc... and then you can print it out (or a subset of it, for different kinds of providers) or generate a PDF that they can copy & paste into their horrible legacy system (an improvement on retyping), or, for those truly at the cutting edge - the kind of electronic transmission you speak of.

I'm probably bias because I've lived in two areas now where healthcare is one of a few, if not the, major industry in the area. They're always trying out new apps and services here.

I am on board with what you're saying; an escape hatch for non- or semi-adopters. Obviously, printing is a way to go, so maybe on the mobile app, the ability to check each piece of information required then export/email to your preferred destination.

It'd also be interesting to look to make money on conversion i/e replacing, or integrating with, the outdated monsters you're talking about.

Maybe we're not even talking about healthcare anymore, maybe just the ability to piece together PII (personally identifiable information) and deliver it to X.

>>>> on another note

This goes into a topic I've seen posts on recently, and something of interest to me, personal indexing; a better way to throw blobs against the wall and have it indexed for me, leading to a personal Google. I mean, that's already coming, really, between Facebook and Google (especially Google Photos) but currently I see nothing about piecing together information I'd like to share on a professional level.

Hmm, Google Drive does a reasonable job of that. It indexes everything (including OCR for images + PDFs), has decent search, and has per-folder permissioning and sharing.

It's actually a pretty good solution for ad-hoc "working together" with someone (a lawyer / architect / whatever) on a project, where you have lots of files you need to share and refer to during the project.

Totally didn't think of that, especially the OCR which is great, having used it before with the mobile app.

I wonder if you could stitch together a workflow as a reseller for Google Apps (no clue what their current name is)?

Either way, good suggestion.

Maybe the problem is we're trying to get the wrong people to pay. Since the pain point is with patients, fix it for them and make them pay. Gets around the industry inertia.

Sell the service to the patients for some smallish fee ($5 per month) and then provide the integrations into the various provider systems for free.

Later on you could scale it up to be an add-on to employee benefits or the health plans.

Insurance card scanning and recognition is available. Costs $999.[1] This has apparently been around for years; there's Windows 98 support. It's been acquired by AcuFill [2]

They also offer identity document verification with facial recognition crosscheck. They want to use this to detect visa overstayers for immediate deportation.[3] That now looks like a market with potential.

[1] http://www.card-reader.com/medical_cards.htm [2] http://www.acuantcorp.com/autofill-software/ [3] http://sandhill.com/article/its-high-time-we-build-border-te...

The military is terrible for this. You are constantly filling out forms that amount to a half page of your basic information, followed by a couple of text fields that form is actually for.

> The military is terrible for this.

Agreed. Especially considering they have an ENTIRE department devoted to personnel along with an office at every single unit level above platoon.

It might not work for the military specifically, but for browser based forms, Lastpass has a form fill feature: https://helpdesk.lastpass.com/fill-form-basics/

I'm sure it's possible to hack together an AHK script[1], combined with Pulover's macro creator[2] to automate virtually anything repetitive on a Windows PC, or use Selenium to automate browser actions[3]. Of course then you run the risk of having to fall into the classic XKCD automation time sink[4].

[1] http://ahkscript.org/

[2] http://www.macrocreator.com/

[3] http://www.seleniumhq.org/

[4] https://xkcd.com/1319/

It's mostly paper.

Can't they just use their dog tags like an ink stamp or something? It is supposed to be enough to ID them right?

France has national healthcare and everyone has a smart card with vital information, and all doctors have the hardware to read it and software to process it. Or at least they did 15 years ago when I was there.

Edit to add: https://en.wikipedia.org/wiki/Carte_Vitale

We just built something like this for the health market. Users can auto-sigin to websites with one of their identities or set it to ask for each visit. Here is a lil demo I made that uses Craiglist as an example: http://tricorder.org/cl When the user goes to your website, say wellness.com/newcustomer, there are javascript APIs to get at your standard data, that brings up a perm dialog and if the user accepts, the data is sent to the website. Send me an email (profile) if you want to talk biz, tho was planning to open source it. Auth is very solid, but its currently android only.

I am working on a patient-driven platform which brings together all key stakeholders with support of a few good partners, and I am currently conducting user interviews for it. I would love to talk with you more about your ideas. To make scheduling painless, I have a link in my LinkedIn profile summary:


I would love to hear from anyone else with big ideas relating to or are working on driving outcomes towards holistic wellness with patient-center healthcare, patient data collection/quantified self, and patient-powered research networks. In the bigger picture, I am passionate about making the world a better place through innovation and working on what really matters for humanity.

we're making something similar, currently sitting at ~40k patient users, 200+ health systems adopted -


Here's an observation. HIPAA applies to health care providers, insurance companies, and other entities like that. HIPAA does NOT apply to me when I am in possession of my own personal health records. Not saying such an app should not be secure, but for me to hold my own records is regulatorily simpler than HIPAA.

I'm surprised none of the comments mentioned ZocDoc, which does most of this already. You can fill out forms once, schedule appointments and click to send your info to the office.

agreed providing the same exact 4 pages of info to every doctor's office is insane.

It is but you need to understand why that happens (hint: $$)

When you go to the doctor for a sinus infection, the cost of this is not fixed, even across insurance companies. The other factor is the "level" of service. (ref: http://medicaleconomics.modernmedicine.com/medical-economics...)

The more in-depth the examination and the more time you spend with the patient, the more they can charge. All those forms are "taking family history", etc. and it is free money since you have to do the work. Those are then scanned so they can be used later in an audit.

(Source: I also worked at a start-up that was trying to disrupt out patient medical systems. It's very hard and has lots of roadblocks. btw, of the top 50 EMRs in the US, only 3 have APIs and these are mostly to pull data, not push it back in).

All the ballroom dance competitions use this old, disliked software to organize and run the events. The guy who wrote it isn't interested in making improvements, (and it can certainly use improvements) and is happy living off the income from people's per event usage rights. I am sure if something modern and regularly updated came out, it would get a lot of uptake. Thing is, the portion of it that runs during the event needs to be able to run offline since venues don't always have reliable internet, and that also means you would be going to at least the first few events for support.. And your tests better be good, since time is of the essence if some does go wrong mid event. I thought about it, and decided I was not interested in dealing with all that when my job pays pretty well. Still, it's a real opportunity.

Or just wait it out until he snuffs it... http://www.douglassassociates.com/compmngr_history.htm#long_...

"So what happens when the Douglasses are no longer around? We have every reason to believe that we'll be around for a good long time, but we wanted a plan to provide for our loyal customers just in case we aren't so lucky. So we made one.

Here is how the plan works. Immediately upon learning of our deaths the executors of Dick's estate (his two highly computer literate kids) will post two files on our www.compmngr.com web site and will send out a broadcast email advising our customers how to download the files. The first file is a small standalone computer program called RegisterEvent.exe, which allows you to create your own registration files. So you won't have to register with Douglass Associates and you won't have to pay a registration fee. You can read more about RegisterEvent and how to use it below. The second file is a ZIP file containing all the source code for COMPMNGR and its supporting programs. This file will only be of interest to those few users who want to continue COMPMNGR development and who either know C++ programming or or willing to hire a C++ programmer."

> I am sure if something modern and regularly updated came out, it would get a lot of uptake.

Would it? See, here's a dirty little secret: people can't deal with change.

Any change made to the software means people have to learn something new. And that results in tech support.

I once had a very nice chat with the CEO of a CNC company and asked him why certain features weren't implemented since his hardware was clearly capable of it. He was quite blunt that a single new feature added about 30% to his tech support budget for almost 3 years, and his tech support budget was almost 1/3 of his annual budget.

So, he simply will not add a feature until it results in an expected 500K in increased revenue or he has to fend off a competitor.

> Still, it's a real opportunity.

Is it? Actually?

And do you know ballroom competitions well enough to get all the corner cases correct? The Douglasses have been to a LOT of competitions and probably wrote this because they got tired of the grief caused by badly run competitions.

How many ballroom competitions exist (<1000)? How much are they willing to pay (<$1000)? And how much will tech support cost?

So, this is less that $1,000,000 per year in revenue MAX. And, this software is already in place with people know how to use it.

Your revenue will likely be $10-20K per year for a long while unless you completely displace this. And they can always drop their prices and block you out if they feel like it. And your tech support costs will be quite high.

I suspect the Douglasses made this same calculation and that's why they aren't improving it. It's just not worth the money.

This is an idea I also considered, given that in Europe the software is similarly awful, but at least the guy (yes, the one guy) here is still doing some improvements.

See http://www.topturnier.de/ for what he is doing.

If one would like to do something in this space I'd go with a solution where you can rent the equipment, get it shipped to you in boxes and ship it back later. For larger organisers you could arrange for leasing options or an on-premise installation that has an auto-update.

The advantage would be to provide offline capabilities including a controlled network environment for adjudicators.

Is this the software you are talking about: http://www.douglassassociates.com/ ?

It looks like it's offered for free now. The thing with this kind of software, is it must be "good enough" for the task.

It looks like you have to pay "to have more than 250 entries [I don't know what's typical], to sign up for web page creation options, and to receive technical support".

And here they ask for credit card details over http...


Typical is often larger than 250, and not having things like heat lists up online before the event would make people think you are not running things seriously, whether your event is over 250 or not. I hadn't looked at this web site, and haven't personally used the product. If this website is at all indicative of the user friendliness and modernity of the product itself, I can see why the event organizers complain.

Holy crap. I've danced with Liz (she's an amazing dancer--I was not an amazing leader--she was very gracious).

Geez. Small world.

I posted a similar question last year great discussion there too. https://news.ycombinator.com/item?id=9799007

I would suggest we make a monthly of these as they provide important insight into industries.

Seconded. Might even help us get out of the mindset of building things for other 20-somethings to replace their moms.

Exactly. The whole point is to get access to people with insights about an industry and who can point to the problems and why they haven't been solved and connect them with people who might have solutions to those problems.

Just imagine how much valuable knowledge and insights get lost every time someone retires.

P.S: I love using all the things that replace my mom and fully support them being built. But, there's obviously other problems to solve.

Do the problems change rapidly enough for the monthly discussion to be radically different than before?

Perhaps Quarterly?

Well, I'm sure you would still get different people commenting due to one set being home on a Friday night one month and then a different set the next. I would prefer monthly, personally.

Yep, I was reading through the discussion in your post and thought it would be interesting to shake that tree once more so I reposted the question.

Great to hear.

I got so exited about the thread and it made me realise something very interesting which I turned into an essay. I call it looking for hidden problems underneat obvious solutions.


that is a great idea!

Rapid generation of high quality 3D models of existing objects. Process should be independent of object size eg. a coke can should use the same process as a car and process time should scale with object size.

Think somewhere on the order of 10,000 models per day throughput.

There's $BNs waiting for you. It's ridiculously hard.

This might already exist depending on your exact requirements and it's a fairly common technology in the world of metrology. I regularly work with manufacturers to reverse engineer and/or measure molds, jigs, and fixtures for which there are no drawings available. A ROMER or FARO arm with a laser scanning head outputting its point cloud to a software like the PolyWorks suite can generate an incredibly accurate CAD model of incredibly large parts in a very short amount of time (an hour or two at worst if the mesh needs a lot of cleanup).

I assume that process would be easy to speed up if the requirement for absolute accuracy was removed. The 8' ROMER arm we use is accurate to ~!2 microns over its entire volume which is absolutely overkill for something intended to produce models for visual arts applications. A quick and dirty approach to generating the mesh might increase the inaccuracy by several orders of magnitude but when coke can has dimensional tolerances to the tune of tenths of a millimeter, the quick and dirty mesh will still be representative of the end product.

Unfortunately not. FARO and other structured light systems don't export texture and are generally too precise (micron) in current form. So they take too much post processing by default.

There's $BNs waiting for you.

Who would be the primary customers? The entire 3D capturing market is currently several $B per year, including services. Where would be the customers that aren't getting served today that would double this market?

Well it would siphon everything away from the existing 3D capture industry and open it up to smaller groups and those that aren't savvy on it yet. The consumer space generally isn't doing this so anyone that sells anything would get on board at a low enough price point and simplicity.

I think Intel has been trying to do this for a while with their realsense technology and accompanying cameras.


Mapillary has also been moving quickly in this space, not for objects per se, but entire environments.


Not really. I know some folks on the RS team and they don't really develop around applications, they are more focused on miniaturizing and making RS more available and lower power.

That said, some people have tried to use RS for this problem, but from what I've seen end up just using Kinects.

My friend's working on this, for generating 3d floor plans from point cloud data. It's a pretty complex problem.

That's a different problem set all together actually.

But yea, there are a lot of us working on that.

What are the best approaches you've found? Specifically for generating clean, useful NURBS/poly mesh models from lidar or SfM dense point cloud data

My colleagues at Creaform have something that works pretty well. Their latest handheld scanner can generate a wire mesh on the fly with 0.030mm resolution.


10K scans/day is way beyond their limits, and I'm not sure that's a very common use case. But I bet they could get there if they wanted.

I'm sure I speak for many of my fellow fans of physics when I say that technology capable 8.6 seconds per scanning a complex smaller object would have applications beyond just making 3D models.

What is the use case for the models themselves?

Product development engineer here. In the early stages of a project it can be useful to have CAD models of a competitor's product when analyzing how to improve upon them. Recently we had an intern reverse engineer a competitor's product, and we've used some of these CAD models as the basis for our new designs.

Would it be useful to capture the full appearance properties of the model (BRDF) etc to be able to accurately render the object?

This combined with 3D printers would make fixing stuff much easier by printing the broken piece.

There are multiple but mine specifically is AR. It's valuable also for VR, 3D space planning for Designers/Architects/Engineers, Assets for Game Dev, Objects for modeling and simulation, Training Deep Vision nets and on and on...

Anyone who does work with 3d content: visual effects, video games, vr, ar, etc. Being able to quickly build your scenes from a huge library or accurate models would be amazing and would save businesses lots of money.

What kind of structural integrity do you have in mind? Something with the density of industrial packing foam? I've seen set pieces constructed / carved from such material and it can be painted quite well. Putting aside the environmental / toxicity concerns for a moment regarding the type of material to be used, I'm genuinely curious how "rigid" such pieces might need to be.

Perhaps I should have specified that I am talking about Computer Generated 3D models, not physical models.

Oh, so you mean like a big box that could fit XYZ items inside it and capture something like 10,000 per day? I mean, to me it's kind of hard to believe nobody's tried making a "conveyor belt" like process inside a closed system (a shipping container?) with the right optics and resolution to pull it off. Fidelity plus speed plus software consistency. Considering what I saw the gaming industry doing with static models about 10 years ago I kind of thought it'd be a lot further along now, but I guess not. Sounds like a good project for a few Rensselaer Polytechnic Institute grads that otherwise would've been destined for Kodak.

I was musing another kind of 'real world capture' with videogames, because I want to race around my neighborhood in Forza. https://hackernoon.com/dashcam-google-maps-dev-kit-custom-ne...

Oh, so you mean like a big box that could fit XYZ items inside it and capture something like 10,000 per day?

Maybe but I actually think that's the wrong approach.

I mean, to me it's kind of hard to believe nobody's tried making a "conveyor belt" like process inside a closed system

Yea they have - kinda. None of it works well or fast enough though. We put up a patent for one a year ago before I thought there was a better way to do it. The manpower required to move items onto/off of a line is a big part of the problem.

Well yeah the human element is exactly what I'd want to eliminate as much as possible; I'm thinking of it more along the lines of what I've seen on How It's Made: Dream Cars in the sense that to get the various layers you're going to want - basic dimensions, surface features, coloration, reflective properties - aren't going to happen in one quick grab I don't think, and I get the feeling the process would work best in "absolute darkness" and isolated as much as possible for vibration.

Taking that 10k number - assuming disparate types of items that might be part of a series like "Bathroom" (toothbrush, hair brush, toilet brush, plunger) - in 24 hours that means cycling each item through in about 8 seconds. The only way I remotely see that possible is essentially having a robot hand pick up the item at the entry point, hold it for the capture sequence (perhaps have a custom-designed 'mount' that can allow for true 360 via a couple positions), and then drop it out the other side.

It's the scale part I'm wondering about, re: one size machine fits all doesn't seem to make sense. One machine for items under a certain dimension (e.g. "hand held") then another for items where the machine has to essentially have super-powers to pick up and rotate objects to complete the imaging process (e.g. a couch, a dresser, a motorcycle, etc). I think trying too hard to accommodate outliers ends up tainting the balance of operations a little? Just thinking out loud, really cool puzzle.

Do you know what has been tried before or references to other attempts?

Yes there have been a million attempts since the early days of 3D. Most of them are photogrammetry or structured light setups of some kind, that aren't fast enough and don't scale for sizing. Part of it is logistics of getting objects through a scanner, with the accuracy being poor or muddy at best.

IMO it should be done with a mixture image segmentation and procedural generation.

There isn't a combination of laser scanning and/or structured light projection scanning that can accomplish this? Or is it a speed/quality control issue of the output?

Triangulation laser scanning is about the closest you get in terms of accuracy. It can work on virtually any surface, including specularly reflective ones (I've worked on bespoke systems for the steel industry). It's accurate down to microns, but the usual problem is the field of view sucks - either you go further away and sacrifice resolution or you go really close and accept that you need to move the scanner (or object) around a lot. For small things, it's fine. You put your doodad on a turntable. For cars, forget it.

Stereo structured light is great, but doesn't work on specularly reflective objects. You've seen those amazing depth maps from the guys at Middlebury? Wonder how they get perfect ground truth on motorbike cowls that are essentially mirrors? Well they have to spray paint them grey so that you can see the light. The next problem is that you're limited by the resolution of the projector (so I guess if you own a cinema, yay!) and the cameras. Then you have to do all the inter-image code matching which sounds trivial in the papers, but in practice a lot harder (and since you don't get codes at all pixels you need to interpolate, etc, etc).

There are handheld scanners like the Creaform which work pretty well on small things, but I don't know what the accuracy is like.

The ultimate system would probably be a high-resolution, high-accuracy, scanned LIDAR system. Then you lose the problems with scanning ranges/depth of field, but you accept massively higher cost and possibly a much longer scan time for accurate systems.

Beyond just the 3D: Do you think that capturing the full appearance properties (BRDF etc.) of the object would be useful? This would allow users to very accurately render objects.

Definitely, though UV mapping from that data takes it to another level of difficulty.

Yeah think about it as icons but for 3d models.

That's been turned into an industry with very high throughput.

You mean better than those inferred from parallax/pictures?

And inside the object as well?

You mean better than those inferred from parallax/pictures?

I'm not sure what you're asking here. Are you asking if it should be better than what can be done with photogrammetry?

And inside the object as well?

Doing just the outside is a big enough market/problem.

Where exactly are the $BNs in this tech? Also, any datasets? I'd love to experiment.

You may be interested in the COIL dataset, it contains images of objects from many angles.


theres a method for generating 3d point clouds from images using SIFT already

See other responses for the markets it would serve. Use 3D modeling by hand pricing as a comp (Low end $10/model, average in the $50-100/model range, sky's the limit for super HQ stuff).

Not sure what kind of datasets you're looking for. You'll see actual products to test with.

That makes sense. I think there's opportunity for generative ML to eventually help here. An open dataset of (images, description) -> 3d model would go a long way. Check out this paper on using GANs to generate voxel-based models: http://3dgan.csail.mit.edu/

I've been studying and working with GANs for about a year now. They are still very exciting, and I'd love to try to expand my codebase to new types of data.

Additionally, there are some recent techniques that haven't been tried with voxel-based renderings.

Perhaps there is another algorithm that can help go from voxel -> polygons as well.

I think with the right tech, time, and execution this could be a matter of:

1. Take a picture

2. Generate until you get the 3d model you want

That's the exact approach that I think is going to work.

Well, not exact cause I don't like their voxel building generation method.

I think a GAN + Procedural Generator is the winner.

edit: Let me know if you want to work on this cause it's an active area of research for us. See my HN profile for contact.

Can you clarify what you mean by Procedural Generator? Isn't a generative model already a procedural generator? Its just that a model generated in case of the referred paper is voxel based. Did you mean, generate parameters of a pre-specified model e.g https://graphics.ethz.ch/~edibra/Publications/HS-Nets%20-%20... , although this paper is just learning a regression to the human body model (not using GANs).

Curious to know more about your train of thought. I am working as a researcher in the domain and thinking of experimenting with GANs for 3D model estimation using similar inputs as the one in the paper I referred to.

I think generative models could work but both 2d GAN and 3D voxel GAN are very low res

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact