Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: AI co-worker for system software development (Rust,C,C++,pdf) (h2loop.ai)
22 points by yosai 31 days ago | hide | past | favorite | 19 comments
Hey Everybody,

We are really excited to release the 1st version of H2LooP studio today.https://h2loop.ai/ H2LooP Studio helps system software engineers generate code from technical specs, debug issues, and understand complex code in C, C++, Go, and Rust. Under the hood, it uses the H2LooP Data Engine to create instruction-tuned datasets from data sheets and source code.

Models are what they eat. We create high-quality, pre-vetted domain-specific training data (telecom, IoT, automotive, consumer electronics) at scale for fine-tuning small language models. We leverage both LLMs and human expertise (system knowledge) to build this dataset.

Why are we building H2Loop?

1.Challenges in System Code: -System code presents significant challenges for LLMs that lack specialised pre-training. -Existing tools like GitHub Copilot struggle with tasks such as generating device driver code, debugging network kernel crashes, and interpreting hardware schematics.

2.Limitations of Current Coding Assistants:

-Results from generic coding assistants are often unclear and insufficient. -These tools are unable to handle technical specifications or crash logs, which are essential for system software development. -System developers frequently need to reference specifications like Wi-Fi, Bluetooth, or network protocols while coding, but current tools fail to meet these needs.

3.Specialised Requirements for System Software:

-System software is typically written in languages like C, C++, Go, and Rust, often in closed-source projects. -Enterprises need specialised solutions that understand their specific domain and coding standards.

Challenges in Generating Accurate Code from Technical Specifications:

1.Unstructured Format of Technical Specifications: -Technical specifications are often in PDF format, which is inherently unstructured. -Parsing PDFs that include images, tables, and various text elements, and aligning them with reference sample code, presents a significant challenge.

2.Difficulty in Creating Domain-Specific Datasets: -Developing a question-and-answer coding dataset for specialised domains like automotive or telecom, suitable for LLM training, is a complex task.

3.Necessity of Expert Review: -Expert review of the training dataset is crucial. For example, if a dataset is created for socket creation in a networking protocol, it must be meticulously checked by an expert before being used for fine-tuning.

The Solution: 1.RAG-Based Parsing and Chunking: -We employ a Retrieval-Augmented Generation (RAG) solution to parse and chunk PDFs effectively. -By combining LLM and manual methods, we align the content from PDFs with source code to create an instruction tuned dataset.

2.Expert Review and Validation: -Our team of system and domain experts thoroughly review and validate the training datasets, which are formatted in JSON.

3.Collaborative Fine-Tuning: -We partner with enterprises to transform their code and technical specifications into expert-vetted, domain-specific datasets. -We then assist in fine-tuning a small language model tailored to their domain and coding standards.

Who can use H2LooP: H2LooP is a valuable tool for professionals like developers, product managers, and CTOs. If you're working on proprietary software, frequently coding from technical specifications,H2LooP is for you.

Demo: https://studio.h2loop.ai/

H2LooP Studio is hosted in the cloud. You can download sample technical specifications and experiment with the H2LooP model to generate system software code.

We will soon be releasing the H2LooP Data Engine, which will allow you to create training datasets by uploading code and PDFs.

For more details, refer to https://h2loop.notion.site/

Also please join our community at :

- Slack : https://h2loopstudio.slack.com - Twitter : https://x.com/h2loopinc

Would love to hear your feedback & how we can make this better. Thank you, Team H2LooP




Long ago, I was a system developer. Now I think of it - I feel the whole development / logs / debugging / deploying was all abysmally slow. To even load software via JTAG took about 5 minutes! So there is big room to improve things in this space.

Do you have clients using this now ? how are you thinking to land inside semiconductor-ish corporates ?


Yes rnavi,I have the same experience as well.The complete system software development can be expedited using AI.Yes, we are doing private beta with a Japanese semiconductor conglomerate.We go with a philoshopy like your data, your model and your cloud.We create instruction dataset from their tech spec and code.Everything on-prem.We have partnered with AMD and Nvidia for on-prem GPU deployment.We help them selecting the right small language coding model and deploying it with securely.


.. and how does it fare with devin/cognition etc - have not tried them just youtube videos.


Devin and Cognition are great as co-workers for application software. They're proficient in JavaScript and Python and excel at solving logical problems in software. However, they aren't trained in or familiar with system software, which typically runs on physical devices and requires verification against technical specifications for programming.


This is an exciting development for system software engineers working with complex code in languages like C, C++, Go, and Rust. The integration of expert review and validation adds a layer of reliability that’s often missing in generic coding assistants. I’m particularly interested in seeing how the RAG-based parsing and collaborative fine-tuning work in practice. Looking forward to experimenting with the demo and seeing how this evolves. Great work, Team H2LooP!


Thanks shrianshag..You can try some sample Datasheets(pdf) and ask code generation questions.


Please visit https://h2loop.ai/ to know more about h2loop.ai


Please visit https://h2loop.ai/ to know more about h2loop.ai


very cool! ive been using chatgpt for OS dev, to summarize specs and try to make bulletpoints for stuff todo etc. but its a nightmare :'). 100k documentation pages is a pain to chew through to find what u need and can be hard to read :/. will definitely give it a whirl!

update: tried some basics questions on general stuff during the boot phase. its really nice honestly. that stuff to looong to figure out 10 years back :') all the examples etc. are usually for unix type stuff. chatgpt mixes and mashes tons of things so its basically unusable. this gave some really nice pointers, good structures and mixes a bunch of complex topics into quite coherent responses which so far dont disagree with what ive managed to pick up. amazing!


Thanks for validating the pain point..I was also using chatGPT.It's was a pain.We are building something really big to solve the problem of the system developers.Stay tuned.


Looks interesting to me, but I had some questions. Could you elaborate on the process of expert review and validation? How do you ensure the quality and accuracy of the datasets created?


We have a team of domain expert who do the vetting of the instruction dataset.We do typical RLHF(Reinforcement learning from human feedback) and connect back to our SFT(supervised finetuning) loop.That's why we name ourself as hardware and human in loop.Humans play an important role in ensuring quality and accuracy of our dataset.


Got it, and how well does it work with more complex documents, like those with a lot of images or intricate tables? I'm curious about how accurately it aligns the content with the source code in those cases.


We use multimodal RAG and tools similar to unstructued.io ,We generate structured output and use LLM again to do the matching with our AST parsed source code.Now matching part is really complex and need manual inspection and validation.


Please visit https://h2loop.ai/ to know more about H2LooP


I don't want to be that dude, but the tect seems AI generated. Cool product though! I'll give it a go


Will there be a self-hosted version of H2LooP Studio in the future, or will it remain cloud-only?


Is this another kind of co-pilot ? what kind of performance improvement you see than the other established copilots in h2loop.


Hey cjtechie,It's not an another co-pilot.Existing coding co-pilot can only do code generation,code comprehension But not allow you to upload your technical datasheet or crash log etc.Most of the coding agents are SaaS based and SOC2, GDPR and even HIPAA certified. However, these certifications do not guarantee data privacy since they are relatively easy to obtain. Moreover, most SaaS make use of third party providers for different sub-processes (LLMs, embeddings, reranking…), resulting in private data being transferred and stored on numerous servers across the internet.privacy-aware coding SaaS services anonymize sensitive data before sending it to AI. This approach combines the ease of adoption and power of SaaS with data privacy. However, it comes with a single point of failure, which is the anonymization algorithm itself.We solve it by genarting coding datsets and finetuning a small open source coding model on-prem.You can compare the response in a side by side view to gpt-4o model in our platform.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: