
Show HN: CSV file merging - jampoole
https://mightymerge.io/
======
arjie
I just use `xsv` for this. `xsv concatenate` and `xsv join` do a pretty good
job. Blazing fast:
[https://github.com/BurntSushi/xsv](https://github.com/BurntSushi/xsv)

~~~
jampoole
That tool looks pretty cool.

Funny thing is several years ago I made a very similar tool to what you
pointed out above written entirely in bash/awk and is also blazing fast.

That's where I got the idea for this online tool. I wanted to create the same
exact concept of working with files locally without having to open up a
terminal/console.

------
abarrettwilsdon
Is the code open source? How would one know with certainty their data isn't
being surreptitiously copied?

(You can also recreate this functionality locally with a few lines of Python):

import pandas as pd

import os

list_of_files = [f for f in os.listdir('.') if os.path.isfile(f) and ".csv" in
f]

print(f"Files that will be concatenated are: {list_of_files}")

original_df = pd.DataFrame()

for file in list_of_files:

    
    
      df = pd.read_csv(file, keep_default_na=False)
     
      original_df = pd.concat([original_df, df], axis=0, ignore_index=True)
    

original_df.to_csv(f"Output file.csv", index=False)

~~~
cuchoi
I like glob for getting the list of files:

    
    
        import glob
        list_of_files = glob.glob("*.csv")
    

You can also use Pathlib:

    
    
        from pathlib import Path
        dir = Path(".").glob('*.csv')

~~~
dr_kiszonka
Glob is great but on some OS (Linux maybe?) it doesn't guarantee maintaining
the order of files. If order matters, it would make sense to sort the results
prior to processing.

------
MiracleUser
I tried vertical merge with 2 files that were each a single column of items
(no header). The resulting file was 3 columns wide, 1 file in first column
with second and third columns blank and then below that the items from the
second file were in the 3rd column with columns 1 and 2 blank

I was expecting a 1 column result file with both files items stacked

~~~
jampoole
This is expected as merging currently requires a header row for each file so
that it can automatically line up the data accurately.

For example if you had two csv files...

[File1.csv]

    
    
        Id,ProductId
        1,200
        2,300
    

[File2.csv]

    
    
        ProductId,Name,Id
        400,NameFor400,3
        500,NameFor500,4
    

Final output would be...

[Output.csv]

    
    
        Id,ProductId,Name
        1,200,""
        2,300,""
        3,400,NameFor400
        4,500,NameFor500
    

With that said, thanks for commenting! I will add an option for "No Header"
files, or maybe even auto detect for these situations and do a straight
merge:)

