Issue reading lists from parquet file into a dataframe showing as None on MacOS but working for Windows

Question

I have a number of parquet files with pricing data, the bid and ask prices and sizes are stored as a list of float values e.g.

                                                bidprices  \
0       [4.51088, 4.51079, 4.51065, 4.51051, 4.51011, ...   
1       [4.51088, 4.51079, 4.51065, 4.51051, 4.51011, ...   
2                    [4.51073, 4.51052, 4.51029, 4.51002]   
3                             [4.51049, 4.51049, 4.51039]   
4                                      [4.51049, 4.51039]   
...                                                   ...      
633621                [4.52003, 4.52001, 4.51988, 4.5195]   

                                                 bidsizes  \
0       [1000000, 5000000, 10000000, 20000000, 4000000...   
1       [1000000, 5000000, 10000000, 20000000, 4000000...   
2                   [1000000, 4000000, 5000000, 10000000]   
3                              [500000, 1000000, 3000000]   
4                                      [1000000, 3000000]   
...                                                   ...      
633621                 [500000, 500000, 2000000, 7000000]

I am using boto3 to connect to an AWS s3 bucket and read the files into a dataframe. There are no connectivity or permission issues, the code has been tested and works when running from a Windows machine.

session = boto3.Session(profile_name='aws-profile')
                s3 = session.client('s3')
                for key in key_name:
                    response = s3.get_object(Bucket=bucket, Key= key + '/' + self.symbol + '_' + x + '.parquet')
                    content = response['Body'].read()
                    file_obj = io.BytesIO(content)
                    df = pd.read_parquet(file_obj)
                    files.append(df)

However, when I run from my machine (MacOS Sequoia Version 15.1 (24B83)) python3 version Python 3.9.6 the dataframe produces empty columns where the lists should be, the same thing happens when the file is stored locally.

df.isnull().all() gives

[1739342 rows x 11 columns]
time         False
sym          False
provider     False
valuedate    False
received     False
bid          False
ask          False
bidprices     True
bidsizes      True
askprices     True
asksizes      True
dtype: bool

I have tried updating python versions, checking permissions and verified the files aren't broken. The strangest thing is I have one file saved locally that doesn't lose the list values when read into a df, but I can't see any differences in how it is stored compared to the other local files that don't work.

I haven't included the full code as it doesn't appear to be the reason here but am happy to include it if necessary. Any help greatly appreciated.

Sasa Trivic · Accepted Answer · 2024-11-13 13:10:34Z

0

Had the same issue right now. Try using a different engine. fastparquet didn't work for me but pyarrow did.

df = pd.read_parquet(file_obj, engine="pyarrow")

answered yesterday

Sasa Trivic

11 bronze badge

New contributor

Add a comment |

Collectives™ on Stack Overflow

Issue reading lists from parquet file into a dataframe showing as None on MacOS but working for Windows

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
python-3.x
list
macos
boto3
parquet
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged python-3.xlistmacosboto3parquet or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python-3.x
list
macos
boto3
parquet
or ask your own question.