Python Pandas - how to read in data from list (data) and columns (separate list) - Stack Overflow

admin2025-04-18  5

I'm running into a situation I don't know what to do:

The data is a list, no index. Sample data:

data = [
 {'fields': ['2024-10-07T21:22:01', 'USER-A', 21,  0,  0, 21]},
 {'fields': ['2024-10-07T21:18:28', 'USER-B', 20, 20,  0,  0, 0, 45]}
]

The column header is in another:

cols = ['Created On', 'Created By', 'Transaction Count (ALL)',
        'X Pending', 'X Cancelled (X)', 'X Completed (Y)']

I have tried using pandas.DataFrame as well as json_normalize, I either get a single column table with each value as a row, or I got all values as a column, and when I try with using "fields", it tells me "list indices must be integers or slices, not str" which I don't understand why I get this... what is the best way to have these info into a dataframe please?

(the number of data elements and number of column headers may not be consistent just for example sake, the real data has things aligned)

I'm running into a situation I don't know what to do:

The data is a list, no index. Sample data:

data = [
 {'fields': ['2024-10-07T21:22:01', 'USER-A', 21,  0,  0, 21]},
 {'fields': ['2024-10-07T21:18:28', 'USER-B', 20, 20,  0,  0, 0, 45]}
]

The column header is in another:

cols = ['Created On', 'Created By', 'Transaction Count (ALL)',
        'X Pending', 'X Cancelled (X)', 'X Completed (Y)']

I have tried using pandas.DataFrame as well as json_normalize, I either get a single column table with each value as a row, or I got all values as a column, and when I try with using "fields", it tells me "list indices must be integers or slices, not str" which I don't understand why I get this... what is the best way to have these info into a dataframe please?

(the number of data elements and number of column headers may not be consistent just for example sake, the real data has things aligned)

Share Improve this question edited Jan 30 at 7:58 mozway 264k13 gold badges50 silver badges99 bronze badges asked Jan 29 at 23:04 Alex FAlex F 355 bronze badges 1
  • "the number of data elements and number of column headers may not be consistent just for example sake, the real data has things aligned". If the real data does not have this problem, why are you presenting things this way? The second dict in the list has 8 elements. How should that be aligned with a col list of length 6? More generally, please provide a minimal reproducible example, i.e., please explicitly add the expected desired output based on the sample provided. – ouroboros1 Commented Jan 29 at 23:14
Add a comment  | 

1 Answer 1

Reset to default 3

You could combine two DataFrame constructors:

data = [{'fields': ['2024-10-07T21:22:01', 'USER-A', 21, 0, 0, 21]},
        {'fields': ['2024-10-07T21:18:28', 'USER-B', 20, 20, 0, 0, 0, 45]},
       ]

out = pd.DataFrame(pd.DataFrame(data)['fields'].tolist())

Output:

                     0       1   2   3  4   5    6     7
0  2024-10-07T21:22:01  USER-A  21   0  0  21  NaN   NaN
1  2024-10-07T21:18:28  USER-B  20  20  0   0  0.0  45.0

If you also have a list of columns cols, you could truncate the columns:

cols = ['Created On', 'Created By', 'Transaction Count (ALL)',
        'X Pending', 'X Cancelled (X)', 'X Completed (Y)']

out = pd.DataFrame(pd.DataFrame(data)['fields'].str[:len(cols)].tolist(),
                   columns=cols)

Output:

            Created On Created By  Transaction Count (ALL)  X Pending  X Cancelled (X)  X Completed (Y)
0  2024-10-07T21:22:01     USER-A                       21          0                0               21
1  2024-10-07T21:18:28     USER-B                       20         20                0                0

Or rename to keep the extra columns:

out = (pd.DataFrame(pd.DataFrame(data)['fields'].tolist())
         .rename(columns=dict(enumerate(cols)))
       )

Output:

            Created On Created By  Transaction Count (ALL)  X Pending  X Cancelled (X)  X Completed (Y)    6     7
0  2024-10-07T21:22:01     USER-A                       21          0                0               21  NaN   NaN
1  2024-10-07T21:18:28     USER-B                       20         20                0                0  0.0  45.0

But, honestly, better pre-process in pure python, this will be more efficient/explicit:

# truncation
out = pd.DataFrame((dict(zip(cols, d['fields'])) for d in data))

# alternative truncation
out = pd.DataFrame([d['fields'][:len(cols)] for d in data], columns=cols)

# renaming
out = (pd.DataFrame([d['fields'] for d in data])
         .rename(columns=dict(enumerate(cols)))
      )
转载请注明原文地址:http://www.anycun.com/QandA/1744945753a89854.html