因此,虽然可能有一种更巧妙的方法通过连接和爆炸来完成这个任务,但我认为可以采取两次规范化数据的步骤,然后再使用.explode
方法:
In [107]: df = pd.DataFrame(
...: dict(
...: file=[1, 2, 3, 4],
...: color=["blue", "green", "red", "black"],
...: file_value=[
...: ["123", "abc", "tvr"],
...: ["jlak", "abc", "vds"],
...: ["vssf", "blue", "15a"],
...: "fvg",
...: ],
...: true_value=[
...: ["123", "abc", "tvr"],
...: ["123", "ffs", "tvr"],
...: ["15a", "asd", "15a", "234"],
...: "fvg",
...: ],
...: )
...: )
因此,定义以下这些函数是为了将列表规范化至相同的长度(并确保它们首先始终是列表)。
In [108]: def normalize_to_list(x):
...: if not isinstance(x, list):
...: x = [x]
...: return x
...:
In [109]: def normalize_list_lengths(S):
...: # this could all probably be more elegant
...: file_value = S['file_value']
...: true_value = S['true_value']
...: if len(file_value) == len(true_value):
...: return S
...: elif len(file_value) > len(true_value):
...: diff = len(file_value) - len(true_value)
...: true_value += [None]*diff
...: else:
...: diff = len(true_value) - len(file_value)
...: file_value += [None]*diff
...: return S
...:
In [110]: df
Out[110]:
file color file_value true_value
0 1 blue [123, abc, tvr] [123, abc, tvr]
1 2 green [jlak, abc, vds] [123, ffs, tvr]
2 3 red [vssf, blue, 15a] [15a, asd, 15a, 234]
3 4 black fvg fvg
In [111]: df[['file_value', 'true_value']] = df[['file_value','true_value']].applymap(normalize_to_list)
In [112]: df
Out[112]:
file color file_value true_value
0 1 blue [123, abc, tvr] [123, abc, tvr]
1 2 green [jlak, abc, vds] [123, ffs, tvr]
2 3 red [vssf, blue, 15a] [15a, asd, 15a, 234]
3 4 black [fvg] [fvg]
In [113]: df[['file_value', 'true_value']] = df[['file_value','true_value']].apply(normalize_list_lengths, axis=1)
In [114]: df
Out[114]:
file color file_value true_value
0 1 blue [123, abc, tvr] [123, abc, tvr]
1 2 green [jlak, abc, vds] [123, ffs, tvr]
2 3 red [vssf, blue, 15a, None] [15a, asd, 15a, 234]
3 4 black [fvg] [fvg]
最后,对这两列使用 df.explode
方法。
In [115]: df.explode(column=['file_value','true_value'])
Out[115]:
file color file_value true_value
0 1 blue 123 123
0 1 blue abc abc
0 1 blue tvr tvr
1 2 green jlak 123
1 2 green abc ffs
1 2 green vds tvr
2 3 red vssf 15a
2 3 red blue asd
2 3 red 15a 15a
2 3 red None 234
3 4 black fvg fvg