pandas DataFrame edge list to networkX graph object

I am trying to create an undirected graph from a DataFrame formatted_unique_edges - the 'weight' column will purely be used for edge colouring in downstream visualisation using plotly:

 source target weight 0 protein_2 protein_3 3 1 protein_2 protein_6 2 2 protein_3 protein_6 2 3 protein_2 protein_4 2 4 protein_2 protein_5 2 5 protein_3 protein_4 2 6 protein_3 protein_5 2 7 protein_4 protein_5 2 8 protein_4 protein_6 1 9 protein_5 protein_6 1 

The first lines in the linked plotly example, which I am trying to emulate, is:

G = nx.random_geometric_graph(200, 0.125) edge_x = [] edge_y = [] for edge in G.edges(): x0, y0 = G.nodes[edge[0]]['pos'] x1, y1 = G.nodes[edge[1]]['pos'] edge_x.append(x0) edge_x.append(x1) edge_x.append(None) edge_y.append(y0) edge_y.append(y1) edge_y.append(None) 

I first convert formatted_unique_edges to a Graph, then try to emulate the code above, with some diagnostic print statements:

G = nx.from_pandas_edgelist(formatted_unique_edges, edge_attr=True) #also tried G = nx.random_geometric_graph(200, 0.125) as per plotly example edge_x = [] edge_y = [] for edge in G.edges(): print(edge) #('proteinN', 'proteinM') print(G.nodes[edge[0]]) #{} print(G.nodes[edge[1]]) #{} x0, y0 = G.nodes[edge[0]]['pos'] ##### #THROWS KeyError: 'pos' if G is from formatted_unique_edges ##### #prints {'pos': [float, float]} if G is from nx.random_geometric_graph x1, y1 = G.nodes[edge[1]]['pos'] edge_x.append(x0) edge_x.append(x1) edge_x.append(None) edge_y.append(y0) edge_y.append(y1) edge_y.append(None) 

As stated in the comments, I am getting a KeyError from G.nodes[edge[0]]['pos']. I had a look in the spyder variable explorer and G.nodes._nodes from nx.random_geometric_graph has the format:

{0 : {'pos' : [pos_float, pos_float]}, 1 : {'pos' : [pos_float, pos_float]}, ... 199 : {'pos' : [pos_float, pos_float]} } 

Whereas as G.nodes._nodes from formatted_unique_edges has the format:

{'protein_2' : {}, 'protein_3' : {}, 'protein_4' : {}, 'protein_5' : {}, 'protein_6' : {}} 

This all suggests I am making my Graph object from formatted_unique_edges incorrectly with nx.from_pandas_edgelist - can someone advise how I should be doing it?

Thanks! Tim

1 Answer

You missed to generate a layout for your graph. random_geometric_graph generate a graph but not only. It also call a layout to generate the coordinates (pos).

# Convert your dataframe to graph G = nx.from_pandas_edgelist(formatted_unique_edges, edge_attr=True) # Generate the layout and set the 'pos' attribute pos = nx.drawing.layout.spring_layout(G) nx.set_node_attributes(G, pos, 'pos') edge_x = [] edge_y = [] for edge in G.edges(): x0, y0 = G.nodes[edge[0]]['pos'] x1, y1 = G.nodes[edge[1]]['pos'] edge_x.append(x0) edge_x.append(x1) edge_x.append(None) edge_y.append(y0) edge_y.append(y1) edge_y.append(None) 

Output:

>>> G.nodes._nodes {'protein_2': {'pos': array([0.5830424, 0.0301945])}, 'protein_3': {'pos': array([-0.42158911, 0.33654032])}, 'protein_6': {'pos': array([0.30069049, 1. ])}, 'protein_4': {'pos': array([-0.71990583, -0.51877307])}, 'protein_5': {'pos': array([ 0.25776204, -0.84796174])}} 
0

ncG1vNJzZmirpJawrLvVnqmfpJ%2Bse6S7zGiorp2jqbawutJobmlxYWp%2BcX2OqZinnJGoeqWt05qdq5mdmnqmsMaeZKWho6l6tbuMp5ytr5%2BnuLl5xquYqaBdpK%2BrscKt