Introduction¶
In many real-world datasets, relationships are bipartite, i.e. they exist between two disjoint sets of nodes, for example:
- Customers and the products they buy
- Authors and the papers they write
- Countries and the goods they export
However, sometimes we want to study relationships within one set:
- How similar are customers based on shared purchases?
- Which countries have similar export patterns?
We can do this by creating a network projection, connecting nodes that share neighbours. There are different ways to project a network, and the choice of weighting or filtering can significantly affect the results.
Additionally, projections can be very dense. Network filtering helps by extracting only statistically significant edges, revealing the network’s "skeleton".
In this tutorial, we will learn to create projections and apply filtering using Michele Coscia’s projection and backboning packages. Then, we will focus on analysing a real-world remittance network using these tools. Make sure the .py files are stored in the same directory as this notebook.
#Import Packages
import networkx as nx
import numpy as np
import pandas as pd
PART I: Network Projections¶
Creating a Random Bipartite Network¶
We start with a toy example:
- 100 nodes on one side (let’s call them
U) - 50 nodes on the other side (call them
V) - Random edges between them, generated using an Erdős–Rényi model
- Each edge is assigned a random weight, sampled from a uniform distribution between 0 and 10
This gives us a weighted bipartite graph suitable for experimentation.
# Set random seed for reproducibility
np.random.seed(1999)
# Define sizes of the two bipartite sets
n_U = 100
n_V = 50
p = 0.1 # probability of connection between U and V! Check that your network is fully connected!
# Generate bipartite network
B = nx.bipartite.random_graph(n_U, n_V, p)
# Separate node sets
U_nodes = {n for n, d in B.nodes(data=True) if d["bipartite"] == 0}
V_nodes = set(B) - U_nodes
# Assign random weights to edges
for u, v in B.edges():
B[u][v]["weight"] = np.random.uniform(0, 10)
print("Number of U nodes:", len(U_nodes))
print("Number of V nodes:", len(V_nodes))
print("Number of edges:", B.number_of_edges())
# Show a few edges with weights
list(B.edges(data=True))[:5]
Number of U nodes: 100 Number of V nodes: 50 Number of edges: 506
[(0, 101, {'weight': 8.245201736973794}),
(0, 114, {'weight': 9.84626758540893}),
(1, 111, {'weight': 8.93144516735905}),
(1, 131, {'weight': 3.19646914578513}),
(1, 135, {'weight': 5.93256704242059})]
Simple Projections¶
The simplest way to project a bipartite network is to connect two nodes in set U if they share at least one common neighbour in set V.
- In the count projection, the weight of the edge is the number of shared neighbours.
- Other simple similarity measures include cosine similarity, Pearson correlation, Euclidean distance, and Jaccard similarity.
Below we show an example using cosine similarity. Use this as a template to explore the other methods.
import network_map2 as nm2
# Sort nodes for nm2
rows = sorted(list(U_nodes)) # Projecting onto nodes U
cols = sorted(list(V_nodes)) # Nodes V serve as links
# Apply cosine projection
Gp_cosine = nm2.cosine(B, rows)
print("Nodes in projection:", Gp_cosine.number_of_nodes())
print("Edges in projection:", Gp_cosine.number_of_edges())
print(list(Gp_cosine.edges(data=True))[:5])
# Save projection edgelist for future use
edges = nx.to_pandas_edgelist(Gp_cosine)
edges = edges.rename(columns={'source': 'src', 'target': 'trg'})
edges.to_csv("cosine_projection_edges.csv", index=False)
Nodes in projection: 100
Edges in projection: 2020
[(0, 15, {'weight': 0.3071075949191353}), (0, 17, {'weight': 0.4632641931741317}), (0, 21, {'weight': 0.14782815051596288}), (0, 31, {'weight': 0.0010825166066117387}), (0, 33, {'weight': 0.12295669334062476})]
Questions:
- Try using the other functions (
pearson,jaccard,euclidean,hyperbolic). - How do the edge weights and structure of the projection differ between these methods?
- Give examples of financial data that might be analysed well with each of these projection methods.
- NetworkX also has a built-in bipartite projection function (
nx.bipartite.weighted_projected_graph). Which of thenm2projections is it most similar to? Why might the built-in projections in NetworkX be insufficient when analysing more complex or large financial data?
Suggested Answer:
Different bipartite projection methods produce different edge weights and network structures. Pearson correlation generates edges based on linear co-movement, producing strong connections between nodes with similar patterns. Jaccard similarity considers only the fraction of shared neighbours, resulting in a sparser network emphasising overlap rather than magnitude. Euclidean distance weights edges according to the similarity of values across connections, highlighting nodes with similar magnitudes. Hyperbolic projection captures structural similarity in a non-linear way, often identifying nodes in similar positions within the network.
Each projection method is suited to different types of financial data. Pearson correlation works well for spending or investment patterns over time, where linear co-movement matters. Jaccard similarity is useful for presence/absence data, such as which clients perform the same types of transactions. Euclidean distance is appropriate when comparing magnitudes, like total transaction amounts or portfolio weights. Hyperbolic projections help detect actors with similar structural positions, such as banks with comparable exposure patterns across multiple asset classes.
The NetworkX built-in function nx.bipartite.weighted_projected_graph is most similar to the simple count projection algorithm, where edge weights correspond to the number of shared neighbours between two nodes. While convenient, it can be insufficient for complex or large financial networks because it ignores connection magnitudes, cannot implement more advanced similarity metrics like Pearson or hyperbolic measures.
More Complex Projection Methods¶
Beyond the simple projections we previously explored, there are more advanced projection methods that can capture subtle relationships in bipartite networks. These are implemented in network_map2:
- Hyperbolic projection: Reduces the effect of highly connected neighbours; useful when shared connections are less meaningful in dense areas.
- YCN Random Walks: Uses random walks to measure associations; useful for probabilistic relationships such as user–item transitions.
- ProbS: Probability spreading across the network; captures indirect connections, useful for recommendation or influence networks.
- HeatS: Simulates heat diffusion; highlights the influence of central nodes, useful for identifying key players or hubs.
- Hybrid: Combines ProbS and HeatS; balances capturing indirect connections and central node influence, useful for recommendation systems seeking accuracy and diversity.
Most of these methods also accommodate directed bipartite networks, differentiating between in– and out–degrees. This makes them more suitable for datasets where the direction of interaction matters (e.g. money flowing from customers to merchants, or citations from one paper to another).
Below, we present the implementation of the ProbS projection algorithm as an example.
rows = sorted(list(U_nodes)) # Projecting onto nodes U
cols = sorted(list(V_nodes)) # Nodes V serve as links
# Apply cosine projection
Gp_probs = nm2.probs(B, rows, directed = False)
print("Nodes in projection:", Gp_probs.number_of_nodes())
print("Edges in projection:", Gp_probs.number_of_edges())
list(Gp_probs.edges(data=True))[:5]
Nodes in projection: 100 Edges in projection: 2144
[(0, 18, {'weight': 0.0177532148698498}),
(0, 42, {'weight': 0.03950443490554001}),
(0, 61, {'weight': 0.0307276147074634}),
(0, 62, {'weight': 0.019468941630164598}),
(0, 73, {'weight': 0.01854469505645067})]
Questions (optional):
Here are the papers introducting each of the advanced methods: Hyperbolic, YCN Random Walks, ProbS, HeatS & Hybrid. After the tutorial, choose one of the advanced projection methods, read the corresponding paper, and explain in your own words:
- What is the intuition behind the method?
- How does it assign edge weights differently from simple methods (like count or cosine)?
- In what type of real-world dataset would this method be especially useful?
PART II: Network Filtering¶
When we project a bipartite network, the resulting graph is often very dense: almost every node connects to many others. To make sense of this, we need to extract only the most meaningful edges.
A naive approach would be to simply remove edges below a certain weight threshold or filter off nodes and remove all edges connecting to them. However, this is often ineffective: an edge with weight 3 might be very significant for a node with only 1 neighbour, but trivial for a hub with hundreds of neighbours.
Instead, network backboning calculates a statistical score for each edge, evaluating how surprising or important it is given its position in the network. We can then apply thresholding to these scores, keeping only the most significant connections, which is considered the network’s backbone.
Applying filtering algorithms¶
As with network projections, there are different algorithms for network filtering. One of the most recent and versatile ones is the Noise-Corrected backboning algorithm, developed by Michele Coscia and Frank Neffke.
This method accounts for the noise introduced during data collection when calculating edge scores. Instead of simply thresholding raw weights, it evaluates whether an observed edge is stronger than what would be expected given the degree of the nodes it connects.
Below, we apply the Noise-Corrected backboning algorithm to the dense projected network we previously obtained with the cosine projection Gp_cosine.
import backboning
# Read in the table from Gp_cosine
table, nnodes, nnedges = backboning.read("cosine_projection_edges.csv", "weight", sep=",")
# Apply Noise-Corrected Backboning
nc_table = backboning.noise_corrected(table, undirected = True)
# Apply thresholding
threshold_value = 0.05 #we can define different threshold values
nc_backbone = backboning.thresholding(nc_table, threshold_value)
G_backbone = nx.from_pandas_edgelist(
nc_backbone,
source='src',
target='trg',
edge_attr='nij'
)
print("Nodes in backbone:", G_backbone.number_of_nodes())
print("Edges in backbone:", G_backbone.number_of_edges())
print(list(G_backbone.edges(data=True))[:5]) #"nij" is the same as "weight" in our data
# Write the backbone to file
backboning.write(nc_backbone, "cosine_projection", "nc", ".")
Nodes in backbone: 100
Edges in backbone: 1397
[(0, 15, {'nij': 0.3071075949191353}), (0, 17, {'nij': 0.4632641931741317}), (0, 21, {'nij': 0.1478281505159628}), (0, 33, {'nij': 0.1229566933406247}), (0, 34, {'nij': 0.0390928565192532})]
Calculating NC score...
The threshold_value in the example above determines how many edges are kept in the backbone: higher values remove more edges, while lower values retain more of the original network. Choosing a threshold is a tradeoff between reducing noise to keep only significant connections and keeping enough edges (and nodes) to preserve the overall network structure.
Excercise:¶
- We want to explore how the threshold value in the Noise-Corrected backboning algorithm affects the structure of the network. In the cell below, write code that iterates over a range of threshold values, applies Noise-Corrected backboning to the cosine projection network for each threshold, and records the number of nodes and edges in the resulting backboned networks. Then, produce two line plots with threshold values on the x-axis with one plot showing the number of edges and one showing the number of nodes on the y-axis, so you can visually assess how increasing the threshold changes the network.
# SUGGESTED ANSWER
import matplotlib.pyplot as plt
# Define the range of thresholds to explore
thresholds = np.linspace(0, 20, 50)
# Lists to store results
num_nodes = []
num_edges = []
# Iterate over thresholds and record network metrics
for thr in thresholds:
nc_backbone = backboning.thresholding(nc_table, thr)
G = nx.from_pandas_edgelist(nc_backbone, source='src', target='trg', edge_attr='nij')
num_nodes.append(G.number_of_nodes())
num_edges.append(G.number_of_edges())
# Plot number of nodes vs threshold
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(thresholds, num_nodes, marker='o')
plt.xlabel("Threshold")
plt.ylabel("Number of nodes")
plt.title("Number of nodes vs NC backboning threshold")
# Plot number of edges vs threshold
plt.subplot(1, 2, 2)
plt.plot(thresholds, num_edges, marker='o', color='orange')
plt.xlabel("Threshold")
plt.ylabel("Number of edges")
plt.title("Number of edges vs NC backboning threshold")
plt.tight_layout()
plt.show()
- Based on your plots, what would be the optimal threshold to choose if we want to remove as many edges as possible while preserving all the nodes in the projected network?
Answer: The optimal threshold to choose if we want to remove as many edges as possible while preserving all the nodes in the projected network is about 3.4.
- Now lets repeat this exercise by just elimilating random nodes (simple filtering). Create a line plot with the number of nodes removed from the network on the x-axis and the corresponding remaning number of edges on the y-axis. How does this compare to the results of the backboning algorithm?
edge_list = pd.read_csv("cosine_projection_edges.csv")
# Get all unique nodes
all_nodes = pd.concat([edge_list['src'], edge_list['trg']]).unique()
n_total = len(all_nodes)
n_edges_total = len(edge_list)
# Sort nodes by total strength (sum of weights in edges)
node_strength = edge_list.groupby('src')['weight'].sum() + edge_list.groupby('trg')['weight'].sum()
node_strength = node_strength.fillna(0).sort_values(ascending=False)
sorted_nodes = node_strength.index.tolist()
nodes_dropped = []
edges_retained = []
# Iteratively remove nodes from weakest to strongest
for k in range(n_total, 0, -1):
retained = sorted_nodes[:k] # keep top-k nodes
sub_edges = edge_list[edge_list['src'].isin(retained) & edge_list['trg'].isin(retained)]
G = nx.from_pandas_edgelist(sub_edges, 'src', 'trg', edge_attr='weight')
nodes_dropped.append(n_total - G.number_of_nodes())
edges_retained.append(G.number_of_edges())
# Plot
plt.figure(figsize=(6, 5))
plt.plot(nodes_dropped, edges_retained, marker='o', label='Simple filtering')
plt.xlabel("Number of nodes dropped")
plt.ylabel("Number of edges retained")
plt.title("Node filtering: edges retained vs nodes dropped")
plt.grid(True)
plt.legend()
plt.show()
Answer: In the simple filtering approach, edge removal is node-driven: edges disappear only when their connected nodes are removed. This creates a roughly linear relationship between nodes dropped and edges lost, as each node elimination takes all its incident edges with it, producing a steady decline in the network’s connectivity. In contrast, the backboning algorithm implements an edge-driven removal: edges are pruned based on their statistical significance regardless of whether their nodes remain in the network. As a result, many weak connections are eliminated early, causing a rapid initial drop in the number of edges while most nodes are still present. This highlights the most meaningful connections and preserves the network’s core structure, producing a concave-down curve when plotting edges remaining against increasing threshold.
- Noise-Corrected backboning is just one way to extract significant edges from a network. There are several other methods that can be applied, including the Disparity Filter (Serrano et al., 2009), the High Salience Skeleton (Grady et al., 2012), and the Doubly Stochastic Transformation (Beckmann et al., 2009). Choose one of these alternative backboning methods and repeat the same threshold analysis you performed with Noise-Corrected backboning. Investigate how different threshold values affect the number of nodes and edges in the resulting backbone, and visualise your results with line plots.
table, nnodes, nnedges = backboning.read("cosine_projection_edges.csv", "weight", sep=",")
# Apply Disparity Filter
df_table = backboning.disparity_filter(table) # produces disparity filter table
# Define range of thresholds to explore
thresholds = np.linspace(0, 1, 50) # disparity filter values are usually between 0 and 1
# Lists to store results
num_nodes = []
num_edges = []
# Iterate over thresholds and record network metrics
for thr in thresholds:
df_backbone = backboning.thresholding(df_table, thr)
G = nx.from_pandas_edgelist(df_backbone, source='src', target='trg', edge_attr='nij')
num_nodes.append(G.number_of_nodes())
num_edges.append(G.number_of_edges())
# Plot number of nodes vs threshold
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(thresholds, num_nodes, marker='o')
plt.xlabel("Threshold")
plt.ylabel("Number of nodes")
plt.title("Number of nodes vs Disparity Filter threshold")
# Plot number of edges vs threshold
plt.subplot(1, 2, 2)
plt.plot(thresholds, num_edges, marker='o', color='orange')
plt.xlabel("Threshold")
plt.ylabel("Number of edges")
plt.title("Number of edges vs Disparity Filter threshold")
plt.tight_layout()
plt.show()
Calculating DF score...
- (Optional) After completing the threshold analysis with the backboning method you chose, take some time after the tutorial to read the original paper describing that algorithm. Try to understand why the threshold–network structure relationship differs between this method and the Noise-Corrected backboning algorithm. Consider how each algorithm calculates edge significance and how this affects the number of nodes and edges retained at different thresholds. How would this influence your choice of threshold values for different algorithms?
PART III: Real-World Implementation: Bilateral Remittance Matrix¶
In this exercise, we will move beyond the small remittance subset we explored in Tutorial 5 and instead work with the full Bilateral Remittance Matrix dataset.
This dataset presents the remittance estimates for 2018 using migrant stocks, host country incomes, and origin country incomes (in millions of US$). The version we are using is the October 2019 release from the World Bank.
Why use backboning instead of subsetting?¶
In Tutorial 5, we analysed a subset of the dataset with the top 54 countries by remittance to work with a manageable network for analysis. However, as we saw above, subsetting in this way risks missing interesting structural relationships in the data.
Instead, here we will apply the noise-corrected backboning algorithm introduced earlier. While subsetting by "top countries" focuses on size (biggest senders/receivers), backboning filters edges by statistical significance, which often reveals different structural insights.
Exercise 1: Converting the matrix into an edge list¶
The remittance matrix is in CSV format. Each row represents a sending country, and each column represents a receiving country. The cell values are the remittance flows (in millions of US$).
To apply the backboning algorithm, we first need to convert the matrix into a standard edge list format:
src: the sending country (row)trg: the receiving country (column)weight: the remittance value
In pandas, the final edge list should look like this:
edge_list = pd.DataFrame(edges, columns=['src', 'trg', 'weight'])
# SUGGESTED ANSWER
file_path = './Bilateralremittancematrix2018Oct2019.csv'
data = pd.read_csv(file_path)
# Skip the first row (NaNs)
data = data.iloc[1:].reset_index(drop=True)
# Extract sending countries
senders = data.iloc[:, 0].values
# Extract receiving countries
receivers = data.columns[2:] # skip first two columns
# Build edge list as a DataFrame, excluding 'WORLD'
edges = []
for i, sender in enumerate(senders):
for receiver in receivers:
value = data.at[i, receiver]
value_str = str(value).replace(',', '').strip()
# Only process numeric values that are not '-', empty, or NaN
if value_str not in ('-', '', 'nan', 'NaN'):
weight = float(value_str)
if weight != 0:
edges.append([sender, receiver, weight])
# Convert to DataFrame
edge_list = pd.DataFrame(edges, columns=['src', 'trg', 'weight'])
print(edge_list.head())
# Save to CSV (optional)
edge_list.to_csv("remittance_edge_list.csv", index=False)
src trg weight 0 United States Afghanistan 26.0 1 United States Albania 169.0 2 United States Algeria 20.0 3 United States Antigua and Barbuda 15.0 4 United States Argentina 100.0
Exercise 2: Applying the Noise-Corrected Backboning Algorithm¶
With the edge list prepared, we can now apply the Noise-Corrected (NC) backboning algorithm to the remittance network. Filter the network to retain only the most significant edges, by setting a threshold on nij.
Questions:
- When setting the threshold to 0, which countries are eliminated (fully disconnected) from the network? Does this make sense given the geograpical location and income levels of these countries? Why?
- Experiment with higher threshold values. Do you notice differences compared to the thresholds used earlier on the cosine projection network? What differences do you observe, and why do you think they occur?
#SUGGESTED ANSWER
table, nnodes, nnedges = backboning.read("remittance_edge_list.csv", "weight", sep=",")
# Apply Noise-Corrected Backboning
nc_table = backboning.noise_corrected(table, undirected=True)
# Apply thresholding
threshold_value = 0 # adjust as needed
nc_backbone = backboning.thresholding(nc_table, threshold_value)
# Convert to NetworkX graph
G_backbone = nx.from_pandas_edgelist(
nc_backbone,
source='src',
target='trg',
edge_attr='nij' # 'nij' is the corrected weight from backboning
)
# Write the backbone to file
backboning.write(nc_backbone, "remittance_edge_list", "nc", ".")
original_nodes = set(edge_list['src']).union(set(edge_list['trg']))
backbone_nodes = set(G_backbone.nodes())
removed_nodes = original_nodes - backbone_nodes
print("Nodes removed in the backbone:", removed_nodes)
Nodes removed in the backbone: {'San Marino', 'Virgin Islands (U.S.)', 'Suriname'}
Calculating NC score...
Suggested Answer:
When setting the threshold to 0, the countries eliminated from the backbone are Virgin Islands (U.S.), San Marino, and Suriname. This makes sense because these countries are small economies with relatively minor roles in the global remittance system. San Marino, for example, is geographically located within Italy and has very limited independent remittance flows. Similarly, Virgin Islands (U.S.) and Suriname have relatively low connectivity and small remittance volumes. Once the backboning algorithm filters out statistically insignificant links, these nodes become fully disconnected from the network.
In this network we need much larger thresholds to remove edges compared to the cosine projection network. The reason is that here the edge weights represent actual remittance flows, which can reach very large absolute values, while in the cosine projection network the weights are bounded between 0 and 1. This difference in scale matters because the significance scores produced by the noise-corrected backboning algorithm are influenced by the magnitude of the edge weights, meaning that in networks with large-valued weights, higher thresholds are necessary to filter out edges.
Exercise 3: Retaining 54 countries through simple filtering vs NC backboning¶
In Tutorial 5 we analysed the top 54 countries by total remittance to form a network. Here, instead of simple filtering, we will use the noise-corrected backboning method and adjust the threshold until only about 54 countries remain in the backbone. Once you have identified such a threshold, compare the resulting backbone with the Tutorial 5 network. Compare the retained number of links and density of these networks. How do the two approaches differ, and what does this tell you about the effect of statistical filtering compared to simple size-based selection?
# SUGGESTED ANSWER
# Keep the nc_table from Exercise 2 available
target_nodes = 54
threshold_found, G_backbone = None, None
for thr in np.linspace(0, nc_table['nij'].max(), 200):
df = backboning.thresholding(nc_table, thr)
G = nx.from_pandas_edgelist(df, 'src', 'trg', edge_attr='nij')
if G.number_of_nodes() <= target_nodes:
threshold_found, G_backbone = thr, G
break
print("Chosen threshold:", threshold_found)
print("Nodes:", G_backbone.number_of_nodes())
print("Edges:", G_backbone.number_of_edges())
print("Density:", nx.density(G_backbone))
Chosen threshold: 5318.045226130653 Nodes: 53 Edges: 30 Density: 0.02177068214804064
The backboning approach produces a very sparse network, retaining only 53 nodes and 30 edges with a density of approximately 0.022. This occurs because the algorithm filters edges based on statistical significance, keeping only connections that are unusually strong relative to the overall distribution of remittance flows. As a result, the backbone highlights the most meaningful and structurally important ties while removing weaker or incidental connections. This method is particularly useful when the goal is to identify the core structure of a network, focus on key relationships, or reduce noise in highly connected weighted networks, for example when studying which remittance corridors are most critical in global money flows.
# Load edge list from tutorial 5
edge_list = pd.read_csv("net_directional_remittance_edge_list.csv")
# Build directed graph
G = nx.from_pandas_edgelist(
edge_list,
source='Source',
target='Target',
edge_attr='Net_Remittance',
create_using=nx.DiGraph()
)
print("Nodes:", G.number_of_nodes())
print("Edges:", G.number_of_edges())
print("Density:", nx.density(G))
Nodes: 54 Edges: 775 Density: 0.27078965758211043
In contrast, simple filtering of the top countries produces a much denser network with 54 nodes and 775 edges, and a density of approximately 0.271. Here, all connections between the selected countries are retained, regardless of their relative importance. This gives a broader overview of interactions among the largest remittance economies and can be useful for visualisation, policy analysis, or when the aim is to capture the general connectivity and flow patterns among major players. However, it includes many links that may be weak or less informative. While backboning focuses on structural significance, simple filtering provides a more comprehensive picture of activity and can complement the backbone analysis for a fuller understanding of the network.
Exercise 4 (Optional, Challenge)¶
For this optional challenge, you should create an interactive world map using Dash, where we have a slider that controls the threshold of the Noise-Corrected backboning algorithm. As the threshold changes through the slider, the map should dynamically display only the countries (nodes) retained in the backbone at that threshold, allowing you to explore how the network structure evolves when weaker connections are filtered out. This exercise is similar to the interactive network in Tutorial 5, but instead of simply plotting a dynamic network, here we plot a map with the retained countries determined by the statistical significance of their connections.
# SUGGESTED ANSWER
import plotly.express as px
from dash import Dash, dcc, html, Input, Output
# Load original CSV to get country codes
csv_file = './Bilateralremittancematrix2018Oct2019.csv'
data = pd.read_csv(csv_file)
data = data.iloc[1:].reset_index(drop=True)
# Map country names to country codes (second column)
country_name_to_code = dict(zip(data.iloc[:, 0].values, data.iloc[:, 1].values))
# Load edge list
edge_list = pd.read_csv("remittance_edge_list.csv")
# Apply Noise-Corrected Backboning once (undirected)
table, nnodes, nnedges = backboning.read("remittance_edge_list.csv", "weight", sep=",")
nc_table = backboning.noise_corrected(table, undirected=True)
# Dash app
app = Dash(__name__)
app.layout = html.Div([
html.H1("Remittance Backbone Map"),
html.P("Adjust threshold to see which countries remain in the backbone."),
dcc.Slider(
id='threshold-slider',
min=0,
max=10000,
step=50,
value=0,
marks={i: str(i) for i in range(0, 10001, 2000)}
),
dcc.Graph(id='map-graph')
])
@app.callback(
Output('map-graph', 'figure'),
Input('threshold-slider', 'value')
)
def update_map(threshold_value):
# Apply thresholding
nc_backbone = backboning.thresholding(nc_table, threshold_value)
# Build NetworkX graph from backbone
G_backbone = nx.from_pandas_edgelist(
nc_backbone,
source='src',
target='trg',
edge_attr='nij'
)
# Get retained countries
retained_countries = list(G_backbone.nodes())
# Map retained country names to codes
retained_codes = [country_name_to_code.get(name.strip()) for name in retained_countries]
# Prepare a DataFrame for the map (all countries in the original edge list)
all_codes = [country_name_to_code.get(name.strip()) for name in edge_list['src'].unique()]
map_df = pd.DataFrame({
"iso_alpha": all_codes
})
map_df["retained"] = map_df["iso_alpha"].apply(lambda x: 1 if x in retained_codes else 0)
# Plotly choropleth map
fig = px.choropleth(
map_df,
locations="iso_alpha",
color="retained",
color_continuous_scale=["lightgray", "blue"],
range_color=(0, 1),
labels={"retained": "Retained in backbone"},
title=f"Countries Retained in Backbone (Threshold={threshold_value})"
)
return fig
if __name__ == '__main__':
app.run(debug=True)
Calculating NC score...