Graph Neural Networks Part 3: Graphsage Plan How To Place the Learning Store

Parts of this series, we looked at the networks of Graph Callocults Networks (GCNS) and graphical closing networks (gats). Both of these construction advice work well, but they have some limitations! Big One is that in large graphs, counting Node and GCN presentations and gats will be slow. One limit is that if the graph form changes, GCNs and gats will not be familiar. So if the nodes added to the graph, GCN or GAT can't make predictions so. Fortunately, these problems can be solved!
In this case, I will explain the grabs and how to solve regular GCNS problems and gats. We will train graphies and use the graph predicting to compare work with GCNs and gats.
New to GNS? You can start by mail 1 About GCNS (also containing the original setup code samples), and Post 2 about Magats.
Two important problems with GCNS and gats
I touched you soon in the introduction, but let's fall deep. What are problems with previous GNN models?
Problem 1. The hypocrites in general
GCNs and waves are fighting and using invisible graphs. The graph structure requires to be like training data. This is known as Reading variableWhen the model trains and makes predicting the same graph. It is actually an overview of some graphing topologies. In fact, the graphs will change: NONEs and endings can be added or deleted, and this happens many times in real world conditions. We want our GNNs to be able to read the generic patterns in unseen house, or new graphs (this is called cry reading).
Problem 2. They have scalabilities
Training of GCN and Gats in large graphic graphs. GCNs need a repeated integration of the neighborhood, which increases in the size of the graph, while the germs involved (multibeads of attention are well paying attention to the growing places.
In large graphical programs for producing graphs with millions of users and products, GCNS and GCNS do not work and slow.
Let's look at graphs correct these problems.
Graphs (sample and Aggregate)
Graphs make training too quickly and great. It does this Sorting only for a neighbor's set. In large graphs are not possible to process all local neighbors (unless you have a limitless, we all do not do …) as traditional GCNS. One important step of graph Combining the features of poor neighbors for the work of combining.
We will go with all graph steps below.
1. SAMPLING Neighbors
With tabar data, sample is easy. It is something you do in all the standard machinery learning project when creating trains, exams, and verification sets. In graphs, you cannot select random areas. This can lead to broken graphs, nodes without neighbors, etcella:
what you can Make in graphs, selecting the unique subsidy of neighbor. An example is a social network, not sampling 3 friends for each user (instead of all friends):

2. Informed information
After the selection of neighborhood from the previous part, the graphs include their features in one representation. There are many ways to do this (more Tasks for Collection). The most common types and those described in the paper are It means a combination, Lstmbesides tablet.
With the intentively intention Aggregation, the rate is built on all aspects of organized neighborhood (very simple and effective). In the formula:
The compilation of the lstm uses LSTM (Neural network type) to process your neighbors in a row. It can capture more complex relationships, and it is more powerful than the combination.
Type Three, Pool Aggregation, applies to a non-key tools (consider Max-Pooling on neural network, where and you take a high value of certain numbers).
3. Update Node Representation
After a sample of united, node combines its former features with integrated neighborhoods. Places will learn from their neighbors but also keep themselves, just as we saw earlier with GCNS and Gats. Information can flow straight to graph successfully.
This formula of this step:
Integration 2 Acting 2 was made over all the neighbors, and when the local introduction element has been cleaned. This vendeCTor is repeated with weight matrix, and passed entirely (for example). As the last step, the ordinary can work.
4. Repeat multiple layers
The first three steps can be repeated many times, where this happens, the details can flow from long neighbors. In the picture below you can see node with three neighbors designated in the first layer (direct neighbors), and two neighbors designated in the second part (neighboring neighbors.

Summary, the main power graphs are their farming (sample that makes it effective in large graphs); Maintenance, You can use the basic learning (it is effective when it is used to predict the invisible locations and graphs); Configuration helps typically because it is smooth in a noisy factors; And many layers allow the model to learn from far away areas.
It's good! And the best thing, Graphsage is used in PYG, so we can easily use the SpyTroch.
Graph prediction
In previous posts, we used MLP, GCN, and the GAT in CORA Dataset (CC by-SA). To revive your mind a little, the CORA is a scientific experience where you should predict each page title, in seven classes. This data is small, so it may be the best search graph. We will do this, so we can compare. Let's see how well graphs works.
An exciting components of the code I like to highlight the graph:
- This page
NeighborLoader
That makes you choose your neighbors for each layer:
from torch_geometric.loader import NeighborLoader
# 10 neighbors sampled in the first layer, 10 in the second layer
num_neighbors = [10, 10]
# sample data from the train set
train_loader = NeighborLoader(
data,
num_neighbors=num_neighbors,
batch_size=batch_size,
input_nodes=data.train_mask,
)
- The Aggregation version is initiated in
SAGEConv
layer. The default ismean
You can change thismax
orlstm
:
from torch_geometric.nn import SAGEConv
SAGEConv(in_c, out_c, aggr='mean')
- Another important difference is that graphs are trained in mini batches, and GCN and GAT in full data. This affects the essence of the graph, because the sample of neighbor the neighbor makes it possible to train on small batteries, we do not need a complete graph. GCN and GCs need a full graph of the correct broadcasting and accountability scores, which is why we train GCNS and waves on the full graph.
- Every code is like before, unless we have a single class where all different models are established based on the
model_type
(GCN, Gat, or SAGE). This makes it easy to compare or make small changes.
This is a perfect text, we train 100 epochs and repeat the test 10 times 10 to calculate the accuracy of the regular model:
import torch
import torch.nn.functional as F
from torch_geometric.nn import SAGEConv, GCNConv, GATConv
from torch_geometric.datasets import Planetoid
from torch_geometric.loader import NeighborLoader
# dataset_name can be 'Cora', 'CiteSeer', 'PubMed'
dataset_name = 'Cora'
hidden_dim = 64
num_layers = 2
num_neighbors = [10, 10]
batch_size = 128
num_epochs = 100
model_types = ['GCN', 'GAT', 'SAGE']
dataset = Planetoid(root='data', name=dataset_name)
data = dataset[0]
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
data = data.to(device)
class GNN(torch.nn.Module):
def __init__(self, in_channels, hidden_channels, out_channels, num_layers, model_type='SAGE', gat_heads=8):
super().__init__()
self.convs = torch.nn.ModuleList()
self.model_type = model_type
self.gat_heads = gat_heads
def get_conv(in_c, out_c, is_final=False):
if model_type == 'GCN':
return GCNConv(in_c, out_c)
elif model_type == 'GAT':
heads = 1 if is_final else gat_heads
concat = False if is_final else True
return GATConv(in_c, out_c, heads=heads, concat=concat)
else:
return SAGEConv(in_c, out_c, aggr='mean')
if model_type == 'GAT':
self.convs.append(get_conv(in_channels, hidden_channels))
in_dim = hidden_channels * gat_heads
for _ in range(num_layers - 2):
self.convs.append(get_conv(in_dim, hidden_channels))
in_dim = hidden_channels * gat_heads
self.convs.append(get_conv(in_dim, out_channels, is_final=True))
else:
self.convs.append(get_conv(in_channels, hidden_channels))
for _ in range(num_layers - 2):
self.convs.append(get_conv(hidden_channels, hidden_channels))
self.convs.append(get_conv(hidden_channels, out_channels))
def forward(self, x, edge_index):
for conv in self.convs[:-1]:
x = F.relu(conv(x, edge_index))
x = self.convs[-1](x, edge_index)
return x
@torch.no_grad()
def test(model):
model.eval()
out = model(data.x, data.edge_index)
pred = out.argmax(dim=1)
accs = []
for mask in [data.train_mask, data.val_mask, data.test_mask]:
accs.append(int((pred[mask] == data.y[mask]).sum()) / int(mask.sum()))
return accs
results = {}
for model_type in model_types:
print(f'Training {model_type}')
results[model_type] = []
for i in range(10):
model = GNN(dataset.num_features, hidden_dim, dataset.num_classes, num_layers, model_type, gat_heads=8).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
if model_type == 'SAGE':
train_loader = NeighborLoader(
data,
num_neighbors=num_neighbors,
batch_size=batch_size,
input_nodes=data.train_mask,
)
def train():
model.train()
total_loss = 0
for batch in train_loader:
batch = batch.to(device)
optimizer.zero_grad()
out = model(batch.x, batch.edge_index)
loss = F.cross_entropy(out, batch.y[:out.size(0)])
loss.backward()
optimizer.step()
total_loss += loss.item()
return total_loss / len(train_loader)
else:
def train():
model.train()
optimizer.zero_grad()
out = model(data.x, data.edge_index)
loss = F.cross_entropy(out[data.train_mask], data.y[data.train_mask])
loss.backward()
optimizer.step()
return loss.item()
best_val_acc = 0
best_test_acc = 0
for epoch in range(1, num_epochs + 1):
loss = train()
train_acc, val_acc, test_acc = test(model)
if val_acc > best_val_acc:
best_val_acc = val_acc
best_test_acc = test_acc
if epoch % 10 == 0:
print(f'Epoch {epoch:02d} | Loss: {loss:.4f} | Train: {train_acc:.4f} | Val: {val_acc:.4f} | Test: {test_acc:.4f}')
results[model_type].append([best_val_acc, best_test_acc])
for model_name, model_results in results.items():
model_results = torch.tensor(model_results)
print(f'{model_name} Val Accuracy: {model_results[:, 0].mean():.3f} ± {model_results[:, 0].std():.3f}')
print(f'{model_name} Test Accuracy: {model_results[:, 1].mean():.3f} ± {model_results[:, 1].std():.3f}')
Here are the results:
GCN Val Accuracy: 0.791 ± 0.007
GCN Test Accuracy: 0.806 ± 0.006
GAT Val Accuracy: 0.790 ± 0.007
GAT Test Accuracy: 0.800 ± 0.004
SAGE Val Accuracy: 0.899 ± 0.005
SAGE Test Accuracy: 0.907 ± 0.004
Impressive advancement! Even for the small dataset, Graphsage at Caterforms GAT and GCN easily! I repeated this for the CITIERIRA and PUBMED Dates, and there is always the best graphs.
What I love to recognize here that GCN is still very effective, it is one of the most effective basics (if the graph structure lets)). Also, I didn't do Hyperparameter Tuning, but I just go with some familiar numbers (such as 8 heads for the gat headache). With big graphs, complex and noisier, the benefits of the graph are more clear than this example. We have not made any performance examination, because the graphs are Graphs waves that are speedily than GCN.
Store
Graphs bring us good advancement and the best benefits compared to clothing and GCNs. The basic reading of the possibility, graphs can manage the structure of graphs well. And we didn't check this for this post, but the cake sample makes it possible to create big graphic graphical introductions with good work.
Related
Service connection: Mathematical use within graphs
Graph Neural Networks section 1. The Graph Confideral networks are described
Graph Network Networks section 2. Graph Networks VS. gcns