Machine Learning

Graph Neural Networks Part 3: Graphsage Plan How To Place the Learning Store

Parts of this series, we looked at the networks of Graph Callocults Networks (GCNS) and graphical closing networks (gats). Both of these construction advice work well, but they have some limitations! Big One is that in large graphs, counting Node and GCN presentations and gats will be slow. One limit is that if the graph form changes, GCNs and gats will not be familiar. So if the nodes added to the graph, GCN or GAT can't make predictions so. Fortunately, these problems can be solved!

In this case, I will explain the grabs and how to solve regular GCNS problems and gats. We will train graphies and use the graph predicting to compare work with GCNs and gats.

New to GNS? You can start by mail 1 About GCNS (also containing the original setup code samples), and Post 2 about Magats.


Two important problems with GCNS and gats

I touched you soon in the introduction, but let's fall deep. What are problems with previous GNN models?

Problem 1. The hypocrites in general

GCNs and waves are fighting and using invisible graphs. The graph structure requires to be like training data. This is known as Reading variableWhen the model trains and makes predicting the same graph. It is actually an overview of some graphing topologies. In fact, the graphs will change: NONEs and endings can be added or deleted, and this happens many times in real world conditions. We want our GNNs to be able to read the generic patterns in unseen house, or new graphs (this is called cry reading).

Problem 2. They have scalabilities

Training of GCN and Gats in large graphic graphs. GCNs need a repeated integration of the neighborhood, which increases in the size of the graph, while the germs involved (multibeads of attention are well paying attention to the growing places.
In large graphical programs for producing graphs with millions of users and products, GCNS and GCNS do not work and slow.

Let's look at graphs correct these problems.

Graphs (sample and Aggregate)

Graphs make training too quickly and great. It does this Sorting only for a neighbor's set. In large graphs are not possible to process all local neighbors (unless you have a limitless, we all do not do …) as traditional GCNS. One important step of graph Combining the features of poor neighbors for the work of combining.
We will go with all graph steps below.

1. SAMPLING Neighbors

With tabar data, sample is easy. It is something you do in all the standard machinery learning project when creating trains, exams, and verification sets. In graphs, you cannot select random areas. This can lead to broken graphs, nodes without neighbors, etcella:

Selection from time to time, but some are disconnected. Photo by author.

what you can Make in graphs, selecting the unique subsidy of neighbor. An example is a social network, not sampling 3 friends for each user (instead of all friends):

Discovering to select three lines on the table, all neighbors selected in GCN, three neighbors selected from graphs. Photo by author.

2. Informed information

After the selection of neighborhood from the previous part, the graphs include their features in one representation. There are many ways to do this (more Tasks for Collection). The most common types and those described in the paper are It means a combination, Lstmbesides tablet.

With the intentively intention Aggregation, the rate is built on all aspects of organized neighborhood (very simple and effective). In the formula:

The compilation of the lstm uses LSTM (Neural network type) to process your neighbors in a row. It can capture more complex relationships, and it is more powerful than the combination.

Type Three, Pool Aggregation, applies to a non-key tools (consider Max-Pooling on neural network, where and you take a high value of certain numbers).

3. Update Node Representation

After a sample of united, node combines its former features with integrated neighborhoods. Places will learn from their neighbors but also keep themselves, just as we saw earlier with GCNS and Gats. Information can flow straight to graph successfully.

This formula of this step:

Integration 2 Acting 2 was made over all the neighbors, and when the local introduction element has been cleaned. This vendeCTor is repeated with weight matrix, and passed entirely (for example). As the last step, the ordinary can work.

4. Repeat multiple layers

The first three steps can be repeated many times, where this happens, the details can flow from long neighbors. In the picture below you can see node with three neighbors designated in the first layer (direct neighbors), and two neighbors designated in the second part (neighboring neighbors.

The selected node is the selected neighbors, three of the first layer, two in the second half. It is interesting to recognize that one of the neighborhood of the original stake is the selected area, so that one can choose from two neighbors at the second camp. Photo by author.

Summary, the main power graphs are their farming (sample that makes it effective in large graphs); Maintenance, You can use the basic learning (it is effective when it is used to predict the invisible locations and graphs); Configuration helps typically because it is smooth in a noisy factors; And many layers allow the model to learn from far away areas.

It's good! And the best thing, Graphsage is used in PYG, so we can easily use the SpyTroch.

Graph prediction

In previous posts, we used MLP, GCN, and the GAT in CORA Dataset (CC by-SA). To revive your mind a little, the CORA is a scientific experience where you should predict each page title, in seven classes. This data is small, so it may be the best search graph. We will do this, so we can compare. Let's see how well graphs works.

An exciting components of the code I like to highlight the graph:

  • This page NeighborLoader That makes you choose your neighbors for each layer:
from torch_geometric.loader import NeighborLoader

# 10 neighbors sampled in the first layer, 10 in the second layer
num_neighbors = [10, 10]

# sample data from the train set
train_loader = NeighborLoader(
    data,
    num_neighbors=num_neighbors,
    batch_size=batch_size,
    input_nodes=data.train_mask,
)
  • The Aggregation version is initiated in SAGEConv layer. The default is meanYou can change this max or lstm:
from torch_geometric.nn import SAGEConv

SAGEConv(in_c, out_c, aggr='mean')
  • Another important difference is that graphs are trained in mini batches, and GCN and GAT in full data. This affects the essence of the graph, because the sample of neighbor the neighbor makes it possible to train on small batteries, we do not need a complete graph. GCN and GCs need a full graph of the correct broadcasting and accountability scores, which is why we train GCNS and waves on the full graph.
  • Every code is like before, unless we have a single class where all different models are established based on the model_type (GCN, Gat, or SAGE). This makes it easy to compare or make small changes.

This is a perfect text, we train 100 epochs and repeat the test 10 times 10 to calculate the accuracy of the regular model:

import torch
import torch.nn.functional as F
from torch_geometric.nn import SAGEConv, GCNConv, GATConv
from torch_geometric.datasets import Planetoid
from torch_geometric.loader import NeighborLoader

# dataset_name can be 'Cora', 'CiteSeer', 'PubMed'
dataset_name = 'Cora'
hidden_dim = 64
num_layers = 2
num_neighbors = [10, 10]
batch_size = 128
num_epochs = 100
model_types = ['GCN', 'GAT', 'SAGE']

dataset = Planetoid(root='data', name=dataset_name)
data = dataset[0]
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
data = data.to(device)

class GNN(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, num_layers, model_type='SAGE', gat_heads=8):
        super().__init__()
        self.convs = torch.nn.ModuleList()
        self.model_type = model_type
        self.gat_heads = gat_heads

        def get_conv(in_c, out_c, is_final=False):
            if model_type == 'GCN':
                return GCNConv(in_c, out_c)
            elif model_type == 'GAT':
                heads = 1 if is_final else gat_heads
                concat = False if is_final else True
                return GATConv(in_c, out_c, heads=heads, concat=concat)
            else:
                return SAGEConv(in_c, out_c, aggr='mean')

        if model_type == 'GAT':
            self.convs.append(get_conv(in_channels, hidden_channels))
            in_dim = hidden_channels * gat_heads
            for _ in range(num_layers - 2):
                self.convs.append(get_conv(in_dim, hidden_channels))
                in_dim = hidden_channels * gat_heads
            self.convs.append(get_conv(in_dim, out_channels, is_final=True))
        else:
            self.convs.append(get_conv(in_channels, hidden_channels))
            for _ in range(num_layers - 2):
                self.convs.append(get_conv(hidden_channels, hidden_channels))
            self.convs.append(get_conv(hidden_channels, out_channels))

    def forward(self, x, edge_index):
        for conv in self.convs[:-1]:
            x = F.relu(conv(x, edge_index))
        x = self.convs[-1](x, edge_index)
        return x

@torch.no_grad()
def test(model):
    model.eval()
    out = model(data.x, data.edge_index)
    pred = out.argmax(dim=1)
    accs = []
    for mask in [data.train_mask, data.val_mask, data.test_mask]:
        accs.append(int((pred[mask] == data.y[mask]).sum()) / int(mask.sum()))
    return accs

results = {}

for model_type in model_types:
    print(f'Training {model_type}')
    results[model_type] = []

    for i in range(10):
        model = GNN(dataset.num_features, hidden_dim, dataset.num_classes, num_layers, model_type, gat_heads=8).to(device)
        optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)

        if model_type == 'SAGE':
            train_loader = NeighborLoader(
                data,
                num_neighbors=num_neighbors,
                batch_size=batch_size,
                input_nodes=data.train_mask,
            )

            def train():
                model.train()
                total_loss = 0
                for batch in train_loader:
                    batch = batch.to(device)
                    optimizer.zero_grad()
                    out = model(batch.x, batch.edge_index)
                    loss = F.cross_entropy(out, batch.y[:out.size(0)])
                    loss.backward()
                    optimizer.step()
                    total_loss += loss.item()
                return total_loss / len(train_loader)

        else:
            def train():
                model.train()
                optimizer.zero_grad()
                out = model(data.x, data.edge_index)
                loss = F.cross_entropy(out[data.train_mask], data.y[data.train_mask])
                loss.backward()
                optimizer.step()
                return loss.item()

        best_val_acc = 0
        best_test_acc = 0
        for epoch in range(1, num_epochs + 1):
            loss = train()
            train_acc, val_acc, test_acc = test(model)
            if val_acc > best_val_acc:
                best_val_acc = val_acc
                best_test_acc = test_acc
            if epoch % 10 == 0:
                print(f'Epoch {epoch:02d} | Loss: {loss:.4f} | Train: {train_acc:.4f} | Val: {val_acc:.4f} | Test: {test_acc:.4f}')

        results[model_type].append([best_val_acc, best_test_acc])

for model_name, model_results in results.items():
    model_results = torch.tensor(model_results)
    print(f'{model_name} Val Accuracy: {model_results[:, 0].mean():.3f} ± {model_results[:, 0].std():.3f}')
    print(f'{model_name} Test Accuracy: {model_results[:, 1].mean():.3f} ± {model_results[:, 1].std():.3f}')

Here are the results:

GCN Val Accuracy: 0.791 ± 0.007
GCN Test Accuracy: 0.806 ± 0.006
GAT Val Accuracy: 0.790 ± 0.007
GAT Test Accuracy: 0.800 ± 0.004
SAGE Val Accuracy: 0.899 ± 0.005
SAGE Test Accuracy: 0.907 ± 0.004

Impressive advancement! Even for the small dataset, Graphsage at Caterforms GAT and GCN easily! I repeated this for the CITIERIRA and PUBMED Dates, and there is always the best graphs.

What I love to recognize here that GCN is still very effective, it is one of the most effective basics (if the graph structure lets)). Also, I didn't do Hyperparameter Tuning, but I just go with some familiar numbers (such as 8 heads for the gat headache). With big graphs, complex and noisier, the benefits of the graph are more clear than this example. We have not made any performance examination, because the graphs are Graphs waves that are speedily than GCN.


Store

Graphs bring us good advancement and the best benefits compared to clothing and GCNs. The basic reading of the possibility, graphs can manage the structure of graphs well. And we didn't check this for this post, but the cake sample makes it possible to create big graphic graphical introductions with good work.

Related

Service connection: Mathematical use within graphs

Graph Neural Networks section 1. The Graph Confideral networks are described

Graph Network Networks section 2. Graph Networks VS. gcns

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button