Patching GitHub repositories in bulk with APIs and Go’s concurrency features

Updated on
Patching GitHub repositories in bulk with APIs and Go’s concurrency features

I recently came across GitHub’s changelog about the set-output command being deprecated. When I found over 220,000 workflow files still using this deprecated syntax, I knew I needed an automated solution.

Architecture

The solution combines multiple GitHub APIs:

  • GraphQL for precise file operations and commits
  • REST API for repository and PR management
  • patch2pr for handling patches

GraphQL Query Structure

First, I needed to fetch specific workflow files. Here’s the GraphQL query structure:

type FileContentQuery struct {
    Repository struct {
        Object struct {
            Blob struct {
                Text githubv4.String
            } `graphql:"... on Blob"`
        } `graphql:"object(expression: $expression)"`
    } `graphql:"repository(name: $name, owner: $owner)"`
}
func fetchFileContent(client *githubv4.Client, owner, name, expression string) (string, error) {
    var query FileContentQuery
    variables := map[string]interface{}{
        "owner":      githubv4.String(owner),
        "name":       githubv4.String(name),
        "expression": githubv4.String(expression),
    }
    err := client.Query(context.Background(), &query, variables)
    return string(query.Repository.Object.Blob.Text), err
}

Creating Commits via GraphQL Mutation

The interesting part is using GraphQL mutations to create commits. Here’s how I structured it:

graphqlApplier := patch2pr.NewGraphQLApplier(
    client,
    patch2pr.Repository{
        Owner: *fork.Owner.Login,
        Name:  *fork.Name,
    },
    oid,
)
// Create commit using the mutation
sha, err := graphqlApplier.Commit(
    context.Background(),
    "refs/heads/"+*fork.DefaultBranch,
    &gitdiff.PatchHeader{
        Author: &gitdiff.PatchIdentity{
            Name:  "Arun",
            Email: "[email protected]",
        },
        Title: "ci: Use GITHUB_OUTPUT envvar instead of set-output command",
        Body:  "Updating deprecated GitHub Actions commands",
    },
)

REST API for Pull Request Creation

After creating commits, I use GitHub’s REST API to create pull requests:

prRequest := &github.NewPullRequest{
    Title:               &prTitle,
    Body:                &prBody,
    MaintainerCanModify: &maintainerCanModify,
    Draft:              &draft,
    Base:               &base,
    Head:               &head,
}
pr, _, err = clientv3.PullRequests.Create(
    context.Background(), 
    repoOwner, 
    repoName, 
    prRequest,
)

Concurrency Management

To handle multiple repositories efficiently, I implemented concurrent processing with proper error handling:

errChan := make(chan error, len(scannedLines))
for _, scannedLine := range scannedLines {
    wg.Add(1)
    go func(line string) {
        defer wg.Done()
        parts := strings.Split(line, "/")
        repoOwner := parts[0]
        repoName := parts[1]
        fork, _, err := client.Repositories.CreateFork(context.Background(), 
            repoOwner, repoName, nil)
        if err != nil {
            errChan <- err
            return
        }
        // Process repository updates
    }(scannedLine)
}

Current Limitations

I hit a few technical roadblocks:

  1. Fine-grained tokens lack access to public-but-unowned data
  2. GitHub’s rate limiting affects mass operations
  3. Need for proper authentication without full GitHub App installation I’m working on implementing a bot token solution, similar to how Dependabot handles authentication, but the current token limitations are blocking progress.

This approach shows how GitHub’s APIs can be combined for efficient repository maintenance, though it also highlights some areas where the platform could potentially offer native support for such mass updates.

The complete implementation is available in the set-output-janitor repository.