Patching GitHub repositories in bulk with APIs and Go’s concurrency features
I recently came across GitHub’s changelog about the set-output
command being deprecated. When I found over 220,000 workflow files still using this deprecated syntax, I knew I needed an automated solution.
Architecture
The solution combines multiple GitHub APIs:
- GraphQL for precise file operations and commits
- REST API for repository and PR management
patch2pr
for handling patches
GraphQL Query Structure
First, I needed to fetch specific workflow files. Here’s the GraphQL query structure:
type FileContentQuery struct {
Repository struct {
Object struct {
Blob struct {
Text githubv4.String
} `graphql:"... on Blob"`
} `graphql:"object(expression: $expression)"`
} `graphql:"repository(name: $name, owner: $owner)"`
}
func fetchFileContent(client *githubv4.Client, owner, name, expression string) (string, error) {
var query FileContentQuery
variables := map[string]interface{}{
"owner": githubv4.String(owner),
"name": githubv4.String(name),
"expression": githubv4.String(expression),
}
err := client.Query(context.Background(), &query, variables)
return string(query.Repository.Object.Blob.Text), err
}
Creating Commits via GraphQL Mutation
The interesting part is using GraphQL mutations to create commits. Here’s how I structured it:
graphqlApplier := patch2pr.NewGraphQLApplier(
client,
patch2pr.Repository{
Owner: *fork.Owner.Login,
Name: *fork.Name,
},
oid,
)
// Create commit using the mutation
sha, err := graphqlApplier.Commit(
context.Background(),
"refs/heads/"+*fork.DefaultBranch,
&gitdiff.PatchHeader{
Author: &gitdiff.PatchIdentity{
Name: "Arun",
Email: "[email protected]",
},
Title: "ci: Use GITHUB_OUTPUT envvar instead of set-output command",
Body: "Updating deprecated GitHub Actions commands",
},
)
REST API for Pull Request Creation
After creating commits, I use GitHub’s REST API to create pull requests:
prRequest := &github.NewPullRequest{
Title: &prTitle,
Body: &prBody,
MaintainerCanModify: &maintainerCanModify,
Draft: &draft,
Base: &base,
Head: &head,
}
pr, _, err = clientv3.PullRequests.Create(
context.Background(),
repoOwner,
repoName,
prRequest,
)
Concurrency Management
To handle multiple repositories efficiently, I implemented concurrent processing with proper error handling:
errChan := make(chan error, len(scannedLines))
for _, scannedLine := range scannedLines {
wg.Add(1)
go func(line string) {
defer wg.Done()
parts := strings.Split(line, "/")
repoOwner := parts[0]
repoName := parts[1]
fork, _, err := client.Repositories.CreateFork(context.Background(),
repoOwner, repoName, nil)
if err != nil {
errChan <- err
return
}
// Process repository updates
}(scannedLine)
}
Current Limitations
I hit a few technical roadblocks:
- Fine-grained tokens lack access to public-but-unowned data
- GitHub’s rate limiting affects mass operations
- Need for proper authentication without full GitHub App installation I’m working on implementing a bot token solution, similar to how Dependabot handles authentication, but the current token limitations are blocking progress.
This approach shows how GitHub’s APIs can be combined for efficient repository maintenance, though it also highlights some areas where the platform could potentially offer native support for such mass updates.
The complete implementation is available in the set-output-janitor
repository.