GUIDE SysML v2

SysML v2 from a GitHub repo: deriving an evidence-backed system model from source

Guide Updated Jun 2026 ~7 min read

Deriving a SysML v2 model from source code instead of hand-authoring it is now plausible — but only if every model element stays traceable to the file and line that justifies it, and only if a human gets the final say. This guide explains what that means, why the derive-versus-author distinction matters, and exactly what Arcwoven produces today, scope and limits included.

A note on scope before anything else: Arcwoven's analysis currently covers TypeScript/TSX, Node, and React only. If your system is built on another stack, the rest of this is useful background, but the tool does not analyze it yet.

What SysML v2 is, and why "derive vs. hand-author" matters

SysML v2 is the next-generation OMG Systems Modeling Language for model-based systems engineering (MBSE). It is not an incremental update to SysML v1.x. Where v1 was a UML profile — stereotypes layered onto UML — v2 drops the UML-profile approach and is built on a purpose-built metamodel called KerML (the Kernel Modeling Language). The OMG approved final adoption of SysML v2.0, KerML v1.0, and the Systems Modeling API & Services v1.0 on 21 July 2025, and published the formal specification on 3 September 2025.

Two changes make v2 relevant to people who work in code. First, v2 adds a standard textual notation alongside the graphical one — the same underlying model, viewable and authored as text. That text can be version-controlled, diffed, and merged like source. Second, v2 defines a standard API for programmatic create, read, update, and query access, so tools can synchronize against one model rather than swapping lossy XMI files.

That textual, Git-friendly form is precisely what makes deriving a model from a repository newly feasible. Hand-authoring SysML — in v1 or v2 — has always been the default, and it is costly: you learn the new syntax and the v2 conceptual model at the same time, then you keep the model current by hand as the system changes. Deriving the model from code that already exists inverts that effort. The model starts from what the system actually is, not from what someone remembered to write down.

The problem: diagrams drift, and hand-authoring is expensive

Two failure modes drive interest in source-derived models.

Drift. A hand-authored model or architecture diagram is a snapshot. The moment code merges past it, it begins to lie. Teams either pay continuously to keep the model synchronized or quietly stop trusting it. This is a familiar reason MBSE adoption stalls: the model becomes a parallel artifact to maintain rather than a reflection of the system.

Cost and difficulty. Vendor and practitioner literature on SysML v2 openly acknowledges a steep learning curve and tends to recommend small pilots and workshops to get started. AI-assisted authoring tools have appeared specifically because manual authoring is laborious and error-prone — but note their direction. Tools such as Visual Paradigm's SysML v2 Studio and the academic SysTemp system generate SysML v2 from natural-language descriptions, not from an existing codebase. Much of the established MBSE tooling also runs the other way: model-to-code (generating implementation from a model), not code-to-model.

That gap is the point. As of 2025–2026, there is no widely known, established tool that auto-derives a SysML v2 model from an arbitrary source repository. MBSE incumbents are largely forward-engineering and digital-thread tools; they generally do not ingest a raw repo. The "paste a repo, get a picture" tools do ingest repos, but they typically emit informal diagrams (such as Mermaid) rather than SysML, and — the part that matters most here — they do not tie their claims back to the evidence behind them.

How an evidence-backed, source-derived model works

The naive version of "model from code" is to feed a repository to a language model and accept whatever architecture it describes. Published research is blunt about why that fails. Benchmarks of LLM-generated architecture views report that models can struggle with completeness and consistency, that traceability between code and generated components is frequently absent, and that such output should be treated as a preliminary draft requiring substantial human review rather than a finished artifact. Separate work on generating architecture documents from code found that LLMs omitted whole categories of behavior for complex components. Even repo-wiki generators that link to sources, such as DeepWiki, recommend human review for critical use and note that output quality tracks input quality.

So source-linking alone is not the answer. The crux is traceability to evidence at the level of each claim: every element in the model — every boundary, component, interface, dependency — points to the specific files (and the lines within them) that justify its existence, and is reviewable on its own terms. A model element you cannot trace to evidence is a hypothesis, and it should be treated as one until a person confirms it.

What Arcwoven produces and exports

You paste a public GitHub repository URL. Arcwoven runs automated analysis and produces an evidence-backed architecture and system model in a private, per-user workspace. The model covers:

System boundaries — what is inside the system and what is external.
Services and components — the units the system is built from.
APIs and interfaces — how those units expose and consume behavior.
Data stores — where state lives.
Dependencies — how components and services relate.
Implementation details — grounded in the actual code.

Every element is traceable to the source evidence behind it — the files and lines that support it. Exporting the reviewed model as diagrams, documents, HTML, and SysML v2 is a Pro feature; the SysML v2 export gives MBSE and systems-engineering practitioners a starting model in a standard, textual form — derived from code rather than typed from scratch — which you can then carry into a SysML v2 toolchain and refine.

A deliberate scope statement: the list above describes what the model represents and how it is exported. It does not promise a specific accuracy figure, a guaranteed-complete model, or coverage of anything the evidence does not support. Where the code is silent, the model should be silent or explicitly uncertain.

The honest review loop: provisional until a human agrees

Generated outputs are provisional. They come from repository evidence plus automated analysis, and they may be incomplete or wrong until a human reviews them. Arcwoven is built around that review loop rather than around the pretense that automation is authoritative.

In practice this means each assertion in the model cites the evidence that supports it, so an engineer can open the citation, check the claim against the code, and accept, correct, or reject it. The model is "finished" when an engineer agrees with it, not when the analysis run completes. This is not a hedge bolted on for caution; it directly answers the documented failure modes of automated architecture extraction — incompleteness, run-to-run inconsistency, and missing code-to-component traceability. The evidence trail exists so that review is fast and grounded, not so the output can skip review.

The pilot scope, stated plainly

Honesty about scope is a trust signal, so here it is up front rather than buried:

Languages: analysis currently covers TypeScript/TSX, Node, and React only. Arcwoven does not support other languages or stacks today; do not assume coverage beyond this set.
Repositories: public repositories only, submitted as a public repository root URL (for example, https://github.com/owner/repo).
Access: a free account gives you a private workspace that stays available to your account. Import quotas are 2 per day, 6 per week, and 10 per month.
Export: deriving and reviewing models is included free; exporting them (diagrams, documents, HTML, and SysML v2) is a Pro feature.
Beyond the pilot: higher limits and export are available on Pro. Private-repository support is coming to Pro and is handled today through a private pilot — contact [email protected].

Arcwoven is a product of Arcwoven Systems. It is not affiliated with, sponsored by, or endorsed by GitHub; GitHub is a trademark of GitHub, Inc. Submitted repository content is not used to train third-party foundation models.

Try it on your repo

If you work in TypeScript, Node, or React and want to see a system model derived from your code — with every claim tied back to the file that justifies it — create a free account, paste a public repo URL, and review the result. Upgrade to Pro to export the SysML v2 (and diagrams, documents, or HTML) into your toolchain.

Create free account See the workflow

For how the end-to-end flow fits together, see the workflow overview. For why evidence and human review sit at the center of the model, see how generated outputs are reviewed against evidence.

FAQ

What languages does the pilot support?

TypeScript/TSX, Node, and React only. Support for other languages and stacks is not available today.

Public or private repositories?

Public repositories only today. Private-repository support is coming to Pro; for early access, contact [email protected] about the private pilot.

Is the SysML v2 export hand-editable?

Yes. SysML v2's textual notation is plain text by design — version-controllable, diffable, and editable like source. Arcwoven exports SysML v2 as a derived starting model; treat it as provisional until you have reviewed it, then edit and refine it as you would any model.

Can I export on the free plan?

Deriving and reviewing models is free. Exporting — diagrams, documents, HTML, and SysML v2 — is a Pro feature.

Is the generated model authoritative on its own?

No. Outputs are provisional and may be incomplete or wrong until a human reviews the evidence behind them. The model is finished when an engineer agrees with it.

Are there usage limits?

Yes — 2 imports per day, 6 per week, and 10 per month on a free account.

Does Arcwoven replace hand-authored MBSE models?

No. It gives you an evidence-backed starting model derived from code, which a person then reviews and refines. It is a faster starting point and a way to keep architecture grounded in source, not a substitute for engineering judgment.