The website used to say a simple thing: every paper should have a working prototype. That is still true, but it is no longer specific enough.
A GitHub pass over the organization on June 18, 2026 shows a clearer shape: Crafter Research is becoming a lab for evidence systems. The public surface is no longer just apps. It is corpora, engines, SDKs, benchmarks, graph artifacts, and research logs that show how conclusions moved.
The repo graph
The GitHub organization has 27 repositories: 23 public and 4 private. The most active work this week clusters into four arcs.
| Arc | Repos | What changed |
|---|---|---|
| Legal research stack | legalize-pe, legalize-pe-engine, amicus, amicus-sdk | The work now spans corpus, engine, assistant, SDK, docs, and retrieval evaluation. |
| Citizen process graph | tramites-pe, tramites-pe-engine | The scraper and engine split corpus truth from derived graph research. |
| Evaluation culture | latambench, website | Benchmarks and logs now record when the first conclusion was wrong. |
| Civic interfaces | peru-financia, political-graph, sunat-cli, andenar | The older civic-tech layer remains useful, but it is now part of a broader research infrastructure story. |
What moved this week
The latest commit pattern matters more than the repo count.
- amicus scaled the retrieval gold set to 35 firm pairs using dual blind annotation. That turned a flashy early result into a more cautious finding: rerank carried the result after the small-set story collapsed.
- tramites-pe harvested the gob.pe citizen-service taxonomy, process graph, and canonical entity directory.
- tramites-pe-engine turned the corpus into a web explorer and process map, including a goal library derived from public-service graphs.
- LatamBench tightened evaluation accounting: valid counts, hallucination metrics, abstention metrics, and judge calibration artifacts.
This is the actual strategy: make public data into durable substrates, put agents on top, and then measure the parts that can fail.
The important shift
The old website grouped projects as active apps and tools. That made sense in March. It is now misleading.
The unit of work is no longer “project”. The unit is inspectable evidence:
- corpus: legal norms, gob.pe services, political finance, election data
- engine: retrieval, graph construction, pathfinding, scoring
- evaluation: gold sets, ablations, judge agreement, precision gates
- interface: assistant, explorer, CLI, SDK, website
- log: what we believed, what broke, what survived
That is why the website needed to change. A visitor should understand the lab by its evidence chain, not by a portfolio grid.
What stays honest
Some claims are still early:
- The amicus gold set is 35 firm pairs, not a legal-grade benchmark.
- The tramites graph has useful signals, but the engine found that real process dependencies often live outside a single public page.
- LatamBench is an evaluation platform, but benchmark methodology is still moving.
- Some active systems, like amicus and Andenar, are partly private while their public interfaces or SDKs are exposed.
The research standard is not to avoid weak spots. It is to publish the weak spot next to the artifact.
Website update
The home page now says what the repos say:
Research systems that can be inspected.
That is the current operating frame. Every new public claim should point to a source, run, repo, live interface, or research log. If it cannot, it is not ready for the homepage.