Summary
- Each channel server now runs as a fully independent instance with its own listener, goroutines, and
state — one channel crashing or shutting down no longer affects the others
- Introduces a ChannelRegistry interface for cross-channel operations (find session, disconnect user,
worldcast) replacing direct iteration over a shared []*Server slice
- Adds cmd/protbot, a headless MHF protocol bot that exercises the full sign → entrance → channel
flow for automated testing
- Fixes several data races and panics found by the race detector during isolation testing
Changes
Channel server isolation (server/channelserver/)
- ChannelRegistry interface + LocalChannelRegistry implementation for cross-channel lookups
- done channel for clean goroutine shutdown signaling, idempotent Shutdown()
- Race-free acceptClients/manageSessions using select on done instead of closing acceptConns
- invalidateSessions rewritten with proper locking (snapshot under lock, process outside)
- logoutPlayer guards nil DB and logs errors instead of panicking
- Session loops use per-server erupeConfig instead of global _config.ErupeConfig
- Per-channel Enabled flag in config for selectively disabling channels
Protocol bot (cmd/protbot/)
- Standalone Blowfish connection package (no dependency on server config)
- Sign, entrance, and channel protocol implementations
- 5 scenario actions: login, lobby, session, chat, quests
- 19 unit tests covering packet building, parsing, and connection handling
Bug fixes
- Nil decompSave panic on disconnect before character data loads
- Docker Postgres 18 volume mount path (/var/lib/postgresql/ not /data/)
Test plan
- go test -race ./... passes (27 packages, 0 races)
- 5 channel isolation tests verify: independent shutdown, listener failure recovery, session panic
containment, cross-channel registry after shutdown, stage isolation
- Protbot live-tested against Docker stack (all 5 actions)
- Existing config.json files work unchanged (Enabled defaults to false but config.example.json sets
it explicitly)
The channel server had several concurrency issues found by the race
detector during isolation testing:
- acceptClients could send on a closed acceptConns channel during
shutdown, causing a panic. Replace close(acceptConns) with a done
channel and select-based shutdown signaling in both acceptClients
and manageSessions.
- invalidateSessions read isShuttingDown and iterated sessions without
holding the lock. Rewrite with ticker + done channel select and
snapshot sessions under lock before processing timeouts.
- sendLoop/recvLoop accessed global _config.ErupeConfig.LoopDelay
which races with tests modifying the global. Use the per-server
erupeConfig instead.
- logoutPlayer panicked on DB errors and crashed on nil DB (no-db
test scenarios). Guard with nil check and log errors instead.
- Shutdown was not idempotent, double-calling caused double-close
panic on done channel.
Add 5 channel isolation tests verifying independent shutdown,
listener failure, session panic recovery, cross-channel registry
after shutdown, and stage isolation.
The protbot sent "DSGN:\x00" as the sign request type, but the server
strips the last 3 characters as a version suffix. Send "DSGN:041"
(ZZ client mode 41) to match the real client format.
The entrance channel entry parser read 14 bytes for remaining fields
but the server writes 18 bytes (9 uint16, not 7), causing a panic
when parsing the server list.
The channel server panicked on disconnect when a session had no
decompressed save data (e.g. protbot or early client disconnect).
Guard Save() against nil decompSave.
Also fix docker-compose volume mount for Postgres 18 which changed
its data directory layout.
Copy MHBridge into the Erupe module as cmd/protbot/ so it can be
built, tested, and maintained alongside the server. The bot
implements the full sign → entrance → channel login flow and
supports lobby entry, chat, session setup, and quest enumeration.
The conn/ package keeps its own Blowfish crypto primitives to avoid
importing erupe-ce/config (which requires a config file at init).
Enable multiple Erupe instances to share a single PostgreSQL database
without destroying each other's state, fix existing data races in
cross-channel access, and lay groundwork for future distributed
channel server deployments.
Phase 1 — DB safety:
- Scope DELETE FROM servers/sign_sessions to this instance's server IDs
- Fix ci++ bug where failed channel start shifted subsequent IDs
Phase 2 — Fix data races in cross-channel access:
- Lock sessions map in FindSessionByCharID and DisconnectUser
- Lock stagesLock in handleMsgSysLockGlobalSema
- Snapshot sessions/stages under lock in TransitMessage types 1-4
- Lock channel when finding mail notification targets
Phase 3 — ChannelRegistry interface:
- Define ChannelRegistry interface with 7 cross-channel operations
- Implement LocalChannelRegistry with proper locking
- Add SessionSnapshot/StageSnapshot immutable copy types
- Delegate WorldcastMHF, FindSessionByCharID, DisconnectUser to Registry
- Migrate LockGlobalSema and guild mail handlers to use Registry
- Add comprehensive tests including concurrent access
Phase 4 — Per-channel enable/disable:
- Add Enabled *bool to EntranceChannelInfo (nil defaults to true)
- Skip disabled channels in startup loop, preserving ID stability
- Add IsEnabled() helper with backward-compatible default
- Update config.example.json with Enabled field
Wii U decompilation of all 6 callers of snj_stage_create confirms
the field distinguishes new stage creation (1) from entering an
existing stage (2): lobby/myhouse/quest pass 1, guild room and
move operations pass 2.
A malicious or buggy client could send arbitrarily large payloads
that get written directly to PostgreSQL, wasting disk and memory.
Each save handler now rejects payloads exceeding a generous upper
bound derived from the known data format sizes.
Covers all remaining items from #158: partner, hunternavi,
savemercenary, scenariodata, platedata, platebox, platemyset,
rengokudata, mezfes, savefavoritequest, house_furniture, mission.
Closes#158
Several handlers used packet fields as array indices or SQL column
names without bounds checking, allowing crafted packets to panic the
server or produce malformed SQL.
Panic fixes (high severity):
- handlers_mail: bounds check AccIndex against mailList length
- handlers_misc: validate ArmourID >= 10000 and MogType <= 4
- handlers_mercenary: check RawDataPayload length before slicing
- handlers_house: check RawDataPayload length in SaveDecoMyset
- handlers_register: guard empty RawDataPayload in OperateRegister
SQL column name fixes (medium severity):
- handlers_misc: early return on unknown PointType
- handlers_items: reject unknown StampType in weekly stamp handlers
- handlers_achievement: cap AchievementID at 32
- handlers_goocoo: skip goocoo.Index > 4
- handlers_house: cap BoxIndex for warehouse operations
- handlers_tower: fix MissionIndex=0 bypassing normalization guard
UserBinary type1-5 and EnhancedMinidata are transient session state
resent by the client on every login. Persisting them to the DB on
every set was unnecessary I/O. Both are now served exclusively from
server-scoped in-memory maps (userBinaryParts, minidataParts).
Includes a schema migration to drop the now-unused type2/type3
columns from user_binary and minidata column from characters.
Ref #158
- Reject BinaryType outside 1-5 in SetUserBinary to prevent
dynamic column name with unchecked client input
- Check rengoku payload length before DB write and fixed-offset
reads to prevent panic on short payloads
- Require MercData >= 4 bytes before ReadUint32 to prevent panic
Ref: Mezeporta/Erupe#158
GetStageBinary and WaitStageBinary silently dropped the ACK when
the requested stage did not exist, leaving the client waiting
indefinitely. Additionally, BinaryType1 == 4 and unknown binary
types returned a completely empty response (zero bytes), which
earlier clients cannot parse as a counted structure.
Return a 4-byte zero response (empty entry count) in all fallback
paths so the client always receives a valid ACK it can parse.
Verified via Wii U decompilation of putUpdate_house: the field is set
to 0 when no password is provided, and 1 when a password string is
present. The previous comment "Always 0x01" was inaccurate.
Add package-level documentation (doc.go) to all 22 first-party
packages and godoc comments to ~150 previously undocumented
exported symbols across common/, network/, and server/.
Ghidra decompilation of hf_gp_main in the Wii U binary revealed that
these three fields are reward eligibility thresholds checked against
the player's Hunter Rank, max Skill Rank, and G Rank respectively.
The extra reward fields (Unk5, Unk6, Unk7) in the InfoFesta response
were gated at >= G1, but G1 clients do not expect these 5 extra bytes
per reward entry. This caused the entire packet after the rewards
section to be misaligned, corrupting MaximumFP, leaderboards, and
bonus rates — which broke the festa UI including trial voting.
Wii U disassembly of import_festa_info (0x02C470EC, 1068 bytes)
confirms G3-Z2 reads these fields. G1 binary analysis shows only
8 festa packets (vs 12 in ZZ), and the intermediate/personal prize
systems were not added until G5.2/G7 respectively.
Move SavePointer type/constants, CharacterSaveData struct, getPointers,
Compress, Decompress, and save data serialization methods out of
handlers_character.go into a dedicated model file.
Split three large files into focused modules:
- handlers_guild.go: extract types/ORM into guild_model.go
- handlers_cast_binary.go: extract command parser into handlers_commands.go
- handlers.go: move seibattle types/handlers into handlers_seibattle.go
Separate the two distinct systems into focused files:
- handlers_shop.go: item shops, exchange shops, frontier point trading
- handlers_gacha.go: normal/stepup/box/free gacha, coin management
- Guard nil listener/acceptConns in Server.Shutdown() to prevent panic
in test servers that don't bind a network listener
- Remove redundant userBinaryPartsLock in TestHandleMsgMhfLoaddata that
caused a deadlock with handleMsgMhfLoaddata's own lock acquisition
- Increase test save blob size from 200 to 150000 bytes to accommodate
ZZ save pointer offsets (up to 146728)
- Initialize MHFEquipment.Sigils[].Effects slices in test data to
prevent index-out-of-range panic in SerializeWarehouseEquipment
- Insert warehouse row before updating it (UPDATE on 0 rows is not an
error, so the INSERT fallback never triggered)
- Use COALESCE for nullable kouryou_point column in kill counter test
- Fix duplicate-add test expectation (CSV helper correctly deduplicates)
Anchor the token regex to ^[A-Za-z0-9]+$ so partial matches on
traversal strings like "../../etc/passwd" are rejected. Refactor
the handler to use early returns so execution stops immediately
on validation failure instead of falling through to os.Create
with tainted input.
Add explicit error discards (_ =) for Close() calls on network
connections, SQL rows, and file handles across 28 files. Also add
.golangci.yml with standard linter defaults to match CI configuration.
golangci-lint v2 removed the --out-format CLI flag, causing the lint
job to fail. The golangci-lint-action v7 already uses problem matchers
to surface issues natively in GitHub Actions.
138 bare db.Exec calls across 22 handler files silently dropped write
errors. Each is now wrapped with error check and zap logging.
4 QueryRow sites that legitimately return sql.ErrNoRows during normal
operation (new player mezfes, festa rankings, empty guild item box)
now filter it out to reduce log noise.
Purge oldest guild posts beyond the limit (100 messages, 4 news) after
each new post is created. Replace misleading alliance application TODO
with a note that the feature is not yet implemented.
Fix unchecked error returns on bf.Seek(), db.Exec(), QueryRow().Scan(),
pkt.Build(), logger.Sync(), and binary.Write() calls. The linter now
passes with 0 errors, build compiles, and all tests pass with -race.
Re-enable the golangci-lint job in CI (disabled Oct 2025), update to
Go 1.25 and golangci-lint-action v7. Fix errcheck, gosimple S1009,
staticcheck SA4031 and SA2001 errors across 54 files. Remaining ~39
lint errors will be addressed in follow-up commits.
Move time utilities (TimeAdjusted, TimeMidnight, TimeWeekStart, TimeWeekNext,
TimeGameAbsolute) from channelserver into common/gametime to break the
inappropriate dependency where signserver, entranceserver, and api imported
the 38K-line channelserver package just for time functions.
Replace all fmt.Printf debug logging in sys_session.go and handlers_object.go
with structured zap logging for consistent observability.
Add error checking and logging for ~25 database call sites that were
silently dropping errors, preventing resource leaks (unclosed rows),
nil pointer panics, and silent data corruption in festa transactions.
- Remove deprecated version field from docker-compose.yml
- Pin Postgres to 18-alpine (matches existing db-data)
- Remove undocumented web (Apache) service
- Fix config/bin volume mounts to use docker/ directory
- Gitignore docker/savedata, docker/bin, docker/config.json
- Rewrite docker/README.md: fix typos, use docker compose V2
commands, match actual compose file behavior
- Link docker/README.md from main README Docker section
Reorganize README to put Quick Start first with three install paths
(Docker/binary/source), give quest files their own section, consolidate
updating instructions, trim configuration to essentials with wiki link,
and move informational sections (features, architecture) below setup.
Absorb community tool links from the now-removed pastebin FAQ and update
dead URLs: Ferias → English Project, damage calc → fist.moe, armor set
searcher → mhfz-ass GitHub releases.
The final fallback in seasonConversion blindly constructed a filename
without checking if it existed on disk. When the file was missing,
handleMsgSysGetFile would send doAckBufFail, but the original Frontier
client does not gracefully handle this during quest loading — causing a
softlock instead of showing the built-in error dialog.
Now every fallback path validates file existence before returning, and
also tries the opposite time-of-day variant as a last resort. If no
file variant exists at all, the original filename is returned with a
warning log so the failure ack is still sent.
Multi-stage Dockerfile for smaller runtime image, CD workflow triggers
on main branch pushes and version tags, docker-compose defaults to the
prebuilt GHCR image.
Import 18 network packet test files and 5 server infrastructure test
files, adapted for main branch APIs: fix config import alias (_config),
remove non-existent DevMode field, use global handlerTable instead of
per-server handlers map, and correct validateToken mock expectations
to include both token and tokenID arguments.
Adds go-sqlmock dependency for database mocking in signserver tests.
pg_restore would fail because the dump contains CREATE DATABASE but
POSTGRES_DB already creates it. With set -e this aborted the script
before update/patch schemas could run.
- Allow pg_restore to continue past non-fatal errors
- Add --no-owner --no-acl to avoid permission mismatches
- Force LF line endings for .sh files via .gitattributes
- Quote file path variables in schema loops