ZTranslate translates games based on screen grabs on the user's machine, in either package mode, or automatic mode. In automatic mode, when the user presses the tilde key (~) the client grabs a screenshot of the current window in focus, sends the image to the ZTranslate server, and then waits to recieve the translated image back to display in ZTranslate window. The server does this by doing a generalized OCR algorithm, and then using machine translation to translate the OCRed text to the target language (both via google apis).
These two actions can be slow and inaccurate, so the image sent to the server is saved, where the user can modify and improve the translation, and eventually release a packaged translation. In package mode, a user loads a pre-made package, and then the client will continuously grab the window in focus and translate the image based on the package. This mode runs much faster than automatic mode (allowing continous grabbing), doesn't require making calls to the server, and uses a currated human translation instead of a machine one. Packages themselves can be made public for other users to download from the downloads page on this site, or they can just distribute it themselves whichever way they want.
Creating a package for a game yourself first starts with the sourcing step, where you grab images from the game while in automatic mode in the ZTranslate client. After that, you create a game page for the game, move the images you sourced from the (uncategorized) game to the new game page, create the index definitions, and fix the translations. When you're done, set the game to be public on the edit game page, or download the package to distribute it yourself.
For package mode to work, the game needs defined "indexes" to be able to match up the text on the screen to the translated text in the package. This uses optical character recognition (OCR), but anyone familar with OCR will tell you that OCR is a finicky beast. In order for it to work, you have to reduce the image down to clear white text against a clear black background, and even then it'll mess up reading characters.
But while writing a generalized algorithm to do this clean-up is difficult, writing a generalized algorithm to clean up an image for OCR and be able to run it fast enough for real-time translation on a client machine is near impossible. This is why ZTranslate uses a pipeline system that the translating user can define to make the OCR (or just image-matching) algorithms fast enough. A index pipeline consists of two pipelines: the indexer pipeline and the ocr pipeline. The indexer is used on a translated image and creates indexes that can be used to select the correct translation when run in the ocr pipeline. The ocr pipeline runs on an untranslated image to match it up to an existing translation.
A full API specification of how to define these pipelines, and specific image options can be found in the API subsection.
This is an example "game_indexes" definition from "Die Höhlenwelt Saga: Der Leuchtende Kristall." It defines two indexes: the "Diff Block" index and the "Deu OCR" index. The "Diff Block" index is a general-purpose index that matches text based on the exact pixels of the bouding box of the text on the screen. When looking for a match, it checks each translated textbox against the corresponding space on the screen and add its text if the two cropped rectangles are similar enough. Because there could be many (eg: thousands) of boxes to check, the algorithm includes a HSV (hue, saturation, value) index on the overall color of the screen to reduce down the textboxes it has to check. It is a good general-purpose algorithm, especially for fixed-position textboxes, but will fail if the backgound is too variable, the text can change position, or the text itself can change. In "Die Höhlenwelt Saga: Der Leuchtende Kristall," it uses this algorithm for most colored texts, and the intro cutscene.
The second index it defines is the "Deu OCR" index which is the german OCR algorithm. It is designed to pickup the white text with black outlines that don't have a fixed position or formatting on the screen. A more indepth look at the individual commands can be found in the API section.
{
"default_index": "deu_ocr",
"indexes": [
{
"displayName": "Screen Diff Block",
"indexer": [
{
"action": "indexHSV",
"options": {
"tolerance": 0.05
}
},
{
"action": "sharpen",
"options": {
"sharpness": 4.0
}
},
{
"action": "crop",
"options": {
"color": "$block.colors",
"x1": "$block.bounding_box.x1",
"x2": "$block.bounding_box.x2",
"y1": "$block.bounding_box.y1",
"y2": "$block.bounding_box.y2"
}
},
{
"action": "createIndex",
"options": {}
}
],
"name": "diff_block",
"ocr": [
{
"action": "indexHSV",
"options": {
"tolerance": 0.05
}
},
{
"action": "sharpen",
"options": {
"sharpness": 4.0
}
},
{
"action": "findShortlistByIndex"
},
{
"action": "expandShortlist"
},
{
"action": "crop",
"options": {
"color": "$block.colors",
"x1": "$block.bounding_box.x1",
"x2": "$block.bounding_box.x2",
"y1": "$block.bounding_box.y1",
"y2": "$block.bounding_box.y2"
}
},
{
"action": "diffImage",
"options": {
"image": "$block.index_image"
}
}
]
},
{
"displayName": "German OCR",
"indexer": [
{
"action": "reduceToMultiColor",
"options": {
"base": "FF0000",
"colors": [
[
"FFFFFF",
"FFFFFF"
],
[
"FFDF00",
"FFDF00"
],
[
"EFA20C",
"EFA20C"
],
[
"000000",
"000000"
]
],
"threshold": 32
}
},
{
"action": "segFill",
"options": {
"base": "FF0000",
"colors": [
"FFFFFF",
"FFDF00",
"EFA20C"
]
}
},
{
"action": "reduceToColors",
"options": {
"colors": [
"FFFFFF",
"FFDF00",
"EFA20C"
],
"threshold": 32
}
},
{
"action": "indexOCR",
"options": {
"common_errors": [
"\u00fc=u",
"\u00f6=o",
"\u00e4=a",
"e=c",
"o=0",
"D=)",
"E=F",
"t=l",
"vv=W"
],
"lang": "deu",
"miss": [
1,
6,
20,
40
],
"mode": 6,
"subs": {
"sub_chars": [
"0",
"1",
"2",
"3",
"4",
"5",
"6",
"7",
"8",
"9"
],
"sub_placeholder": [
"{{",
"}}!"
]
},
"text_colors": [
"FFFFFF",
"FFDF00",
"EFA20C"
]
}
},
{
"action": "createIndex"
}
],
"name": "deu_ocr",
"ocr": [
{
"action": "reduceToMultiColor",
"options": {
"base": "FF0000",
"colors": [
[
"FFFFFF",
"FFFFFF"
],
[
"FFDF00",
"FFDF00"
],
[
"EFA20C",
"EFA20C"
],
[
"000000",
"000000"
]
],
"threshold": 32
}
},
{
"action": "segFill",
"options": {
"base": "FF0000",
"colors": [
"FFFFFF",
"FFDF00",
"EFA20C"
]
}
},
{
"action": "reduceToColors",
"options": {
"colors": [
"FFFFFF",
"FFDF00",
"EFA20C"
],
"threshold": 32
}
},
{
"action": "indexOCR",
"options": {
"common_errors": [
"\u00fc=u",
"\u00f6=o",
"\u00e4=a",
"e=c",
"o=0",
"D=)",
"E=F",
"t=l",
"vv=W"
],
"lang": "deu",
"miss": [
1,
6,
10,
15,
20,
25
],
"mode": 6,
"subs": {
"sub_chars": [
"0",
"1",
"2",
"3",
"4",
"5",
"6",
"7",
"8",
"9"
],
"sub_placeholder": [
"{{",
"}}!"
]
},
"text_colors": [
"FFFFFF",
"FFDF00",
"EFA20C"
]
}
},
{
"action": "findShortlistByIndex"
},
{
"action": "expandShortlist"
},
{
"action": "diffTextLines"
}
]
}
]
}
The translation server mode allows other applications to use the ZTranslate client without being on the same machine, or in cases where ZTranslate can't grab the window's screen or its key inputs. While the ZTranslate client is running on your PC in this mode, it will listen on a TCP port and recieve HTTP requests for image translations. This way, an emulator running on a PSP can send the current screen to the ZTranslate client, get the results back and show it to the player, all without having to do the CPU and memory intensive operations on the device itself.
Here is an example config.json setup with translation server mode enabled:
{
"default_target": "En",
"server_host": "ztranslate.net",
"server_port": 443,
"user_api_key": "",
"local_server_enabled": true,
"local_server_api_key_type": "ztranslate",
"local_server_host": "localhost",
"local_server_port": 4404,
"local_server_ocr_key": "",
"local_server_translation_key": ""
}
The server will run on the client when "local_server_enabled" is true, and will run with the local_server_hostname and local_server_port specified (usually "localhost" and 4404). The local_server_api_key_type specifies what api the client will use: either "ztranslate" or "google". The "ztranslate" option will use the server_host/port (the non-local server) and the user_api_key to translate the images, while also saving them to the server, while the "google" option will use the local_server_ocr_key and the local_server_translation_key to translate the image without having to contact the ztranslate server itself, so long as the keys provided are for the Google Cloud OCR and the Google Cloud Translation apis respectively.
The target language, source language, and translation mode can be changed from the UI of the client running on the PC, and you can scroll through previous translated images by using client. As well, by loading a package in the client, it will translate the incomming image using the package and not call any external API, though automatic capture mode would not be supported (unless the application calling the local server does so).