Help:Extension:Translate/Translation memories/zh
翻译扩展的翻译记忆 支持ElasticSearch。该页面旨在指导您安装ElasticSearch,并更详细地探索其规格。
与其他翻译辅助工具(例如外部机器翻译服务)不同,翻译记忆库会不断更新您的维基中的新翻译。如果您选择使用ElasticSearch,也可以在Special:SearchTranslations中获得跨翻译的高级搜索。
比较
The database backend is used by default: it has no dependencies and doesn't need configuration. The database backend can't be shared among multiple wikis and it does not scale to large amounts of translated content. Hence we also support ElasticSearch as a backend. It is also possible to use another wiki's translation memory if their web API is open. Unlike ElasticSearch, remote backends are not updated with translations from the current wiki.
数据库 | 远程 API | ElasticSearch | |
---|---|---|---|
默认为启用 | <translate> Yes</translate> | <translate> No</translate> | <translate> No</translate> |
可含多个来源 | <translate> No</translate> | <translate> Yes</translate> | <translate> Yes</translate> |
随本地翻译更新 | <translate> Yes</translate> | <translate> No</translate> | <translate> Yes</translate> |
直接访问数据库 | <translate> Yes</translate> | <translate> No</translate> | <translate> No</translate> |
访问来源 | 编者 | 链接 | 本地时编者,否则链接 |
可共享为 API 服务 | <translate> Yes</translate> | <translate> Yes</translate> | <translate> Yes</translate> |
表现 | 缩放不好 | 未知 | 合理的 |
条件
ElasticSearch 后端
ElasticSearch is relatively easy to set up. If it is not available in your distribution packages, you can get it from their website. You will also need to get the Elastica extension. Finally, please see puppet/modules/elasticsearch/files/elasticsearch.yml for specific configuration needed by Translate.
The bootstrap script will create necessary schemas. If you are using ElasticSearch backend with multiple wikis, they will share the translation memory by default, unless you set the index parameter in the configuration.
When upgrading to the next major version of ElasticSearch (e.g. upgrading from 2.x to 5.x), it is highly recommended to read the release notes and the documentation regarding the upgrade process.
安裝
在满足基本要求后,安装需要您调整配置,然后执行引导程序。
配置
包含翻译记忆的所有翻译辅助功能都通过$wgTranslateTranslationServices
设置變數来配置。
The primary translation memory backend must use the key TTMServer
. The primary backend receives translation updates and is used by Special:SearchTranslations.
TTMServers的配置示例:
默认配置 |
---|
$wgTranslateTranslationServices['TTMServer'] = array(
'database' => false, // Passed to wfGetDB
'cutoff' => 0.75,
'type' => 'ttmserver',
'public' => false,
);
|
远程 API 配置 |
$wgTranslateTranslationServices['example'] = array(
'url' => 'http://example.com/w/api.php',
'displayname' => 'example.com',
'cutoff' => 0.75,
'timeout' => 3,
'type' => 'ttmserver',
'class' => 'RemoteTTMServer',
);
|
ElasticSearch后端配置 |
In this case the single back-end service will be used both for reads & writes.
$wgTranslateTranslationServices['TTMServer'] = array(
'type' => 'ttmserver',
'class' => 'ElasticSearchTTMServer',
'cutoff' => 0.75,
/*
* See http://elastica.io/getting-started/installation.html
* See https://github.com/ruflin/Elastica/blob/8.x/src/Client.php
'config' => This will be passed to \Elastica\Client
*/
);
|
ElasticSearch多个后端配置(由MLEB 2017.04支持) |
// 定义用于读取操作的默认服务
// 允许快速切换到另一个后端
// 'mirrors' configuration option is no longer supported since MLEB 2023.10
$wgTranslateTranslationDefaultService = 'cluster1';
$wgTranslateTranslationServices['cluster1'] = array(
'type' => 'ttmserver',
'class' => 'ElasticSearchTTMServer',
'cutoff' => 0.75,
/*
* 定义要复制写入的服务列表。
* 这里只允许“可写”服务。
*/
'mirrors' => [ 'cluster2' ],
'config' => [ 'servers' => [ 'host' => 'elastic1001.cluster1.mynet' ] ]
);
$wgTranslateTranslationServices['cluster2'] = array(
'type' => 'ttmserver',
'class' => 'ElasticSearchTTMServer',
'cutoff' => 0.75,
/*
* 如果“cluster2”被定义为默认服务,它将开始将写入复制到“cluster1”。
*/
'mirrors' => [ 'cluster1' ],
'config' => [ 'servers' => [ 'host' => 'elastic2001.cluster2.mynet' ] ]
);
|
ElasticSearch multiple services with single readable service using writable configuration (supported by MLEB 2023.04)
|
With writable configuration the following rules are enforced:
If a service is marked as writable, the mirrors configuration will not be allowed. // Three services configured with one being readable and the others being writable.
$wgTranslateTranslationServices['dc0'] = [
'type' => 'ttmserver',
'class' => 'ElasticSearchTTMServer',
'cutoff' => 0.75,
// Default service cannot be marked as write-only
];
$wgTranslateTranslationServices['dc1'] = [
'type' => 'ttmserver',
'class' => 'ElasticSearchTTMServer',
'cutoff' => 0.75,
// Marks this service as write-only
'writable' => true,
];
$wgTranslateTranslationServices['dc2'] = [
'type' => 'ttmserver',
'class' => 'ElasticSearchTTMServer',
'cutoff' => 0.75,
'writable' => true
];
$wgTranslateTranslationDefaultService = 'dc0';
|
可能的键值为:
鍵 | 用于 | 说明 |
---|---|---|
config
|
ElasticSearch | 傳遞給Elastica的配置。 |
cutoff
|
所有 | 匹配建议的最小阀值。尽管在阀值上有更多合法值,但仅显示一些最佳建议。 |
database
|
本地 | 如果您想保存翻译记忆到不同位置,则可以在此处指定数据库名。同时还必须配置 MediaWiki 的负载均衡器以确定连接到该数据库的方法。 |
displayname
|
远程 | 悬停在建议来源链接(子弹头图标)时,工具提示中显示的文本。 |
index
|
ElasticSearch | 在ElasticSearch中使用的索引。默认值:ttmserver。 |
public
|
所有 | 该 TTMServer 是否可通过本 wiki 的 api.php 查询。 |
replicas
|
ElasticSearch | 如果您正在运行群集,则可以增加副本数。默认值:0。 |
shards
|
ElasticSearch | 要使用多少个分片。默认值:5。 |
timeout
|
远程 | 等待远程服务应答的秒數。 |
type
|
所有 | 以最终格式表示的 TTMServer 类型。 |
url
|
远程 | 远程 TTMServer 中 api.php 的链接。 |
use_wikimedia_extra
|
ElasticSearch | Boolean, when the extra plugin is deployed you can disable dynamic scripting on Elastic v1.x. This plugin is now mandatory for Elastic 2.x clusters. |
mirrors (DEPRECATED Since MLEB 2023.04)
|
Writable services | 字符串数组定义了要复制写入的服务列表,它允许多个TTM服务保持最新。对于快速切换或减少计划维护操作期间的停机时间非常有用(在MLEB 2017.04中添加) Cannot be used along with the newly added writable configuration.
|
writable (Added in MLEB 2023.04)
|
Write-only services | Boolean value, defined for a service if that service is write-only. The default service (wgTranslateTranslationDefaultService ) cannot be marked as write-only. If out of all the translation memory services configured, none are marked as writable then all services are considered to be readable and writable. 参见<translate> task <tvar name=1>T322284</tvar></translate>
|
TTMServer
做為到$wgTranslateTranslationServices
的陣列索引。远程TTMServer无法实现此功能,因为它们无法更新。 As of MLEB 2017.04 the key TTMServer
can be configured with the configuration variable $wgTranslateTranslationDefaultService
. Support for Solr backend was dropped in MLEB-2019.10, in October, 2019.目前只支持MySQL数据库後端。
Bootstrap
Once you have chosen ElasticSearch and set up the requirements and configuration, run ttmserver-export.php
to bootstrap the translation memory.
Bootstrapping is also required when changing translation memory backend. If you are using a shared translation memory backend for multiple wikis, you'll need to bootstrap each of them separately.
Sites with lots of translations should consider using multiple threads with the --thread
parameter to speed up the process. The time depends heavily on how complete the message group completion stats are (incomplete ones will be calculated during the bootstrap). New translations are automatically added by a hook. New sources (message definitions) are added when the first translation is created.
Bootstrap does the following things, which don't happen otherwise:
- adding and updating the translation memory schema;
- populating the translation memory with existing translations;
- cleaning up unused translation entries by emptying and re-populating the translation memory.
When the translation of a message is updated, the previous translation is removed from the translation memory. However, when translations are updated against a new definition, a new entry is added but the old definition and its old translations remain in the database until purged. When a message changes definition or is removed from all message groups, nothing happens immediately. Saving a translation as fuzzy does not add a new translation nor delete an old one in the translation memory.
TTMServer API
如果您想实现自己的 TTMServer 数据库,请看详细说明。
查询参数:
您的服务必须接受下列参数:
鍵 | 值 |
---|---|
format
|
json |
action
|
ttmserver |
service
|
存在多个共享翻译记忆时可选的服务标识符。如果未提供,则使用默认服务。 |
sourcelanguage
|
如同 MediaWiki 中使用的语言代码,请参阅 IETF 语言标记和 ISO693? |
targetlanguage
|
如同 MediaWiki 中使用的语言代码,请参阅 IETF 语言标记和 ISO693? |
test
|
源语言表示的原内容 |
您的服务必须提供对象数组中含有键 ttmserver
的 JSON 对象。这些对象必须包含下列数据:
鍵 | 值 |
---|---|
source |
原始的源文本。 |
target |
翻译建议。 |
context |
源的本地标识符,可选。 |
location |
到查看建议的网页链接。 |
quality |
表示建议且在 [0..1] 区间的十进制数。1 表示最佳匹配。 |
例如:
{
"ttmserver": [
{
"source": "January",
"target": "tammikuu",
"context": "Wikimedia:Messages\\x5b'January'\\x5d\/en",
"location": "https:\/\/translatewiki.net\/wiki\/Wikimedia:Messages%5Cx5b%27January%27%5Cx5d\/fi",
"quality": 0.85714285714286
},
{
"source": "January",
"target": "tammikuu",
"context": "Mantis:S month january\/en",
"location": "https:\/\/translatewiki.net\/wiki\/Mantis:S_month_january\/fi",
"quality": 0.85714285714286
},
{
"source": "January",
"target": "Tammikuu",
"context": "FUDforum:Month 1\/en",
"location": "https:\/\/translatewiki.net\/wiki\/FUDforum:Month_1\/fi",
"quality": 0.85714285714286
},
{
"source": "January",
"target": "tammikuun",
"context": "MediaWiki:January-gen\/en",
"location": "https:\/\/translatewiki.net\/wiki\/MediaWiki:January-gen\/fi",
"quality": 0.85714285714286
},
{
"source": "January",
"target": "tammikuu",
"context": "MediaWiki:January\/en",
"location": "https:\/\/translatewiki.net\/wiki\/MediaWiki:January\/fi",
"quality": 0.85714285714286
}
]
}
数据库后端
后端包含了三个表:translate_tms
、translate_tmt
和 translate_tmf
。分别对应于源、目标和完整的文本。您可以在 sql/translate_tm.sql
中看到表格的定义。源包含了所有信息组定义。尽管通常它们总是使用相同的语言(例如英语),但在极少数情况下,文本的语言也会存储,这是不正确的。
每个条目都有唯一的 ID 和两个附加字段:长度和上下文。查询时使用长度作为首个过滤器,这样就无需把要搜索的文本和数据库中每个条目进行比较。上下文中保存了文本来源的页面标题,例如“MediaWiki:Jan/en”。根据该信息,我们可以把建议链接到“MediaWiki:Jan/de”,这样有助于译者快速修复问题或确定使用哪个译文。
第二个过滤器来自全文索引。它的定义与 ad hoc 算法混合。首先通过 MediaWiki 的 Language::segmentByWord
把文本分割为片段(词)。如果有足够的片段,我们主要去除所有非单词字母的那些内容来常态化。接着获取开头的十个唯一单词,必须至少五个字节长(英文中的五个字母,对于多字节字符则更少字数)。然后把这些词保存在全文索引中供将来过滤更长的字符串。
过滤出候选列表后,则从目标表中获取匹配的目标。然后使用编辑距离算法进行最后的过滤和排序。定义如下:
- E
- 编辑距离
- S
- 用于搜索建议的文本
- Tc
- 建议文本
- To
- 译文 Tc 的原始文本
通过 E/min(length(Tc),length(To)) 计算 Tc 建议的质量。我们使用 PHP 内置的 levenshtein 函数,但当某个字符串长于 255 字节时,则使用 PHP 实现的 levenshtein 算法。[1] It has not been tested whether the native implementation of levenshtein handles multibyte characters correctly. This might be another weak point when source language is not English (the others being the fulltext search and segmentation).
<translate> Translators</translate> (<translate> main help page</translate> )
- <translate> How to translate</translate>
- <translate> Best practices</translate>
- <translate> Statistics and reporting</translate>
- <translate> Quality assurance</translate>
- <translate> Message group states</translate>
- <translate> Offline translation</translate>
- <translate> Glossary</translate>
<translate> Translation administrators</translate>
- <translate> How to prepare a page for translation</translate>
- <translate> Page translation administration</translate>
- <translate> Unstructured element translation</translate>
- <translate> Group management</translate>
- <translate> Move translatable page</translate>
- <translate> Import translations via CSV</translate>
- <translate> Working with message bundles</translate>
<translate> Sysadmins and developers</translate>
- <translate> Installation</translate>
- <translate> Configuration</translate>
- <translate> Getting started with development</translate>
- <translate> Developer guide</translate>
- <translate> Extending Translate</translate>
- <translate> Validators</translate>
- <translate> Insertables</translate>
- <translate> Group configuration</translate>
- <translate> Group configuration example</translate>
- <translate> Translation memories</translate>
- <translate> Translation aids</translate>
- <translate> Enabling message bundles</translate>
- <translate> PHP hooks</translate>