Find Source Text

Find and extract source text that must be translated.

Usage

find_source(
  path = ".",
  encoding = "UTF-8",
  verbose = getOption("transltr.verbose", TRUE),
  tr = translator(),
  interface = NULL
)

find_source_in_files(
  paths = character(),
  encoding = "UTF-8",
  verbose = getOption("transltr.verbose", TRUE),
  algorithm = algorithms(),
  interface = NULL
)

Arguments

path: A non-empty and non-NA character string. A path to a directory containing R source scripts. All subdirectories are searched. Files that do not have a .R, or .Rprofile extension are skipped.
encoding: A non-empty and non-NA character string. The source character encoding. In almost all cases, this should be UTF-8. Other encodings are internally re-encoded to UTF-8 for portability.
verbose: A non-NA logical value. Should progress information be reported?
tr: A Translator object.
interface: A name, a call object, or a NULL. A reference to an alternative (custom) function used to translate text. If a call object is passed to interface, it must be to operator ::. Calls to method Translator$translate() are ignored and calls to interface are extracted instead. See Details below.
paths: A character vector of non-empty and non-NA values. A set of paths to R source scripts that must be searched.
algorithm: A non-empty and non-NA character string equal to "sha1", or "utf8". The algorithm to use when hashing source information for identification purposes.

Value

find_source() returns an R6 object of class Translator. If an existing Translator object is passed to tr, it is modified in place and returned.

find_source_in_files() returns a list of Text objects. It may contain duplicated elements, depending on the extracted contents.

Details

find_source() and find_source_in_files() look for calls to method Translator$translate() in R scripts and convert them to Text objects. The former further sets these resulting objects into a Translator object. See argument tr.

find_source() and find_source_in_files() work on a purely lexical basis. The source code is parsed but never evaluated (aside from extracted literal character vectors).

The underlying Translator object is never evaluated and does not need to exist (placeholders may be used in the source code).
Only literal character vectors can be passed to arguments of method Translator$translate().

Interfaces

In some cases, it may not be desirable to call method Translator$translate() directly. A custom function wrapping (interfacing) this method may always be used as long as it has the same signature as method Translator$translate(). In other words, it must minimally have two formal arguments: ... and source_lang.

Custom interfaces must be passed to find_source() and find_source_in_files() for extraction purposes. Since these functions work on a lexical basis, interfaces can be placeholders in the source code (non- existent bindings) at the time these functions are called. However, they must be bound to a function (ultimately) calling Translator$translate() at runtime.

Custom interfaces are passed to find_source() and find_source_in_files() as name or call objects in a variety of ways. The most straightforward way is to use base::quote(). See Examples below.

Methodology

find_source() and find_source_in_files() go through these steps to extract source text from a single R script.

It is read with text_read() and re-encoded to UTF-8 if necessary.
It is parsed with parse() and underlying tokens are extracted from parsed expressions with utils::getParseData().
Each expression (expr) token is converted to language objects with str2lang(). Parsing errors and invalid expressions are silently skipped.
Valid call objects stemming from step 3 are filtered with is_source().
Calls to method Translator$translate() or to interface stemming from step 4 are coerced to Text objects with as_text().

These steps are repeated for each R script. find_source() further merges all resulting Text objects into a coherent set with merge_texts() (identical source code is merged into single Text entities).

Extracted character vectors are always normalized for consistency (at step 5). See normalize() for more information.

Limitations

The current version of transltr can only handle literal character vectors. This means it cannot resolve non-trivial expressions that depends on a state. All values passed to argument ... of method Translator$translate() must yield character vectors (trivially).

Examples

# Create a directory containing dummy R scripts for illustration purposes.
temp_dir   <- file.path(tempdir(TRUE), "find-source")
temp_files <- file.path(temp_dir, c("ex-script-1.R", "ex-script-2.R"))
dir.create(temp_dir, showWarnings = FALSE, recursive = TRUE)

cat(
  "tr$translate('Hello, world!')",
  "tr$translate('Farewell, world!')",
  sep  = "\n",
  file = temp_files[[1L]])
cat(
  "tr$translate('Hello, world!')",
  "tr$translate('Farewell, world!')",
  sep  = "\n",
  file = temp_files[[2L]])

# Extract calls to method Translator$translate().
find_source(temp_dir)
find_source_in_files(temp_files)

# Use custom functions.
# For illustrations purposes, assume the package
# exports an hypothetical translate() function.
cat(
  "translate('Hello, world!')",
  "transtlr::translate('Farewell, world!')",
  sep  = "\n",
  file = temp_files[[1L]])
cat(
  "translate('Hello, world!')",
  "transltr::translate('Farewell, world!')",
  sep  = "\n",
  file = temp_files[[2L]])

# Extract calls to translate() and transltr::translate().
# Since find_source() and find_source_in_files() work on
# a lexical basis, these are always considered to be two
# distinct functions. They also don't need to exist in the
# R session calling find_source() and find_source_in_files().
find_source(temp_dir, interface = quote(translate))
find_source_in_files(temp_files, interface = quote(transltr::translate))