Skip to end of metadata
Go to start of metadata

Permission Denied exception when starting the Flux engine

Caused by: java.net.SocketException: Permission denied
                at sun.nio.ch.Net.bind(Native Method)

The message above (the full stack trace might be longer) is thrown when the logged on user/the user starting the Flux engine doesn't have permission to create a socket on port 7520 (default engine port). Please contact your administrator to get the necessary permissions. 

Where to find the Flux logs

When using the default Flux installation:

The engine logs can be found in the root of your Flux installation folder (usually located in 'C:\flux-8-0-x').

  • The logs are named 'flux-<server name>-dd-MMM-yyyy.log'.

The operations console logs are also located in the root of your Flux installation folder

  • The logs are named 'opsconsole-dd-MMM-yyyy.log'.

Engine or JVM crashes unexpectedly

Most engine crashes occur due to a bug in the Java Virtual Machine (JVM). To find the known bugs for your Java environment, you can search Sun's bug database for your version of Java. 

When a JVM bug causes the Flux engine to crash, you will see a message written to standard err (stderr), and a fatal error crash report file written to the file system. This crash report is normally saved in the temp directory (/tmp for Linux / Unix, or C:\Windows\temp on Windows); if you cannot find the crash report file in the temp directory, it may also be located in the working directory of the JVM (usually the directory where the JVM or Flux engine was started from).

Another common cause for crashes, in cases where the crash occurs after the environment has been running for a long period of time, is that the environment has run out of resources. If the crash occurs for a non-apparent reason (there is no error message logged or displayed to stderr), we recommend monitoring the JVM closely on a daily basis to watch memory usage and see if memory increases in the time leading up to the crash. Although Java will typically write an error message to stderr in the event of a memory problem, this can often be unreliably as memory problems are known to cause unexpected behavior. A basic tool for memory monitoring is JConsole for Java 6+, or VisualVM for earlier versions of Java.

If the crash occurs in a long-running environment that was previously stable, and does not appear to occur due to a bug in the JVM or a memory problem, send an email to support@flux.ly for further assistance and take special care to note any possible changes in the environment for the following:

    • System upgrades
    • Java upgrades
    • Other software upgrades
    • Device driver upgrades
    • Command line argument changes
    • Changes to application code (in particular, any calls to the Java method System.exit())
    • Additional load recently added to the Flux system
    • Java library upgrades.

For memory usage problems, we also recommend enabling the heap dump on out of memory JVM configuration parameter:

-XX:+HeapDumpOnOutOfMemoryError

This will generate a heap dump if the JVM runs out of memory, which can be useful for the Flux team in debugging what caused the application to exceed its memory limitations.

My engine or workflow is stuck in a certain state and does not respond to commands

If an engine or workflow is unresponsive, it is typically because the MAX_CONNECTIONS parameter in the engine configuration is not large enough to accommodate the number of users or client connections for the engine. Every client or user who connects to Flux may need a database connection available (as well as a certain number for workflows and background tasks), so the MAX_CONNECTIONS parameter must be high enough to accommodate all of the potential database connections the engine might require. For more information on setting an ideal MAX_CONNECTIONS parameter, see Max Connections and Concurrency Level / Concurrency Throttle.

If your MAX_CONNECTIONS is configured correctly, or you are certain that the engine has not reached its total number of available database connections, and your engine is still unresponsive, gather as much of the following information as you are able and send it in an email to support@flux.ly for further assistance:

    • A log file from the engine, preferable at the FINEST level.
    • A thread dump from the JVM where the engine is running, taken at least 5 minutes after the engine or workflow becomes unresponsive.
    • The full contents of the FLUX_READY table in your database from the time the unresponsiveness occurred.
    • The version of Flux that you are running (if you aren't sure, run the command "java -jar flux.jar" and copy the output).
    • Copies of the workflows, if any, that have become stuck.
    • A database deadlock report from the database, showing any deadlocks or deadlocked transactions that occurred around the time the engine/workflow became stuck.

A Process Action on my engine is stuck in the FIRING state and won't progress to the next step

In most cases, this happens when the underlying process invoked by the process action has not completed or is still running on the system. A Process Action cannot complete its execution until the underlying process has finished execution and exited entirely – in some cases, it can be possible for a process to appear to be completed (from an operator's perspective), but fail to exit the underlying process, causing the Process Action to appear to hang.

A simple way to test this is to view the Task Manager (in Windows) or use the process status command (in *nix) to check the status of the process.

If this problem occurs, it may be necessary to make changes to the Process Action's "command" argument to ensure the underlying process exits when complete, or to make changes to the invoked script / command itself.

If you encounter this problem and the task manager / process status does not show that the process is open, see My engine or workflow is stuck in a certain state and does not respond to commands above.

Cannot contact a remote engine from a Linux system

This problem is a known Linux and Java issue. It is not Flux-specific. In general, when a Java application tries to lookup a remotee object on a Linux computer, the remote reference that is returned to the Java application may contain a reference to 127.0.0.1 (localhost), instead of the remote computer’s actual (routable) IP address and host name.

Very likely, the first entry in your Linux system’s /etc/hosts file matches, or is similar to, the following line:

127.0.0.1 localhost


There may be additional lines below this line that specify other IP addresses and host names. However, if the very first line is similar to the above line, this Linux/Java problem can occur.

To resolve this problem, move the first line farther below in your /etc/hosts file, beneath the line that lists your computer’s real (routable) IP address and host name.

Workflow fires twice

If a database deadlock occurs while your workflow is firing, the database transaction is rolled back and tried again automatically. At this point, your workflow will fire again but only because it did not run to completion successfully the last time it fired. This behavior is normal.

You can completely eliminate the possibility of "your workflow firing twice" by tying your workflow’s work into the same database connection that your Flux workflow uses. That way, if Flux’s database connection rolls back, your workflow’s work rolls back too, and there is no harm done.

You can also tie the Flux database connection into your work by using an XA resource or an XA database connection. Again, if the Flux database connection rolls back, your workflow’s work rolls back too — no harm done.

  • mmiVerifyTpAndGetWorkSize: stack_height=2 should be zero; exit

    You may see the above message on your console. It is a harmless message emitted directly from the IBM Java Runtime Environment (JRE) or the JRE’s Just In Time (JIT) Compiler. The following IBM website explains that it is an IBM error and provides the solution.

    http://www-1.ibm.com/support/docview.wss?uid=swg1PQ83394
  • I see database deadlocks! What is wrong?

    Probably nothing. Database deadlocks are a normal part of any database application. Deadlocks occur in a normally functioning software application.

    If a database deadlock occurs while a workflow is running, Flux rolls back the current database transaction and automatically retries the flow chart. No administrative action is required.

    If a deadlock occurs while using the Flux Designer, you must manually retry the GUI action that you attempted.

    Once your flow chart is successfully added to the engine, database deadlocks do not require any action on your part.

    In general, row-level locking is preferred in databases, because it minimizes the opportunity for deadlock and connection timeouts. If possible, enable row-level locking at the database level.

    If you see more than an average of one deadlock per hour or if you can reproduce a deadlock regularly by following a well defined sequence of steps, then contact our Technical Support department at support@flux.ly with an explanation of the deadlock situation. We will work with you to attempt to reduce the number of deadlocks to a tolerance of less than an average of one deadlock per hour.

 

EJB 2.0 restriction on Flux client calls

If you are using EJB 2.0 and are making client calls into a Flux engine from your EJB, Flux will not operate properly if the calling EJB has Container Managed Transactions (CMT) enabled. This issue occurs because the EJB 2.0 specification does not allow other applications (in this case, a Flux engine) to look up user transactions while CMT is enabled. Flux engines utilize user transactions in order to allow them to participate in EJB transactions.

The workaround for this issue is to either configure your EJB 2.0 beans to use Bean Managed Transactions (BMT) or simply have your beans use EJB 1.1.

First, you should make sure the Flux engine’s loggers are enabled. These loggers record useful information about the state of Flux and running flow charts. By default, loggers write their logging information to the console (standard out or stdout), but these logs can be reset to log to other destinations.

Workflows firing late or failing to fire after an engine restart

If some of your previously scheduled workflows seem to fire very late, if at all, after you restart your engine, the cause may be that the engine was not properly disposed. If the engine process is terminated before the shutdown fully completes, some of your workflows may have been left in the FIRING state. In this case, when your engine restarts, your engine leaves these workflows alone, assuming they are being fired by a second, clustered engine instance. After a few minutes, according to the configuration parameter FAILOVER_TIME_WINDOW, your engine instance will failover these workflows, and they will begin running again.

In order to avoid this delay, be sure to shutdown cleanly by calling engine.dispose(). Alternately, configure your engine so that it is running standalone, not as part of a cluster. To ensure that an engine runs as a standalone, it must be the only engine pointed to its set of database tables (clustering is enabled by pointing multiple engines to the same database tables).

Flux reporting a lack of CPU or memory resources available when it runs embedded in my application

Because Flux runs inside the same JVM as the rest of your application, if parts of your application exhaust database, memory, or virtual machine resources, this excessive consumption of resources may be revealed in Flux. If a lack of resources is reported by Flux, it does not necessarily imply that Flux itself leaked or consumed these resources. Other parts of your application residing in the same JVM may have consumed most or all of these resources.

For example, suppose you call a Flux engine method and an SQLException is thrown, indicating that that database has run out of database cursors. This SQLException does not necessarily imply that Flux is leaking database resources. It may imply that other parts of the application are leaking database resources, but that this leak was merely exposed by a call to Flux.

SSH Encryption issues

Keys generated by OpenSSH with default params are considered weak. Weak encryption keys are not supported in Flux Sftp Hosts in 8.0.11 or later. You need to generate your keys with strong encryption algorithm like DES-EDE3 in CBC mode. If you upgrade from an earlier version of Flux, you may see this exception with your private key.

Caused by: java.lang.IllegalArgumentException: Unsupported key format.
at com.jscape.inet.ssh.util.SshParameters.a(Unknown Source)
at com.jscape.inet.ssh.util.SshParameters.<init>(Unknown Source)
at com.jscape.inet.ssh.util.SshParameters.<init>(Unknown Source)
at fluximpl.ftp.SftpClient.connectAndLogin(fyc:73)
at fluximpl.variables.file.Ftp_Host.setup(sqb:430)
at fluximpl.variables.file.AbstractFileSystem.i(fwb:400)
... 13 more
Caused by: com.jscape.inet.ssh.util.keyreader.FormatException: cannot restore key pair
at com.jscape.inet.ssh.util.KeyPairAssembler.restoreKeyPair(Unknown Source)
at com.jscape.inet.ssh.util.KeyPairAssembler.restoreKeyPair(Unknown Source)
at com.jscape.inet.ssh.util.KeyPairAssembler.restoreKeyPair(Unknown Source)

To fix this, you need to create a strong encryption key. Here is an example provided using openssl.

1. Generate the private key with passphrase
openssl genrsa -des3 -out id_rsa 1024
Generating RSA private key, 1024 bit long modulus
................................................................................++++++
....++++++
e is 65537 (0x10001)
Enter pass phrase for id_rsa:
Verifying - Enter pass phrase for id_rsa:
2. Change the permissions of the private key to 600
chmod 600 id_rsa
3. Generate public key and provide the private key passphrase when prompted
ssh-keygen -y -f id_rsa > id_rsa.pub

 

 

  • No labels