RStudio Integration

Complete roundtrip workflow for developing validated analysis steps within RStudio

Workflow Overview

The improVerse RStudio integration provides a hybrid development model that combines the comfort of local RStudio development with the rigor of validated execution environments. Work on your scripts locally with full IDE capabilities, then push to the repository for execution in controlled, reproducible environments.

RStudio Integration

Local Development

  • Full RStudio IDE access with debugging tools
  • Interactive execution and rapid iteration
  • Immediate feedback during development
  • Access to local computing resources

Validated Execution

  • Controlled runserver environment
  • Versioned tool dependencies
  • Standardized execution parameters
  • Reproducible results independent of developer machine

Key Capabilities

📝

Step Management

  • • Create consumer steps with explicit data dependencies
  • • Configure step parameters and tool specifications
  • • Document descriptions and rationales
  • • Establish clear data lineage
📁

Workspace Integration

  • • Check out steps to dedicated workspace directories
  • • Automatic RStudio navigation to step context
  • • Isolated environments for each step
  • • Direct access to step input data
🎮

RStudio Attach Mode

  • • Programmatic IDE navigation
  • • Automatic working directory synchronization
  • • File opening in source editor
  • • Global environment management
🚀

Push & Execute

  • • Upload scripts and files to repository
  • • Execute in validated runserver environment
  • • Monitor execution status and logs
  • • Verify outputs and execution results

Complete Workflow

1. Launch improVerse Connect Addin

Access the improVerse integration through the RStudio Addins menu. The integration opens in the Viewer pane, providing access to repository resources.

2. Select Analysis Tree

Navigate to your target analysis tree using the Tree Explorer. The selected tree becomes the active context for subsequent operations.

3. Inspect Source Data

View existing step details to examine available data outputs. The Step Details panel shows info, inputs, and outputs tabs for complete step understanding.

4. Create Consumer Step

Create a new consumer step that processes outputs from existing steps. Configure input assignments, tool specification (Rscript), description, and rationale to establish clear dependencies.

5. Attach RStudio Control

Enable RStudio attach mode for seamless IDE integration. This allows the improVerse integration to control RStudio operations like navigation, working directory changes, and file management.

6. Open Step Workspace

Check out the step to a dedicated workspace directory. RStudio automatically navigates to the step location, providing an isolated environment with direct access to input data.

7. Develop Locally

Write and test your R script within RStudio using standard development workflows. Execute interactively, debug with RStudio tools, and iterate rapidly with immediate feedback.

8. Push and Execute

Upload your script and files to the repository and execute in the validated runserver environment. Monitor execution status and access real-time logs during processing.

9. Verify Outputs

Check generated files and preview results directly in the integration. The outputs tab lists all generated files including datasets, logs, and visualizations.

Workspace Management

workspace/
└── [tree-name]/          # Analysis tree name
    ├── [step-1-name]/    # First step workspace
    │   ├── inputs/       # Input data and files
    │   ├── script.R      # Development script
    │   └── outputs/      # Generated outputs
    └── [step-2-name]/    # Second step workspace
        ├── inputs/       # Input from step 1
        ├── script.R      # Visualization script
        └── outputs/      # Generated plots

Each step maintains an isolated workspace with explicit input/output separation. This structure ensures clean dependencies and makes it easy to understand data flow through your analysis pipeline.

Best Practices

Step Organization

  • • Create focused steps with single, well-defined purposes
  • • Provide clear descriptions and rationales
  • • Explicitly declare all input dependencies
  • • Avoid monolithic scripts performing multiple operations
  • • Balance granularity with workflow complexity

Development Workflow

  • • Test scripts thoroughly locally before pushing
  • • Verify all edge cases and error conditions
  • • Use relative paths within step workspace
  • • Ensure scripts run identically in both environments
  • • Document changes when updating existing steps

Reproducibility

  • • All inputs must be explicitly declared and versioned
  • • Avoid reading files from arbitrary paths
  • • Scripts should be environment-independent
  • • Explicitly specify all required packages
  • • Generate self-documenting outputs

Resource Management

  • • Consider computational requirements upfront
  • • Optimize for runserver environment execution
  • • Handle large datasets efficiently
  • • Use meaningful commit messages
  • • Maintain backward compatibility when possible

Multi-Step Workflow Patterns

Sequential Dependencies

Step 1 (Data Import)
  └─> Step 2 (Data Cleaning)
       └─> Step 3 (Statistical Analysis)
            └─> Step 4 (Report Generation)

Parallel Processing

Step 1 (Data Import)
  ├─> Step 2a (Analysis A)
  ├─> Step 2b (Analysis B)
  └─> Step 2c (Analysis C)
       └─> Step 3 (Combine Results)

Branch and Merge

Step 1 (Data Preparation)
  ├─> Step 2a (Method A)
  └─> Step 2b (Method B)
       └─> Step 3 (Comparison)

Ready to streamline your RStudio workflow?