RStudio Integration
Complete roundtrip workflow for developing validated analysis steps within RStudio
Workflow Overview
The improVerse RStudio integration provides a hybrid development model that combines the comfort of local RStudio development with the rigor of validated execution environments. Work on your scripts locally with full IDE capabilities, then push to the repository for execution in controlled, reproducible environments.
Local Development
- ✓ Full RStudio IDE access with debugging tools
- ✓ Interactive execution and rapid iteration
- ✓ Immediate feedback during development
- ✓ Access to local computing resources
Validated Execution
- ✓ Controlled runserver environment
- ✓ Versioned tool dependencies
- ✓ Standardized execution parameters
- ✓ Reproducible results independent of developer machine
Key Capabilities
Step Management
- • Create consumer steps with explicit data dependencies
- • Configure step parameters and tool specifications
- • Document descriptions and rationales
- • Establish clear data lineage
Workspace Integration
- • Check out steps to dedicated workspace directories
- • Automatic RStudio navigation to step context
- • Isolated environments for each step
- • Direct access to step input data
RStudio Attach Mode
- • Programmatic IDE navigation
- • Automatic working directory synchronization
- • File opening in source editor
- • Global environment management
Push & Execute
- • Upload scripts and files to repository
- • Execute in validated runserver environment
- • Monitor execution status and logs
- • Verify outputs and execution results
Complete Workflow
1. Launch improVerse Connect Addin
Access the improVerse integration through the RStudio Addins menu. The integration opens in the Viewer pane, providing access to repository resources.
2. Select Analysis Tree
Navigate to your target analysis tree using the Tree Explorer. The selected tree becomes the active context for subsequent operations.
3. Inspect Source Data
View existing step details to examine available data outputs. The Step Details panel shows info, inputs, and outputs tabs for complete step understanding.
4. Create Consumer Step
Create a new consumer step that processes outputs from existing steps. Configure input assignments, tool specification (Rscript), description, and rationale to establish clear dependencies.
5. Attach RStudio Control
Enable RStudio attach mode for seamless IDE integration. This allows the improVerse integration to control RStudio operations like navigation, working directory changes, and file management.
6. Open Step Workspace
Check out the step to a dedicated workspace directory. RStudio automatically navigates to the step location, providing an isolated environment with direct access to input data.
7. Develop Locally
Write and test your R script within RStudio using standard development workflows. Execute interactively, debug with RStudio tools, and iterate rapidly with immediate feedback.
8. Push and Execute
Upload your script and files to the repository and execute in the validated runserver environment. Monitor execution status and access real-time logs during processing.
9. Verify Outputs
Check generated files and preview results directly in the integration. The outputs tab lists all generated files including datasets, logs, and visualizations.
Workspace Management
workspace/
└── [tree-name]/ # Analysis tree name
├── [step-1-name]/ # First step workspace
│ ├── inputs/ # Input data and files
│ ├── script.R # Development script
│ └── outputs/ # Generated outputs
└── [step-2-name]/ # Second step workspace
├── inputs/ # Input from step 1
├── script.R # Visualization script
└── outputs/ # Generated plots Each step maintains an isolated workspace with explicit input/output separation. This structure ensures clean dependencies and makes it easy to understand data flow through your analysis pipeline.
Best Practices
Step Organization
- • Create focused steps with single, well-defined purposes
- • Provide clear descriptions and rationales
- • Explicitly declare all input dependencies
- • Avoid monolithic scripts performing multiple operations
- • Balance granularity with workflow complexity
Development Workflow
- • Test scripts thoroughly locally before pushing
- • Verify all edge cases and error conditions
- • Use relative paths within step workspace
- • Ensure scripts run identically in both environments
- • Document changes when updating existing steps
Reproducibility
- • All inputs must be explicitly declared and versioned
- • Avoid reading files from arbitrary paths
- • Scripts should be environment-independent
- • Explicitly specify all required packages
- • Generate self-documenting outputs
Resource Management
- • Consider computational requirements upfront
- • Optimize for runserver environment execution
- • Handle large datasets efficiently
- • Use meaningful commit messages
- • Maintain backward compatibility when possible
Multi-Step Workflow Patterns
Sequential Dependencies
└─> Step 2 (Data Cleaning)
└─> Step 3 (Statistical Analysis)
└─> Step 4 (Report Generation)
Parallel Processing
├─> Step 2a (Analysis A)
├─> Step 2b (Analysis B)
└─> Step 2c (Analysis C)
└─> Step 3 (Combine Results)
Branch and Merge
├─> Step 2a (Method A)
└─> Step 2b (Method B)
└─> Step 3 (Comparison)
Ready to streamline your RStudio workflow?