An instruction-tuned large language model (LLM) developed by Databricks, known for its commercial-use license and open model weights.
Dolly (Dolly 2.0 and later) is an instruction-following Large Language Model developed by Databricks, built on the EleutherAI Pythia model family. It is distinguished by its **truly open-source nature**, including the model weights and the high-quality human-generated instruction dataset (databricks-dolly-15k), all licensed for commercial use. This allows organizations to own, customize, and deploy the model entirely within their private infrastructure, eliminating vendor API reliance and data leakage risks.
Dolly is not used as a public API but as a model deployed on private infrastructure:
Databricks Dolly specializes in generating high-quality, idiomatic code for data engineering tasks. Instead of manually writing boilerplate PySpark for complex joins and aggregations, users describe the desired ETL logic in natural language. This accelerates pipeline development significantly, allowing engineering teams to focus on architecture while the AI produces optimized, production-ready code.
Dolly is capable of advanced text processing for healthcare data. It can be trained on private clinical datasets to recognize key findings and treatment plans within unstructured clinician notes. By converting messy text into structured data points like "primary diagnosis" and "treatment prescribed," it drastically improves the speed and accuracy of population health reporting and automated billing code assignment, enhancing resource allocation in hospital networks.