The dataset is provided as a supplementary material for the paper (entitled: Identifying Refactoring Opportunities for Large Packages by Analyzing Maintainability Characteristics in Java OSS). The dataset contains the package metrics (in CSV format) of the Java projects compiled in Qualitas Corpus (a collection of open-source Java projects). Each column (except the first column) represents a package metric. The description of each column (and its format) in the dataset is as follows: Name: Package name (string). NumCls: Number of classes in the package (integer). NumCls_tc: Number of classes in the packages and its sub-packages (integer). NumOpsCls: Number of operations (methods) in the classes of the package (integer). R: Number of relationships between classes/interfaces that are internal to the package (integer). H: Average number of internal relationships per class/interface in the package (decimal). Ca: Number of external classes/interfaces that depend on the package (integer). Ce: Number of external classes/interfaces that the package depends on (integer). ConnComp: Number of connected components formed by the classes/interfaces of the package (integer). A Python script written for performing the empirical analysis is also provided in the supplementary material. The mapping of Python functions to the research questions (RQs)/sections in the research paper is as follows: Percentile methods (classification of packages) --> RQ1 (section 3.2.4) Distribution plot (distribution of package sizes) --> RQ1 (section 3.2.4) largePackagesWithCopuling method (coupling issues in large packages) --> RQ1 (section 3.2.4) smallPackagesWithCoupling method (coupling issues in small packages) --> RQ1 (section 3.2.4) moderatePackagesWithCoupling method (coupling issues in moderate package) --> RQ1 (section 3.2.4) packagesWithCohesion (cohesion issues in different package sizes) --> RQ1 (section 3.2.4) packagesWithComplexity (complexity issues in different package sizes) --> RQ1 (section 3.2.4) kruskal method (Kruskal-Wallis test of significance) --> RQ1 (section 3.2.4) Risk ratio methods (risk ratio of coupling in different package sizes) --> RQ1 (section 3.2.4) variance_inflation_factor method (multi-collinearity between independent variables) --> RQ2 (section 3.2.5) Regression method (regression analysis) --> RQ2 (section 3.2.5) spearmanr (Spearman's correlation) --> RQ2 (section 3.2.5) Cohenf2 method (effect size) --> RQ2 (section 3.2.5)