-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
very slow on initializing gpmodel for non-consecutive group_data #4
Comments
Just to double check, by non-consecutive you mean e.g. [1, 1, 5, 60, 60] instead of [1, 1, 2, 3, 3]? Can you provide a minimal working example? |
like [1,2,3,1,2,3] |
Which setting do you use? Number of samples and number of different groups? |
what do you mean by setting? |
[1,2,3,1,2,3] means a total of 6 samples and 3 different groups. I guess this gives no problem :-). When do you experience performance issues? |
That happens everytime I initialize gpboost. My dataset is as large as 10million samples,2000+groups, and samples in every group is not consecutive just like the example above. |
I see. 10'000'000 samples and 2'000 groups. I will investigate this. |
btw, I cannot save the gp_model as well. I'm using the python wrapper of gpboost, but gpb.save_model() could only save the gbdt part, not with the gaussian model. |
Yes, that is correct. Saving of the GPModel is not implemented. But this is another topic. Please open another issue if this feature is desirable for you. |
I have fixed this now. Initialization of a GPModel with group_data that is not ordered takes now approximately the same time as in the ordered case (see #3 (comment)). @aprilffff : Many thanks for raising this issue! |
As in #3, the speed for initializing consecutive group_data is tested great, but it seems not properly handeled non-consecutive data.
I have tested this on a super cool machine, and the performance between consecutive group data and non-consecutive ones are 0.4s and 1800+s.
pls help on this, because lgb model requires the data in its query order, which might be very different with the group_data requested in gaussian model.
The text was updated successfully, but these errors were encountered: