Build trust, collect data from cdrowdsource, if you want to succeed on this.
Build trust by: truly making this a public good, by open sourcing it. Be the maitnainer. Data dump every week as a zipball/tarball. These will ensure you can't rugpull.
With this trust, offer an extension (open source of course) to all, which whever a user goes through crunchbase, traxn, etc, sends any factual data (hence non-copyrightable) to you. If you gained trust, I would also do this.
You get the right to be a maintainer, and figure out if you also want to make a business with it on top.